: 4 3 . .
.mameQM-ug

«r... k ;
1v. 3

«It I.
H.» a

.. x
313;!
1....
:5 3?! 1&1...
I): {:3} hi
45.53:: 3h ‘ .
a: a 50. u
{I u‘ ’ .h“.

s .
I"! (I519
fight) 11.!

1 .
5$v_IIBP\I .5. )
l 13.1.1213.
159...! $05..

‘ ‘ .. l 5‘
anti? . . lat .pL ‘ .

11-16513

41' ' {(0

This is to certify that the

dissertation entitled

CANDIDATE SUBCIRCUIT ENUMERATION
FOR MODULE IDENTIFICATION IN DIGITAL
CIRCUITS

presented by

Jennifer Lynn White

has been accepted towards fulﬁllment
of the requirements for

Doctoral degree in Computer Science
& Engineering

1 (f) '
9
Ma jyprofessor

 

Date .5, 0 50

MS U i: an Afﬁrmatiw Action/Equal Opportunity Institution 0-12771

 

. UBRARY
M'Chigan State
University

 

 

 

 

PLACE IN RETURN Box to remove this checkout from your record.
To AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 chIRC/DatoDuepSS—ats

CANDIDATE SUBCIRCUIT ENUMERATION
FOR MODULE IDENTIFICATION IN DIGITAL
CIRCUITS

By

Jennifer Lynn White

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science and Engineering

2000

ABSTRACT
CANDIDATE SUBCIRCUIT ENUMERATION FOR
MODULE IDENTIFICATION IN DIGITAL CIRCUITS
By

Jennifer Lynn White

Reverse engineering involves the transformation of system speciﬁcations to a de-
scription that contains fewer details but presents a more abstract view of the system.
Within design automation, reverse engineering can be applied to low-level circuit de-
scriptions, such as the geometric description of a chip, to recover a higher level of
abstraction, such as transistor- or gate-level speciﬁcations.

Transforming a circuit speciﬁcation from a lower-level speciﬁcation (gates) to a
higher-level speciﬁcation (modules) is the focus of the Module Identiﬁcation Problem.
Transforming a speciﬁcation from the gate to module level provides a conceptual
description of the circuit, facilitating the reverse engineering effort. The Module
Identiﬁcation Problem has two subproblems: Candidate Subcircuit Enumeration and
Subcircuit Identiﬁcation. Candidate Subcircuit Enumeration locates clusters of gates

within the target circuit that may be equivalent to a known high-level module, and

Subcircuit Identiﬁcation tests them for equivalence.

All gate clusters within the target circuit that may be functionally equivalent
to a known high-level module must be enumerated to effectively allow Subcircuit
Indentiﬁcation. This thesis presents a technique for enumerating all candidate sub-
circuits. Each subcircuit is enumerated exactly once, to ensure complete coverage,
and only subcircuits that perform a viable function in the context of digital logic are
enumerated.

The number of subcircuits in a circuit can be exponential with respect to the
number of devices in the circuit, so a technique is presented to partition the subcircuits
into structural equivalence classes by generating an identiﬁer for each subcircuit that
describes the subcircuit structure. Only one instance of each structural equivalence
class must be tested for semantic equivalence to known high-level modules; all other
members of the class inherit the equivalence results. This classiﬁcation reduces the
number of times that semantic subcircuit identiﬁcation must be applied, and thus

improves the performance of module identiﬁcation without loss of efﬁcacy.

Copyright by
Jennifer Lynn White
2000

TO EVERYONE WHO BELIEVED THAT I WOULD CREATE THIS THESIS,
ESPECIALLY MY FAMILY,

WHO RAISED ME TO BELIEVE THAT I COULD.

ACKNOWLEDGMENTS

I would like to thank my advisor, Dr. Anthony S. Wojcik, for his guidance, en-
couragement, and patience throughout the process of researching and writing this
dissertation. I would also like to thank my committee members, who provided ex-
cellent suggestions to improve this work, particularly Dr. Moon-Jung Chung for
spending so much time considering the theoretical aspects, and Dr. Betty Cheng for
thoroughly editing the dissertation itself.

This work was ﬁrst conceived at Argonne National Laboratory under the advise-
ment of Gregory Chisholm. I would like to extend my appreciation to him and the
other members of the Reverse Engineering Team: Ken Dritz, Steve Eckmann, Chris
Lain, and Bob Veroff.

My family has believed in me through all of my endeavors, and for that I thank
them. They have always been willing to expend any effort necessary to help me during
this educational process. I received encouragement, emotional support, and technical
assistance from my dissertation support group: Mark Brehob, Scott Connell, 'Iiravis
Doom, Dave Paoletti, Delia Raymer, and Mike Raymer. I would especially like to
thank Mike and Delia, who happily welcomed me into their home for my frequent
visits to Michigan State University during the completion of this dissertation.

Most importantly, I would like to thank Travis Doom, who encouraged me to begin
work on a doctoral degree, applauded as each theory was formed, and supported me
through each step of the process. His personal and professional support made it

possible for me to travel this path.

vi

TABLE OF CONTENTS

LIST OF TABLES xiii
LIST OF FIGURES xiv
1 Introduction 1
1.1 Background .................................. 3
1.1.1 Conceptual Circuit Representation .................... 3
1.1.2 The Forward Engineering Process ..................... 6
1.1.3 Relevance .................................. 7
1.2 Motivation ................................... 7
1.2.1 Information Retrieval ......... . .................. 8
1.2.2 Reengineering ................................ 8
1.2.3 Alternatives to Reverse Engineering ................... 13
1.2.4 Reverse Engineering Focus ......................... 14
1.3 Problem Description ............................. 14
1.3.1 Syntactic and Semantic Matching ..................... 15
1.3.2 The Module Identiﬁcation Problem .................... 21
1.4 Problem Scope ................................ 24

vii

1.4.1 Reverse Engineering Focus ......................... 24

1.4.2 Approach Requirements and Limitations ................. 26
1.5 Contributions ................................. 27
1.5.1 Overview .................................. 27
1.5.2 Details .................................... 28
1.5.3 Summary .................................. 31
1.6 Dissertation Outline ............................. 32
2 Background and Related Work 33
2.1 Terminology .................................. 34
2.1.1 Design Automation and Digital Circuits ................. 35
2.1.2 Graph Theory ................................ 36
2.2 Transistor- to Gate-Level Transformation .................. 39
2.3 Reverse Engineering ............................. 42
2.3.1 Module Identiﬁcation ............................ 43
2.3.2 FINES .................................... 45
2.3.3 Hybrid Module Identiﬁcation ....................... 47
2.4 Application Areas ............................... 49
2.4.1 Regularity Extraction ........................... 49
2.4.2 Formal Veriﬁcation ............................. 52
2.4.3 Technology Mapping ............................ 54
2.5 Candidate Enumeration ........................... 56
2.5.1 Partitioning ................................. 57

viii

2.5.2 Subgraph Clustering ............................ 59

2.6 Circuit Representations for Structural Comparison ............ 60
2.6.1 Structural Labeling ............................. 61
2.6.2 K-Formulas ................................. 62
2.6.3 Circuit Matrix Manipulation ....................... 63
2.7 Syntactic Matching .............................. 64
2.7.1 Graph Isomorphism ............................ 65
2.7.2 Subgraph Isomorphism ........................... 68
2.7.3 Discussion .................................. 71
2.8 Semantic Matching .............................. 72
2.8.1 Functional Canonical Form ........................ 72
2.8.2 Cones of Logic ............................... 75
2.8.3 Hierarchical Semantic Matching ...................... 76
2.8.4 Pseudo-Semantic Matching ........................ 77
2.8.5 Discussion .................................. 78
2.9 Software Reverse Engineering ........................ 78
2.9.1 Functional Module Identiﬁcation ..................... 80
2.9.2 Discussion .................................. 82
2.10 Summary ................................... 83
3 Candidate Subcircuit Enumeration 85
3.0.1 Uniqueness ................................. 87
3.0.2 Focused Enumeration ........................... 87

ix

3.0.3 Overview of Technique ........................... 89

3.0.4 Heuristics .................................. 90
3.0.5 Chapter Outline .............................. 91
3.1 Background .................................. 92
3.1.1 Representation ............................... 92
3.2 Enumeration of Candidate Subcircuits ................... 100
3.2.1 Rules .................................... 103
3.2.2 Correctness of Rules ............................ 124
3.2.3 The Algorithm ............................... 129
3.3 Results ..................................... 135
3.3.1 Implementation ............................... 136
3.4 Summary ................................... 138
4 Subcircuit Equivalence Classes 140
4.0.1 Overview of Technique ........................... 141
4.0.2 Chapter Outline .............................. 143
4.1 Structural Identiﬁers ............................. 144
4.2 Vertex Weighting ............................... 145
4.2.1 Vertex Functionality Labels ........................ 146
4.2.2 Vertex Precedence ............................. 147
4.2.3 Correctness of Vertex Weighting ..................... 162
4.2.4 Algorithm .................................. 166
4.3 Structural Identiﬁer Generation ....................... 172

4.3.1 Example Algorithm ............................. 174

4.3.2 Structural Identiﬁer Example ....................... 174
4.4 Subcircuit Equivalence Classes ........................ 182
4.4.1 Local Information ............................. 183
4.4.2 Correctness of Structural Equivalence Classes .............. 184
4.5 Results ..................................... 187
4.5.1 Implementation ............................... 188
4.6 Summary ................................... 189
5 Practical Application of Techniques 191
5.1 Parallel Implementation ........................... 192
5.2 Circuit Graph Order Reduction ....................... 193
5.2.1 Preliminary Partitioning .......................... 194
5.2.2 Preliminary Syntactic Matching ...................... 195
5.3 Subgraph Order Limiting .......................... 196
5.4 Hierarchical Module Identiﬁcation ...................... 199
5.4.1 Module Replacement ............................ 199
5.4.2 Primitive Functional Modules ....................... 200
5.5 Subgraph Fitness Evaluation ......................... 201
5.6 Summary ................................... 202
6 Contributions and Future Directions 203
6.1 Contributions ................................. 206
6.1.1 Module Identiﬁcation Problem ...................... 206

xi

6.1.2 Unique Subgraph Enumeration ...................... 206

6.1.3 Restricted Subgraph Types ........................ 207
6.1.4 Circuit Structural Identiﬁer ........................ 209
6.1.5 Heuristics for Practical Application of Candidate Enumeration ..... 210
6.2 Future Directions ............................... 211
6.2.1 Candidate Subcircuit Enumeration .................... 211
6.2.2 Module Identiﬁcation Problem ...................... 212
6.2.3 System Integration ............................. 213
6.3 Conclusion ................................... 214
BIBLIOGRAPHY 216

xii

1.1

2.1

3.1

4.1

4.2

4.3

5.1

5.2

LIST OF TABLES

Hardware Speciﬁcation Levels [48] ..................... 5

Reverse engineering technologies and challenges, 1998 ........... 40

Focused Enumeration Results Demonstrating Reduction of Interesting

Subgraphs .................................. 136
Vertex Functionality and Precedence Example ............... 147
Vertex Identiﬁers for Circuit Graph in Figure 4.10 ............. 182
Structural Equivalence Classes Applied to Subcircuits and

Contained Subcircuits .......................... 187
Focused Enumeration Results of Parallel Implementation ......... 193
Order Limited Focused Enumeration Results ................ 198

xiii

LIST OF FIGURES

1.1 Gate-level 2—bit Full Adder. ......................... 16
1.2 Gate-level 2-bit Adder with Two l-bit Full Adders Identiﬁed. ...... 17
1.3 2-bit Adder as Two 1—bit Full Adder Modules ................ 18
1.4 2-bit Adder as One 2-bit Full Adder Module ................. 19
1.5 Overview of Gate Level to Register Level Transformation ......... 25
3.1 Circuit to Circuit Graph Transformation Example ............. 94
3.2 Circuit Graph G ............................... 96
3.3 Examples of Subcircuits and Fully Speciﬁed Vertices ........... 98
3.4 Examples of Contained Subcircuits and Vertices .............. 100
3.5 Algorithm for Naive Subgraph Enumeration ................. 103
3.6 Algorithm for Nai've Subgraph Enumeration (Diagram) ........... 104
3.7 Subgraph containing vertices [5, 3, 2] created by P(H) = {3, 2, 5}. . . . 106
3.8 Subgraph containing vertices [5, 3, 2] created by P(H) = {5, 2, 3}. . . . 106
3.9 Subgraph containing vertices [5, 3, 2] created by P(H) = {5, 3, 2}. . . . 107
3.10 Frontier and Reachable Frontier ....................... 109
3.11 Algorithm for Unique Subgraph Enumeration. ............... 110

xiv

3.12 Algorithm for Unique Subgraph Enumeration (Diagram) ..........
3.13 Rule 1 (Example 1). .............................
3.14 Rule 1 (Example 2). .............................
3.15 Algorithm for Unique Focused Enumeration .................
3.16 Algorithm for Focused Subgraph Enumeration (Diagram). ........
3.17 Rule 2 (Example 1). .............................
3.18 Rule 2 (Example 2). .............................
3.19 Rule 3 (Example 1). .............................
3.20 Rule 3 (Example 2). .............................

3.21 Algorithm for Unique Focused Enumeration .................

4.1 Conceptual Description of Input Cone ...................
4.2 Conceptual Description of Output Cone ..................

4.3 Example 1: First Pass Weighting with Local Vertex Information
Consideration ...............................

4.4 Example 2: Second Pass Weighting With Input Weight Consideration . .
4.5 Example 3: Second Pass Weighting with Output Weight Consideration. .

4.6 Example 4: Second Pass Weighting with Arbitrary Vertex Ordering
Required ..................................

4.7 Example 5: Second Pass Weighting with Arbitrary Ordering of
Distinguishable Vertices .........................

4.8 Algorithm for Circuit Graph Vertex Weighting. ..............

4.9 Example of a Canonical Identiﬁer Generation Algorithm for Weighted
Circuit Graphs. ..............................

4.10 Full Equivalence Class Example: 2-bit adder ................

XV

111

114

116

117

130

153

155

157

159

175

176

4.11 Vertex weighting for the circuit graph of the 2—bit full adder in

Figure 4.10. ................................ 178
4.12 Vertex weighting for the circuit graph of the 2-bit full adder in

Figure 4.10. ................................ 179
4.13 Vertex weighting for the circuit graph of the 2-bit full adder in

Figure 4.10. ................................ 180
5.1 Algorithm for Order Limited Focused Enumeration ............. 197

xvi

Chapter 1

Introduction

“Reverse engineering is considered as the process of developing a set of
speciﬁcations for a complex hardware system by an orderly examination
of specimens of that system.” [97]

Design automation iS concerned with the computer-aided transformation of a cir-
cuit design from one level to another. The usual focus of design automation is forward
engineering or synthesis, that is, the transformation of high-level design speciﬁcations
to a geometric layout which realizes the design. The inverse of this process, reverse
engineering, is a less extensively explored topic.

Reverse engineering involves the transformation of a lower-level speciﬁcation to
a higher-level speciﬁcation. Simply put, reverse engineering is the investigation of
the component parts of a system to discern the nature of the whole. This technique
has been applied in environments such as computer software [24, 30, 46, 49], mechan-
ical parts [101,107, 108], and digital circuits [32, 77, 97, 105], and may be motivated
by reeengineering for the purposes of modifying functionality or implementation, in-

1

formation retrieval, or replication of a component to extend the lifetime of a sys-
tem. Reverse engineering is typically applied to an existing speciﬁcation to recover
a higher-level description of a system, but in the extreme case it may require the
transformation of the physical implementation into a high-level speciﬁcation of the
system’s behavior. These varied applications have led to many approaches to reverse

engineering.

The research presented herein is directed toward developing an approach to re-
verse engineering in Situations in which no design information is available about the
component under study. The main focus of the research is the transformation from
a description at the logic gate level to a functional speciﬁcation, comprised of imple-

mentation independent functional modules.

Several techniques are presented in this dissertation to facilitate understanding
of digital circuits by abstracting away from the details (gate-level description) and

providing a more general overview (module-level description).

Section 1.1 discusses conceptual representations of circuits, including levels of ab-
straction, and the forward engineering process. Section 1.2 discusses why reverse
engineering is currently an important topic in design automation. The formal deﬁni-
tion of the Module Identiﬁcation Problem is deﬁned in Section 1.3. The sc0pe of this
research into the problem solution is discussed in Section 1.4. Section 1.5 presents
the contributions made by this research. Finally, Section 1.6 presents an outline of

this dissertation.

1 . 1 Background

Design automation is primarily concerned with developing techniques to facilitate
the synthesis of an implementation that realizes a design speciﬁcation. The synthesis
process consists of several steps, each of which deﬁnes the circuit at an increasingly
detailed level. These levels of abstraction are described in more detail in Section 1.1.1.

An integral part of developing a method for reverse engineering is understanding
the forward engineering process. Section 1.1.2 describes the process of designing a

hardware component, from the initial concept to the implementation speciﬁcations.

1.1.1 Conceptual Circuit Representation
Levels of Abstraction

A circuit may be described at many different levels by increasing or decreasing the
amount of detail in the description. For instance, an arithmetic logic unit may contain
several functional modules, such as adders and comparators, which are built of logic
gates, and each logic gate is built of transistors. Each step in the progression is an
example of one of the levels of abstraction [48] that may be used to describe digital

circuits.

0 Processor Level. The processor level is the highest tangible level of abstraction
for circuits. It consists of microchip components such as processors, memo-
ries, controllers, and interfaces, as well as application-speciﬁc integrated circuits

(ASICS).

0 Register Level. The register level contains modules that are used to build the

processor components, such as adders, registers, comparators, counters, etc.

0 Gate Level. The gate level describes how the register level components are built
from primitive logic elements. Only gates and ﬂip-ﬂ0ps are present at the gate

level.

0 Transistor Level. The transistor level is the lowest level of abstraction, de-
scribing all gates and ﬂip-ﬂops of the gate level by the transistors necessary to

implement them.

Design automation focuses on transformations between levels of abstraction. For
example, from the register level, technology mapping (Section 2.4.3) may be used to
locate appropriate gate-level implementations of necessary functionality. The research
in this dissertation focuses on transforming a gate-level speciﬁcation to an equivalent

register-level speciﬁcation.

Views

There are three major views that are encountered during the engineering or reverse

engineering of a digital circuit: behavioral, structural, and physical [48,84].

0 Behavioral View. The behavioral view describes the functionality of the circuit
to be engineered without considering the implementation. The behavior may

be described as a function of the input values and elapsed time.

0 Structural View. The structural view breaks the behavioral design into inter-
connected components. This view speciﬁes the organization of the circuit and

4

 

some of its implementation.

0 Physical View. The physical view includes the physical speciﬁcations of the

components deﬁned in the structural view, such as dimension and location.

The physical view is generated for the purpose of manufacturing the design.

 

 

 

 

 

 

 

 

 

 

View
Level Behavioral Structural Physical

Forms Components Objects

Transistor Differential equations, Transistors Analog and digital
current-voltage resistors, cells
diagrams capacitors

Gate Boolean equations Gates, Modules,
ﬁnite-state ﬂip-ﬂops units
machines

Register Algorithms, Adders, comparators, Microchips
ﬂowcharts, registers, counters,
instruction sets, register ﬁles, queues
generalized F SM datapaths

Processor Executable Processors, controllers, Printed-circuit boards,
speciﬁcation, memories, ASICS multichip modules
programs

 

Table 1.1: Hardware Speciﬁcation Levels [48]

Each view may be described at any of the four levels of abstraction, as summa-

rized in Table 1.1. By locating functional modules, datapath and control units may

become clearly delineated, allowing an architectural-level description of the device to

be recovered.

 

1.1.2 The Forward Engineering Process

The engineering process begins with a concept which is translated into a behavioral
speciﬁcation. This speciﬁcation provides details about the interface of the circuit with
the environment and the functionality that the circuit will perform. Behavioral-level
Speciﬁcations are often written in natural language, although executable speciﬁcations
are becoming more common because they may be veriﬁed, analyzed, and synthesized
more easily. Executable speciﬁcations include some hardware description languages
(HDLS), such as VHSIC Hardware Description Language (VHDL) [6] and Verilog
[15]. These HDLS may be used to describe the circuit at various levels, as well as

perform simulation and synthesis.

This overview of the Circuit’s functionality is then partitioned into blocks, each of
which has a well—understood functionality that can be described by a mathematical
formula or an algorithm, or possibly still in natural language, to be reﬁned later.
This represents the ﬁrst layer (processor) in the structural view of the circuit. The
processor level representation is then reﬁned to include more information about im-
plementation by determining which register-level modules will be used to provide the
functionality for the processor-level modules. In a Similar step, the register-level mod-
ules are replaced by equivalent gate-level implementations. This task is often auto-
mated, and focuses on determining the implementation of each register-level module
that optimizes the circuit for quality measures such as area, cost, or performance.
The gates are then replaced by their transistor representation, and the design may
be veriﬁed for correctness against the original behavioral design. The design is ﬁnally

6

manufactured by replacing the structural components in the speciﬁcations by their

physical counterparts.

1.1.3 Relevance

The overall goal of reverse engineering is to raise the level of abstraction of the circuit
speciﬁcation to recover a conceptual understanding of the circuit, based on functional
modules. The levels described in Section 1.1.1 (processor, register, gate, transistor)
are the four levels at which the circuit may be represented. The objective of the
Module Identiﬁcation Problem (Section 1.3) is to transform a gate-level speciﬁcation
to a register- or processor-level Speciﬁcation.

It is apparent by examining the process of forward engineering that module identi-
ﬁcation is a useful tool for reverse engineering. This top-down methodology in which
large modules are implemented by connecting smaller modules or gates results in
clusters of gates that perform interesting and identiﬁable functions.

Further, when synthesizing gate-level Speciﬁcations from the register transfer level
description, the engineer may use implementations from a standard cell library, or
create new implementations for the register modules. In either case, it is likely that

the circuit will contain gate clusters that may be identiﬁed as functional modules.

1.2 Motivation

Reverse engineering was deﬁned in 1985 by MC. Rekoff as “the act of creating a
set of speciﬁcations for a piece of hardware by someone other than the original de-

7

signers, primarily based upon analyzing and dimensioning a Specimen or collection
of specimens.” [97]. This landmark paper was the ﬁrst to formally discuss reverse
engineering of hardware, its difficulties, and to present a methodology for performing
reverse engineering of complex hardware systems. Rekoff identiﬁed two primary mo-
tivations for reverse engineering: the gaining of information and the reproduction of

a part.

1.2.1 Information Retrieval

Reverse engineering of a part to gain information has many applications. Large
corporations apply reverse engineering to the products of their competitors to main-
tain a competitive edge. With the development of design watermarking techniques
[25,63, 64], a chip manufacturer can determine via reverse engineering whether its
intellectual property has been stolen. It is likely that military powers apply reverse
engineering to captured weapons and military equipment. When correct and current
speciﬁcations of a component are not available, information retrieval becomes the
ﬁrst step in the process toward Rekoff’s second motivation for reverse engineering:

the reproduction of parts.

1.2.2 Reengineering

“An electronic product may need reengineering for a number of reasons re-
lated explicitly or implicitly to the passage of time. It needs reengineering
if its requirements (what it does) change over time or if its Speciﬁcations

8

(how it does it) change.” [77]

Reengineering or redesign is the modiﬁcation of a component to extend or modify
its functionality or implementation. This activity requires that an accurate speciﬁ-
cation of the component is available so that appropriate changes can be made and

subsequently re-implemented in hardware.

The reproduction of a part by reengineering often drives reverse engineering of
a device. Technology improvements may allow a faster, smaller, cheaper, or more
efficient implementation to be designed. Unexpected design flaws may be detected
after the unit is in service, requiring the implementation of an error-free replacement.
A device in a large system may degrade and require replacement. If no spares are
available, then the device may be reverse engineered and reproduced, forestalling the

replacement of the entire system.

There are two basic types of replacement parts [97]. A clone is an exact repro-
duction of the original device, including details such as the location and material of
each internal primitive element. A surrogate is a re-implementation of the device,
with identical functionality and input/output performance, but the internal design
and implementation may be completely different. Clones of digital logic devices are
seldom possible because the rate at which fabrication technology advances obviates a
design within 18 months. Fabrication facilities may cease to exist for older technolo—
gies within a Short span of time, causing cloning to be ﬁnancially infeasible. For most
reverse engineering motivations, clones are not necessary; the identical functionality

provided by surrogates is sufﬁcient.

EPOI [21, 105]

The United States Air Force is currently develOping a solution to their inability to
obtain needed parts. Just 20 years ago, before integrated circuits became ubiquitous
in the consumer market, the military accounted for over 15% of integrated circuit
purchases. Since integrated circuits began appearing in personal computers and other
household devices, the military has lost its inﬂuence over the market, and can no
longer expect manufacturers to maintain obsolete technology fabrication facilities to
provide parts indeﬁnitely. Consequently, the Air Force has developed the Electronic

Parts Obsolescence Initiative (EPOI) to explore other Options.

EPOI has identiﬁed four alternatives to handle parts obsolescence: life-of-time
buys, complete system upgrade, part substitution, and board redesign. Life-of-time
buys involve purchasing sufficient stock to support replacement throughout a sys-
tem’s expected lifetime. A complete system upgrade is so expensive that it is seldom
warranted. Part substitution and board redesign have been determined to be the
most cost-effective solution to parts obsolescence. Both of these alternatives involve

reverse engineering of the existing components.

The current cost of a replacement that has the same interface and functionality of
the original part can range from $10,000 to $30,000, primarily due to the fact that the
engineers must manually recover the design from incomplete original speciﬁcations.
This situation has lead the Air Force to investigate automated tools for reverse engi-
neering, with the help of commercial and research companies such as Litton—TASC,
TRW, Raytheon, Northrop Grumman and Aspect Development Inc.

10

Obtaining Speciﬁcations

Reengineering frequently requires information retrieval. The amount of effort that
is necessary to reengineer a component is directly related to the amount of design
information that is available. It is obvious that when a speciﬁcation that has been
proven to be accurate is available, then no reverse engineering is necessary and the
redesign process can begin immediately. All too frequently, however, the original
speciﬁcations for the component are not available. The original Speciﬁcations may
have been lost or damaged. A more common problem, particularly when dealing
with older (legacy) systems, is that the company originally contracted to design and
manufacture the device is no longer in business. Design knowledge and speciﬁcations
have disappeared as well. Such situations require extensive effort to recover the design

from the hardware.

If a complete and provably accurate design is available or can be obtained, then
it is possible to re-implement the hardware from that speciﬁcation. If some original
design information is available, then it can be used to guide the reverse engineering

process, thereby greatly simplifying the effort.

Even when speciﬁcations are available, it iS unlikely that they represent a reliable
description of the device. Changes made further along during the design or imple-
mentation process may not be accurately reﬂected in the documentation. For this
reason, it is prudent to treat information gathered from design documents carefully.

The only completely accurate information is contained within the device itself.

There are two types of speciﬁcations that are necessary to completely describe

11

complex hardware products: functional and dimensional [97]. The functional Spec-
iﬁcation describes how the hardware works and what capabilities it performs. Di-
mensional speciﬁcations describe the details about the fabrication of the component,
including materials, dimensions, distances between constituent components, coefﬁ-
cients of friction, etc.

Depending on the goal of the reverse engineering, it is likely that only the func-
tional speciﬁcation is required. If the intention is to redesign to take advantage of new
manufacturing technology, then it is not necessary to have the obsolete dimensional
speciﬁcations. Only the functional information is required, and the redesign engi-
neers or computer aided design (CAD) utilities can create apprOpriate dimensional
speciﬁcations.

Dimensional speciﬁcations may be helpful, however, if the goal is to correct a
design error. Rather than re-implement the entire component, only the erroneous
section must be repaired and re—implemented. Dimensional information from the

faulty component can reduce reengineering time.

Legacy Systems

The most frequent candidates for reverse engineering and reengineering are not sys-
tems that were designed last month or last year, but rather those that were designed
many years ago with now obsolete design tools and technology. These are referred to
as legacy systems.

As the number and age of existing legacy systems increases, so does the demand for
automated methods of design recovery. The problem of extracting design information

12

from hardware is gaining importance as many of these existing systems are reaching
a state at which they need to be reengineered to meet changing design speciﬁcations,

overcome defects, or replace failing parts.

Legacy systems are particularly likely to require reverse engineering. Original
design information is seldom available [39], and the designers or manufacturers of
these older systems no longer exist, so the speciﬁcations are no longer available or are

obsolete.

Legacy systems are often complex systems that perform a very speciﬁc task. De-
signing a new system to take over the functionality is usually prohibitively expensive.
The most practical way to maintain a legacy system is in an evolutionary manner,

incrementally redesigning and replacing portions of the system as necessary [78].

1.2.3 Alternatives to Reverse Engineering

The reverse engineering of hardware is a difﬁcult task, and it is worthwhile to consider
other Options before initiating the reverse engineering process. An obvious alternative
is to simply design the circuit again. If some of the original design speciﬁcations are
available, then this may be an appropriate solution. An expenditure of sufﬁcient man-
hours could result in an acceptable circuit. This approach may not be cost-effective
if original design information is not available.

Additionally, if the component to be redesigned interacts with other components
within the system, it is necessary to have a comprehensive understanding of the inter-
faces with those other components. Interface information is frequently not available,

13

rendering this approach difﬁcult or impossible. In this case, the entire system may

need to be replaced.

1.2.4 Reverse Engineering Focus

The research presented in this dissertation focuses upon the reverse engineering of
digital hardware in the absence of design information toward the goal of redesign or
replacement of the hardware component. The resulting speciﬁcation will contain a
functional register-level description of the component. The techniques presented in
this thesis utilize only the information that can be derived from the component itself,
such as a transistor—level netlist, transistor position information, circuit power lines,

and clock lines.

1.3 Problem Description

Reverse engineering can be described as abstracting away details to obtain a more
general understanding. In digital circuits, the details involve low-level components
such as transistors, resistors, logic gates, and buffers. To create a more easily un-
derstood description of the circuit, those components are subsumed into functional
modules such as arithmetic logic units, multiplexors, and adders.

The recovery of a gate-level speciﬁcation (netlist) from a physical piece of hardware
involves physically examining each layer of silicon composing the device to create a
low-level geometric description [31, 107], which can then be converted into a transistor-
level netlist by pattern matching or visual inspection [31]. Transistor implementations

14

of logic gates are generally small in number and can be easily identiﬁed by graph
isomorphism and pattern matching techniques [17, 72,91]. Techniques to perform
these three steps toward abstraction have been developed, and recovering a design
to the gate-level is considered to be an achievable step in the process. Section 2.2

presents more detailed descriptions of the applicable techniques.

The same cannot be said of a subsequent transformation from that netlist to a
functional module-level netlist. The transformation of a gate-level netlist describing a
combinational logic circuit into a modular-level representation of the circuit in terms
of functional components and glue logic is the focus of the general Reverse Engineering
Project, ﬁrst proposed at Argonne National Laboratory by Eckmann and Chisholm
[43].

To illustrate the difference between a gate-level description and a module-level
description, a 2—bit adder is shown in Figures 1.1, 1.2, 1.3 and 1.4. Figure 1.1 shows
the adder described at the gate-level, which is divided into two 1-bit full adders in
Figure 1.2. Those two l-bit full adders are functional modules, so the same 2-bit
adder can also be described by the modular description in Figure 1.3. Moving to
a higher level of abstraction within the modular description describes the 2-bit full

adder as a single module in Figure 1.4.

1.3.1 Syntactic and Semantic Matching

Many existing approaches to the transformation between gate-level and register-level
speciﬁcations use specialized solutions to the classic graph-theoretic subgraph iso—

15

 

cin

aO .

b0

 

\

l

 

 

 

 

 

 

 

7

 

 

 

 

1E}

 

 

 

\

 

a1
b1

 

 

 

I

 

 

 

 

 

1:):

 

 

 

_D—— cout

 

43

Figure 1.1: Gate-level 2-bit Full Adder.

 

16

 

 

 

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

cm I , D : 80
: ’ '
. ' I
a0 \. , _’ I
I .
b0 7) ~ :
: I
tun—ID- ___________ .
i 2 Si) ' J . 31
I l . . I
I. . I
a1 + x A. o—D__ :
l LW ‘DJ— '00“

 

Figure 1.2: Gate-level 2-bit Adder with Two 1-bit Full Adders Identiﬁed.

 

17

 

 

cin—
,_ so

30 " 1-blt Full Adder
b0 ~

 

 

——s1

31 ~ 1-bit Full Adder
b1 ~

— COUt

 

 

 

Figure 1.3: 2-bit Adder as Two l-bit Full Adder Modules.

 

18

 

 

cin -—
— so
.30 ..
b0 _
2-bit Full Adder
—— 81
a1 ._
b1 .
_ cout

 

 

 

Figure 1.4: 2-bit Adder as One 2-bit Full Adder Module.

 

19

morphism problem [110]. Given a Speciﬁc implementation of a functional module,
these approaches can locate all instances of that circuit structure within the cir-
cuit under examination, the target circuit. This is referred to as syntactic matching
[17, 72, 76, 91, 93] because it matches circuit structure (syntax) rather than meaning.

Systems employing syntactic matching are usually very fast, capable of locating
all instances within linear time with respect to the Size of the target circuit [91].
Unfortunately, not all functionally equivalent circuits are structurally equivalent. AS
described above, a 2—bit adder may be built by connecting two I-bit adders imple—
mented by NAND gates, but it may also be implemented completely differently, in
XOR-AND format, for example. These two implementations will perform exactly the
same function, but will bear no resemblance to one another.

Therefore, the drawback of the syntactic approach is that it is only capable of
locating known implementations of the functional modules. To locate all instances
of 2-bit adders in a circuit by applying a syntactic matching technique, it would
be necessary to create a library of all possible implementations of a 2-bit adder.
Even with extensive libraries, it is unlikely that all needed implementations of a
functional module will be present. In addition, the matching time to search for all
implementations would be prohibitive.

Syntactic matching approaches can provide valuable information, particularly
when the circuit contains many stock implementations from a known library. This is
likely in circuits that were designed by using glue logic to connect modules from a
library to implement the desired functionality. However, reverse engineering is often
applied to obsolete circuits or circuits designed by someone other than the reverse

20

engineer, so it is unlikely that the original implementation library used to design the

circuit is known and available.

A more powerful matching technique is semantic matching [32, 75, 92]. Semantic
matching compares the logical functionality of the circuits rather than their struc-
ture. When applying semantic matching, the module library need only contain one
implementation of each functional module. All gate clusters in the target circuit that
perform the same function, regardless of their structure, will be identiﬁed by semantic
matching.

In an effort to provide a general solution for reverse engineering, the research in
this dissertation focuses upon a problem formulation for module identiﬁcation that

applies semantic matching instead of syntactic matching.

1.3.2 The Module Identiﬁcation Problem

The following formal description has been developed to describe the gate-level to
module-level transformation of a digital circuit speciﬁcation.

Module Identiﬁcation Problem [37,38,114]. Given a gate-level logic descrip-
tion (netlist) of a target circuit, efﬁciently identify all gate clusters (subcircuits) that

perform the function of a known library module.

0 Candidate Subcircuit Enumeration Problem. Identiﬁcation of gate clus-
ters (candidate subcircuits) within the netlist that may comprise a functional
module.

21

o Subcircuit Identiﬁcation Problem. Proving functional equivalence between

a candidate subcircuit and a known standard library module.

The research presented in this dissertation provides an approach to the ﬁrst half

of the Module Identiﬁcation Problem: Candidate Subcircuit Enumeration.

Subcircuit Identiﬁcation The second part of the Module Identiﬁcation Problem
is determining whether the subcircuits produced by candidate subcircuit enumeration
are equivalent to a functional module. Several solutions to this problem have been

proposed in the literature, and are discussed in detail in Section 2.8.

Candidate Subcircuit Enumeration A candidate subcircuit is deﬁned as a clus-
ter of gates that possesses a well-deﬁned functionality in the context of digital sys-
tems. To provide the most complete coverage of a circuit with functional modules,
all candidate subcircuits should be enumerated.

If the circuit is viewed as a graph, in which vertices represent gates and edges
represent wires, enumerating all subgraphs is a very complex problem because the
number of subgraphs is not polynomially bound [50]. Even in a graph that conforms
to the structure of a planar binary tree, the number of subtrees of height n that

contain the root vertex is described by the doubly exponential recurrence relation:

xn+1=xi+1,n20;xo =1 (1.1)

which generates the sequence 1, 2, 5, 26, 677, 458330, 210066388901, [1].
The structure of a circuit is not as well-deﬁned as that of a binary tree, so it is not

22

possible to develop a simple equation to obtain an exact count of the subgraphs in an
arbitrary circuit. It is obvious that enumeration of all subgraphs is a computationally
difﬁcult problem. However, for the purpose of reverse engineering, not all subgraphs
need to be enumerated. Only those subgraphs that represent viable circuits need
to be considered. A viable subcircuit is represented by a subgraph in which each
gate represented is fully-speciﬁed with respect to the subgraph. If the subgraph
contains a 2-input AND gate, the subgraph may not contain only one of the gate’s
inputs. Either both or neither of the inputs must be present to fully-specify the
gate. Limiting enumeration to subcircuits reduces the subgraphs to be enumerated
considerably. For instance, in a 1—bit adder composed of eight logic gates, there are
114 subgraphs, but only 18 subgraphs that represent subcircuits. Only those 18 need
to be enumerated.

To further limit the number of subgraphs that must be enumerated, the class
of subcircuits may be restricted to include only those subcircuits that do not share
any functionality with neighboring subcircuits. These subcircuits are known as self-
contained subcircuits. Of the 18 subcircuits in the 1-bit adder, only 6 of those are
self-contained.

In addition to deﬁning the types of subgraphs to enumerate, it is important to
ensure that each of the subgraphs is enumerated exactly once. A naive algorithm
for subgraph enumeration, which iteratively adds neighboring vertices to an existing
subgraph, will produce each subgraph many times, an unnecessary expenditure of
computational effort. If each subgraph produced, including duplicates, is then tested
for semantic equivalence, module identiﬁcation becomes completely impractical.

23

Even when only a restricted class of subgraphs, subcircuits, is enumerated, the
number of subcircuits in the original circuit may still be unwieldy. Several heuristics
have been developed in this research to guide the enumeration without interfering

with the effectiveness of the overall technique.

1.4 Problem Scope

1.4.1 Reverse Engineering Focus

The process of reverse engineering for the purpose of transformation between gate-
level and register-level circuit speciﬁcations is presented in Figure 1.5. The process
begins with a gate-level netlist, which is partitioned into smaller segments to im-
prove performance. The current solution to the Module Identiﬁcation Problem does
not apply to sequential circuits (because current semantic matching approaches are
limited to combinational logic), so the combinational logic is extracted from these
segments and passed to the Candidate Subcircuit Enumeration module. All of the
subcircuits generated are placed into Structural Equivalence Classes, then matched
to known modules. A circuit covering technique may then be applied to produce a

register-level description of the circuit.

The work presented in this dissertation focuses on the steps outlined in bold,
Candidate Subcircuit Enumeration and Structural Equivalence classes.

24

 

 

    
   
     
  
 

 

circuit
partitions

 

A Circuit to be /93t.¢'le"el . . netlist
A reverse engineered / netlist '2 partitioning

 

 

  

Combinational I
logic extraction ~

      
  
 

 

 

 

 

 

 

eqmvalence classes , enumeration ' circuit partition-s-
structural
instance
Semantic I matched . Circuit , ~ Module-level
Matching I subcircuits covering ‘ circuit netlist ’

 

 

 

Figure 1.5: Overview of Gate Level to Register Level Transformation

 

25

1.4.2 Approach Requirements and Limitations

The approach to the Module Identiﬁcation Problem (Section 1.3) described in this
dissertation focuses on performing gate—level to register-level transformations on com-
binational logic circuits. The solution to Candidate Subcircuit Enumeration presented
in Chapter 3 has been designed to work on general directed graphs, including those
with cycles. The current solutions [32,36] available for Subcircuit Identiﬁcation per-
form combinational logic matching only, and cannot operate effectively on sequential
circuits.

Thus, this approach may depend upon the following information:

1. The circuit represents combinational logic only.

2. The functionality of the candidate subcircuit is assumed to have the same num-
ber of inputs and outputs as that of the library module. No output can be
“ignored”; every output of any library entity must either be a primary output

or be used as the input at some other point in the circuit.

3. The functional modules to be located are expected to be a single connected
subcircuit, not several disjoint subcircuits. If a module is built of disjoint sub-

circuits then each subcircuit should be deﬁned as a module.

4. The functionality of the candidate subcircuit and the library module must match

exactly; don’t care sets are not considered at this time.

AS research progresses in semantic matching of sequential circuits, new approaches

to subcircuit identiﬁcation can easily be incorporated into the module identiﬁcation

26

r- rum-q

system. It is also possible to perform preliminary partitioning into combinational
blocks [23], thus allowing reverse engineering of the combinational portions of the

design.

In addition, it is currently necessary that a library of functional modules be avail-
able. The technique presented in this thesis applies semantic matching to the Module
Identiﬁcation Problem, so only one implementation of each module need be included
in the library. The library modules may be represented in any format, including cir-
cuit netlist, circuit graph, or in a pure functional format, described by Binary Decision

Diagrams [19].

1.5 Contributions

The research presented in this dissertation provides both theoretical and practical
contributions to the ﬁeld of design automation. Techniques have been developed to
allow the reverse engineering of a gate-level circuit speciﬁcation to a more conceptual
module-level speciﬁcation. This module-level speciﬁcation can be more easily utilized

for the purposes of reengineering, reuse, replacement, and repair.

1.5.1 Overview

0 Problem formulation: The Module Identiﬁcation Problem, as discussed above,
has been developed to provide a framework for this research. Details presented
in Chapter 1.

27

?

0 Unique Subgraph Enumeration: A technique for uniquely enumerating each sub-
graph of a general graph has been developed by indexing the vertices and
subgraphs and applying ordering rules to guide subgraph expansion. Details

presented in Chapter 3.

o Restrictive Subgraph Types: Two types of subgraphs that are interesting within
the Module Identiﬁcation Problem have been deﬁned: subcircuits and contained
subcircuits. The enumeration of subgraphs can be focused solely on these sub—
graphs to reduce the effort required to enumerate candidate subcircuits. Details

presented in Chapter 3.

0 Structural Circuit Identiﬁers: A technique for developing structural circuit iden-
tiﬁers to place subcircuits into structural equivalence classes has been deﬁned.
This reduces applications of semantic matching. Details presented in Chapter

4.

o Heuristics for Practical Application: The techniques cannot be applied ‘as is’
to circuits of arbitrary size, so a set of heuristics has been deﬁned to allow effec-
tive candidate subcircuit enumeration to be performed for any circuit. Details

presented in Chapter 5.

1.5.2 Details

Module Identiﬁcation Problem A formal problem, the Module Identiﬁcation
Problem, has been deﬁned to describe the reverse engineering of digital circuits. The
Module Identiﬁcation Problem consists of two subproblems: Candidate Subcircuit

28

Enumeration and Subcircuit Identiﬁcation. The research in this dissertation focuses

on the problem of Candidate Subcircuit Enumeration.

Unique Subgraph Enumeration The unique and comprehensive enumeration of
subgraphs of an arbitrary graph is a theoretical result that has many applications
within design automation and graph theory. The most important contribution is
vertex indexing and associated rules that ensure that each subgraph will be created
exactly once. This indexing prevents unnecessary computation and improves execu-
tion time.

Unique enumeration may be applied within both syntactic and semantic solutions
to the Candidate Enumeration part of the Module Identiﬁcation Problem. When
coupled with standard partitioning heuristics, it can be used to improve bottom-up
partitioning methods. The vertex ordering that prevents duplication of subgraphs
may easily and efﬁciently be applied to netlist clustering and vertex clustering. Clus-
tering appears in the solution to many problems, so widespread beneﬁts can result

from improving its execution.

Restricted Subgraph Types The enumeration of all subgraphs of a graph is
intractable because the number of subgraphs is not polynomially bound. To focus
enumeration upon subgraphs that may be functionally equivalent to a module, and

to reduce enumeration complexity, two classes of subgraphs have been deﬁned.

A subcircuit is a subgraph that possesses well-deﬁned functionality. It is necessary
that all gates in the subcircuit are accompanied by either all or none of its inputs.

29

1-

Gates that are only partially speciﬁed are not allowed. Restricting enumeration to

subcircuits provides considerable reduction of effort without reducing effectiveness.

Further, a contained subcircuit is a subcircuit that does not share functionality
with neighboring subcircuits. These subcircuits are likely to represent a complete
functional module, and thus their enumeration and identiﬁcation represent an efﬁcient

step in the reverse engineering process.

Structural Circuit Identiﬁers A structural representation of circuits was de-
veloped to implement the concept of subcircuit equivalence classes. This structural
identiﬁer completely describes the structure of the circuit, so the subcircuits may
be partitioned into structural equivalence classes. Structurally equivalent subcircuits
are necessarily functionally equivalent, so only one instance of each equivalence class
must be tested for equivalence to register-level modules.

Although several other string representations for graphs and circuits are available
(see Section 2.6), they require extensive processing to determine equivalence of graphs
and circuits. The method presented in Chapter 4 is based on efﬁciently determining
a vertex ordering that can be used to generate a structural identiﬁer. This technique
allows fast and effective partitioning of the subcircuits into structural equivalence
classes. Although the application is Simplified by the vertex labels, indicating gate

type or functionality, it also may be applied to unlabeled directed graphs.

Heuristics for Practical Application The number of candidate subcircuits in a

circuit can grow exponentially with the number of devices in the circuit. Therefore,

30

the practicality of performing candidate subcircuit enumeration decreases as the size
of the circuit grows. Therefore, several heuristics have been deﬁned that allow module
identiﬁcation to be performed on circuits of any size. These heuristics include reduc-
tion of the circuit size by preliminary partitioning, limitation of candidate subcircuit

order, and hierarchical module replacement.

1.5.3 Summary

To solve the Module Identiﬁcation Problem for a digital circuit, each cluster of gates
in a circuit that may be equivalent to a known module must be enumerated. An
algorithm is presented in this dissertation that uniquely enumerates each and every
subcircuit that may be functionally equivalent to a known module. The number of
these subcircuits may be inherently intractable, so several guidelines have been de-
signed to effectively reduce the number of subcircuits enumerated. Properly applied,
these heuristics will permit circuits of arbitrary Size to be analyzed, without reducing

the accuracy of the module identiﬁcation.

In addition, the concept of structural equivalence classes and a technique for deter-
mining a efﬁcient representation for comparison of subcircuit structure are discussed

to further improve the efﬁciency of the solution to the Module Identiﬁcation Problem.

The combination of unique enumeration of subcircuits and structural circuit rep-
resentation provides an effective solution to the ﬁrst part of the Module Identiﬁca-
tion Problem, Candidate Subcircuit Enumeration. When combined with a semantic
matching approach to Subcircuit Identiﬁcation, the basis for a powerful reverse engi-

31

neering tool is created.

1.6 Dissertation Outline

Chapter 1 has discussed motivations for reverse engineering, presented the Speciﬁc
formulation that is the focus of this dissertation, and stated the scope of the problem.
Chapter 2 reviews the current literature on reverse engineering methods and solutions
to related problems. Concepts, terminology and notation that will be used throughout
this dissertation are also presented.

Chapter 3 discusses the Subcircuit Enumeration Problem and presents an algo-
rithm that enumerates each and every candidate subcircuit of a circuit exactly once.
Chapter 4 introduces the idea of subcircuit equivalence classes, which serves as an
effective method of reducing the number of necessary semantic comparisons. The
application of these theoretical techniques is discussed in Chapter 5, with several
methods to extend their application to circuits of any size. Chapter 6 contains a

summary of the contributions of this research as well as future work in this area.

32

Chapter 2

Background and Related Work

As the need for reverse engineering increases, so does the motivation for research into
better methods and techniques to accomplish reverse engineering tasks. Previous
work has provided important background to the topics discussed in this dissertation,
and many of these techniques may be applied in conjunction with the algorithm and
heuristics presented here to achieve improved results and useful contributions to the
reverse engineering effort.

The formulation of the Module Identiﬁcation Problem in Section 1.3 emphasizes its
relation to problems such as technology mapping, design validation, circuit clustering,
and regularity extraction. These problems are described and contrasted to the Module
Identiﬁcation Problem, and current research into their solutions is presented in this
chapter.

This research represents digital circuits as directed graphs. Section 2.1 presents the
general concepts and terminology of digital logic and graph theory that are necessary
to understand the discussions in the remainder of this dissertation. The techniques

33

presented in this dissertation focus on the transformation of a gate-level netlist to a
module-level netlist, where the module level may contain register- or processor-level
modules. Techniques developed by others that perform the transformation from the

physical-level to a gate-level netlist are discussed in Section 2.2.

Related work is described in the remainder of the chapter, beginning with a dis-
cussion of other approaches to reverse engineering in Section 2.3. There are several
application areas in which similar problems are being researched. These application
areas are presented in Section 2.4. Discussion of techniques used to solve candidate
enumeration within these domains follows in Section 2.5, including circuit partitioning

in Section 2.5.1 and netlist clustering in Section 2.5.2.

Techniques for representing circuits for comparison are presented in Section 2.6.
The two approaches to Subcircuit Identiﬁcation, syntactic and semantic matching,
are discussed in Sections 2.7 and 2.8, respectively. The reverse engineering of soft-
ware shares many goals with reverse engineering, such as the abstraction away from
details and location of functional modules. Section 2.9 discusses several approaches

to software reverse engineering. Section 2.10 summarizes this chapter.

2.1 Terminology

The following section provides background for understanding the Module Identiﬁ-
cation Problem and its subproblems, as well as the related work discussed in the
following sections.

34

2.1.1 Design Automation and Digital Circuits

Design automation is the study of the computer-aided transformation of a digital
design from one level of abstraction to another, ending with the ﬁnal synthesis of a
digital circuit. The research described here provides an approach to reverse engineer a
low—level (gate-level netlist) description of a circuit to a high-level (functional-module
netlist) speciﬁcation. The following terms are from digital logic and are used through-

out this dissertation [111].

Deﬁnition 2.1 A gate or logic gate is an electrical component that has one or

more inputs and produces an output that is a function of the current input values.

Examples: AND, XOR, NOT.

Deﬁnition 2.2 A module (functional module or high-level module) is a col-
lection of interconnected gates that perform a known and well-deﬁned functionality.

Examples: AL U, multiplexor, adder (Figure 1.3).

Deﬁnition 2.3 A circuit is an arrangement of logic gates that are interconnected

to perform a speciﬁed function. A circuit may include many modules.

Deﬁnition 2.4 The depth of a device g in a circuit C is the length of the maximum

path between g and any input of C.

Deﬁnition 2.5 A speciﬁcation is a description of a circuit or a digital design.
Speciﬁcations may exist in many different levels of abstraction (Section 1.1.1). Spec-

iﬁcations are written in hardware description languages, such as VHDL and Verilog.

35

' ﬁgmﬂ

Deﬁnition 2.6 A gate-level netlist is a circuit speciﬁcation containing only logic

gates.

Deﬁnition 2.7 A module-level netlist is a circuit speciﬁcation containing high-

level modules and glue logic (gates).

Deﬁnition 2.8 A ﬂip-ﬂop is a device that stores either a 0 or a 1. The value of a

ﬂip-ﬂop may change only at times determined by a clock input. Examples: D ﬂip—ﬂop,
S-R flip-flap.

Deﬁnition 2.9 A combinational logic circuit is a circuit that contains no storage

or memory capability.

Deﬁnition 2.10 A sequential logic circuit is a circuit that contains storage or mem—

ory capabilities. Sequential logic circuits include flip-ﬂops.

Deﬁnition 2.11 A synchronous logic circuit is a sequential circuit whose state

changes only at a speciﬁed point on a triggering input called the clock.

Deﬁnition 2.12 A library is a collection of high-level modules. Details are in Sec-

tion 1.4.

Deﬁnition 2.13 A partition of a circuit is a contiguous portion of the circuit.

2.1.2 Graph Theory

The technique of subcircuit enumeration presented in this dissertation represents
circuits as directed graphs. This section presents graph theory terminology [28,34]
that will be used during the discussion of the techniques develOped in this research.

36

I'-

Graphs

Deﬁnition 2.14 A graph G is a ﬁnite nonempty set V(G) of objects called vertices
and a (possibly empty) set E (G) of two element subsets of V(G) called edges. The

set V(G) is called the vertex set of G and E(G) its edge set.

Deﬁnition 2.15 The number of vertices in a graph G is called its order, denoted

|V(G)|, and the number of edges is its size, denoted |E(G)|.

Deﬁnition 2.16 A trivial graph is a graph consisting of a single vertex and no

edges.

Deﬁnition 2.17 Vertices u and v are adjacent to each other in a graph G if uv E

E(G). If u and v are adjacent, then u is a neighbor of v and v is a neighbor of u.

Deﬁnition 2.18 For a vertex v in a graph G, its neighborhood is deﬁned by
N(v) 2: {u E V(G) | vu E E(G)}. The degree of a vertex v is the number of

vertices adjacent to v, that is, IN (v)|

Deﬁnition 2.19 A graph H is a subgraph of a graph G if V(H) Q V(G) and

E(H) g E(G).

Deﬁnition 2.20 A subgraph H of G is an induced subgraph of G if E (H) contains

precisely those edges joining two vertices in V(G).

Deﬁnition 2.21 A walk in a graph G is an alternating sequence of vertices and
edges, v0,e1,v1,e2,v2, ...,emvn for n 2 0, beginning and ending with vertices, such
that e,- = v,-_1v,- fori = 1, 2, ..., n.

37

 

Deﬁnition 2.22 A path in a graph G is a walk in which no vertex is repeated.

Deﬁnition 2.23 A graph G is connected if there exists a path between u and v for

every pair u, v of vertices of G. Otherwise, it is disconnected.

Deﬁnition 2.24 A vertex v is a cut-vertex of G if G — v is disconnected.

Directed Graphs

Deﬁnition 2.25 A directed graph or digraph D is a ﬁnite, nonempty set V(D)
of vertices and a (possibly empty) set E (D) of ordered pairs of distinct vertices. The

elements of E(D) are called arcs.

Deﬁnition 2.26 If uv is an arc of D, then u is said to be adjacent to or a parent

of v, and v is said to be incident to or a child of u.

Deﬁnition 2.27 The outdegree of a vertex v in a digraph D is the number of
vertices incident to v. The indegree of v is the number of vertices adjacent to v.

The degree of v is outdegree(v) + indegree{v).

Deﬁnition 2.28 A semiwalk in a directed graph D is an alternating sequence of
vertices and arcs, v0,e1,v1,e2,v2, ...,emvn for n 2 0, such that either e,- = {v,-_1,v,-}

or e,- = {v,~,v,-_1} for each i(1 _<_ i g n).

Deﬁnition 2.29 A digraph D is connected if there exists a semiwalk between u and

v for every pair u, v of vertices of D.

38

[

 

2.2 Transistor- to Gate-Level Transformation

Beginning the reverse engineering process with a digital device and no external infor-
mation is the most difﬁcult type of reverse engineering. However, the hardware itself
is the most reliable information that is available, because speciﬁcations often become
incorrect as changes are made during the implementation process (Section 1.2.1).
The research presented in this thesis assumes that the hardware has already been
reverse engineered to a gate—level speciﬁcation. This section gives an overview of
the steps necessary to recover the gate-level netlist from a physical circuit. Table
2.1 presents the steps of design recovery for digital circuits, as developed at the
1998 Reverse Engineering Workshop [103] and presented in [36]. The well-deﬁned
techniques for transforming a circuit from the physical device implementation to the

gate-level implementation (Levels 0 - 4) are described in the following paragraphs.

Sample Preparation (Level 0) The ﬁrst step involves extraction of the design
layout of the device. A destructive technique (chemical etching or mechanical slicing)

is used to expose each layer of the device [31].

Image Acquisition (Level 1) High-resolution images of each level are obtained
by a scanning electron microscope. Many images are necessary to capture the entire
layer at a high enough resolution to allow further recovery. These images are then
assembled to present an image of the entire layer [31].

The problem inherent in these steps is the destructive nature of this etching pro-

cess. Care must be taken to ensure that the images are collected accurately. If only

39

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

] Level Technologies ] Technical Challenges |
0 Sample Preparation
Etching f Etching
1 Image Acquisition
SEM Accuracy
Image processing Geometry
BMP to GDS-II Staging
Unconventional technology
2 Geometric Description
Postprocessing Process information
Design rule checkers
3 Transistor Netlist
Syntactic matching Exact models of gates
Semantic matching Unconventional technology
Semantic matching
4 Gate Netlist
4a Layout
Pattern matching Combinatorics
Syntactic matching Optimizations
Semantic matching Library support
Contextual matching Incomplete information
Optimization tools Clustering
Function-centric naming
4b Timing
Simulation and modeling Unconventional technology
Technology-speciﬁc information
5 Register Transfer
Model generation Complexity
Sequential functionality Validation
Timing Automating process
% Domain-speciﬁc information
6 Behavioral

 

 

 

Table 2.1: Reverse engineering technologies and challenges, 1998. [36]

 

40

one instance of the device to be reverse engineered exists, then the images will be the

only information remaining about the device after image acquisition.

Geometric Description (Level 2) The high-resolution images obtained from the
Image Acquisition stage must be converted into a geometric data stream, a descrip-
tion of the elements on each layer including position information. This can be accom-
plished by standard pattern recognition techniques, provided that knowledge about

the implementation technology is available [8, 47, 55].

'Ii'ansistor-Level Description (Level 3) The transformation from the geometric
description to a transistor-level description can be performed by commercially avail-
able CAD tools that recognize physical structures in the geometric description such

as transistors, wires, and resistors [31].

Gate-Level Description (Level 4) The ﬁnal step to bring the design to a gate—
level description uses isomorphism techniques to locate transistor implementation of
logic gates. This step requires an extensive library of gate implementations. Because
the number of transistor implementations of gates is relatively small, syntactic library
matching is usually sufﬁcient for this purpose. Many tools are available to perform
this transformation, such as SubGemini [91], LOGEX [17], GateMaker [72], and BLEX
[76], among others.

41

2.3 Reverse Engineering

“Reverse engineering is the inverse of the design process. The design
process begins with an abstract description of a target device and, through
a succession of reﬁnements, produces an implementable design. Reverse
engineering begins with the disassembly of a manufactured device and

ends with an abstract description of its functionality.” [32]

The reverse engineering of a digital design has many purposes. It may be nec-
essary to change the speciﬁcations of the product to add new features or to meet
changing client needs. Portions of an existing system may be reverse engineered so
they may be used within another system [99]. With improvements in available tech-
nology, hardware may be re-implemented to improve power, size, or speed constraints.
The availability of cheaper technology or materials may drive reverse engineering for
redesign of a functionally equivalent but more cost effective part.

Technology is improving at such a fast rate that it is necessary to perform reverse
engineering and redesign quickly so the new part is not obsolete before it is released.
Thus, reverse engineering techniques must produce results quickly and effectively so
that the redesign process may begin.

The most practical way to implement recovery of the gate-level netlist is to incor-
porate recovery within the design process itself. A system that allows simple transi-
tions between design levels (higher or lower) produces designs that can be redesigned
within the system. Unfortunately, most hardware was not designed in such an inter-
active system, so reverse engineering is a “blind” process. Some design information

42

-31. vii-q

may be available, but it is not guaranteed to be current.

The following sections describe several approaches to reverse engineering. The ﬁrst
two, Module Identiﬁcation and FINES, are the techniques described in the literature
that are the most similar to the approach described in this dissertation. The last
approach, Hybrid Reverse Engineering, describes a general methodology for module
identiﬁcation that provides many interesting techniques for simplifying reverse engi-

neering. These techniques span both automated and manual identiﬁcation techniques.

2.3.1 Module Identiﬁcation

The technique that is most similar to the research presented in this dissertation is
proposed by Chisholm et al [32]. Two parts of the module identiﬁcation problem are

identiﬁed:
0 Partitioning - generate candidate subcircuits by partitioning the subject netlist.

0 Matching - determine matchings between candidate subcircuits and library

modules.

This technique considers both syntactic and semantic matching. The syntactic ap-
proach is used initially to locate subcircuits whose functionality is already understood.
Assuming that module identiﬁcation is for the purposes of veriﬁcation of one’s own
design, the library modules for this step can be built from the design documentation.

The syntactic matching technique used for this approach is the SubGemini algo—
rithm [91]. Details on this algorithm can be found in Section 2.7.2. Recognizing that
syntactic matching has limitations, such as requiring a large module library and an

43

unsung-_—

inability to match optimized modules, the researchers now apply a semantic technique
to locate all implementations of the library modules.

The semantic equivalence problem is solved by using Binary Decision Diagrams
(Section 2.8.1) as a canonical form for the circuits, then comparing those canonical
representations as necessary. The difﬁculty with using Binary Decision Diagrams as
a canonical form is that it is necessary to determine a correspondence between the
inputs and outputs of the pattern circuit and those of the target circuit.

The candidate subcircuits are identiﬁed by applying partitioning to the circuit.
Details on this technique are given in Section 2.5.1. Once the candidate subcircuits
have been determined, the canonical BDD description of their logic is generated for
each output. These canonical forms for the candidate outputs are compared to the
canonical forms for the outputs of the library elements. If all outputs of the candidate

match all outputs of a library element, a match has been found.

Discussion

The focus for the technique presented by Chisholm et al [32] is on reducing the com-
plexity of the second step of the Module Identiﬁcation Problem (Matching or Sub-
circuit Identiﬁcation) by identifying possible outputs of modules in the target circuit
and attempting to match those with outputs of library modules. This ensures that
the matching is done between smaller circuits with only one output, which simpliﬁes
each individual matching test considerably, though it requires a greater number of
equivalence tests because each output must be matched.

The approach to the Module Identiﬁcation Problem described in this thesis per-

44

 

forms matching between full subcircuits with multiple inputs and outputs. Candidate
Subcircuit Enumeration (Chapter 3) concentrates on reducing the complexity of the
ﬁrst step (Candidate Enumeration). By combining these improvements to each of
the Module Identiﬁcation steps, a powerful tool for module identiﬁcation may be

developed.

2.3.2 FINES

FINES (Functional INformation Extraction System) [92] is a prototype system de-
signed to extract logic functions from combinational systems for the purpose of de-
sign veriﬁcation and developing functional simulation models and test patterns. The
eventual goal of the project is to enable automated development of design documents

and manuals for logic circuits.

The interesting feature of FINES is that it focuses on extracting arithmetic func-
tions instead of logic formulas because they are sufﬁcient for the purposes of design
veriﬁcation and functional simulation and are generally simpler and more compact

than logic formulas.

The goal of FINES is to represent a circuit with a function table, which is an
intermediate form between logic formulas and truth tables. A function table is a
desirable representation because it is compact and is frequently used to describe
commercial integrated circuits in data books, so it is a familiar representation to
many engineers. Binary Decision Diagrams (presented in Section 2.8.1) are used to
create the function table, because they allow well-deﬁned manipulation to determine

45

I
l
i

 

a reduced BDD, which translates into a smaller but equivalent function table.

FINES operates under the assumption that additional information is available
from an engineer to guide the extraction process. This additional information can
take the form of inputs, outputs, and control data of important modules.

The input to FINES is a logic circuit description, a library of arithmetic functions,
and some additional information. The output is a function table that describes the
functionality of the logic circuit. The additional information is used to determine
the ordering of BDD variables that results in the most compact function table. The
authors determined by experimentation that by ordering the control inputs ﬁrst, the
resulting ROBDD (and thus the function table) was smaller.

Once the function table has been determined for a logic circuit, the known arith-
metic functions are extracted. FINES compares the BDDs of the library entities
(arithmetic functions) with the function table created for the circuit. The method
of determining correct input correspondences between the library entity BDD and a

possible match BDD is not discussed in [92].

Discussion

The interesting feature of this approach is that it describes the entire target circuit by
a table describing its functionality in logic formulas, then attempts to locate library
entities in this function table so that an algebraic function table may be created.
The function table is a more easily understood representation of a circuit, but its
readability and compactness is dependent upon additional information provided by
an engineer. Without this additional information, the function table is equivalent to

46

a truth table, which would be quite large for circuits with realistic functionality.

A limitation of this system is that it is designed to locate only arithmetic functions.
However, this approach can be easily extended by expanding the library to contain
more complex modules. The real difﬁculty with FINES is that the entire circuit
must be transformed into a BDD, which can quickly become prohibitively large. This

requirement severely limits the size of circuits to which FINES may be applied.

2.3.3 Hybrid Module Identiﬁcation

Hansen, Yalcin and Hayes [53] describe the reverse engineering of several of the
ISCAS-85 benchmark circuits to illustrate a hybrid module identiﬁcation technique
as well as the hierarchical nature of the ISCAS benchmarks.

The technique applied in this case study uses many different module identiﬁcation
techniques to identify all meaningful modules in the target circuit. These techniques

are as follows:

0 Library modules. Location of known library modules, such as multiplexors,
decoders and adders. These modules can be found in integrated circuit manu-

facturers data books and common cell libraries.

0 Repeated modules. Multiple instances of a subcircuit (with or without identiﬁed
functionality) may occur in the target circuit. These repeated elements are
most common in datapath circuits, and are the focus of regularity extraction

techniques (discussed in Section 2.4.1.

0 Expected global structures. After identifying several modules, a reverse engineer

47

 

can use that information as a guide to locate larger modules that may use the

known modules.

0 Computed functions. For small subcircuits (no more than four or ﬁve signals),

the logic function may be computed and aid in identiﬁcation of local modules.

0 Control Functions. Key control Signals may be identiﬁed that partition the

functionality of a subcircuit into simpler subcircuits on various control line

E

values.

0 Bus structures. After repeated modules have been identiﬁed, their outputs may
be grouped together into buses, allowing circuit partitioning based upon that

information.

0 Common names. If names are known for elements in the netlists, elements with
similar or identical names can be grouped together, in hope of sparking further

structural insight.

0 Black boxes. If a group of gates cannot be identiﬁed by any of the above

techniques, they may be encapsulated within a black box.

Discussion

This method requires a high-level of engineer input. It does apply a syntactic match-
ing technique to locate the library modules and repeated modules identiﬁed by the
engineer. This paper provides an excellent overview of module identiﬁcation tech-
niques, many of which may be applied by an engineer at any stage of the module

48

 

identiﬁcation process to guide an automated system.

A signiﬁcant drawback to the methodology is that the library module identiﬁca-
tion applies syntactic matching. However, the approach assumes that an engineer is
guiding the entire process, which provides a large degree of ﬂexibility. Once a module
has been identiﬁed by any of the methods listed above, it may be placed in the library

so other instances may be identiﬁed.

2.4 Application Areas

Although reverse engineering or design recovery is the stated application of Module
Identiﬁcation in this dissertation, there are several other problems in the digital logic
domain to which this technique may be applied. Three of these applications are
described here. Although the problem description may differ, the underlying focus is
the same. All of these problems can be solved by the location of functional modules

within a circuit netlist.

2.4.1 Regularity Extraction

Circuits are commonly designed to optimize a set of features, such as footprint, perfor-
mance, power consumption, etc. One of the powerful tools available for this optimiza-
tion is regularity extraction [5, 33, 95]. Regularity extraction involves the identiﬁcation
of templates: gate clusters within the circuit that may re-occur multiple times. By
identifying these templates, designers may be able to redesign the system to exploit
the duplication, rather than replicating the technology many times. In addition, the

49

designers can optimize a template for a speciﬁc factor (speed, size, etc), and that op-
timization effort applies to all instances of the template. Thus, regularity extraction
can improve the efﬁciency of the layout as well as reduce the necessary design effort.

The process of regularity extraction can be described by three steps:

1. Template Library Identiﬁcation: Identify a set of templates, the template library.

Many of the techniques currently available assume that the template library

 

was designed and provided by engineers [96]. Several techniques provide a
template library identiﬁcation step [33]. Essentially, this problem involves the
enumeration of subcircuit equivalence classes under isomorphism. However,
due to the potentially exponential number of subcircuit equivalence classes,
restrictions are placed on the type of subcircuit that qualiﬁes as a template.
For instance, one might only consider subcircuits that occur more than once
within the circuit, or subcircuits with only one output, or subcircuits that are

not subcircuits of another template.

2. Template Location: Locate all instances of each of the templates.
Using the templates provided by Step 1, locate all instances of each template.
This problem is equivalent to the subgraph isomorphism problem. The identi-
ﬁcation of the subcircuits for comparison to the templates is accomplished by

applying clustering methods [96] or by a syntactic matching technique [33].

3. Determine Template Cover. Position templates to cover circuit.
The goal of this step is to identify the templates that cover the largest portion of
the circuit. The problem of determining the optimal cover is difﬁcult, so many

50

regularity extraction techniques rely on designer input to facilitate the process.
Common heuristics used are Largest-Fit—First, in which the largest templates
are applied ﬁrst, then smaller templates are applied to cover the remaining area,
and Most~Frequent-Fit-First, in which the template with the largest number of
instances is applied ﬁrst. The remaining logic components comprise a module

that must be custom designed.

Relation to Module Identiﬁcation Problem

This problem is directly related to the Module Identiﬁcation Problem, except that
regularity extraction is commonly performed to locate syntactic matches. There are
many techniques that can be used to locate these syntactic matches that are more

efﬁcient for this purpose than the technique presented in this dissertation.

However, the generation of the template library and identiﬁcation of isomorphisms
could be performed by the techniques presented in this thesis. Regularity extraction
methods require that the template subcircuit occur at least once, and in fact attempt
to locate the templates with the most occurrences in the graph. This count can easily
be maintained while the structural equivalence classes (Chapter 4), which represent
templates, are developed by Candidate Subcircuit Enumeration (Chapter 3). The
result would be an exhaustive listing of the templates in the circuit, as well as a
count and identiﬁcation of their instances, which could then be used as an input to
the template covering step.

51

2.4.2 Formal Veriﬁcation

An important step in circuit synthesis is design veriﬁcation. Once the circuit has
been implemented from its speciﬁcations, it is necessary to prove that the intended
functionality was correctly realized. It is common for the veriﬁcation process to
consume 70% of the design time [2].

In the past, simulation has provided the basis of design veriﬁcation for digital
circuits [44]. Due to increasing circuit complexity, performing extensive simulation
tests of the implementation is no longer a reasonable approach. Exhaustive simulation
of a design requires a test set that represents all possible inputs to a system along
with expected outputs. This veriﬁcation technique is infeasible for most circuits, so
determining a set of input vectors to be tested is a possible approach, but may leave
some input vectors untested so cannot guarantee equivalence.

Consequently, many current veriﬁcation strategies take advantage of formal veri-
ﬁcation methods [61]. By applying formal veriﬁcation to the designs, it is no longer
necessary to exhaustively test all input vectors because the implementation can be
proved to be correct for all possibilities.

In a survey of current formal veriﬁcation methods for combinational circuits, Jain
et al [61] describe the available approaches as belonging to either functional or struc-

tural approaches to veriﬁcation:

0 functional: Circuits are represented by a functional canonical form and are
considered to be equivalent if their canonical forms are equivalent. Lai, Pedry,
and Sastry [74] use BDDs, which are a very common canonical representation

52

 

in veriﬁcation, to represent the circuits to be compared.

0 structural: Key equivalent vertices are located within the Boolean network ex-
tracted from the implementation and in the Boolean network representing the
original design. These starting points are used to simplify and focus the equiva-
lence checking process. The Gemini [42] and GeminiII [41] systems both follow

structural approaches to veriﬁcation.

Pelz and Roettcher [93] and Batra and Cooke [9] both present hierarchical struc-
tural techniques for formal veriﬁcation that successively match library modules
to higher and higher levels of abstraction. Many structural circuit equivalence

techniques have been derived for transistor-level implementations [60, 100,106].

The two types of formal veriﬁcation, structural and functional, correlate directly
to the syntactic and semantic reverse engineering techniques identiﬁed in this disser-
tation. The design recovery technique applied by Sub Gemini [91] to identify high-level
modules is based directly on the structural formal veriﬁcation techniques ﬁrst used

in Gemini.

Relation to Module Identiﬁcation Problem

There is one primary difference between formal veriﬁcation of circuit designs and
reverse engineering of circuits. In formal veriﬁcation, the modules being matched, the
mapping between devices, and the overall functionality of the circuit are all known a
priori or can be easily derived.

The equivalence checking process can be eﬂ'ectively guided with this information.

53

 

In addition, the correspondences between input and output variables are generally
available (unlike reverse engineering), which simpliﬁes the semantic comparison pro-
cess. Therefore, formal veriﬁcation of circuits is a much simpler task than reverse

engineering an unknown device.

The solution to the Module Identiﬁcation Problem presented in this thesis is
similar to the functional approaches to logic veriﬁcation. However, formal veriﬁcation,
because it is working with two essentially known circuits, does not need to perform the
ﬁrst step of the Module Identiﬁcation Problem, Candidate Subcircuit Enumeration.
Therefore, the unique enumeration of subgraphs is not a technique that should be
applied within formal veriﬁcation. However, the structural identiﬁer presented in
Chapter 4 could prove very useful within structural veriﬁcation techniques for proving

structural equivalence.

2.4.3 Technology Mapping

After the design for a new digital device is complete, it is transformed into a tech-
nology independent logic description and optimized. That speciﬁcation is commonly
stored as a Boolean network. The next step in the design process is to identify logical
modules (standard cells) that provide the best cover (based on design goals such as
footprint, power, and speed) of the functionality represented by the Boolean network.

This step is called technology mapping.

Technology mapping is a particularly important step in the design of a circuit
because the quality of the design is greatly aﬂ'ected by the choice of cells. The cell

54

selection can optimize the realized circuit in many ways, including area, speed, and
power.

There are two steps inherent in technology mapping:

0 Step 1 - matching: locating logical modules that perform the necessary func-

tionality.

0 Step 2 - covering: identifying an optimal cover of the Boolean network with the

logical modules.

The matching step in technology mapping has typically been handled by Boolean
matching [115], although pattern matching of logic functions has also been applied
[81]. Many older techniques used tree matching, restricting matched modules to those
with only one output [91].

A typical technique for solving the technology mapping problem is to represent
the logic function to be performed in a canonical form, such as NAND2 and NOTI
gates exclusively. The cells are also represented by a canonical form, and the goal is
to determine an optimized cover of the canonical function graph with the canonical
primitive cells.

Tree covering approaches are capable of operating in linear time when both the
target graph and the subject graph are trees, by applying dynamic program tech-
niques. When the functions are not restricted to representation by trees, directed
acyclic graph covering techniques [71] may be applied.

Wu, Chen, and Acken present an interesting alternative to NAN D2-N OTI canon-
ical form that involves manipulation of a function’s truth table to develop an input

55

ordering that permits a canonical description [115]. This approach, although designed
for application to single-output functions, has possible applications to multiple-output

functions as well.

Relation to Module Identiﬁcation Problem

Technology mapping involves determining an optimal cell cover of an optimized logic
function. The cells available are generally small ( 20 logic gates) and have only
one output. This simplicity allow the application of more focused techniques to
be applied to solve technology mapping problems. The matching applied within
the Module Identiﬁcation Problem must be able to determine equivalence between

complex modules with multiple inputs and outputs.

2.5 Candidate Enumeration

The ﬁrst part of the Module Identiﬁcation Problem, Candidate Subcircuit Enumera-
tion, can be approached in several ways. The ﬁrst method involves partitioning the
netlist into the candidate subcircuits. This approach to subcircuit enumeration and
details of some research in this area are discussed in Section 2.5.1. The other common
approach is to build clusters of vertices. The details of netlist clustering and some
related approaches are given in Section 2.5.2.

56

2.5.1 Partitioning

Many graph partitioning algorithms have been modiﬁed speciﬁcally to operate on
circuit netlists [3]. The goal of these general algorithms is typically to divide a sys-
tem Speciﬁcation into clusters such that the number of inter—cluster connections is
minimized for use in circuit board layout design. This section will discuss several
partitioning techniques that have been applied within module identiﬁcation to iden-

tify candidate subcircuits.

Although they appear similar, there are several factors which distinguish the Mod-
ule Identiﬁcation Problem from the partitioning problem. Partitioning implies that
the clusters are disjoint, but that assumption cannot be made for the Module Identi-
ﬁcation Problem. Functionality may be shared among several modules, so they may
overlap and share gates. Relying on a strict partitioning algorithm could result in

unidentiﬁed modules.

In solving the Module Identiﬁcation Problem, it is necessary to identify all sub-
circuits that are semantically identical to the high-level module. An approximate
partition will not suﬂice. If a NOT gate is appended to the end of a subcircuit that is
equivalent to a functional module, the function that that subcircuit performs is dras-
tically altered. The subcircuit will no longer be able to be identiﬁed as the functional

module.

Several approaches to the Module Identiﬁcation Problem use partitioning to iden-
tify candidate subcircuits for comparison. Details on the heuristics and techniques
used to guide the partitioning are described in the following sections.

57

Partitioning Heuristics

The design recovery approach by Chisholm et al (Section 2.3) [32] uses three heuristics

to guide the partitioning of the circuit into candidate subcircuits:

0 Logic signal analysis. Two devices that have equivalent signals at their control
ports are more likely to be part of the same module. Clock trees of ﬂip—ﬂops
are developed, and the constituent gates are correlated to determine whether

or not they are likely to belong to the same module.

0 Relative position analysis. Devices that lie close to each other are more likely
to be part of the same module. Designer input can also be used to deﬁne an

area of the netlist that looks (to the human eye) like it may be a module.

0 Graph analysis. Identiﬁcation of articulation points, path length between ver-
tices, and cycles all provide hints regarding functional modules. For example,
two vertices separated by only a few edges are more likely to be a part of a

module than if they were separated by hundreds of edges.

Key Vertex Location

The SubGemini [91] algorithm identiﬁes a key vertex in the library module, then
locates all of the instances of that vertex within the circuit. This set of vertices
is called the candidate vector, and each represents a possible partition of the circuit.
Each of the vertices in the candidate vector are then investigated to determine whether
or not they may match the key vertex.

58

2.5.2 Subgraph Clustering

The goal of netlist clustering in this application is to locate the subcircuits of the gate-
level netlist that may be equivalent to library modules. Netlist clustering has been
applied to many areas of digital design, particularly module placement in physical

design [59, 65, 98].

Typically, the goal of clustering a netlist is to group cells into disjoint clusters.
By clustering a netlist, the complexity of the netlist can be reduced, thus improving

the execution time of partitioning and searching algorithms.

The two basic approaches to clustering are top—down and bottom-up. Top-down
approaches recursively partition the netlist into subclusters by considering global
information about the netlist. When applied to the Module Identiﬁcation Problem,
this paradigm is similar to the partitioning discussed in Section 2.5.1. Top-down
approaches, although they are unlikely to make bad decisions, are computationally

complex and are seldom applied alone.

The bottom-up approach is more similar to the candidate subcircuit enumeration
research presented in this dissertation. Initially, each cell (gate) is assigned to a trivial
cluster. The clusters are then merged together by considering local information about
neighboring clusters, until a satisfactory clustering of the netlist has been achieved
[4, 52, 65, 96, 98].

59

Discussion

Standard clustering approaches develop clusters that are disjoint. In module identi-
ﬁcation, it is possible for the modules to share gates, a situation that often occurs
when the circuit is optimized. The Candidate Subcircuit Enumeration approach pre-
sented in Chapter 3 will produce all gate clusters, so that overlapping modules may
be identiﬁed.

Many clustering concepts may be applied to subcircuit enumeration, such as
heuristics to determine whether or not a gate may be added to a subcircuit. If a
gate is physically located “far” from the current cluster, it is unlikely that it is part
of the same functional module. If a gate is connected to a subcircuit by a single wire,
it is less likely (though obviously not impossible) that it is a part of the module.
These heuristics may be added to the implementation of the general solution to the
candidate subcircuit enumeration problem to improve execution time in real-world
applications.

Similarly, clustering approaches may be improved by applying the vertex indexing

technique and rules described in Chapter 3 to avoid duplication of gate clusters.

2.6 Circuit Representations for Structural Com-
parison

There are several approaches described in the literature that develop a label for the
circuit or graph and then compare these labels via string matching [13,58]. This

60

approach was ﬁrst applied to representing the molecular structure of chemical com-
pounds, which are very similar to circuit graphs in that each vertex has additional
information (atom type, gate type) that may be used to simplify equivalence com-
parisons.

This technique has several advantages. String comparison is much more efﬁcient
than comparing graphs vertex by vertex, and strings serve as an excellent hash index,
allowing isomorphic graphs to be quickly located. The disadvantage of this technique
is that it is often difﬁcult to develop the label itself. However, for module identiﬁcation
via structural isomorphism, circuit labeling can be very useful, because the labels for

the circuits in the library may be calculated before execution begins.
A circuit labeling technique, applying a vertex weighting approach similar to that
described in Section 2.6.1 is applied in Chapter 4 to partition the subgraphs into

equivalence classes.

2.6.1 Structural Labeling

Benecke, Kerber, and Laue present a technique for canonically numbering chemical
molecules to facilitate searches in an on—line chemical identiﬁcation system (MolGen
[51]) [11]. This approach considers structural properties of the molecules to develop
classes to deﬁne the order in which a nested structure can be built to describe the

molecule.

This technique does not elegantly perform canonical element ordering. If two
elements cannot be simply ordered, a backtracking procedure is required to evaluate

61

the resulting nested structures to determine the correct canonical one. Developing a

technique for canonical element ordering is possible, though computationally costly.

2.6.2 K-Formulas

A K-formula is a representation of directed graphs that is deﬁned by a set of linear
formulas. This string preserves the structure of the graph, and the graph may be

recreated from its K-formula representation [10].

K-formulas were ﬁrst proposed by Krider [69] and further explored by Berztiss
[13,14], and have been applied to the problem of graph isomorphism [14]. The K-
formulas for two graphs may be evaluated for equivalence by attempting to determine

a matching between the vertices in the two graphs.

K-formulas are based on representing arcs of the directed graphs as the K-operator
(‘*’) followed by the arc’s endpoints. An arc connecting vertex u to vertex v is
represented as *uv. A K-formula for a vertex v is written with a K-operator for
each arc, then the vertex name, followed by the endpoints of all arcs leaving v. The
K—formula for a vertex u that has neighbors u, x, and y is * * *vuxy. The symbols
for the endpoints of the arcs may be written in any order, so v's K-formula may also
be written * * *vxyu or * * *vyux, for example.

The K-formula for a directed graph may be derived by combining the formulas
of the vertices. Vertex labels for endpoints in the K-formula for the vertex u may
be replaced by the endpoint’s K-formula. Details on this substitution process are
available in [13].

62

 

The problem with using K—formulas to represent directed graphs is that there may
be more than one K-formula for a single graph, because the K-formula is dependent on
the order in which the graph was traversed during the K—formula creation. Multiple
K-formulas for a single graph structure means that determining isomorphism of two
graphs is a complicated process, involving determining the correct matching between
vertices in the graphs.

The technique presented in Chapter 4 provides a method of ordering the vertices of
a directed graph to allow a particular set of K-formulas to be created for a structure.
Thus, each graph structure will result in only a few K-formulas, allowing more efﬁcient

comparison for determining graph isomorphism results.

2.6.3 Circuit Matrix Manipulation

Huang and Overhauser [58] present a circuit labeling technique that uses manipulation
of a circuit matrix to develop a unique circuit label for each subcircuit structure. This
technique was designed to operate on bipolar junction transistors, which have three
different vertex terminals. A circuit matrix is an adjacency matrix for the circuit
graph that also represents the type of terminal connecting the vertices.

Once the circuit matrix is built, it is arranged via partitioning to develop a ﬁnal
fully partitioned matrix. This matrix deﬁnes a unique ordering of the subcircuit

vertices and elements. The partitioning involves four steps:

1. Partition by vertex size of partitioning groups.

2. Partition by types of vertices and elements.

63

3. Partition by connection numbers in the matrix.

4. Arrange the unpartitioned portion to determine maximum matrix code.

The matrix code is generated by concatenating the integer value of each row in
the partitioned circuit matrix. The value of a row is the binary number created when
each element in the row is considered to be 0 if the element value is 0 and 1 otherwise.

The circuit code itself is generated by concatenating the matrix code, the numbers
in the matrix, the ordered list of element types, and the ordered list of vertex types.
This code uniquely deﬁnes each subcircuit, allowing comparison of the codes, rather
than the subcircuits. This technique is designed to operate on graph vertices with
exactly three neighbors (transistors). The research presented in Chapter 4 derives a

similar result for circuits containing vertices with an arbitrary number of neighbors.

2.7 Syntactic Matching

The problem of locating gate clusters syntactically equivalent to a given circuit has
been addressed in several different domains, including reverse engineering [17,72,
91], technology mapping [68,94], regularity extraction [33, 87, 88,96], and validation
[41]. The problem itself is related to the classic graph theoretic problem, subgraph
isomorphism, which is known to be NP-complete [50].

There are two structural isomorphism problems that play a part in this research.

0 graph isomorphism: Given two graphs G1 and G2, determine whether or not
they are structurally identical. In the research presented in this dissertation,

64

this problem arises when the subcircuits are divided into equivalence classes.

0 subgraph isomorphism: Given two graphs G1 and G2, determine whether or
not there is a subgraph within G2 that is structurally identical to G1. This
problem is equivalent to locating a subcircuit within a circuit that is structurally

equivalent to a template subcircuit (structural matching).

Both the graph isomorphism and subgraph isomorphism problems also have ap-
plications in other ﬁelds, most notably pattern recognition [22, 29, 83] and chemical
analysis [12,40,104, 109]. Accordingly, much effort has been expended to develop
eﬂicient techniques for solving these diﬂicult problems.

Although graph and subgraph isomorphism are difﬁcult problems, their applica-
tion in other domains is frequently simpliﬁed by additional information. In digital
circuits, vertices represent logic devices with speciﬁc functionality, which can be used

to facilitate these techniques.

2.7 .1 Graph Isomorphism

“Two ﬁnite graphs are isomorphic if there exists a bijective map between

the vertex sets of the two graphs that preserves adjacency.” [57]

Graph isomorphism has been a standing problem for many years because it holds
interest both for the theory and applications communities. This problem has not
been proven to be NP-Complete, nor has it been proven not NP-Complete [7,57].
Assuming that P 75 NP, it can be considered to be NP — intermediate [57], along

65

with factorization of integers and computing discrete logarithms [7]. This ambigu-
ous status appeals to theoreticians. Graph isomorphism has been solved for many
restricted types of graphs, such as planar graphs, which are linear in the graph size,
and trivalent graphs, which require 0(n4) steps to test isomorphism [57].

There are three primary approaches to graph isomorphism represented in the

current literature: topological, group-theoretic, and algorithmic.

o Topological Approach: The topological approach involves embedding the graphs
to be compared onto a surface of minimum genus 1. Each graph is drawn onto
a surface without any edges crossing. For example, a sphere allows only planar
graphs to be embedded onto it. For non-planar graph, a surface with more
Sides is necessary, such as a torus, so all edges may be drawn without crossing.
Once the graphs are embedded, they may be dissected into planar components,
and those planar components may be compared in time linear to the number of

vertices in one of the graphs.

However, the problem with this approach is that determining the genus of a
graph is a difﬁcult problem in itself, as is the dissection into planar components

[57].

0 Group- Theoretic Approach: The group theoretic approach involves determining
the group of all automorphisms for the graph. An automorphism is any mapping

or function that, when applied to a graph G, produces an identical graph G’.

 

1A surface is a sphere in which a certain (ﬁnite) number of holes have been inserted. Equivalently,
a surface is a sphere on which a number of handles have been placed. The number of holes or handles
is called the genus of the surface. The surface of genus 0 is, therefore, the sphere (or plane) [28].

66

By determining the automorphisms of the two graphs to be compared, the

automorphism groups may instead be compared.

The difficulty with group-theoretic approaches is that there is large overhead
associated with them and therefore they usually have a large constant factor

within their complexity [83].

o Algorithmic Approach: A more practically oriented approach to graph and sub-
graph isomorphism is the algorithmic approach, in which algorithms are de-
signed to investigate the search space, generally applying depth ﬁrst search and

backtracking techniques.

There are several approaches presented in the literature to solve the graph isomor-
phism problem within the speciﬁc domain of circuit netlists, including partitioning,
and canonical circuit labeling (discussed in Section 2.6). Chapter 4 presents a solution

to the circuit isomorphism problem that uses canonical structural circuit identiﬁers.

Partitioning

The circuit validation approaches SubGemini [91], Gemini [42] and GeminiII [41] use
a partitioning approach to solve the circuit isomorphism problem.

Given two circuits C1 and C2, the goal is to prove them isomorphic or non-
isomorphic. The two circuits are partitioned by vertex invariants such as gate type
and degree with the intention of develOping partitions that contain equivalent ver-
tices. Once an initial partitioning has been accomplished, the partitions for each
vertex are reﬁned based on information about neighboring vertices. If the circuits are

67

isomorphic, the partitions developed through this partitioning will be identical.

The partitioning in the Gemini algorithms is done by applying a label to each
vertex that classiﬁes the vertex into a partition. Initially, the label represents the
vertex invariants (device type, net degree). For example, a likely partition after the
ﬁrst partitioning of a gate-level netlist would contain all NAND gates that have 2
inputs and 1 output. As the partitions are reﬁned, the vertices are relabeled with the
previous vertex label and with the labels of the neighboring vertices. An isomorphism
has been found when a matching of singleton partitions has been identiﬁed for the

circuits.

2.7 .2 Subgraph Isomorphism

Subgraph isomorphism is an NP-Complete problem [7, 50] consisting of two subprob-
lems: locating subgraphs that may be equivalent to the pattern subgraph and de-
termining structural equivalence. As with graph isomorphism, the additional infor-
mation obtained from the circuit graph can be used to derive practical solutions to

locating isomorphic subgraphs.

Backtracking

A successful algorithm for determining subgraph isomorphism was proposed by Ull-
man [110], and is still considered to be one of the most efﬁcient approaches to the
problem. It improved upon the original backtracking procedure for graph isomor-
phism presented by Corneil and Gotleib [35]. The algorithm is based on the fact
that subgraph isomorphism may be determined by applying a brute force enumera-

68

tion algorithm to perform a tree search simultaneously on the pattern subgraph and

the target graph, in the same way as [14].

The difﬁculty with this technique, in general, is the large number of vertices that
must be searched to identify an isomorphism. Ullman prunes the number of vertices
that must be considered by applying a reﬁnement step after each vertex has been
added to the potentially isomorphic subgraph. This step eliminates successor nodes
that will not lead to discovery of an isomorphism, improving the average execution

time to a time proportional to n3, where n is the number of vertices in the pattern

graph.

Decision Trees for Multiple Pattern Graphs

Bunke and Messmer [83] present an approach to subgraph isomorphism that can
simultaneously search for multiple pattern graphs within the target graphs. Multiple
pattern graphs arise in many applications, including syntactic matching for gate- to

register- level transformation, so this technique has practical appeal.

The technique involves the generation of all permutations of the adjacency matrix
of a pattern graph, and then a decision tree representing these permutations is built.
The adjacency matrix permutations for any number of pattern graphs may be added
to the decision tree. The decision tree is applied to the adjacency matrix of the target
graph to identify all pattern graphs within.

69

 

Multi—place Graph Weighting

Within the ﬁeld of veriﬁcation, Luellau, Hoepken, and Barke [76] present BLEX,
a tool that employs syntactic matching to identify components in a transistor-level
netlist. As deﬁned in this context, multi—place graphs have an additional vertex type:
multi—nets. Edges represent device pins (base, collector, emitter, cathode, anode) and
are distinguishable as such. Multi-nets represent intersections between edges in the

graph.

Each edge in the graph is weighted with a primary number representing its pin
type. To obtain the weight of a vertex or multi—net, the weights of its adjacent edges
are multiplied together. A weighted incidence matrix can be built to Show adjacency
of multi-net vertices and device vertices. The values in the matrix product of all edges
connecting the device with the multi-net. Two circuits are identical if their weighted

incidence matrices are identical.

It would be possible to interchange rows and columns of the adjacency matrix
until they proved to be identical or all possibilities had been tested. However, with
n vertices and m multi-nets, there are n! x m! permutations to test. Instead, BLEX
partitions the incidence matrix by sorting devices and multi-nets by their weights,
and then gradually matching the circuits by locating a starting vertex and growing
from there. It matches pins and vertices adjacent to the subcircuit and gradually
exploring the matrix until a match has been found.

70

2.7.3 Discussion

The disadvantage of all syntactic techniques is that they can identify only the set of
speciﬁc implementations of a functional component that are contained in its library.
Nonstandard or intentionally obfuscated implementations will never be recognized.
Furthermore, any optimization that modiﬁes the implementation of the entity (such
as optimizations for don’t care conditions) renders the entity unﬁt for recognition by
structural techniques. Structural matching cannot reliably recognize all functional

components that exist in a circuit.

Although syntactic techniques are useful in applications such as converting a
transistor-level netlist into a gate-level netlist, techniques that rely on exact struc-
tural matching have limited application to higher levels of design, because high-level
functional components generally have many valid implementations. The execution
and memory costs associated with maintaining a library of many implementations
for each module are not reasonable, so this method is not general enough to reliably

solve the Module Identiﬁcation problem.

Any solution to the general Module Identiﬁcation Problem should include the use
of structural matching to recognize standard implementations of functional compo-
nents within the circuit. When the technique of syntactic matching is applied ﬁrst,
the effective complexity of a circuit can be signiﬁcantly reduced before more complex
approaches are applied.

71

2.8 Semantic Matching

In contrast to syntactic matching, semantic matching bases equivalence on function-
ality rather than structure. Two subcircuits may have completely different structures,
yet perform identical functions. This is a powerful matching technique.

A general solution to the Subcircuit Identiﬁcation Problem requires the identiﬁca-
tion of high-level components that are more complex then those dealt with in Boolean
matching but that lack the input/output correspondences between the logic design
and the library components that veriﬁcation techniques require. Because the func-
tionality of the high-level component may be represented in any number of structural
forms, it is necessary to identify the subcircuit by proving semantic equivalence [43].

Semantic matching provides a structurally independent solution to the problem
of identifying meaningful subcircuits. By using a semantic technique, it is possible to
identify subcircuits that are equivalent to a high-level component in many situations
for which syntactic techniques fail. The structural changes imposed by new imple-
mentations, design optimizations for area and power, and many other complicating
factors cause purely syntactic techniques to fail, but they are amenable to semantic
matching techniques. Details about the semantic matching approach are presented

in [36, 38].

2.8.1 Functional Canonical Form

Canonical representation is a technique that ensures that equivalent instances of a
circuit are represented identically. A canonical semantic representation of a circuit

72

describes the circuit’s functionality in a set of logical formulas. Any functionally
equivalent implementation of that circuit produces an identical set of logical formulas.
Semantic matching can be reduced to a two-step algorithm if canonical represen-

tations of the circuits are used:

0 Convert each of the circuits to be compared to a canonical form.

0 Compare the canonical forms of the circuits.

The difﬁculty, of course, is developing a canonical representation. Binary Decision
Diagrams are the most effective option [32, 115], but they still require that input and
output names of the circuits are matched, which requires trying all permutations of

the input and output ordering.

Binary Decision Diagrams

Since their initial presentation in 1986 [19], Binary Decision Diagrams (BDDS) have
become a widely used representation of Boolean functions. The advantage of BDDS is
that they can be manipulated and generated by efﬁcient graph algorithms. A variant
of BDDS, Reduced Ordered Binary Decision Diagrams (ROBDDS) [20] can represent
Boolean functions in canonical form. A further extension of ROBDDS, partitioned
ROBDDS, can be exponentially more compact than ROBBDS and are more easily
manipulated [85, 86].

BDDS represent functions as directed, acyclic graphs, in which each nonterminal
vertex is labeled by a function variable. Each vertex has two outgoing edges, one

corresponding to the case in which the variable evaluates to 0 (generally represented

73

by a dashed line), the other corresponding to the case in which the variable evaluates
to 1 (generally represented by a solid line). The terminal vertices in the graph are
labeled either 0 or 1, representing the value of the entire function. For any variable
assignment, the function value can be determined by following a path through the
graph, following the appropriate branch for the value of the variable.

Each decision variable must appear at most once on a path from the root vertex
to a terminal vertex. The order in which the variables are visited is determined by a
total ordering on the variables. When this total ordering is applied, a BDD becomes
an ordered binary decision diagram (OBDD). By eliminating and sharing isomorphic
subgraphs within the OBDD, the BDD may be reduced to a compact form, a reduced
ordered binary decision diagram (ROBDD), and can present a canonical form of the
function.

The difﬁculty in deriving this canonical form is the ordering of the decision vari—
ables. Two identical functions will present two different ROBDDs if their variables are
given different orderings. So, when attempting to use this canonical representation
to compare the functionality of two circuits, it is necessary to determine the correct
variable ordering for each so that the OBDD representations of identical circuits are
also identical.

ROBDDS are a valuable tool for function comparison. The representation is com-
pact for the majority of functions that appear in logic circuits, and manipulation and
comparison is simple and fast. The focus, therefore, must be on reducing the variable
ordering permutations that must be tested to determine equivalence.

The difﬁculty when attempting to determine equivalence of two ROBDDS is that

74

the ordering of the variables can result in vastly different ROBDDS. To reduce the
Situations in which all ordering permutations must be tested, it is useful to imme-
diately disregard functions that cannot be equivalent. This non-equivalence can be

determined by deriving and comparing Boolean signatures [36,79].

Boolean Signatures

A Signature of a Boolean function is a representation of a property of the function [36].
Signature functions are applied to a function and return a characteristic signature for
that function. The signature function must be dependent on just the behavior of the
function; variable orders and labels may not be considered by the signature function.

Functions with equivalent signatures are said to belong to the same signature
class. For two functions to be equivalent, they must share a signature class. Typical
signature functions are designed to be very quick to return signatures, so they may

be efﬁciently applied to reduce the possible correspondences that must be tested.

2.8.2 Cones of Logic

Rather than performing multiple-output semantic matching, outputs may be consid-
ered individually by building a logic cone for each output of a subcircuit [32]. This
technique has also been applied in partitioning techniques to improve the mapping of
partitions to FPGAS [18].

A logic cone is built by starting at one of the outputs of the subcircuit and adding
gates that are inputs to gates already within the cone until the primary inputs to the
subcircuit are reached. This cone will contain all of the gates that contribute to the

75

determination of that output value.

The current technique to solve the Module Identiﬁcation Problem with logic cones
involves locating a Single output of a known library module (the target output). Logic
cones are built from gates within the circuit. Each of these cones is semantically

compared to the function of the target output.

Expansion stops when the logic cone is semantically equivalent to the function of
the target output or the primary inputs to the circuit have been reached. Once the
target output has been located in the circuit, an attempt is made to expand the cone
to locate the functionality to produce the other outputs of the known module.

The advantage of this technique is that single-output semantic matching is less
intensive than matching with multiple-outputs. The disadvantage is that the number
of matches that must be performed is larger. The candidate subcircuit enumeration
technique presented in Chapter 3 can enumerate just these logic cones, so logic cone
matching can be performed within module identiﬁcation by providing an appropriate

output functionality library.

2.8.3 Hierarchical Semantic Matching

In an approach to regularity extraction, Chowdhary et al [33] perform tree subgraph
matching. Each tree, or template, is stored as a root node and a set of templates
that represent its children. To view an entire tree, the template for each child may

be expanded.

The technique simultaneously builds a list of templates, and it is against these

76

that the expanding subtrees are matched. The result is a list of logic functions. For
each logic function, there is a list of the subcircuits of the circuit that perform that

function. The details of the logic function matching were not discussed in [33].

The obvious drawback to this technique is that it operates on only a small sub-
set of the subcircuits, subtrees. Subtrees represent single output functions with no
reconvergent fan-in. Some extensions for multiple outputs have been researched, but

the focus is primarily on single-output functions.

2.8.4 Pseudo-Semantic Matching

Methods that fall between the strict deﬁnitions of syntactic and semantic matching

are permutation matching and rule-based structure manipulation.

The approaches that have applied permutation matching effectively have oper-
ated exclusively on transistor-level netlists, which have a Simpler structure and fewer
reorganization options than gate-level netlists. For instance, YNCC [100] uses per-
mutation to iteratively gather information about possible bindings until a match can

be located.

A circuit comparison system presented by Takashima et al [106] uses a rule-based
system to identify small subcircuits corresponding to any of the implementations that
occur in the rule base. That subcircuit may then be replaced by a block or element

representing the appropriate module.

There are several systems that perform “functional matching” of components but
rely upon an extensive library and syntactic techniques to perform matching [93], or

77

use rules or permutations to manipulate simple structures until a match is found [100,
106]. These systems are useful and efﬁcient in circuits for which high-level modules are

known or can be easily identiﬁed, but they do not perform true semantic matching.

2.8.5 Discussion

The techniques discussed in this section are approaches to performing the most difﬁ-
cult type of matching: semantic matching. For module identiﬁcation, it is much more
powerful than syntactic matching because it can locate all instances of a functional
module, regardless of implementation. However, it is a more complex technique,
requiring more computation than syntactic matching. The goal of the research in
this thesis is to reduce the number of applications of semantic matching to make the

Module Identiﬁcation Problem more tractable.

2.9 Software Reverse Engineering

Unlike in hardware, in which a digital design is synthesized, implemented, and man-
ufactured, then left static until it is replaced, software is often subject to constant
extension and modiﬁcation. In fact, the majority of effort applied to software devel-
Opment is not in creation and implementation, but in the extension and modiﬁcation
of existing software systems [112]. This is not surprising, because the reengineering of
software systems is frequently driven by outside inﬂuences. As the available technol-
Ogy of digital hardware improves and drives replacement of the underlying computer

systems, the software must be modiﬁed to operate on the new platform. The ac-

78

cessibility of software leads to greater user expectations with regard to functionality,
performance, and ease of use.

Legacy systems exist in the software domain as well as the hardware domain, and
their reverse engineering is desirable for the same reasons: complete replacement is
too expensive or infeasible due to complex features and the lack of accurate original
speciﬁcations. In some cases, a “wrapper” may be developed that runs on the new
hardware platform, but interfaces with the legacy system on its original platform.
An example of this is developing a front-end client application to allow access to
a mainframe software system. This solution is safe because no change has been
made to the underlying functionality, and can be implemented cheaply and quickly.
The disadvantage is that the system does not take advantage of the performance
improvements gained by the new platform [90].

Support and evolution of systems programmed in older languages like COBOL
are suffering from a lack of knowledgeable software engineers to upgrade, modify, and
troubleshoot these systems. Transforming useful systems that are implemented in
a obsolete language into more current language not only increases the chances that
they can be modiﬁed as needs change, but also allows them to be more easily ported
to newer hardware and operating platforms.

A compromise between replacement of the legacy system and continuing to use
obsolete technology is to reengineer the system by extracting and reusing critical por-
tions of the code. In this way, complex business rules, for instance, can be ported
“as is”, without requiring an in-depth understanding of their operation. This tech-

nique of reengineering incorporates functional module identiﬁcation, also referred to

79

as reusable component recovery [89].

2.9.1 Functional Module Identiﬁcation

Just as in reverse engineering of hardware, one of the focuses of software reverse
engineering is the identiﬁcation of logical blocks of code that perform a well-deﬁned
and potential useful piece of functionality. Once these modules have been identiﬁed,

they may be reused or redesigned.

Modularization

A functional module identiﬁcation technique called modularization [80] was devel-
Oped by a team of engineers at Boeing to improve the maintainability of large payroll
systems implemented in COBOL. A large compilation unit, the source, may be bro-
ken up into a collection of smaller units targets, each containing functionally related
subroutines. In effect, this develops an object-oriented realization of the program,
and the targets represent self-contained libraries of subroutines that are more easily
modiﬁed and may be easily applied within other systems.

The critical tasks of this modularization process include:

o Splitting: the partitioning of the procedure source into the targets according to
a plan designed by a project leader. After modularization, the source contains

calls to the subroutines in the targets.

0 Linkage determination: the determination of how the targets interact with each

other, as well as the proper handling of their data elements.

80

The splitting step is the most similar to the module identiﬁcation problem pre-
sented in this thesis. The library modules correspond to the targets deﬁned by the
project leader. The source must be searched to locate potential subroutines and deter-
mine their proper target placement. This functional identiﬁcation can be determined
by analyzing the control ﬂow graph, call graphs (which represent the subroutine in-

teraction and parameter information), and abstract syntax trees.

Concept Assignment

The concept assignment problem [16] is a formulation of the transformation between
levels of abstraction in software. It is the study of how to abstract away from variables
and keywords to realize a human-level understanding of the software functionality.
The goal is to recognize concepts implemented by the portions of the code and relate
them to each other to understand the system.

Several techniques are combined to develop an approach to this problem. Syntax-
based approaches, which formally consider the structure of the code, are necessary
but not sufﬁcient to perform effective concept recovery. These approaches look for
patterns of features within the code that can be used to form a signature to identify
the function of the code. These approaches may employ a bottom-up approach,
locating elemental concepts and gradually subsuming them within coarser grained
concepts.

The Concept Assignment Problem bears a strong resemblance to the Module Iden-
tiﬁcation Problem presented in this thesis. It uses a semantic module identiﬁcation

approach to identify elemental concepts within the code and then abstracts away from

81

the details by identifying larger modules that include those simple concepts.

Formal Methods

Formal methods provide techniques for specifying software by representing the soft-
ware by a well—deﬁned speciﬁcation language [56]. Inference rules can be applied to
this speciﬁcation to provide correctness and consistency, and also to extract informa-
tion about the software functionality. By deﬁning and applying formal transforma-
tions to the source code, the code may be transformed into a higher-level description

of the code, hiding the programming details to reveal the intent of the program.

2.9.2 Discussion

The conceptual approach to reverse engineering of software and hardware is very
similar. Both involve the abstraction away from the details (e.g., low-level netlist,
source code) to derive a higher-level understanding of the system. However, the
difﬁculty of software reverse engineering surpasses hardware engineering due to the
difﬁculty of accurately representing the concepts implemented in software. Formal
methods may be applied to develop descriptions of the concepts, but the syntax
of software is not as rigidly deﬁned as is that of hardware, which complicates the
reduction and transformation into a syntactic or semantic representation that may
be applied within module identiﬁcation.

Reverse engineering in both the software and hardware domains applies syntactic
and semantic matching techniques to perform transformation between levels of ab-
straction. Syntactic approaches in software involve parsing of source code to extract

82

patterns for comparison and identiﬁcation. In hardware, syntactic approaches involve
pattern matching of structures in a netlist representation of the circuit.

However, although the goals and abstract approaches of reverse engineering of
hardware and software reverse engineering are nearly identical, the methods of per-
forming tasks toward these goals must be guided by the environment in which they
are applied. In software, a semantic technique capable of comparing complex software
routines is necessary, possibly by representation in a theoretical framework, exhaus-
tive simulation, or bottom-up functional extraction. Hardware can more easily be
represented by a canonical functional representation, such as ROBDDS or a logical
formula.

Therefore, the focus of reverse engineering in these two ﬁelds should concentrate on
solving the problems speciﬁc to the domain. This dissertation presents techniques for
approaching the two parts of the Module Identiﬁcation Problem with regard to hard-
ware: candidate subcircuit enumeration (Chapter 3) and semantic matching (Section

2.8).

2.10 Summary

This chapter has presented background information on reverse engineering and cur-
rent approaches to reverse engineering. In addition, an overview was presented of
areas in which the research presented here may also be applied. Each of these appli-
cation areas can be reduced to the same problem: the Module Identiﬁcation Problem.
Due to the disparity of the application areas, many different approaches have been

83

proposed for the solution of this problem. Those techniques have been categorized
and presented, along with their relation to the solutions proposed in this dissertation.
The next chapter, Chapter 3, presents a solution to the ﬁrst half of the Module

Identiﬁcation Problem: Candidate Subcircuit Enumeration.

84

Chapter 3

Candidate Subcircuit Enumeration

Raising the level of abstraction of a circuit speciﬁcation by identifying all of the high-
level modules within the circuit requires that all of the potential high—level modules
within the circuit are identiﬁed. This is a challenging task because a section of the
circuit that functionally corresponds to a library module will not necessarily appear
to be similar in structure, as measured by size, order, gate types, connectivity ratio,
or physical layout. Additionally, a cluster of gates that is functionally “close to” being
equivalent to a library module may not exhibit any traits that would indicate this
near-equivalence. This lack of a relation between structural features and functionality
impedes any attempt to guide the search for library modules with simple heuristics.

A straightforward approach to the Module Identiﬁcation Problem enumerates all
of the gate clusters within the circuit, then checks each for functional equivalence
to known functional modules. This chapter presents a technique to enumerate all
interesting gate clusters to ensure that all library modules can be located within the

target circuit. Interesting gate clusters are those that perform a well-deﬁned function;

85

enumerating only these gate clusters reduces the number of clusters to be enumerated.

The netlist of the target circuit can be represented by a directed graph; subgraphs
of this graph represent the gate clusters. Each vertex represents a gate, and each arc
represents a wire. The gate clusters are represented by induced, connected, subgraphs.
Within the context of the Module Identiﬁcation Problem, each of the gates within a
gate cluster must be reachable from every other gate. Put simply, the gate cluster
is represented by a connected subgraph. This is a reasonable restriction on the gate
clusters to be enumerated because each library module is represented by a single
subcircuit, not several disjoint subcircuits (discussed in Section 1.4). Further, the
subgraphs are induced because all of the arcs connecting the vertices in a subgraph
are also considered to be included within that subgraph. All subgraphs referred to

within this thesis are assumed to be induced and connected.

In a completely connected graph 1 of order n, there are n cliques 2 of order 1, C3
cliques of order 2, C; cliques of order 3, and so on. Therefore, the formula 2;, C,” =
2" can be derived to describe the number of (not necessarily disjoint) subgraphs, where
'n. is the order of the graph. Digital circuits are seldom (if ever) completely connected,
so the number of subgraphs in a circuit graph will be signiﬁcantly smaller. However,
the above formula does give an upper bound and succinctly characterizes the enormity

of this problem.

 

1a graph in which there is an edge from every vertex to every other vertex
23. subgraph in which there is an edge from every vertex to every other vertex

86

3.0.1 Uniqueness

When developing subgraphs for enumeration, it is possible to derive a single subgraph
in several different ways, depending on the order in which the vertices are grouped
together. This results in unnecessary duplication of effort. When applied within
the Module Identiﬁcation Problem, not only is extraneous computation expended
to produce duplicate subgraphs, but also in unnecessary applications of Subcircuit
Identiﬁcation or the detection and pruning of duplicate subgraphs. The enumeration
technique presented in this chapter guarantees that each subgraph is produced exactly

ODCB.

3.0.2 Focused Enumeration
Subcircuits

Not all of the subgraphs represent interesting gate clusters within the target circuit.
It is important that the subgraph represent a functional subcircuit of the circuit. No
gate in the subcircuit may be accompanied by some but not all of its inputs; that
Situation leaves the functionality of the gate undeﬁned. For example, a 2-input AND
gate in a subcircuit must be accompanied in the subcircuit by both of its inputs.
A subgraph containing a gate with undeﬁned functionality does not have functional
meaning, so the subgraph does not represent a meaningful subcircuit.
In a subcircuit, any vertex with an input are connecting it to another vertex
in the subcircuit may not have an input arc connecting it to a vertex outside of
the subcircuit. In graph theory terminology, no vertex in the subgraph may be

87

accompanied in the subgraph by a proper subset of its parents.

Contained Subcircuits

To further optimize performance of Module Identiﬁcation, a subset of the subcircuits,
contained subcircuits, may be enumerated. A contained subcircuit is the most likely
subgraph type to match a library module because its logic is entirely self-contained.
Circuit designers often use a library of modules, connecting them together with glue
logic to create the ﬁnal circuit. Each of the modules in a circuit designed in this
manner will be represented by a contained subcircuit.

For example, each of the 1-bit full adders that comprise the 2—bit full adder in
Figure 1.1 on page 16 are contained. No logic is shared with other modules. One of
the outputs from the ﬁrst adder is an input to the second adder, but the two adders
themselves are disjoint.

In a contained subcircuit, any vertex with an are that exits the subcircuit may
not have another arc that is an input to a vertex in the subcircuit. All outputs
from a contained subcircuit are primary outputs of that subcircuit. In graph theory
terminology, no vertex in the subgraph may be accompanied in the subgraph by a
proper subset of its children vertices.

It is not sufficient to enumerate only the contained subcircuits of the target cir-
cuit, but contained subcircuit enumeration can be performed much more quickly than
subcircuit enumeration. Therefore, it can be applied as a preliminary module identi-
ﬁcation step to reduce the complexity of the target circuit, thus improving the overall

performance of module identiﬁcation.

88

Candidate Subcircuits

With regard to the Module Identiﬁcation Problem, a candidate subcircuit may refer
to either a subcircuit or a contained subcircuit, depending on the application. The
candidate subcircuits enumerated by the technique presented in this chapter must be
subcircuits. However, by applying an additional restriction, only contained subcir-
cuits may be enumerated as candidate subcircuits. Enumerating and identifying only
the contained subcircuits is useful as an initial technique for reducing the complexity

of the target circuit. For a more detailed description of subcircuits and contained

subcircuits, see Section 3.1.1.

3.0.3 Overview of Technique

The naive solution to candidate subcircuit enumeration enumerates all subgraphs
of the graph representing the circuit, then simply discard all duplicate subgraphs
and subgraphs that are not subcircuits. A more efﬁcient technique enumerates each
Subgraph exactly once. However, the number of subgraphs in an arbitrary circuit
can be prohibitive (0(2") where n is the number of primitive devices in the circuit
netlist), so a more restrictive technique is necessary. The focused enumeration algo-
rithm presented and discussed in Section 3.2.3 generates unique candidate subcircuits
only, eliminating excess computation effort spent on the enumeration of unnecessary
subgraphs. This algorithm also enumerates each candidate subcircuit exactly once,

guaranteeing that all interesting subcircuits are available for examination while wast-

ing as little effort as possible.

89

Although the effort to generate only the subcircuits or contained subcircuits re-
quires some overhead, the beneﬁt of solely enumerating these subsets far outweighs
this cost. For example, there are 98,922 unique subgraphs in a 3-bit adder, but only
522 subcircuits, and only 66 contained subcircuits. Only the subcircuits need to be
considered for equivalence to a known module during module identiﬁcation, so focus-
ing the enumeration effort on this class of subgraphs provides signiﬁcant reduction of
computational effort. The improvement gained by enumerating just these subsets of

the possible subgraphs is presented in Section 3.3.

Generating all of the subgraphs of a graph cannot be accomplished in polynomial
time because the number of subgraphs within an arbitrary directed graph is exponen-
tial, as described on page 3. However, the algorithms presented here enumerate only
a. speciﬁc subset of the subgraphs, and can generate all of these candidate subcircuits
quickly for small circuits (< 100 gates) by taking advantage of the fact that the target
circuit is represented by a labeled, directed, graph. This information can be used to

focus the efforts on subcircuits that are likely to match library modules.

3.0.4 Heuristics

Due to the inherent difﬁculty of subgraph enumeration, it is necessary to provide
heuristics to allow this technique to be effectively applied to larger circuits (> 100
gates). The most important of these is preliminary partitioning. The number of po-
tential subcircuits may increase exponentially with each additional logic device, so by
intelligently reducing the size of the circuit to be explored, signiﬁcant computational

90

improvement may be gained.

Additionally, within the Speciﬁc domain of the Module Identiﬁcation Problem, it
is reasonable to limit the order of the subgraphs that are created. Although a func-
tional module may be realized by any number of logic devices, design constraints such
as minimization of area and power consumption make it unlikely that the number of
gates in a speciﬁc implementation will greatly exceed that of a standard implementa-
tion of that module. Therefore, the order of the subcircuits that must be enumerated
may be bounded, providing great improvement in the computational complexity of

subgraph enumeration.

There are several other computation reducing techniques that may be applied
within candidate subcircuit enumeration along with preliminary partitioning and sub-

graph order limiting. These heuristics are discussed in Chapter 5.

3.0.5 Chapter Outline

Section 3.1 presents background information necessary for understanding the subcir-
cuit enumeration technique presented in this chapter. The technique itself, including
the algorithm, explanatory examples, proof of correctness, and complexity analysis,
is presented in Section 3.2. Proof of concept results and discussion appear in Section

3.3, and Section 3.4 summarizes and considers the ramiﬁcations of this research.

91

3.1 Background

3. 1.1 Representation

The ﬁrst step in formalizing our approach to the Subcircuit Enumeration Problem is
the representation of the circuit as a directed graph. Graph theory is a well-explored
area, and by working with graphs rather than circuits, advantage can be taken of
previously derived results, algorithms, and tools. These graphs, transformed in a
well-deﬁned manner from target circuits, are referred to as circuit graphs.

There are several graphical representations of digital circuits, including a method
that represents both connections and gates as vertices, resulting in a bipartite graph
[41, 42,91]. A vertex representing a connection contains information about each of its
endpoints as well as any additional information that is available, such as length and
the physical position of its endpoints. The vertex representing a gate includes a list
of the connection vertices that are adjacent to it. An extension of this technique is
multiplace graphs or hypergraphs, in which the edge vertices may connect more than
two gate vertices [26, 27, 66, 76]. This method provides ﬂexibility in the implementa-
tion of procedures on the converted graph. For the purpose of subcircuit enumeration
a Simpler type of circuit graph is suﬂicient.

The circuit graphs used in this thesis represent connections as arcs and gates as
vertices, resulting in a directed graph [33, 58, 71,81,95]. In the elementary case, an
arc begins at the vertex that has a lower depth (distance from the circuit inputs)
and terminates at the vertex that has a higher depth. When sequential logic is
encountered, arcs can connect vertices in either direction. Primary inputs to the

92

circuit are represented as vertices as well. Primary outputs are considered to be
represented by the logic gate that produces them.

A 1-bit adder is shown in Figure 3.1a. To transform this circuit into a circuit
graph, each gate and circuit input is replaced by a vertex. Wires connecting gate
inputs to gates are replaced by arcs pointing from the vertex representing the input
to the vertex representing the gate. The transformed circuit is presented as the circuit

graph in Figure 3.1b. Dashed vertices represent input and output vertices.

Deﬁnition 3.1 A circuit graph G representing a digital circuit C is a directed
graph with V(G) containing a vertex for each logic gate in G. E (G) contains an arc
for each connection within G. A parent of a vertex v is a vertex that represents an
input of the gate represented by v. The child of a vertex is a vertex that represents

an output of the gate represented by v.

To facilitate the development and application of rules and algorithms applied to
the graph, a unique numerical index is applied to each vertex in the circuit graph.
Similarly, each subgraph also has an index associated with it. By exploiting the
Uniqueness of the index, a total ordering is deﬁned for the set of vertices. This

ordering is the key to the technique to enumerate each subgraph exactly once.

Deﬁnition 3.2 The index of a vertex v, vindex, is a unique integer assigned to a
vertex. The index of a subgraph H, H .index, is equivalent to the highest index of its

constituent vertices.

An induced, connected subgraph H may be enumerated by creating a trivial

subgraph H’ containing one of H’s vertices and iteratively adding a vertex adjacent to

93

 

 

 

chi ___;[::> 30

a0
b0

 

 

 

 

 

cout

 

 

 

 

 

Figure 3.1: Circuit to Circuit Graph Transformation Example

 

94

H’ until H’ is identical to H. The order of the vertices added to create the subgraph
is known as its creation path. Each vertex in H occupies a speciﬁc position within the
creation path. Theorem 3.3 proves that when each vertex has a unique integer index
and the rules described in Section 3.2.1 are applied, there is exactly one creation path

for each subgraph.

Deﬁnition 3.3 The creation path P(H) of a subgraph H of order n is a list of the
vertices of H in the order in which they were added to create H. It consists of the
vertices P0, P1, ..., Pn_1, where P0 is the initial vertex of H and Pn_1 is the ﬁnal
vertex of H. The intermediate subgraphs are denoted Hm, where x is the position

within the creation path of the most recently added vertex.

For a given subgraph H, the vertices that are adjacent to H are the vertices that
may be considered for addition to the subgraph during subgraph expansion. Each
of these vertices must be no more than one arc away from some vertex within H.

Formally, these vertices are referred to as the subgraph’s neighborhood.

Deﬁnition 3.4 The neighborhood N (H) of a subgraph H within a graph G con-
sists of all vertices of G adjacent to at least one member of H. Formally, N (H) =

{v | v e? V(H) /\ Elu E V(G) such that uv E E(G) Vvu E E(G)}.

We will be using a simple circuit graph C (Figure 3.2) to clarify the deﬁnitions,
theorems, and rules presented here. Vertices with a bold outline are constituent ver-
tices of the subgraph being displayed and dashed vertices represent vertices available
for expansion; all other vertices and edges represent vertices of the underlying circuit

95

 

 

 

 

Figure 3.2: Circuit Graph G

 

graph and are not part of the subgraph (e.g., Figure 3.3). The number in the vertex
is the index of the vertex, which for simplicity will also serve as its name. If a vertex

belongs to the subgraph, its position x on the creation path is displayed under the

vertex name as PX.

Focused Enumeration

The naive solution to the subcircuit enumeration problem is subject to exponential
explosion depending on the connectivity of the graph. However, the complexity of
the enumeration can be signiﬁcantly reduced by exploiting the fact that only some
subgraphs are of interest in the domain of reverse engineering. Consider a subgraph
H composed of a gate vertex v and two vertices representing two of its three inputs
(either gates or primary inputs). The vertex v does not represent a complete gate in
the context of the subcircuit represented by the subgraph H because it is missing an
input. The functionality that it performs is therefore undeﬁned because it is not fully

96

speciﬁed.

For the purposes of module identiﬁcation the enumeration expense is greatly re-
duced by enumerating only those subgraphs that exclusively contain vertices repre-
senting fully Speciﬁed gates. These subgraphs are referred to as subcircuits. This
deﬁnition has also been independently developed and presented as feasible subgraphs
[33].

A fully speciﬁed vertex represents a gate that is joined within the subgraph by

either all of the vertices representing its inputs or none of those vertices.

Deﬁnition 3.5 In a subgraph H of a circuit graph G, a vertex v E V(H) is a fully
speciﬁed vertex if (Vu | u E V(G) A uv E E(G) ——> u E V(H)) V (Vu | u E

V(G) /\ uv E E(G) —> u ¢ V(H)).

Deﬁnition 3.6 A subgraph H of a circuit graph G is a subcircuit of G if and only

if it is connected and each vertex in H is fully speciﬁed.

In Figure 3.3, the tables below the subgraphs present lists of the vertices in the
subgraph (V), and the neighborhood of the subgraph (N). In Subgraph Ho, Vertex 5
is not fully speciﬁed because one of its two inputs, Vertex 2, is not a member of H0,
so H0 is not a subcircuit. In Subgraph H1, Vertex 5 is fully speciﬁed by the addition
of Vertex 2. However, Vertex 2 is not fully speciﬁed, so H1 is not a subcircuit. In
Subgraph H2, the addition of Vertex 1 has made all four vertices in V(Hg) fully
speciﬁed, so H2 is a subcircuit of the circuit graph G.

A further reﬁnement of this process takes into consideration the fact that most
hardware is designed using CAD synthesis tools that utilize a library of ready-made

97

 

 

Subgraph Ho

 

 

Ho Properties

Subgraph H1

 

 

 

V

5,3

H1 Properties

 

Subgraph H2

 

 

 

 

 

N

 

2

V

5,3,2

H2 Properties

 

 

 

 

 

(a)

N

 

1,4

V

5, 3, 2,1

 

 

 

 

(b)

N

 

4

 

 

(C)

Figure 3.3: Examples of Subcircuits and Fully Speciﬁed Vertices

 

98

modules. To reduce the synthesis effort, these modules are often simply connected
together to provide the desired functionality. In these cases, the gate clusters repre-
senting these modules will be completely contained, with no arcs leaving or entering
the subgraph except for the primary inputs and outputs to the module. The vertices
in these subgraphs represent gates with fully-speciﬁed inputs and contained outputs.

A contained vertex represents a gate that is fully speciﬁed and joined within the

subgraph by either all of the vertices representing its outputs or none of those vertices.

Deﬁnition 3.7 In a subgraph H of a circuit graph G, a vertex v E V(H) is a con-
tained vertex if v is fully speciﬁed and ((Vu | u E V(G) /\ vu E E(G) ——) u 6

V(H)) v (Vu I u e V(G) A vu e E(G) —> u g; V(H))).

Deﬁnition 3.8 A subcircuit H of a circuit graph G is a contained subcircuit of

G if and only if each vertex in H is contained.

The ﬁrst subgraph in Figure 3.4, H0, containing only Vertex 5, is 3 contained
subcircuit. None of the inputs of Vertex 5 are in the subgraph, so it is fully speciﬁed,
and it does not have any outputs, so it is contained. The subgraph H1 is created by
adding Vertices 3, 2, and 1 to H0. It is not a contained subcircuit because although
all of the vertices are fully speciﬁed, Vertex 2 is not a contained vertex because only
one of its two children is present in the subgraph. The only contained subgraph of
the original circuit graph G is the subgraph H2, because after Vertex 4 is added to
contain Vertex 2, Vertex 4 is not fully-speciﬁed, so Vertex 0 must also be added.

There are considerably fewer contained subcircuits than subcircuits. To improve
the module identiﬁcation process, a preliminary search to locate and match only

99

 

Subgraph Ho Subgraph H1 Subgraph H2

"

 

 

 

 

.I. I \ .. 3
0 -. I x: ‘x 3 " 0 a p]
o.E.- .'.. wI/ v.:.
,L(' i 3
I .
, 2 I l ;
\ A’ ' : e
T \ : f /
V ,' \ L(/ l r
\ s I’ i
(a) (b) (C)

Figure 3.4: Examples of Contained Subcircuits and Vertices

 

contained modules can provide a considerable reduction in computation, because
any vertices within identiﬁed modules would no longer be considered for inclusion in

another module, thus reducing the order of the circuit graph being explored.

3.2 Enumeration of Candidate Subcircuits

Developing a naive technique to enumerate the candidate subcircuits is straightfor-
ward. By enumerating all of the subgraphs of the circuit graph, all candidate subcir-
cuits must be enumerated. A simple recursive algorithm can accomplish this, but the
complexity of such an approach makes it applicable to only the smallest circuits (< 20
gates). It is more difﬁcult to enumerate only the candidate subcircuits, but even this
task is relatively easy. Following a few simple rules will focus the recursive algorithm
on the enumeration of only the candidate subcircuits. A challenging problem is to

100

generate only the candidate subcircuits and generate each exactly once.

The unique generation of each subcircuit is very important. Computation time
is wasted by generating duplicate subcircuits. More importantly, this enumeration
technique is designed to solve the Module Identiﬁcation Problem in partnership with a
solution to Subcircuit Identiﬁcation. The current techniques to perform the necessary
semantic matching evaluation are computationally intensive. Performing a semantic
match between two subcircuits is complex and requires signiﬁcant processing time.
Therefore, by enumerating each subcircuit only once, unnecessary semantic matching
can be avoided.

There are three ways to handle the duplicate subcircuit problem within the Mod-
ule Identiﬁcation Problem. First, perform semantic matching for each subcircuit
that is created. This approach is computationally undesirable because the amount
of processing time needed to perform each semantic match is considerable and can
signiﬁcantly impact the overall execution time. Second, maintain a hash table of
subcircuits. This approach allows a preliminary check to be performed to determine
whether or not the current subcircuit has already been generated and evaluated for
equivalence. This approach is highly inefﬁcient because the memory consumption
of such a hash table can be prohibitive, and the time to search it for an identical
subcircuit can be considerable.

The third approach is to avoid the problem entirely. Generation and enumeration
of duplicate subcircuits can be avoided by applying the rules presented in Section
3.2.1. These rules impose a strict ordering on the addition of vertices to subgraphs,
thus guaranteeing that each subgraph will be generated once and only once. This is

101

. ,,___._.,_,

the solution described in Section 3.2.1.

It is possible to enumerate all subgraphs of a graph by creating a subgraph to
contain a single gate of the original graph, recursively adding neighboring vertices
until it is equivalent to the original circuit, and then doing the same for every other
vertex of the original graph. Each intermediate subgraph would be enumerated by
this process. This naive algorithm for subcircuit enumeration is described in Figure
3.5 and diagrammed in Figure 3.6. With this method there are many paths to the
creation of each subgraph, and each will be explored in the procedure. In a com-
pletely connected graph, which contains 2" subgraphs, this algorithm would produce
12.! subgraphs, though only 2" of them would be unique. This naive algorithm provides
a framework for reﬁnement. By applying slight modiﬁcations to the naive algorithm,
unique focused enumeration of candidate subcircuits may be performed.

The research presented in this chapter provides a technique to enumerate each
SUbgraph of a graph exactly once. A set of well-deﬁned rules that govern the addition
0f vertices to subgraphs has been developed. Using these rules, it is possible to
ensure that there is exactly one path to the creation of each subgraph. To permit the
application of these rules, each vertex is assigned a unique integer index (Deﬁnition
3-2). This index provides an ordering between each pair of vertices in relation to the
Subgraph being created. This ordering results in exactly one path of vertex addition
to the creation of any subgraph and, consequently each subgraph is enumerated once

and only once.

In addition, within module identiﬁcation, this enumeration may be focused on
a speciﬁc subset of the subgraphs. Only subgraphs with functional meaning are

102

 

 

Naive Subgraph Enumeration Algorithm

0. Transform circuit into circuit C.

1. foreach v E V(G):
2. Create a subgraph H0 with V(Ho) <— v.
3. Determine N (H0).
4. expandSubgraph(Ho).

5. End foreach.

subroutine expandSubgraph(graph H,)
O. foreach vertex vv+1 E N (H,):
1. Create a subgraph H,“ such that V(H,+1 (— H, + v,+1.
2. Determine N(H,+1).
3. Output Hi“.
4. expandSubgraph( H ,-+1 ) .
5. End foreach.

1F L—_-A .n-q

 

 

Lg

 

Figure 3.5: Algorithm for Naive Subgraph Enumeration.

interesting, so only these subcircuits (Deﬁnition 3.6) are enumerated. The candidate
Slibcircuits enumerated may be further reduced to the subset including only contained
SchircuitS. The next section presents the deﬁnition and rules that are applied to focus

the naive algorithm on unique enumeration of candidate subcircuits.

3.2.1 Rules

The following rules are applied in the algorithm presented in Section 3.2.3 and serve
Several purposes within the enumeration process. Rule 1 prevents any subgraph from
being created more than once. Rule 2 and Rule 3 focus subgraph generation on speciﬁc
tb’pes of subgraphs, subcircuits (Deﬁnition 3.6) and contained subcircuits (Deﬁnition

3.8), respectively.

103

 

 
  

    
    

 

   
 
   

 

 

 
 

 

 

 

  
    

Circuit. Foreach Create ~ Determine
graph G mm a“; ‘ subgraph neighborhood
For each
New subgraph vertex vin
neighborhood
Output Add v to
new subgraph , new subgraph

 

 

(a)
Figure 3.6: Algorithm for Naive Subgraph Enumeration (Diagram).

 

104

Unique Enumeration

T 0 ensure that there is only one creation path for each subgraph created, an order
must be deﬁned for each pair of vertices with respect to their addition to a speciﬁc
subgraph. The naive algorithm in Figure 3.5 iteratively adds each vertex in the neigh-
borhood of a subgraph H,- to Hi, creating H,+1 (Steps 0 and 1 of expandSubgraph),
then expands Hi+1~

The ﬁrst step toward unique enumeration is recognizing that all of the vertices of
the subgraph’s neighborhood are not necessarily viable candidates for expansion. A
Simple step toward eliminating duplicate subgraph generation is to disallow addition
0f vertices that have an index higher than that of the subgraph. This subset of the

neighborhood is referred to as the frontier of the subgraph.

Deﬁnition 3.9 The frontier of a subgraph H, .7 (H ), consists of all v such that

'U 6 N(H) and v.index < H.index.

In Figures 3.7, 3.8, and 3.9, the ﬁnal subgraphs are identical, containing vertices

[5, 3, 2], but the creation paths are different. In Figure 3.7, the creation path is {3,
2, 5}. In Figure 3.8, the creation path is {5, 2, 3}. By restricting viable expansion
Vertices to those on the frontier of a subgraph, the former creation path is disallowed.
Restricting expansion of a subgraph H to members of .77 (H ) does not enforce a
Single creation path for each subgraph. The creation paths in Figures 3.8 and 3.9
both create the subgraph containing vertices [5, 3, 2] when the frontier is considered.
It is therefore necessary to further specify the ordering of vertex addition to ensure
that no duplicate creation paths may exist. This is accomplished by limiting the

105

 

Subgraph Subgraph Subgraph

3 3,2 5,3,2

 

 

(a) (b) (C)
Figure 3.7: Subgraph containing vertices [5, 3, 2] created by P(H) = {3, 2, 5}.

\

Subgraph Subgraph Subgraph
5 5, 2 5, 3, 2

 

   

(a) (b) (C)
Figure 3.8: Subgraph containing vertices [5, 3, 2] created by P(H) = {5, 2, 3}.

g

106

 

Subgraph Subgraph Subgraph
5 5, 3 5, 3, 2
0 l 3 0 I r31 0

 

 

 

(a) (b) (C)
Figure 3.9: Subgraph containing vertices [5, 3, 2] created by P(H) = {5, 3, 2}.

‘

Viable expansion vertices to a subset of the frontier, called the reachable frontier.
The reachable frontier of a subgraph H consists of each vertex of the frontier whose
addition has not yet been forbidden by the addition of some vertex already on the
Creation path for H. There are two reasons why a vertex v may be on the reachable
frontier. First, it may have just become a member of the frontier due to the addition
Of an intermediate vertex that put it into the neighborhood of H. Every vertex on
the frontier is considered for addition to the reachable frontier exactly once. Second,
it may have been on the reachable frontier for a subgraph earlier in the creation path,
but has always had an index lower than the vertices added so has not been removed.
When a vertex v is a member of the reachable frontier of a subgraph Ho, it may
be added to create a new subgraph H1. At that point, the reachable frontier of H1
will contain any vertices that are new to the frontier of H1, as well as any vertices

107

from the reachable frontier of H0 that have an index less than v.index. Thus, a local
ordering for addition with respect to the subgraph H1 exists for each pair of vertices.
The ordering between vertices iS only with respect to the subgraph currently being
expanded. The ordering is affected by the order in which vertices are encountered and
added to the neighborhood. Until a vertex enters the neighborhood of a subgraph,

its ordering with respect to other vertices is not yet deﬁned.

Deﬁnition 3.10 The reachable frontier of a subgraph H, fR(H), consists of all

of the vertices v that may be added to H. For a subgraph H,- = H + H, fR(H,-)

Consists of all v such that:

v E .7:(H,-) and either (v ¢ .7:(H,-_1)) or (v E fR(H,-_1) and v.index < H.index).

In Figure 3.10a, the only vertex in the subgraph is Vertex 4. The vertices in
tIle neighborhood are Vertices 0 and 2, and those also comprise the frontier and the
reachable frontier of H0. In H1, Vertex 2 has been added. All of the vertices in H 1’s
neighborhood are on the frontier and reachable frontier except Vertex 5, because it
violates the v.index < H.index property of the deﬁnition of the frontier, because
5.index > H1.index. Because it is not a member of the frontier, it cannot be a
member of the reachable frontier. When Vertex 0 is added to the subgraph, all of the
Vertices in .FR(H1) with indices greater than the index of Vertex 0 are removed from

the reachable frontier and will never again appear on it.

The ﬁrst rule of vertex addition prevents the creation of any duplicate subgraphs.

It requires that any vertex added to a subgraph H must be a member of fR(H), the

reachable frontier of H.

108

 

Subgraph H1

Subgraph Ho Subgraph H2

 

   

I 0 ‘ I .2 3 0 I: I .2 :. 3 .2
\ l .' P2 '. .‘ ’. ..
‘l’

' i : ..' I
I ,L (' '

I
l ‘ 2 \
| I
I > ~ I...
| I
/ '.
’ ' ”I
. 5 ”

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Ho Properties H1 Properties H2 Properties
V 4 V 4, 2 V 4, 2, 0
N 0,2 N 0,1,3,5 N 1,3,5
F 0, 2 F 0, 1, 3 F 1, 3
FR 0, 2 FR 0, 1, 3 F“ O

(a) (b) (C)

 

 

 

 

 

 

Figure 3.10: Frontier and Reachable Frontier

109

 

Deﬁnition 3.11

Rule 1: Only add vertices to H that are members of fR(H).

 

Unique Subgraph Enumeration Algorithm

0. Transform circuit into circuit C with unique vertex indices.
1. foreach v E V(G):

2. Create a subgraph Hg with V(Ho) (— v.

3. Determine fR(Ho).

4. expandSubgraph(Ho).
5. End foreach.

subroutine expandSubgraph(graph Hg)
0. foreach vertex vv+1 E fR(H,) (Rule 1):
1. Create a subgraph H,-+1 such that V(H,+1 (— V(H,-) + v,-+1.
2. Determine fR(H,-+1).
3. Output Hi“.
4. expandSubgraph(H,+1).
5. End foreach.

 

 

 

Figure 3.11: Algorithm for Unique Subgraph Enumeration.

Figure 3.11 shows the algorithm for unique subgraph enumeration, modiﬁed from
the naive algorithm in Figure 3.5 on page 103 by the application of Rule 1 in Step 0 of
the subgraphExpand subroutine. Figure 3.12 presents a graphical representation

0f the algorithm.

Examples of Rule 1 Application To illustrate the application of Rule 1, consider

the example circuit graph C (Figure 3.2).

Example 1 The subgraphs discussed in this example are shown in Figure 3.13.

110

 

  
     
   

     
    

 

  

 

-
7 9’3th vertexofG SUDgraph frontier

 

For each
vertex vln
reachable frontier

New subgraph

 

 
 
 

 

 

 

    

Add v to
new subgraph

Output 2
new subgraph

Figure 3.12: Algorithm for Unique Subgraph Enumeration (Diagram).

 

111

 

Subgraph H0

0

I

\

 

 

Ho Properties

 

 

 

 

 

 

 

 

V 5
N 2, 3
F 2, 3
FR 2, 3
(a)

Subgraph H1 Subgraph H2

’~

 

H1 Properties
V 5, 2

N 1,
F 1,
FR 1

 

 

 

,4
4

,

 

 

 

 

 

A0300

3

 

(b) (C)
Figure 3.13: Rule 1 (Example 1).

 

112

Subgraph H0 is composed of a single vertex, Vertex 5. The frontier of Ho, .7-"(H0)
(Deﬁnition 3.9), contains all of the vertices that are no more than one arc away and
have an index less than that of the subgraph, in this case, Vertices 2 and 3. The
reachable frontier of H0, .7-‘R(Ho) (Deﬁnition 3.10), is simple to calculate in this case,
because there are no previous subgraphs in the path; the reachable frontier is equal
to the frontier.

Rule 1 states that any vertex v E FR(H0) may be added. The addition of Vertex
2 to Ho creates the subgraph H1. Vertices 1, 3, and 4 are all members of f (H 1) now,
but Vertex 3 is no longer reachable because its index is greater than that of the ﬁnal
vertex of .7(H1), Vertex 2. The index of Vertex 1 is less than that of Vertex 2, but
Vertex 1 is newly reachable (by the addition of Vertex 2), so it is currently reachable.
Thus, .7-"R(H1) contains Vertices 1 and 4.

Vertex 1 is added to H1 to create the subgraph H3. fR(H2) is empty, so this path
has been exhausted. The creation path (Deﬁnition 3.3) for this subgraph, P(H3), is

{5,2,1}.

Example 2 The subgraphs discussed in this example are shown in Figure 3.14.
The example above illustrates that there is an ordering between the vertices with
respect to the current subgraph. The addition of Vertex 2 before Vertex 3 prevented
Vertex 3 from ever appearing in a subgraph based on the expansion of H1 (Figure
3.13b).
H1 (Figure 3.14b) is the result of adding Vertex 3 to Ho, instead of Vertex 2.
Vertex 2 is the only vertex on the reachable frontier of H1, so it is added next to

113

"l

 

Subgraph H0 Subgraph H1 Subgraph H2

"

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 1 ' ‘ 3 \ o 1 . Pl
,.'\‘|’/ ’
,L( i ’L('
1 2 l ' : 1’ 2 l
I l /
~ /\\ | - ~ /\\
' \ ' \
' I ..--' r l
4 to
Ho Properties H1 Properties H2 Properties
V 5 V 5, 3 V 5, 3, 2
N 2, 3 N 2 N 1, 4
F 2, 3 F 2 F l, 4
FR 2, 3 15'R 2 FR 1, 4
(a) (b) (C)

Subgraph H3 Subgraph H4 Subgraph H5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

H3 Properties H4 Properties H5 Properties
V 5,3,2,4 V 5,3,2,4,1 V 5,3,2,4,1,0
N O, 1 N O N 0
F O, 1 F 0 F 0
FT 1 FR 0 7‘“ Q)
(d) (e) (f)

Figure 3.14: Rule 1 (Example 2).

 

114

create H2. Either of the two vertices on the reachable frontier of H2 may be added
next. If Vertex 1 is added then the expansion will end immediately because the only
remaining neighbor, Vertex 4, would not be reachable. Vertex 4 can be added before
Vertex 1 to create H3, allowing the addition of Vertices 1 and 0 in that order. This
effectively creates a subgraph equivalent to the original graph. That creation path,
P(H5) = {5, 3, 2, 4, 1, 0}, is the only path that will reproduce circuit graph G.
Additional rules are needed to ensure that only subcircuits or contained subcircuits
are enumerated. Rule 2 will cause only subcircuits to be produced, while Rule 3 will
cause only contained subcircuits to be generated. Note that Rule 3 requires the

simultaneous application of Rule 2.

Subcircuits

Rule 2 is designed to ensure that only subcircuits (Deﬁnition 3.6) are enumerated.
Each vertex in the subgraph is tested to ensure that it is fully speciﬁed. If a vertex
is only partially speciﬁed, then the vertices necessary to make it fully speciﬁed are
added to the subgraph (if they are members of the reachable frontier).

When all of the vertices are fully-speciﬁed, then the subgraph is a subcircuit. If the
necessary vertices may not be added because they are not members of the reachable

frontier of the subgraph (Rule 1), then the subgraph is discarded.

Deﬁnition 3.12
Rule 2: Output only subcircuits
while (31) I v E V(G)/\(Elu luv 6 E(G) /\ u E V(H)) /\ (310 l mu 6 E(G) A 11) ¢

115

V(H)»

o Ifw E FR(H) then V(H) = V(H) +112.

0 Else discard H.

end while

 

Unique Focused Enumeration Algorithm

0. Transform circuit into circuit G with unique vertex indices.
1. foreach v 6 V(G):

2. Create a subgraph H0 with V(Ho) (— v.

3. Determine fR(Ho).

4. expandSubgraph(Ho).
5. End foreach.

subroutine expandSubgraph(graph H,)
0. foreach vertex vv+1 E .7-"R(H.-) (Rule 1):
1. Create a subgraph H,“ such that V(HiH (— V(H,-) + 1),-+1.
2. Determine fR(H,-+1).
3. Add vertices necessary to make H,“ into a subcircuit (Rule 2).
4. Output Hi“.
5. expandSubgraph(H,+1).
6. End foreach.

 

 

 

Figure 3.15: Algorithm for Unique Focused Enumeration.

The algorithm in Figure 3.15 presents the unique subgraph enumeration algorithm

of Figure 3.11 on page 110 modiﬁed by the addition of Rule 2 in Step 3 within

the subgraphExpand subroutine. A graphical representation of the algorithm is

presented in Figure 3.16.

116

 

   
 
  

     
    

 

   
 
 
   
  

Determine A

 

 

 
   

 

    
    

 

 

 
 

 

 

Circuit Far each Create ,
. reachable
graph G “'19" 0’3 frontier
. I For each
New subcircuit vertex v in
reachable frontier
Add v to
new Subgraph
Output Make new subgraph ‘
new subgraph into a subcircuit

 

(a)

Figure 3.16: Algorithm for Focused Subgraph Enumeration (Diagram).

 

117

0"-

Examples of Rule 2 Application To illustrate the application of Rule 2, its

application will be demonstrated on the example circuit graph G (Figure 3.2).

 

/‘

Subgraph H0

 

 

 

 

 

 

 

 

 

Ho Properties
V 4
N 0, 2
F 0, 2
FR 0, 2
(a)

Subgraph H1

 

 

 

Subgraph H2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 3.17: Rule 2 (Example 1).

H1 Properties

V 4, 2

N 0, 1, 3, 5

F O, 1, 3

FR 0, 1, 3
(b)

 

 

 

H2 Properties
V 4, 2, O
N 1, 3, 5
F 1, 3
FR 1, 3

(C)

 

Example 1 The subgraphs discussed in this example are shown in Figure 3.17.

The trivial subgraph Ho may be expanded by adding either of its neighbors, Vertex

2 or Vertex O. The addition of Vertex 2 results in H1, in which Vertex 4 is not fully

speciﬁed. Rule 2 requires that any vertices necessary to fully specify each partially

speciﬁed vertex are added immediately. Therefore, Vertex 0 is added, making Vertex

4 fully-speciﬁed and H2 3 subcircuit.

118

 

Subgraph Ho Subgraph H1

(5’2)

0
pg 1

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

s .\
5 s a: 5

Ho Properties H1 Properties
V 4,2,0 V 4,2,0,1
N 1, 3, 5 N 3, 5
F 1, 3 F 3
FR 1, 3 FR Q

(a) (b)

Figure 3.18: Rule 2 (Example 2).

 

119

Example 2 The subgraphs discussed in this example are shown in Figure 3.18.
Beginning with H0, either Vertex 1 or Vertex 3 may be added. H1 is the result of

adding Vertex 1. Vertex 2 is only partially speciﬁed in H1 because Vertex 3 is not a

member. Rule 2 dictates that Vertex 3 must be added immediately, but it is not on

the reachable frontier. Therefore, this subgraph is not a subcircuit and is discarded.

Contained Subcircuits

Rule 3 guarantees that only contained subcircuits (Deﬁnition 3.8) are created. It
operates similarly to Rule 2, except that as a vertex is added, Rule 2 is applied, then
the subcircuit is evaluated to ensure that it is contained. If not, the vertices necessary
to fulﬁll that condition are added to the subgraph, if possible.

When all of the vertices are fully speciﬁed and contained, then the subcircuit is
marked as contained. If the necessary vertices may not be added because they are
not members of the reachable frontier of the subgraph (Rule 1), then the subgraph is

discarded.

Deﬁnition 3.13

Rule 3: Output only contained subcircuits.
while (31) I u E V(G) /\ (Elu I uu E E(G) Au 6 V(H)) /\ (3w | uw E E(G) Aw ¢

V(H)»
o Ifw E fR(H) then V(H) = V(H) + w.
o Else discard H.

end while

120

Example of Rule 3 Application

Example 1 This example again illustrates the creation of the subgraph that covers
the entire circuit graph, but within the context of Rule 3 application. Subgraphs Ho

- H5 are shown in Figure 3.19

When Vertex 3 is added to the initial subgraph H0, Vertex 2 must be added
immediately, both to fully specify Vertex 5 and to contain Vertex 3. The result is H2,
in which Vertex 2 is only partially speciﬁed and non-contained. Rule 2 applies ﬁrst,
so Vertex 1 is added to H2 to fully specify Vertex 3 and create H3. The reachable
frontier of H3 is not affected by the addition of Vertex 1 because all of the input
vertices of a vertex are treated as a single unit if the highest non-member vertex is in
the reachable frontier when Rule 2 is applied. Expansion proceeds from H3 by adding
Vertex 4 to create H4. Vertex 0 is added last to reproduce the entire circuit graph as

H5. All vertices are fully speciﬁed and contained, so H5 is a contained subcircuit.

The application of Rule 3 guided the generation along this creation path to most

efﬁciently create the contained subcircuit.

Example 2 The subgraphs for this example are displayed in Figure 3.20. When
Vertex 2 is added to trivial subgraph H0, it is non-contained, because its second
parent (Vertex 5) is not a member of H1. Furthermore, Vertex 5 is not a member of
the frontier or reachable frontier of H1, so Vertex 2 could never become contained.

H1 is therefore discarded.

If contained subcircuit enumeration is desired, Rule 3 may be added to the focused

121

 

 

Subgraph Ho Subgraph H1 Subgraph H2

"

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 , 3 l 0
>"/
s s '
' I
l I L ( |
l 2 l : ; l
.-\ ~ "I I :
‘\ l f
,' v ,
o 4
Ho Properties H1 Properties H2 Properties
V 5 V 5, 3 V 5, 3, 2
N 2, 3 N 2 N 1, 4
F 2, 3 F 2 F 1, 4
F 2, 3 7R 2 F 1, 4
(a) (b) (C)
Subgraph H3 Subgraph H4 Subgraph H5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

,L ( ll
4 [l P50
H3 Properties H4 Properties H5 Properties
V 5,3,2,1 V 5,3,2,1,4 V 5,3,2, 1,4,0
N 4 N 0 N 0
F 4 F 0 F C
FR 4 FR 0 FR 0
(d) (8) (f)

Figure 3.19: Rule 3 (Example 1).

 

122

 

/‘

Subgraph H0 Subgraph H1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

, o l

, .

,L ‘
i ' 2‘, ;
l >~ i
I , _ 3

23,1...

Ho Properties H1 Properties
V 4 V 4, 2
N 0, 2 N O, 1, 3, 5
F 0, 2 F 0, 1, 3
FR 0, 2 FR 0, 1, 3

(a) (b)

Figure 3.20: Rule 3 (Example 2).

 

123

enumeration algorithm in Figure 3.15 by changing Step 3 of the subgraphExpand

subroutine to apply both Rule 2 and Rule 3.

3.2.2 Correctness of Rules

To establish the correctness of the rules presented in Section 3.2.1, several lemmas
are presented to establish that the following three major objectives have been accom-

plished correctly [113].

0 When Rule 1 is applied, each subgraph has precisely one creation path.
0 When Rule 2 is applied, only subcircuits are enumerated.

c When Rule 3 is applied, only contained subcircuits are enumerated.

1. Poinderr is the highest index in the subgraph H. To begin with, it is
necessary to establish that the initial vertex of a subgraph is the vertex with the
highest index. Rule 1 requires that each vertex 1) added to a subgraph H must be
a member of fR(H). From that, its membership in .77 (H ) may be assumed, which
requires that v.index < H .index. Because no vertex may be added that has an index
higher than that of the subgraph, the initial subgraph must contain the initial vertex
by Deﬁnition 3.3. Therefore the initial vertex must be the vertex with the highest

index. This discussion establishes:

2. 1),-+1 E .7:(H,-). Next, the membership of each vertex v on the creation path of H

must be established. If v is the next vertex to be added to the subgraph, it must be

124

in the neighborhood, because otherwise there is no are between 1) and H and it could
not be accessed. Further, if u will ever be a member of H, it must have an index less
than that of H. v’s membership in the subgraph is known, so v.index < H .index,

and therefore it must be in the frontier of H.

Lemma 3.1 Given a subgraph of H, H,, containing the vertices of P(H),

P0, P1, ..., R, then the next vertex to be added, Pm, is a member of .7-"(H,).

Proof: Assume 1),-+1 ¢ f(H,-). P,“ is a member of V(H) so it must be a member
of fR(Hz) at some position x (by Rule 1). Therefore, Pm E f(Hx) (Deﬁnition 3.10).
So, the vertices between P,- and P3 must be added before 1),-+1 may be added. However,
1),-+1 is by deﬁnition the next vertex added, which contradicts the assumption that
there are additional vertices between P,- and P; that must be added. Because no
other vertices may be added before 1),-+1 and Pi+1 E 77(Hz), a: = i, contradicting the

assumption that 1),-+1 ¢ f(H,-).

3. P,“ E fR(H,-). Lastly, it must be established that when a vertex u was added
along the creation path of the subgraph H, it was a member of 1"”(H). Lemma 3.1
asserted that v E .7: (H ), so either it must be newly reachable or it must not have
been excluded from the reachable frontier by the previous addition of a vertex with
a higher index.

The ﬁrst part is clear: if a vertex is newly reachable, then it must have at least one
opportunity to become a member of the graph, so it is automatically on the reachable
frontier.

125

The second part is more complicated. If a vertex is not newly reachable, then it
must have been on the reachable frontier of the previous subgraph, Hixl. Therefore,
the addition of the vertex that was added to create H,- may have invalidated v’s
membership on fR(H) if its index was less than v’s. However, if that occurred,
then v would never be able to become a member of H. Its membership is already

established, so it must be a member of fR(H,-_1).

Lemma 3.2 Given a subgraph of H, H,, containing the vertices of P(H),

P0, P1, ..., P,, then the next vertex to be added, Pm, is a member of 7R(H,-).

Proof: P,“ E .7:(H,-) by Lemma 3.1

0 Suppose 1),-+1 ¢ f(H.-_1), meaning that 1),-+1 is newly reachable and has not yet

been considered for addition.

Because PHI 6 .7:(H.-) and P,“ ¢ f(H,_1), then 1),-+1 need not meet any criteria
to be a part of fR(H,-) because it has not had a chance to be considered for

addition yet, so therefore P,“ E FR(H,-).

0 Suppose P,“ E f(H,-_1).

— i-l-l 6 fR(Hi—1)
Proof: Assume 1),-+1 ¢ fR(H,-_1). Because 1),-+1 E .7:(H,-_1) and P,“ ¢
fR(H,-_1), P,“ will not appear on .7-‘R(Hx) for anyi 5 x g n (by Deﬁnition
3.10), and will never become a member of V(H). However, P,“ is a
member of V(H), so therefore P,“ e FR(H,_1).

126

— ,-+1.index < H.index
Proof: Assume P,+1.index Z H.index. We have already proven that
P,“ E fR(H,-_1). Hence, when P,- is added to create the subgraph Hi, 1),-+1
will not be a member of fR(H,-), and therefore will never become a member

of V(H). However, 1),-+1 is a member of V(H), so 1),-H.index < H.index.

Therefore, by Deﬁnition 3.10, 1),-+1 E fR(H,-).

Rule 1: Unique Creation Paths

Theorem 3.1 Rule 1 implies that for any induced, connected subgraph H of a circuit

graph G, there is a creation path P(H) that results in the subgraph H.

Proof: Assume that there is no creation path P(H). Therefore, for some Pm,
Rule 2 must forbid its addition to P34, so P; ¢ .7-‘R(Hx_1). Lemma 3.2 states
that P,- E fR(H,-_1), which contradicts the assumption, thus proving that there is a

creation path to H.

Theorem 3.2 Rule 1 guarantees that for any induced, connected subgraph H of a

circuit graph G, there is no more than one creation path P(H).

Proof: Assume that there are two or more creation paths for an arbitrary subgraph
H. Denote the subgraph created along one of these paths as H1, and the vertices
along H 1’s path as u1,u2, ..., u"; denote the subgraph created along the other path
as H 2 and the vertices along H 2’s path as v1, v2, ..., vn. Let x be the ﬁrst position in
which at 5:5 v3. By Lemma 3.2, at E fR(H;_1) and v1, 6 fR(H3_1). .7-‘R(H;_1) must
be equivalent to .FR(H§_1) because u.- = v,- for O S i < x.

127

All indices are unique, and the labels u and v were assigned arbitrarily, so let
ux.index < vx.index. If Pa, = um, then v1. 9E .7-‘R(H§), by Deﬁnition 3.10, and therefore
could not become a member of H. However, we know that v; E V(H), so PI 95 ux.
If P; = vx, then um E fR(H,1,) and it could become a member of H.

Therefore, for any two vertices, there is only one order in which they may be
added to H. u; = v, for 0 S x S n. This contradicts our assumption, thus proving

that there is a single unique creation path P(H) to the subgraph H.

Rule 2: Only Subcircuits Created

When Rule 2 is applied to a subgraph, each vertex in the subcircuit is examined
to determine whether or not it is fully speciﬁed. If a vertex v is found to be only
partially speciﬁed, that is, at least one but not all of v’s input vertices are present,
then the vertices necessary to fully specify v are added to the subgraph if they are
on the reachable frontier.

If all of the input vertices may not be added because they are not members of
the reachable frontier, then the subgraph is discarded. Therefore, the only subgraphs

that may be enumerated when Rule 2 is applied are subcircuits.

Rule 3: Only Contained Subcircuits Created

When Rule 3 is applied to a subgraph, each vertex in the subcircuit is examined to
determine whether or not it is contained. If a vertex v is found to be not contained,
that is, at least one but not all of v’s output vertices are present, then the vertices
necessary to fully specify v are added to the subgraph if they are on the reachable

128

frontier.
If all of the output vertices may not be added because they are not members of
the reachable frontier, then the subgraph is discarded. Therefore, the only subgraphs

that may be enumerated when Rule 3 is applied are contained subcircuits.

3.2.3 The Algorithm

The algorithm in Figure 3.21 (repeated from page 116) performs candidate subcircuit
enumeration and guarantees the three objectives stated at the beginning of Section
3.2.2. Vertex ordering is implemented so that the algorithm may apply the rules
described in Section 3.2.1. Rule 1 prevents the generation of duplicate subgraphs
by ensuring that there is only one expansion path that creates each subgraph. The
application of Rule 2 guarantees that each of the subgraphs created by the algorithm
is a functional subcircuit. Rule 3 may also be applied to guarantee that each of the

subgraphs is also a contained subgraph.

Algorithm Details

0 Step 0: Transform circuit into circuit graph G with vertex indices.

Create a vertex to represent each logic element in the circuit. These vertices
contain necessary information about the logic element, including number of
inputs and outputs, distance from primary inputs, and function performed.

Each vertex is assigned a unique integer index as it is visited.

This operation is 0(n), and will halt because the number of logic elements in

129

 

 

Unique Focused Enumeration Algorithm

0. Transform circuit into circuit G with unique vertex indices.
1. foreach v E V(G):

2. Create a subgraph H0 with V(Ho) (— v.

3. Determine fR(H0).

4. expandSubgraph(Ho).
5. End foreach.

subroutine expandSubgraph(graph H.)
O. foreach vertex vv+1 E fR(H,-) (Rule 1):
1. Create a subgraph H,“ such that V(H,+1 +- H, + v,+1.
2. Determine .7-‘R(H,+1).
3. Add vertices necessary to make H,“ into a subcircuit (Rule 2).
4. Output Hi“.
5. expandSubgraph(H,+1).
6. End foreach.

 

 

Figure 3.21: Algorithm for Unique Focused Enumeration.

any circuit is ﬁnite.

0 Step 1: foreach v E V(G):
This loop is performed n times, once for each vertex in the circuit graph, so the

complexity is O(n).

0 Step 2: Create a subgraph H0 with V(Ho) (— v.
Creating a subgraph containing v is an 0(1) operation. The subgraph informa-

tion (index, order, depth, etc.) is initialized.

0 Step 3: Determine fR(H0).
The reachable frontier of a trivial subgraph is equivalent to the frontier, which

consists of all of the neighboring vertices with index less than that of the sub-

130

graph. All vertices may be neighbors of the subgraph, so this step is 0(a).

0 Step 4: expandSubgraph(Ho).

Send the newly created subgraph H0 to the expandSubgraph subroutine.

0 Step 5: end foreach.
This outer loop is performed exactly n times, once for each vertex in the circuit

graph.

subroutine expandSubgraph(subgraph Hi)

0 Step 0: Rule 1: foreach vertex v,-+1 E fR(H,-):

a Step 1: Create a subgraph H,“ <— H, + v,+1.
The subgraph H,- is copied. All information is copied to the new subgraph,
including the list of vertices, the order, the depth, the index, etc. Copying
the subgraph H,- and adding a vertex to the new subgraph H,“ is an 0(a)
operation because it is possible that all vertices of the circuit graph belong to

the new subgraph.

0 Step 2: Determine fR(H,-+1).
For each vertex x adjacent to H,- + 1: if x is not a member of V(H) or FR(H,-

and has an index less than H .index and has not been previously explored, then

add 3: to fR(H,-+1).

For each vertex x in .7-"R(H,-): if x.index < vi+1.index then add 3: to fR(H).

131

Each vertex in V(G) may be a neighbor of v,-+1, so this step is 0(n) and will

halt when each vertex adjacent to x has been considered.

0 Step 3: Rule 2: Add vertices necessary to make H,+1 a subcircuit or
discard.
The addition of vertex vi+1 can only affect vertices adjacent to it. For each of
those vertices, if a proper subset of its parent vertices are members of V(H,+1),
then add necessary vertices until H,“ is a contained subcircuit. If a necessary

vertex is not a member of fR(H,+1) then discard the subgraph.

Because this step requires a nested loop to check all (possibly 72. — 1) inputs of
internal vertices (possibly n) to determine if they are fully speciﬁed each time

a vertex is added, its complexity is 0(n3).

This step is performed more efﬁciently by sorting the inputs of each vertex to
ensure that the expansion proceeds with the fewest subcircuits discarded due

to Rule 1 enforcement.

0 Step 4. Output Hi“.

Output the newly created subgraph.

0 Step 5. expandSubgraph(H,+1).

Send the newly created subgraph H,“ to be expanded.

0 Step 6. End foreach.

132

Algorithm Complexity

The creation of each unique candidate subcircuit as described in Figure 3.21 has a
worst-case complexity of 0(n3). The algorithm was presented in this manner for
clarity of presentation, though an actual implementation of this technique can com-
bine the determination of the reachable frontier and the application of Rule 2 (and
Rule 3, if desired) and reduce the complexity to 0(n2). Therefore, the computational
complexity of the focused enumeration algorithm is 0(kn2), where k is the number

of candidate subcircuits in the circuit graph (see Section 3.3).

Correctness of Algorithm

To address the correctness of the focused enumeration algorithm, several theorems
are needed. It must be demonstrated that the three rules developed to restrict the
enumeration of subgraphs provide the desired results when applied within the focused

enumeration algorithm (Figure 3.21).

Rule 1: Unique Subgraphs

Theorem 3.3 When Rule 1 is applied, the focused enumeration algorithm produces

each induced, connected subgraph H of a circuit graph G exactly once.

Proof: Theorems 3.1 and 3.2 state that there is one and only one creation path
P(H) for any induced, connected, subgraph H of a connected graph G. Therefore,
by applying Rule 1, for each induced, connected, subgraph H, the enumeration may
follow P(H) to the unique creation of H.

133

Rule 2: Subcircuits

Theorem 3.4 When Rule 2 is applied, all subcircuits are produced by the focused

enumeration algorithm exactly once.

Proof: Based on Rule 1, Theorem 3.3 states that each induced, connected, sub-
graph of a connected graph G will be produced by the algorithm exactly once. Rule
2 does not interfere with the application of Rule 1, so Theorem 3.3 remains true for

focused enumeration with application of Rule 2.

Theorem 3.5 When Rule 2 is applied, only subcircuits are produced by the focused

enumeration algorithm.

Proof: When Rule 2 is applied to a subgraph H, the subgraph is discarded if it
cannot be made into a subcircuit. When the algorithm terminates, only the sub-
circuits have been produced. Therefore, the application of Rule 2 ensures that only
subcircuits are produced by the focused enumeration algorithm.

In essence, these two theorems ensure that when focused enumeration is executed

with the application of Rule 2, it will produce all and only subcircuits.

Rule 3: Contained Subgraphs

Theorem 3.6 When Rule 3 is applied, all contained subcircuits are produced by the

focused enumeration algorithm exactly once.

Proof: Based on Rule 1, Theorem 3.3 states that each induced, connected, sub-
graph of a connected graph G will be produced by the algorithm exactly once. Rule

134

3 does not interfere with the application of Rule 1, so Theorem 3.3 remains true for

focused subcircuit enumeration with the application of Rule 3.

Theorem 3.7 When Rule 3 is applied, only contained subcircuits are produced by the

focused enumeration exactly once.

Proof: When Rule 3 is applied to a subgraph H, the subgraph is discarded if it
cannot be made into a contained subcircuit. When the algorithm terminates, only
the contained subcircuits have been produced. Therefore, the application of Rule
3 ensures that only contained subcircuits are produced by the focused enumeration
algorithm.

In essence, these two theorems ensure that when focused enumeration is executed

with the application of Rule 3, it will produce all and only contained subcircuits.

3.3 Results

Experiments were run on a Sun Ultra 2 Model 2300 (dual 300mhz processors) with
1024MB of memory. As a proof of concept, results are shown in Table 3.1 for several
small circuits from the LogicSynth93 benchmark suite [82]. Times are given in CPU
seconds. “N/A” indicates that the enumeration results could not be obtained in a
reasonable amount of time. To perform subcircuit enumeration on larger circuits, the
techniques discussed in Chapter 5 must be applied. Results for a parallel implemen-
tation are presented in Section 5.1, and results for an implementation that enforces
an order limit are presented in Section 5.3.

135

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Original Unique Candidate Contained

Circuit Gates Subgraphs Subcircuits Subcircuits
number time number I time number time
1-bit Adder 8 114 0.01 18 <0.01 6 <0.01
2-bit Adder 15 3,408 0.14 114 <0.01 24 <0.01
3-bit Adder 22 98,922 4.79 522 0.04 66 <0.01
C17 24 40,729 1.79 3,951 0.16 1,199 0.05
majority 24 147,366 7.41 12,171 0.45 993 0.05
b1 25 1,066,434 56.60 19,980 0.75 143 0.01
cm138a 33 N/ A - 726,032 27.64 26,652 1.05
cm152a 35 N/A - 59,962 2.27 6,158 0.24
z4ml3 40 N/A - 185,196 14.64 281 0.07
x2 54 N / A - 104,178,928 4282.80 98,322 3.93
rd53 89 N / A - N / A - 207,149 23.88

 

 

 

 

 

 

 

Table 3.1: Focused Enumeration Results Demonstrating Reduction of Interesting
Subgraphs.

These results clearly illustrate the advantage of using the focused subcircuit algo-
rithm. The effort spent on enumeration and the number of subgraphs that must be

tested for semantic equivalence is greatly reduced, resulting in signiﬁcant improve-

ments to overall execution time for solutions to the Module Identiﬁcation Problem.

3.3.1 Implementation

The algorithm described in Section 3.2.3 has been implemented in C. Code to read a
circuit from a BLIF file and produce a network of gates is derived from the Colorado

University Decision Diagram package [102].

Correctness

To evaluate the correctness of the focused enumeration algorithm and the rules pre-
sented as the solution to the Candidate Subcircuit Enumeration Problem, the naive

136

algorithm for subgraph enumeration (Figure 3.5) was implemented.

Uniqueness Each subgraph H produced by the implementation of the naive al-
gorithm was placed in a hash table with a hash index deﬁned by a lexicographical
ordering of the names of the constituent vertices. If a collision occurred, a vertex
by vertex comparison was performed for H and each subgraph in that hash table
location. If any were found to be identical, H was discarded. If H was found to be

unique, it was inserted into the hash table at that location.

The subgraphs in the hash table after a full exploration of the circuit graph cor-
respond exactly to the subgraphs produced by the algorithm that included the appli-
cation of Rule 1. The results for the implementations of the naive and the focused
enumeration algorithms were compared for a range of circuit graphs to evaluate the

correctness of the implementation of the unique enumeration algorithm (Rule 1).

Subcircuits and Contained Subcircuits After enumerating all of the unique
subgraphs as described above, each subgraph was tested for membership in the sub-
circuit and contained subcircuit classes of subgraphs. Each vertex in the subgraph
was tested to determine whether or not it was fully speciﬁed. If all vertices were fully
speciﬁed, then the subgraph was marked as a subcircuit and saved. If not, the sub-
graph was discarded. Each vertex in the subcircuit was then tested for containment.
If all vertices were contained, then the subgraph was also marked as a contained

subcircuit.

The saved subcircuits of this trivial solution were compared to the results of the

137

implementation of the focused enumeration algorithm. The subcircuits produced by
the two solutions were identical. This provides a great deal of conﬁdence in the

correctness of the implementation of the focused enumeration algorithm.

3.4 Summary

This chapter has presented a technique for uniquely enumerating all of the subgraphs
of a graph. By applying this approach within the domain of digital circuits, the sub-
graphs that must be enumerated can focus on speciﬁc classes of subgraphs, reducing
the computation cost considerably.

The problem of enumerating each subgraph exactly once was addressed. By as-
signing an index to each vertex in the circuit graph, a local ordering may be enforced
between each pair of vertices with respect to the subgraph currently under expansion
(Rule 1). This ordering ensures that each subgraph is created and enumerated exactly
once.

The enumerated subgraphs were restricted to those subgraphs whose underlying
circuit represented a functional subcircuit of the circuit. Rule 2 guides the creation
of each subgraph so fewer non-subcircuits are explored, and none are produced. The
subcircuits were restricted yet further to those subcircuits that are self-contained.
These contained subcircuits are more likely to match library entities, so they pro-
vide a valuable starting place for module identiﬁcation. Rule 3 produces contained

subcircuits.

The results show the improvement that focused enumeration achieves over the

138

naive enumeration of subgraphs. Only the subcircuits and contained subcircuits need
to be considered for functional equivalence to library modules. This reduces the over-
all execution time of the solution to the Module Identiﬁcation Problem signiﬁcantly.

Chapter 4 describes a technique to avoid performing costly semantic matching on
more than one instance of subgraphs that represent subcircuits that are structurally
identical. Structurally identical subcircuits are functionally identical, so it is only
necessary to perform semantic matching on one member of each of these structural

equivalence classes.

139

Chapter 4

Subcircuit Equivalence Classes

The overall goal of the research presented in this dissertation is to facilitate the
reverse engineering process by identifying all of the subcircuits of a circuit that may
be equivalent to a library module. These subcircuits undergo a semantic matching
process, Subcircuit Identiﬁcation, to determine this equivalence. Currently available
semantic matching techniques [36] are computationally intensive, so reducing the
number of times that subcircuit identiﬁcation must be performed is desirable.

Many of the candidate subcircuits generated by the focused enumeration algo-
rithm described in Chapter 3 are likely to be structurally identical to other candidate
subcircuits. Structurally identical subcircuits are necessarily functionally equivalent,
so the number of subcircuits to which Subcircuit Identiﬁcation must be applied may
be signiﬁcantly reduced by applying it to only a few instances of each unique subcir-
cuit structure.

Despite the large number of uniquely labeled candidate subcircuits, there is a much

smaller number of structurally unique subcircuits. In other words, if arbitrary labels

140

on the vertices, such as names, are removed and replaced with a label that identiﬁes
the type of gate they represent in the original circuit, many of the subcircuits become

equivalent.

Each unique subcircuit structure deﬁnes a structural equivalence class. Each class
contains only subcircuits whose structures, with respect to gates, modules, and inter-
connects, are identical. Because the subcircuits in an equivalence class are identical,
both structurally and functionally, any matching information gained about one in-
stance of that class provides the same information for all other subcircuits in the

class.

To facilitate the partitioning of subcircuits into equivalence classes, a technique
for generating structural identiﬁers has been developed. These identiﬁers describe the
structure of a circuit with respect to structural invariants such as the functionality of

the individual vertices and the interconnection of the vertices.

Each structural identiﬁer represents a single structural equivalence class. The
identiﬁer is created for each subcircuit generated by subcircuit enumeration. All
subcircuits with the same identiﬁer are members of the same equivalence class, and
all subcircuits within an equivalence class are structurally identical. This chapter

focuses on the presentation of the structural identiﬁer generation technique.

4.0.1 Overview of Technique

As with the approach to Candidate Subcircuit Enumeration presented in Chapter
3, structural identiﬁers and subcircuit equivalence classes apply to circuit graphs,

141

graphical representations of logic circuits.

An identiﬁer that uniquely describes the structure of a circuit may be generated
by ﬁrst determining an ordering for the vertices, determined by structural invariants
such as distance from circuit inputs, functionality of the vertex, number of inputs and
outputs, etc. Once the vertices have been sorted, a unique integer label, a weight,
may be assigned to them to represent their position in the ordering.

Information local to the vertices and their neighbors is not always sufficient to
resolve ordering of all vertices. In this case, a second pass through the circuit graph
may be necessary to resolve these conﬂicts. This pass efﬁciently utilizes information
about the surrounding circuit graph to sort the remaining vertices.

Occasionally, two vertices represent identical functionality in identical positions
within the circuit. These vertices are indistinguishable. This situation can occur, for
example, when two circuit inputs are both inputs to a single gate, or when the circuit
is structurally symmetrical. In that case, the vertices are completely interchangeable
and may be ordered arbitrarily. All other vertices are distinguishable, because they
are different with respect to functionality or position within the circuit.

It may not be possible to order two distinguishable vertices based solely on the
information that can be efﬁciently obtained during two passes through the circuit
graph. These vertices will also be arbitrarily ordered, and this allows the possibility
of multiple identiﬁers that describe the same circuit structure. However, the number
of identiﬁers for a single structure will generally be small (one or two).

The identiﬁer is built after the vertex ordering has been determined. The vertex
ordering is applied during identiﬁer construction to ensure that there are only a few

142

identiﬁers for any circuit graph structure. Many forms of identiﬁers may be derived
for a circuit graph in which the vertices have been labeled in this way. The method
presented in this thesis produces logic functions because logic formulae are familiar

to most engineers and provide an easily comprehended representation.

Vertices with more than one output are will appear more than once in the logic
formula identiﬁer, so they must be handled differently than vertices with only one
output if all structural information is to be accurately represented. These vertices
are called nets because they are the interface to a network of gates within the circuit.
To ensure that the interconnection described by the vertex is represented correctly
in the identiﬁer, nets are assigned special labels. When applied within Candidate
Subcircuit Enumeration, these structural identiﬁers provide an equivalence relation

to partition the circuit into structural equivalence classes.

4.0.2 Chapter Outline

Section 4.1 describes the structural identiﬁers and their generation. Vertex weighting
is the ﬁrst step in the process, and it is discussed in Section 4.2 along with related
topics such as vertex functionality labeling, vertex weighting, and nets. The algorithm
for vertex weighting is presented and discussed in Section 4.2.4. The next step is the
generation of the identiﬁer for the weighted circuit graph, which is described along

with an example algorithm in Section 4.3.

Section 4.4 discusses how these structural identiﬁers are used within candidate
subcircuit enumeration to partition the subcircuits into structural equivalence classes.

143

'f-‘ “mu-3,!

Section 4.5 shows experimental results justifying the application of these equivalence
classes to candidate subcircuit enumeration, and also covers implementation details

and complexity. Section 4.6 summarizes the chapter.

4.1 Structural Identiﬁers

Structural identiﬁers describing the structure of a circuit can be built by relying
only on gate function and interconnects. Labels given to devices or wires are not
considered, because they will not be the same between two different circuits, though
the underlying structure may be identical.

A structural identiﬁer may take many forms; the only requirement is that the
identiﬁer represent the interconnects between the devices and accurately describe the
device functionality. The underlying form of the structural identiﬁer can be viewed

as an augmented adjacency matrix.

Deﬁnition 4.1 An augmented adjacency matrix is an adjacency matrix for a cir-
cuit graph with an additional column whose row elements are the functionality label

assigned to the vertex represented by that row.

The generation of a circuit identiﬁer is a two-step process. First, a weighting of
the vertices in the circuit graph is performed, based upon structural invariants such
as functionality, level (distance from inputs), number of inputs and outputs, etc. To
allow effective deﬁnition of structural equivalence classes, the weighting algorithm
must be designed such that any two identical subcircuits result in as few vertex
weightings as possible.

144

 

The second step is the generation of the identiﬁer. Once the vertex ordering has
been correctly performed, this step may be designed to produce many types of struc-
tural identiﬁers, such as an augmented adjacency matrix, a K -formula representation,
or a logic function representation. The algorithm described in Section 4.3 produces
the latter.

The technique presented in this dissertation for vertex weighting may result in
more than one identiﬁer for a single circuit graph structure, though the number
of identiﬁers is typically very small (one or two). It would be possible to extend
the technique to ensure that each circuit graph structure produced a single unique
ordering, but the complexity of that task (which is equivalent to graph isomorphism)
outweighs the computational improvement to the Module Identiﬁcation Problem that

would result from that extension.

4.2 Vertex Weighting

There are two steps necessary to determine an ordering of the vertices so that an

applicable structural identiﬁer may be developed.

1. Establish vertex functionality labels: determine a unique description for each of

the functionalities that are performed by vertices in the circuit graph.

2. Weight vertices: assign a unique weight to each vertex based upon its function-
ality and other structural invariants. Weight values range from 0 to n— 1, where

n is the number of vertices in the circuit graph.

145

 

4.2.1 Vertex Functionality Labels

To ensure that the identiﬁer accurately describes the structure of the circuit, each
vertex functionality must have a canonical description. Considering that a vertex
in the circuit graph may describe a simple logic gate or a complex library module,
the vertices are described and identiﬁed based on the functionality that they per-
form, rather than logic gate name or label. The function may be represented by any
canonical functional representation, such as ROBDDS or truth tables that have been

manipulated by a well-deﬁned procedure to become canonical.

Once canonical descriptions of the vertex functions are available, a unique integer
may be assigned to each function. These integer representations induce an ordering
on the vertex functionalities. This ordering applies to the entire circuit and may be
arbitrarily assigned such that each integer represents exactly one function, and each
function is represented by exactly one integer. By using integers instead of the truth
table or the BDD itself, vertex function comparison is simple and the size of the

eventual identiﬁer is more compact, without loss of structural information.

For the purpose of clarity, gate types will be described in this thesis by natural
language labels, such as “AND2” (2-input AND gate), “adderl” (1-bit adder), etc.
The vertex function ordering that applies throughout this thesis is alphabetical by the
natural language label. For example, “AND3” has higher precedence than “AND2”
and “0R3”. Table 4.1 shows the precedence of all vertex functions appearing in the
examples in this chapter.

146

”Mi—‘1

 

 

 

 

 

 

 

 

 

 

I Gate Type Label I Precedence
AND3 6 highest
ANDZ 5
NOR2 4
NOTl 3
0R2 2
XOR? 1
inputs 0 lowest

 

 

 

 

 

 

Table 4.1: Vertex Functionality and Precedence Example

 

4.2.2 Vertex Precedence

To develop an effective structural description of a circuit, each vertex of the circuit
graph must be given an order of precedence, a weight, that is dependent solely upon
the structure (including vertex functionality) of the circuit.

There are three steps to weight assignment:

1. Level assignment: Determine the level (maximum path length from circuit in-

puts) of each vertex.

2. First pass ( inputs to outputs): sort the vertices at each level based on structural
invariants of the circuit graph and assign each vertex an integer weight based

on that ordering.

3. Second pass ( outputs to inputs): resolve conﬂicts between vertices that could
not be ordered based on local information alone by considering global structural
information.

147

Phi-ma

Level assignment is accomplished by starting at the inputs and assigning them to
Level 0. The procedure traverses up from the inputs in a breadth-ﬁrst manner and
assigns each vertex the level value one higher than the maximum level of its inputs.
If a cycle is encountered, Le, a vertex has already been assigned a weight, then the

weight is not modiﬁed.

During the ﬁrst pass, weight assignment begins at the circuit graph inputs, and
proceeds up through the levels until all vertices have been weighted. By weighting the
levels in ascending order, information about the vertices closer to the circuit inputs

can be used during sorting of vertices on higher levels.

Vertices that cannot be sorted by the ﬁrst pass because not enough information
was available are assigned the same weight, and the weights that must be assigned
to those vertices to ensure a unique weighting are saved. The second pass resolves
these conﬂicts by beginning at the outputs of the circuit graph, and proceeding down

through the levels until all vertices have been assigned a unique weight.

With regard to vertex ordering, each vertex considers only those vertices that
are directly adjacent to it. These vertices are the inputs and outputs to the gate
that the vertex represents. By restricting the ordering comparisons to this local
information, the sorting can be performed quickly and efficiently. Considering only
adjacent vertices does not exclude other information about the rest of the circuit,
because the weights assigned to those vertices encapsulate relevant information about
the surrounding circuit structure. This encapsulated information is referred to as a
vertex’s input cone or output cone.

148

. {Fa-.1: EST-Iii,“

Input Cones

The input cone of a vertex v represents information about the circuit structure from
the Circuit’s primary inputs to v. As vertices are ordered during the ﬁrst pass, if two
vertices have the same functionality, number of inputs, and number of outputs, then
their input weight lists are sorted and compared. In this way, the information about
the circuit between the inputs and the current vertex may be efﬁciently taken into
consideration.

In digital circuits, the input cone is equivalent to the logic cone described in
Section 2.8.2 of this dissertation, so the input cone of a vertex includes all of the
vertices that represent gates that affect the output value of the gate. Figure 4.1

presents a conceptual diagram of an input cone.

 

 

 

 

 

 

 

Figure 4.1: Conceptual Description of Input Cone

 

149

Output Cones

The output cone of a vertex v is deﬁned by all of the vertices between v and the
Circuit’s primary outputs. By ordering vertices based on their sorted output weight
lists, the output cone is efﬁciently considered when ordering two vertices. If the
output cones of the vertices differ in any way, that will be reﬂected in the weight that
is assigned to the vertices.

Figure 4.2 presents a conceptual diagram of an output cone. Note that the output
cone actually considers the input cones of each of the vertices in the output cone.

This provides extensive information for ordering vertices.

 

 

 

 

 

Figure 4.2: Conceptual Description of Output Cone

 

The following ordering measures are applied to determine sorting precedence.

150

 

First Pass Vertex Precedence Rules:

1. vertex information: local information about the vertex, considered in the fol-

lowing order:

(a) vertex functionality
(b) number of inputs

(c) number of outputs

2. input weights: the weights of the inputs to the vertices under comparison. This
information succinctly considers all vertices between the circuit inputs and the
current vertex (the input cone), because all lower-level information is encapsu-

lated in the weight of a vertex.

After the ﬁrst pass of vertex weighting, each vertex has a weight that represents
its place in the circuit. Example 1 demonstrates the weighting of a circuit graph
in which only vertex information is necessary to order and weight all vertices in the
circuit graph. Example 2 involves a circuit graph that requires that input weights
be considered to accomplish the weighting. In cases in which the local information is
not sufficient to sort a set of vertices, they are all assigned the same weight, and the
other weights that need to be assigned to those vertices are “saved” until the second

pass.

Second Pass Vertex Precedence Rules:

1. output weights: the weights that were assigned in the ﬁrst pass are now used to

resolve sorting conﬂicts. This comparison encapsulates the comparison of the

151

output cones of the vertices.

2. arbitrary ordering: vertices that are indistinguishable with respect to global

circuit structure are arbitrarily sorted and weighted.

The two situations that are resolved during the second pass are examined in two
examples. Example 3 illustrates resolving conﬂicts by investigating the output weights
of the unsorted vertices. Example 4 includes two vertices that are precisely identical
within the circuit, so it is necessary to arbitrarily order them to ensure that each
vertex has a unique weight.

Example 5 demonstrates a situation in which two vertices are indistinguishable by
only two passes through the circuit graph, though the vertices are actually different
within the circuit graph. This situation can result in more than one identiﬁer for the

same structure, as discussed in Section 4.1.

Example 1: First Pass Weighting with Local Vertex Information Consid-

eration

The circuit graph in Figure 4.3a has a simple structure with an easily deﬁned vertex
ordering. This example demonstrates the use of vertex functionality to sort the
vertices and the use of the number of outputs to resolve conﬂicts between gates of

the same functionality.

Level 0 First, the inputs (all vertices at Level 0) are assigned weights. The three
vertices, a, b, and c, perform identical functionality; they are all inputs to the circuit.
Therefore, they cannot be sorted by functionality but are instead sorted in ascending

152

 

 

 

Figure 4.3: Example 1: First Pass Weighting with Local Vertex Information Consid-
eration

 

153

order by number of outputs. Weights are assigned to the sorted list of vertices, so
Vertex a is given weight 0, Vertex b weight 1, and Vertex c weight 2. Figure 4.3b

shows the circuit graph after Level 0 weight assignment.

Level 1 At Level 1 of the circuit graph, the three vertices have three different
functionalities, so sorting them is trivial. Table 4.1 shows the vertex functionality
precedence ordering. Vertex or2a has the lowest precedence so it is assigned weight
3. Vertex not] a has the next highest precedence and is assigned weight 4, and Vertex
and2a is given weight 5 because it has the highest precedence. Figure 4.3c shows the

circuit graph after assigning weights to the vertices at Level 1.

Level 2 The top level of the circuit graph contains only one vertex, so sorting is
not necessary. Vertex and3a is assigned the last weight, weight 6. Figure 4.3d shows

the weighting of all of the vertices in the circuit graph.

Example 2: First Pass Weighting with Input Weight Consideration

If the circuit graph in Figure 4.3a is modiﬁed slightly so that two of the Level 1
gates are “0R2” gates, then it is necessary to consider input weights to sort those
vertices. The modiﬁed circuit graph is presented in Figure 4.4a. The vertices on Level
0 are assigned the same weighting as in the previous example, so the result of that

weighting is shown in Figure 4.4b.

Level 1 The functionality of Vertex notIa has a lower precedence than either Vertex
or2a or Vertex or2b, so it is given weight 3.

154

 

 

 

((1)
Figure 4.4: Example 2: Second Pass Weighting With Input Weight Consideration

 

155

When Vertex or2a and Vertex or2b are compared by vertex functionality (includ-
ing number of inputs) and number of outputs, no difference is found that can guide
the sorting of one over the other. The second vertex ordering criterion, input weights,
must be applied.

The inputs of each vertex are sorted by weight, giving Vertex or2a the weight list
[0, 1] and Vertex or2b the weight list [1, 2]. An iterative comparison of these lists
terminates on the ﬁrst element because 1 > 0, so or2a is assigned the next weight,
4, and or2b is assigned weight 5. The result of Level 1 weighting is shown in Figure

4.4c.

Level 2 Again, with only one vertex on Level 2, no sorting is necessary, so Vertex

and3a is assigned weight 6. Figure 4.4d shows the ﬁnal weighting of the circuit graph.

Example 3: Second Pass Weighting with Output Weight Consideration

When the information local to the vertices is not sufficient to order them, it is nec-
essary to consider the structure of the surrounding circuit graph. Starting at the
outputs, unsorted vertices are ordered by sorting and comparing their output weights.

The circuit graph in Figure 4.5a is an example of a graph which requires the
application of this precedence criterion. Figure 4.5b shows the vertex weights after
the ﬁrst pass through the circuit graph. There are two sets of vertices that have

identical weights: a and c, and xor2a and xor2b.

Second Pass: Output Weight Consideration Resolution of conﬂicts begins at
the outputs of the circuit graph, and all conﬂicts at each level are resolved before

156

 

 

 
    

notla andZa
6 7

nor2a
9

 

(d)
Figure 4.5: Example 3: Second Pass Weighting with Output Weight Consideration.

 

157

proceeding any further toward the inputs. In Figure 4.5b, all vertices on Levels 3 and
2 have been sorted by the ﬁrst pass, so the second pass proceeds to Level 1, where

xor2a and xor2b have both been assigned weight 3.

Level 1 Vertices xor2a and xor2b can be ordered by sorting and comparing the
weights of their outputs. xor2a’s sorted output weights are [6, 7] and xor2b’s sorted
output weights are [5, 7], so xor2a is assigned the higher weight of 4. Figure 4.5c

shows the circuit graph with the new Level 1 weights.

' Level 0 The next set of identically-weighted vertices to resolve is on Level 0: a
and c. The weights of the vertices at Level 1 now allow an unambiguous sorting and
vertex weighting, assigning a the higher weight of 1. Figure 4.5d shows the completely

weighted circuit graph after the second pass.

Example 4: Second Pass Weighting with Arbitrary Ordering of Identical

Vertices

Occasionally, it is not possible to sort vertices deterministically because they perform
identical functionality in isomorphic locations within the circuit graph. For instance,
the circuit graph in Figure 4.6a has identical halves connected in the center at xor2b.
As a result of this symmetry, there are several sets of vertices that will have identical
weights after the ﬁrst pass. The weights after the ﬁrst pass can be seen in Figure
4.6b.

158

1

TL“. 2

 

   

Figure 4.6: Example 4: Second Pass Weighting with Arbitrary Vertex Ordering Re-
quired

 

159

Second Pass: Arbitrary Ordering

Resolution of identically weighted vertices begins at the outputs (Level 2) and pro-

ceeds toward the inputs.

Level 2 The two vertices on Level 2 cannot be sorted because they perform identical
functionality and occupy equivalent locations within the circuit graph. Therefore, the
order in which their weights are applied is not signiﬁcant. Arbitrarily, Vertex and2a

may be assigned the higher weight of 8 (Figure 4.6c).

Level 1 and 0 Sorting the vertices at Levels 1 and 0 proceeds as it did in Example
3, because now the vertices may be sorted based on output weight. In this way, the
arbitrary decision made at the highest level of the circuit graph propagates throughout
to maintain consistency. The weighted circuit graphs after the weighting of Levels 1

and 0 are shown in Figures 4.6d and 4.6e, respectively.

Example 5: Second Pass Weighting with Arbitrary Ordering of Distinct

Vertices

The two passes encompass only the information that is “visible” from each of the
vertices, which includes the input cone and the output cone for each of them. If the
circuit graph is very shallow, it may be necessary to arbitrarily order two distinguish-
able vertices.

For instance, the circuit graph in Figure 4.7a demonstrates this difﬁculty. Figure
4.7b shows the circuit graph after weighting by the ﬁrst pass. There are two vertices

160

 

 

“v.
0:.
0,.
0 e

A
99
v

 

‘
I
I

9
go
9
O

A
v
v

‘
I
I

6
go
9
0

A
O
V

G
‘0
:0
)0

0

A
0..
v

Figure 4.7: Example 5: Second Pass Weighting with Arbitrary Ordering of Distin-
guishable Vertices

 

161

(i, j) that currently have the same weight, so those must be resolved during the second

pass.

Second Pass: Arbitrary Ordering

Resolution of identically weighted vertices begins at the outputs (Level 1) and pro-

ceeds toward the inputs.

Level 1 The two unordered vertices on Level 2 cannot be deterministically sorted
because they perform the same functionality. Although the surrounding circuit graph
distinguishes them, the difference cannot be detected by only two passes through the
circuit graph. Therefore, the arbitrary decision will result in one of the weightings

presented in Figure 4.7c or Figure 4.7d.

This will result in two diﬁ'erent identiﬁers that represent the same circuit graph

structure, as discussed in Section 4.1.

4.2.3 Correctness of Vertex Weighting

Case Analysis for Vertex Weighting

If two arbitrary vertices u and v on a level i are distinguished by local (vertex)
or surrounding (circuit) structure, then they may be ordered by the feature that
distinguishes them. This ordering is the basis for the vertex weighting technique for
developing a structural identiﬁer for circuits or unlabeled directed acyclic graphs.

162

 

First Pass The following ﬁve cases are sufﬁcient to order two arbitrary vertices u

and v on a level i for the purpose of weight assignment during the ﬁrst pass. The

relationships between u and v are based on local information about u and v, such

as functionality, number of inputs, number of outputs, and global information about

the structure of the circuit between the inputs to the circuit and Level i.

When two arbitrary vertices u and v are distinguishable by the ﬁrst or second pass,

let u indicate the vertex that is ordered less signiﬁcantly without loss of generality.

The following cases can be seen applied in the vertex weighting Examples 1, 2, 3, 4,

and 5.

First Pass Case Analysis:

Case 1.1: Vertices u and v have different functionality labels and may be sorted

in ascending order by functionality labels. u.weight < v.weight.

Case 1.2: Case 1.1 is false and vertices u and v do not have the same number
of inputs, then they may be sorted in ascending order by number of inputs.

u.weight < v.weight.

Case 1.3: Cases 1.1 and 1.2 are both false and vertices u and v have unequal
numbers of outputs then they may be sorted in ascending order by number of

outputs. u.weight < u.weight.

Case 1.4: Cases 1.1, 1.2, and 1.3 are all false and u and v have diﬂ'erent sorted
input weight lists then they may be sorted in ascending order by the ﬁrst unequal
input weight. u.weight < u.weight.

163

 

0 Case 1.5: Cases 1.1, 1.2, 1.3, and 1.4 are all false. u.weight = v.weight.

If local information (Cases 1.1, 1.2, and 1.3) is not sufﬁcient to order u and v, then
more comprehensive information about the circuit structure must be applied (Case
1.4). The input cone describes all structural information between the inputs (Level
0) of the circuit and the vertex v. Input cones are efﬁciently compared by comparing
the sorted input weight lists of u and v. If the structure of the input cone differs,
then the sorted input weight lists will differ and the vertices will be weighted in Case
1.4. If the input cones are not sufﬁcient to order the vertices (Case 1.5), then they

are assigned the same weight and will be resolved during the second pass.

Second Pass Resolution of vertex weights during the second pass begins at the
vertices of the circuit at the highest level. Case 2.1 considers vertices that have
already been deterministically assigned an order by the ﬁrst pass. Case 2.2 considers
the ordering of the vertices by considering their output cones. The output cone
represents global circuit structure that affects vertex v because the input cone of v is
necessarily included in the input cone of every vertex to which v is an input.

When Level i is being considered during the second pass, all vertices on Level
i+1 have been assigned a unique weight that considers their input and output cones.
Thus, any structural information that affects the vertex v has been encapsulated and
can be used to resolve identical weights remaining after the ﬁrst pass. The following
cases can be seen applied in Examples 3, 4, and 5.

Second Pass Case Analysis:

0 Case 2.1: u.weight < v.weight.

164

0 Case 2.2: Case 2.1 is false and u and v have different sorted output weight lists.

u.weight < v.weight. Consider Figure 4.5.

0 Case 2.3: Cases 2.1 and 2.2 are both false. u.weight < u.weight (arbitrarily).

Consider Figure 4.6.

Unique Vertex Weighting

Theorem 4.1 Each vertex in the circuit graph G is assigned a unique weight based
on the structure of the circuit graph.

Proof by induction:

Base case: the vertices at Level n whose local information and input cones are
identical are indistinguishable. Therefore, their weights are interchangeable without
loss of uniqueness. Thus an arbitrary assignment provides each vertex with a unique
weight.

Inductive Hypothesis: For Level i assume that every vertex from Level i + 1 to
Level n has been assigned a unique weight.

Inductive Step: Each vertex at Level i can be assigned a unique weight based upon
the structure of the circuit.

For any two vertices at Level i which are not already ordered (by the ﬁrst pass),
if the sorted output weight lists are different then the vertices are distinguishable and
their weighting is deterministic. Otherwise, the vertices are indistinguishable with
respect to local information, input cones, and output cones, and are arbitrarily ordered.
Thus, a unique weight is assigned to each vertex.

165

Each vertex in the circuit graph is assigned a unique weight based on the struc-

ture of the circuit graph.

4.2.4 Algorithm

Figure 4.8 describes the vertex weighting algorithm.

Details and Complexity

First Pass for Vertex Weighting

0 Step 0: weight <— 0, curWeight (— 0
Initialize the weight counter (weight) and the current weight counter
(curWeight). The current weight counter allows “saving” of weights for as-

signment during the second pass of the algorithm.

0 Step 1: foreach level i of the circuit graph G
Comparison of vertices is performed on a level-by-level basis to reduce the size

of the set to be ordered. There are at most n levels, so this step is 0(n).

0 Step 2: sort inputs of each vertex on Level i by weight

The second sorting criterion requires that the inputs must be sorted by weight
so the comparison may be done. There may be it inputs to each vertex, so this

step is order O(n log n) using the Quicksort algorithm.

0 Step 3: sort vertices in Level i in increasing order by the following
criteria:

166

 

 

Algorithm for Circuit Graph Vertex Weighting

First Pass for Vertex Weighting
0. weight (— 0, curWeight +— 0
1. foreach level i of the circuit graph G (increasing i):

2.
3.

9

sort inputs of each vertex on level i by weight.
sort vertices in level i in increasing order by the following criteria:

local vertex information (functionality, number of inputs/ outputs)
input weights

. level [i].vertex[0].weight (— weight
. for (j = 1;j < level]i].order;j++)

6. if (level[i].vertex[j] 76 level]i].vertex[j — 1]) then curWeight (— weight
7. levelIi].vertex[j].weight (— curWeight

8. weight (— weight + 1

End for j.

10. curWeight (— weight
11. End foreach level.

Second Pass for Identical Vertex Weighting
0. foreach level i of the circuit graph G (decreasing i):

1.
2.
3.

4.

8.
9. E

sort outputs of each vertex on level i by weight.
weight (— net[0].weight
sort vertices with identical weight in increasing order by the following criterion:
output weights
for (j = 1;j < levelIi].order;j++)
5. if (level[i].vertex[j].weight S level]i].vertex[j — 1].weight)

6. (level[i].verteij].weight (— levelIi].verteij — 1].weight) + 1
7. End if.
End for j.

nd foreach level.

 

Figure 4.8: Algorithm for Circuit Graph Vertex Weighting.

167

 

1. vertex information: functionality, number of inputs/outputs

2. input weights

An encoding of this information can be derived such that a sorting of that

encoding may occur. To prevent excess computation, these criteria may be

applied in order and comparison may stop when the vertices have been ordered.

 

Therefore, the complexity of this step is O(n log it) using the Quicksort algo-

rithm.

0 Step 4: levelIi].vertex[0].weight (— weight

Set the weight of the vertex with lowest precedence to be value of weight.

0 Step 5: for (j = 1;j < level[i].order;j + +)
Assign a weight to each vertex on the level. A vertex may only belong to one

level, so on average, this step will apply to n/ i vertices, but worst case analysis

is 0(n).

0 Step 6: if (levelIi].verteij] 94 levelIi].vertex[j— 1]) then curWeight (— weight
If the vertices are not equivalent (based on the criteria in Step 2), then the
curWeight variable is set to the weight variable, which has been incremented
as each vertex has been visited. If they are equivalent, then curWeight is not

changed, thus saving the weight value for assignment during the second pass.

0 Step 7: levelIi].verteij].weight (— curWeight
Assign the weight of the vertex to be the value of curWeight.

168

0 Step 8: weight (— weight + 1

Increment the value of weight.

0 Step 9: End for loop of vertices on Level i.
The possible number of vertices on any level is bounded by the number of
vertices in the circuit graph, n. The operation performed within this 100p was

0(n), so the complexity of this step is 0(n).

0 Step 10: curWeight (— weight
Reassign curWeight to the current value of weight, so vertices are assigned
the correct weight. At the beginning of each level, curWeight and weight are

always equivalent.

0 Step 11: End foreach level.
The vertices on each level were sorted only within that level. The worst case
situation would occur if all of the vertices were on a single level, so the com-
plexity of each level (possibly 77.) is the same as the complexity of the sorting

step, O(n log n). Overall, the complexity of the ﬁrst pass is (’)(n2 log n).

Second Pass for Identical Vertex Weighting

0 Step 0: foreach level i of the circuit graph G (decreasing i):
To resolve the conﬂicts between vertices that appear to be identical when in-
formation is only available about the vertex itself and its neighbors (during
the ﬁrst pass), it is now possible to resolve these conﬂicts with the weighting
information of the outputs.

169

F‘M‘m‘ﬁ

0 Step 1: sort outputs of each vertex on Level i by weight
To compare the output weights of two vertices, the weights must be sorted. The
possible number of outputs to be considered is potentially n, and their sorting

is O(n log n) using the Quicksort algorithm.

 

0 Step 2: weight 4— net[0].weight

Set the current weight to be the same as the weight of all of the vertices in the

IP54. mam-1’1

set.

0 Step 3: sort vertices with identical weight in increasing order by the

following criterion:

— output weights

The output weights have already been sorted for each vertex, so a sorting of
the vertices by considering these weights is O(n log n) using the Quicksort al-

gorithm.

 

0 Step 4: for (j = 1;j < levelIi].order;j++)
Each vertex v on the level must be assigned a unique weight, according to the

sorted order.

0 Step 5: if (level[i].vertex[j].weight S level[i].vertex[j — 1].weight)
Step 6: (levelIiI.verteij].weight <— levelIi].verteij — 1].weight) + 1
Step 7: End if
If the current vertex has a weight that is less than that of the previous vertex,
then set the weight of the current vertex to be one higher than the previous

170

vertex. This ensures that every vertex has a unique weight. The weight values
that will be assigned were “saved” during the ﬁrst pass, so no duplicate weights

may occur.

0 Step 8: End for j
The for loop started in Step 5 assigns a weight to each vertex on the level (if
necessary). There may be no more than n vertices on the level, so the complexity

of this simple comparison and assignment loop is O(n).

0 Step 9: End foreach level.
The sorting step (Step 3) is the most complex, requiring 0(n log n) time for the

sorting. There may be at most n levels, so the overall complexity is O(n2 log n).

Overall, the complexity of determining an appropriate ordering and weighting the
vertices in a circuit graph is dominated by the cost of sorting the vertices on each
level which must be done one for each of possibly n levels. The complexity of the ﬁrst
and second pass is equivalent and additive, so the overall complexity of the algorithm

is 0(n2 log n).

Correctness This algorithm considers each of the ﬁve distinguishable cases dis-
cussed in the case analysis performed in Section 4.2.3. Step 3 of the ﬁrst pass con-
siders each of Cases 1.1, 1.2, and 1.3, ensuring that vertices are ordered based on the
ﬁrst pass precedence criteria. Step 3 of the second pass sorts the vertices based on
the second pass precedence criteria, Cases 2.1, 2.2, and 2.3. Because this algorithm
implements the cases exactly, it correctly performs vertex weighting.

171

F M “nut-q

4.3 Structural Identiﬁer Generation

The generation of the actual structural identiﬁer is the ﬁnal step in the process. This
step is relatively straightforward because the ordering of the vertices has already
been established. All that is necessary is to recursively build the string identiﬁer. For
the purposes of this dissertation, a preﬁx-form logic formula structural identiﬁer is

deﬁned.

Nets

Unless the circuit graph is described by a pure tree structure, in which each vertex has
only a single output, there are structures whose functionality feeds into two separate
vertices in the circuit graph. These are referred to as nets in this thesis. For instance,
in Figure 4.3a on page 153, the vertices labeled b and c are examples of nets.

To preserve the overall structure of the circuit, it is important that nets are iden-
tiﬁed and represented appropriately. Simply reproducing the entire structure of a net
each time it occurs in the representation ignores the interconnection information and
results in an inaccurate description.

Nets are represented within the identiﬁer as nX, where X is the number of the net.
Nets are weighted as are all other vertices, so their numbering and location in the
identiﬁer are well-deﬁned. Nets are numbered during the second step of structural
identiﬁer generation: the identiﬁer generation stage.

Figure 4.10 on page 176 presents an implementation of a 2-bit full adder. The
structural identiﬁer generated by the example algorithm in Section 4.3.1 for that

172

 

 

particular implementation of a 2-bit adder is:

(or2(and2(n0, n5), and(n6, n7)), xor2(n0, n5), xor2(n1, n4))
n0:or2(and2(n1, n4), and2(n2, n3)) n1:xor2(n2, n3) n2:X n3:X n4zx

n5:xor2(n6, n7) n6zx n7:X

Vertex Identiﬁers

The identiﬁers for the three types of circuit graph vertices and the circuit graph itself

are described below.

0 Nets: If the vertex has more than one output, then it is assigned a label “nX”,

where X is the current net number. Each net is assigned a unique, strictly

increasing net number starting with 0.

Input Identiﬁers: The identiﬁer of a single-output input is simply “X”. That
identiﬁer carries all necessary information about the input vertex v, including
that v is an input and that v has only one output. If an input has more than
a single output, then the vertex is treated as a net and will be assigned the

appropriate net number.

Internal Vertex and Output Identiﬁers: The ﬁrst part of an identiﬁer for a
vertex is the vertex functionality label. If the vertex has any inputs, then the
functionality label is followed by a left parenthesis, then by the identiﬁers from

each of its input vertices (in order of weight), and ﬁnally by a right parenthesis.

Circuit Graph Identiﬁers: The entire circuit graph may be described by deter-
mining the identiﬁer for a null-functionality vertex that has all outputs of the

173

circuit graph as inputs. This identiﬁer is followed by the description of each of

the nets in the following format: “nX: [identiﬁer of nX]”.

4.3.1 Example Algorithm

The algorithm in Figure 4.9 is an example of a simple algorithm that could be used
to generate structural identiﬁers for weighted circuit graphs. The identiﬁer generated

is in the form of a preﬁx-order logic formula.

4.3.2 Structural Identiﬁer Example

This example demonstrates the application of vertex weighting and identiﬁer genera-
tion together on an actual circuit, the 2-bit adder shown in Figure 4.10. The circuit,
transformed into circuit graph form, is presented in Figure 4.11a. The circuit graph
is displayed with the vertices clearly separated by level to clarify the vertex weighting

procedure.

Level 0 The weighting of the Level 0 vertices is complicated by the fact that none
of the vertices may be immediately sorted. These ﬁve inputs (A0, B0, A1, B1, C0)
are all weighted 0, and the weights 1, 2, 3, and 4 are saved to be assigned during the

second pass. Figure 4.11b shows the result of Level 0 weighting.

Level 1 At Level 1, the vertices may be sorted by considering functionality and
output information. There are two XOR2 gates and two AND2 gates, so the vertices
of each pair is assigned the same weight (5 and 7) appropriately. Figure 4.12a shows

174

r 'l
I

 

 

Example of a Canonical Identiﬁer Generation Algorithm for
Weighted Circuit Graphs

0. nNets (— 0
1. foreach output vertex i of G:
2. G.output[i].ident <— getVertexIdent(G.output[i])
3. End foreach.
4. G.ident +—“(G.output[0].ident, G.output[1].ident, , G.output[n].ident)
n0 : n0.modI dent n1 : n1.modI dent nx : nx.modI dent”

subroutine getVertexIdent(vertex v):
0. if vertex v.ident 74 NULL then return v.ident.
1. if vertex v has no inputs then
2. v.ident (— “X”
3. else
4. v.ident (— v.function + “(“
5. sort inputs of v in decreasing order by weight
6. v.ident (— v.ident+ getVertexIdent(v.inputI0]) + “,”
7. foreach input i of v:
8. v.ident (— v.ident+ getVertexIdent(n.input[i])
9. End foreach input i.
10. v.ident (— v.ident + ”)”
11. End else.
12. if vertex v has one output then
13. return v.ident.
14. else
15. v.modI dent (— v.ident
16. v.ident (— “n” + nNets
17. nNets (— nNets +1
18. End else.

 

 

 

Figure 4.9: Example of a Canonical Identiﬁer Generation Algorithm for Weighted
Circuit Graphs.

175

 

 

\

°'" 1 so

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

7
:3 CD “
L—\
_/
l .
II 31

 

 

 

 

 

 

 

g: )1) "D—
———‘} "D— cout

 

 

 

 

 

 

Figure 4.10: Full Equivalence Class Example: 2-bit adder

 

the result of this weighting.

Level 2 Only vertex functionality is necessary to order the vertices on Level 2 for

weighting. AND2 has higher precedence than XOR2 (Table 4.1).

Level 3 Level 3 has only one vertex, so no ordering is necessary before the weight

of 11 is assigned to the OR2 gate.

Level 4 The sorting of the vertices of Level 4 is accomplished by simple functionality

precedence.

Level 5 The single vertex at Level 5 is assigned the highest weight, and the complete
weighting after the ﬁrst pass through the circuit graph is given in Figure 4.12b.

176

"i

r"

 

 

Second Pass It is necessary to make a second pass through the circuit graph to
resolve the conﬂicts between any vertices that had been assigned the same vertex.
After the ﬁrst pass, all vertices have a weight, so that weight may be used to determine

an ordering for the conﬂicting vertices.

All vertices down to Level 1 have a unique weight assigned to them, so the second

pass begins vertex weighting at Level 1.

Level 1 The vertices on Level 1 consists of two pairs of vertices with equal weights.
Within the AND2 pair, the vertex on the left has a higher output weight than the
vertex on the right, so they are assigned weights 8 and 7, respectively. Similarly with
the XOR2 pair, the output of the vertex on the left has a higher weight than the
output of the vertex on the right, so they are assigned weights 6 and 5, respectively

(Figure 4.13a).

Level 0 The vertex C0 has the highest sorted output weight lists, so is assigned the

highest weight on the level, 4.

It is clear that the vertices A0 and B0 have lower sorted output weights ([5,7])
than the vertices A1 and B1 ([6,8]), so A1 and B1 may both assigned the weight value
2. However, A1 and B1 are identical within the structure of the circuit, so they are
arbitrarily ordered, with A1 assigned the value 3, and B1 assigned the value 2. A0
and B0 are also identical, so they are also arbitrarily ordered. The ﬁnal weighting of
the circuit graph is presented in Figure 4.13b.

177

 

 

 

 

 

 

 

 

 

29:9“ 393°
...2 ...,. “3° ...2 ...2 “3°
9 ...2 G ...a

 

(a) (b)

Figure 4.11: Vertex weighting for the circuit graph of the 2-bit full adder in Figure
4.10.

 

178

 

 

 

 

 

BO C0 BO (1)
0 0 0 0
and2 xon 30112 and2 xor2 and2
7 5 7 7 5 7
. I ' ll
and2 x012
""2 IO 9

 

 

 

 

 

 

(a) (b)

Figure 4.12: Vertex weighting for the circuit graph of the 2-bit full adder in Figure
4.10.

 

179

 

 

 

 

 

 

 

 

BO (1) C0
0 0 4
and2 xor’Z and2 and2 xor’z and2
8 6 7 8 6 7
' ‘I ' w
and2 xorz and2 1:012
10 9 10 9

 

 

 

 

 

 

 

(a) (b)

Figure 4.13: Vertex weighting for the circuit graph of the 2-bit full adder in Figure
4.10.

 

180

 

 

 

Identiﬁer Generation

Identiﬁer generation for this example is performed by the example algorithm in Figure

4.9. The results of this generation are displayed in Table 4.2.

Outputs The 2-bit full adder in Figure 4.10 has three outputs, represented by the
vertices weighted 14, 12, and 9. Determination of the identiﬁer begins with these
vertices, in descending order of weight. These vertices will be referred to by their
weight to allow them to be indicated clearly.

Vertex 14 has two input vertices, so those are sorted by weight, giving the result of
[13, 8]. The identiﬁer for Vertex 13 must be generated ﬁrst. The recursive procedure
considers the inputs of Vertex 13 and orders those [11, 6]. Vertex 11 is a net, so it is
assigned the ﬁrst net number, 0, and given the identiﬁer “n0”.

The module identiﬁer for net n0 must be generated, so the traversal continues
through the circuit graph. Vertex 10 is considered next, and the identiﬁer “and2(n1,
n2)” is generated for Vertex 10. Vertex 5 was given the net label “n1”, and Vertex 4
was given the net label “n2” because that is the order in which they were encountered.
The identiﬁer eventually created for net n0 is “or2(and2(n1, n4), and2(n1, n2))”.

During the consideration of the output Vertex 14, Vertices 11, 5, B0, A0, and C0
are all identiﬁed as nets, and assigned identiﬁers n0, n1, n2, n3, and n4, respectively.

This depth-ﬁrst procedure continues until each vertex has been assigned an iden-
tiﬁer. Each of the outputs is processed in this way. If a vertex is encountered that
already has an identiﬁer assigned, the identiﬁer does not need to be recalculated and
is simply returned. The results of the vertex identiﬁer generation are presented in

181

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I Weight I Vertex rLevel II Inputs I Identiﬁer I Net Label I
0 A0 0 X n3
1 B0 0 X n2
2 B1 0 X n7
3 A1 0 X n6
4 C0 0 X n4
5 xor2 1 1, 0 xor2(n2, n3) n1
6 xor2 1 3, 2 xor2(n6, n7) n5
7 and2 1 1, 0 and2(n2, n3)

8 and2 1 3, 2 and2(n6, n7)
9 xor2 2 5, 4 xor2(n1, n4)
10 and2 2 5, 4 and2(n1, n4)
11 or2 3 10, 7 or2(and2(n1, n4), and2(n2, n3)) n0
12 xor2 4 11, 6 xor2(n0, n5)
13 and2 4 11, 6 and2(n0, n5)
14 or2 5 13, 8 or2(and2(n0, n5), and2(n6, n7))
top 6 14, 12, 9 (or2(and2(n0, n5), and2(n6, n7)),
xor2(n0, n5), xor2(n1, n4))
Table 4.2: Vertex Identiﬁers for Circuit Graph in Figure 4.10
Table 4.2.

4.4 Subcircuit Equivalence Classes

The structural identiﬁer can be used to determine an equivalence relation to parti-

tion the subcircuits generated by candidate subcircuit enumeration into structural

equivalence classes. There are several reasons why ﬁnding these equivalence classes

is desirable.

Equivalence classes allow syntactic matching to be performed transparently as part

of the candidate subcircuit enumeration process. Each time a subcircuit is identiﬁed

as belonging to a structural equivalence class, it is identiﬁed to be syntactically, and

therefore semantically, equivalent to every other subcircuit in that equivalence class.

182

 

5"“?

A subcircuit inherits all matching information from its equivalence class. Only
one member of each equivalence class must be tested for functional equivalence to
library modules. This saves time during module identiﬁcation. After one instance
has been tested, whenever another member of that equivalence class is located in the

circuit, its equivalence to a functional module is already known.

The set of structural equivalence classes generated by the structural identiﬁers
created using the ordering presented in this chapter is not necessarily minimal. There
may be more than one identiﬁer that represents a single circuit structure. However,
all of the subcircuits within an equivalence class will be identical, and the number of
equivalence classes that hold a single circuit structure is generally small (one or two),
so equivalence classes provide a useful technique for improving the performance of

module identiﬁcation.

4.4.1 Local Information

To apply structural identiﬁers to the subcircuits generated by candidate subcircuit
enumeration, a few preliminary steps are necessary to prepare the subcircuit for
identiﬁer generation. The information used to build the identiﬁer is derived entirely
from the structure of the circuit being examined, so the information 'in a subcircuit

must be local with respect to that subcircuit, not the entire circuit graph.

If the circuit to be explored, H, is a subcircuit of another circuit G, only the
information local to the subcircuit is considered. For instance, if a gate has two
outputs in G, but only one of those outputs appears in H, then the gate is considered

183

to be a single output gate for the purpose of vertex weighting and identiﬁer generation

Vertex level is an example of local circuit information. Although a vertex may be
Level 5 in the original circuit, it may only be Level 4 in a subcircuit that does not
include the inputs to the original circuit. It is necessary to determine the local level
for each vertex, to ensure that two structurally identical subcircuits are represented
identically as circuit graphs.

In addition, the inputs and outputs of the circuit must be speciﬁed as such. Inputs,
outputs, and levels can be determined by iterating through the constituent vertices
and declaring vertices with no local inputs as inputs and those with no local outputs
as outputs. Level can be determined by labeling the local inputs as Level 0, the levels
of their outputs as 1, and so on, labeling each vertex with the highest level at which
it is encountered during this traversal.

All circuits considered for identiﬁer generation are expected to contain only fully-
speciﬁed vertices. That is, all inputs of internal gates must be present. The set of
outputs considered during vertex weighting contains only those outputs that are also

within the subgraph.

4.4.2 Correctness of Structural Equivalence Classes

Two graphs G 1 and G2 with adjacency matrices A1 and A2 are isomorphic if there
exists some permutation P for which P(Al) = A2.

Two circuits Cl and C2 with augmented adjacency matrices (Deﬁnition 4.1) A1
and A2 are structurally identical (isomorphic) if there exists some permutation P for

184

 

 

 

which P(Al) = A2.

For clarity, consider the structural identiﬁer of a circuit to be an augmented ad-
jacency matrix that describes the graph. The unique weights assigned to each vertex
during vertex weighting deﬁne a speciﬁc permutation of the vertex ordering for the
augmented adjacency matrix. Any legal permutation of an adjacency matrix still
represents the same graph; a single adjacency matrix can only describe a single graph

structure, by deﬁnition.

Non-isomorphic Circuits Must Have Different Identiﬁers

Theorem 4.2 If two augmented adjacency matrices A1 and A2 are identical then
the circuits CI and C] that they represent are structurally identical.

Proof: Assume that two circuits CI and 02 that are not structurally identical exist
for which the corresponding augmented adjacency matrices A1 and A2 are identical.
As O] and C2 are not structurally identical, there exists no permutation P of A2 such
that P(AI) = A2. Thus, A1 cannot be equal to A2, which contradicts the assumption
that A1 and A2 are identical. Hence, by contradiction, if two augmented adjacency

matrices are identical then the circuits that they represent are structurally identical.

For the purpose of deﬁning subcircuit equivalence classes, it is vital that each
equivalence class contain only those graphs whose structure is isomorphic. Theorem
4.2 states that if the identiﬁers (adjacency matrices) are identical, then the graphs
must be identical. Therefore, two non-isomorphic graphs cannot be placed in the
same equivalence class (have the same identiﬁer).

185

 

 

It is not possible for two non-isomorphic graphs to be represented by the same
adjacency matrix, so they may not be placed in the same equivalence class. This
is important, because it is necessary that no subcircuit be placed into an incorrect
equivalence class. That situation would result in incorrect identiﬁcation of that sub-

circuit.

Each Subcircuit Structure is Generally Represented by Exactly One Iden-

tiﬁer

Although not strictly necessary for the reverse engineering of digital circuits, it is
interesting to note that the vertex weighting assigned by the weighting algorithm
usually identiﬁes a particular permutation of the vertices based on the structure of

the graph, allowing fast isomorphism testing.

The weighting proposed in this chapter restricts the number of vertex permuta-
tions represent isomorphic graphs, so the process provides a signiﬁcant improvement
in the context of module identiﬁcation. The number of vertex permutations that may
be deﬁned for a subcircuit is less than or equal to the number of arbitrary decisions
that are made. Although arbitrary decisions may arise frequently in general directed
acyclic graphs, they are uncommon in this domain due to vertex functionality labels
and the irregular nature of subcircuits.

186

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Original Candidate Contained

Circuit Gates Subcircuits Subcircuits
number I classes time number [classes I time
1-1m Adder 8 18 15 0.01 6 6 <0.01
2-bit Adder 15 114 79 0.04 24 17 0.01
3-bit Adder 22 522 339 0.20 66 41 0.04
z4ml3 39 185,196 164,172 177.11 281 275 0.24
C17 24 3,951 2,937 1.4 1199 1021 0.47
majority 24 12,171 12,171 5.25 993 463 0.61
b1 25 19,980 10,290 8.90 143 105 0.08
cm138a 33 726,032 124,174 516.56 26,652 6402 25.66
cm152a 35 59,962 15,692 44.90 6,484 6158 6.49

 

 

 

 

 

 

 

 

 

 

Table 4.3: Structural Equivalence Classes Applied to Subcircuits and Contained Sub-
circuits

4.5 Results

Experiments were run on a Sun Sparc Ultra-2 server with 1024MB of memory. Results
are shown in Table 4.3 for several small circuits from the LogicSynth93 benchmark
suite [82] (see Chapter 5 for discussion application on larger circuits). Times are
given in CPU seconds. Refer to Table 3.1 on page 136 to see the execution effort of

performing subcircuit enumeration without structural equivalence classes.

 

These results demonstrate the reduction in the number of subcircuits that must
be tested for semantic equivalence to known modules. By applying a solution to
the Subcircuit Identiﬁcation problem to only one instance of each structural equiv-
alence class, the overall performance of the Module Identiﬁcation can be improved
signiﬁcantly, depending on the degree of regularity present in the circuit.

187

 

4.5.1 Implementation

To investigate the application of the structural equivalence classes within the Module
Identiﬁcation Problem, the two steps of the structural identiﬁer generation, vertex
weighting and identiﬁer generation, have been implemented in the C programming
language. The implementation of the vertex weighting procedure follows the algo-
rithm presented in Section 4.2.4, and the identiﬁer generation implements the example

algorithm in 4.3.

The vertex weighting and structural identiﬁer generation techniques are applied
to the subcircuits generated by the code that implements the candidate subcircuit
enumeration technique described in Chapter 3. This allows the subcircuits to be
placed into equivalence classes, which are stored in a hash table with the structural

identiﬁer as the key value.

Correctness

Vertex Weighting A case analysis of the vertex weighting technique has been per-
formed in Section 4.2.3. The implementation of the algorithm performs the sorting
based on the precedence criteria described in Section 4.2.2 by using case statements
to implement each individual criterion. Vertex weighting, based on the technique pre-
sented in Section 4.2 was performed by hand for many small circuits and these results
were determined to be equivalent to the results of executing the implementation on
the same circuits.

188

 

Structural Identiﬁer Generation Again, many small experiments were per-
formed by hand to ensure that correct structural identiﬁers were generated for the
subcircuits.

A testing phase was conducted after the implementation of the two algorithms.
Once a subcircuit had been placed into an equivalence class, it was compared via a
backtracking breadth-ﬁrst traversal, against all other members of the equivalence class
to ensure that they are all structurally equivalent. No subcircuits were incorrectly

placed.

Summary Although formal methods were not applied to ensure equivalence be-
tween the implementations and the algorithms performing the tasks necessary for
subcircuit equivalence classes, comparison of results with hand-determined results
and testing and veriﬁcation within the implementations offer a great level of con-
ﬁdence in the correctness of the implementations. These implementations provide

validation of the theory of structural equivalence classes.

4.6 Summary

This chapter has presented a technique for developing a structural identiﬁer for a
digital circuit. By employing a strict precedence framework, the vertices within the
circuit graph that represents the circuit can be sorted so that each may be assigned
a unique weight. With the exception of vertices that perform identical functionality
with respect to the global structure of the circuit, this weighting will usually generate

189

 

identical weightings for structurally isomorphic circuits.

This weighting may then be applied within an identiﬁer generation method to
develop a structural identiﬁer. An identiﬁer represents exactly one circuit structure.
Within the context of the Module Identiﬁcation Problem, these structural identiﬁers
allow the candidate subcircuits generated by the technique in Chapter 3 to be parti-
tioned into structural equivalence classes so semantic matching may be performed on
only one instance of each equivalence class. This process simultaneously performs syn-
tactic matching and allows subcircuits to inherit information from their equivalence
class.

Although structural identiﬁers can become quite large, their application within
Module Identiﬁcation is possible because the subcircuits to which they are applied
are generally small. Both vertex weighting and identiﬁer generation are easily applied
to these subcircuits.

In addition, structural equivalence classes are equivalent to the template libraries
used within regularity extraction, and the members of the classes describe instances
of that template within the target circuit. Thus, by develOping these equivalence

classes, the ﬁrst two steps of regularity extraction (Section 2.4.1) are solved.

190

Chapter 5

Practical Application of Techniques

It is apparent from considering the results presented in Sections 3.3 and 4.5 that
applying these techniques unmodiﬁed to the circuit to be reverse engineered is not a
feasible approach. Circuits of interest may be millions of gates in size, and considering
that the number of subcircuits can potentially suffer from exponential growth with
each additional vertex, circuits of this size cannot be handled by the techniques as
they have been presented thus far. However, this does not preclude the use of these
techniques in practical application. The techniques and guidelines presented in this
chapter make it possible to effectively apply subcircuit enumeration within Module
Identiﬁcation of any size of circuit.

Details of a parallel implementation of the techniques are presented in Section
5.1. Section 5.2 discusses two methods of reducing the circuit graph order: prelimi-
nary partitioning and preliminary syntactic matching. A technique for reducing the
theoretical complexity and execution time of subcircuit enumeration is presented in
Section 5.3. Hierarchical module identiﬁcation is presented in Section 5.4, and heuris-

191

 

tics for reducing the applications of Subcircuit Identiﬁcation are discussed in Section

5.5. Section 5.6 summarizes the chapter.

5.1 Parallel Implementation

The solution to the subcircuit enumeration problem is “embarrassingly parallel”.
Each vertex of the circuit graph may be expanded individually, and this expansion
proceeds without interaction with any other expansion processes. By developing
a parallel implementation, the execution time required by subcircuit enumeration
technique can be vastly improved.

The computation effort can be divided into many smaller parts by expanding
each vertex of the circuit graph on individual processors to enumerate all candidate
subcircuits derived from that vertex. After each subcircuit is created and labeled
with a structural identiﬁer, it can be returned to the server to be placed into the
proper subcircuit equivalence class for subcircuit identiﬁcation.

The results of performing candidate subcircuit enumeration with a parallel im-
plementation of the Focused Enumeration Algorithm (Figure 3.15 on page 116) on
circuits in the LogicSynth Benchmark Suite [82] are presented in Figure 5.1. The
algorithm was implemented in C and uses the parallel capabilities of Parallel Virtual
Machine (PVM) 3.3 [73]. The experiments were run with a Sun Ultra 2 Model 2300
(dual 300mhz processors) with 1024MB of memory. The client machines included
Sun Ultra 103 and Sun Ultra 53.

The CPU execution times are in seconds, and represent the maximum time nec-

192

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Original Candidate Contained

Circuit Gates Subcircuits Subcircuits
single parallel single parallel
number time max time number time max time

1-bit Adder 8 18 <0.01 <0.01 6 <0.01

2—bit Adder 15 114 <0.01 0.01 24 <0.01 0.01
3-bit Adder 22 522 0.04 0.03 66 <0.01 0.01
C17 24 3,951 0.16 0.08 1,199 0.05 0.02
majority 24 12,171 0.45 0.21 993 0.05 0.01
b1 25 19,980 0.75 0.57 143 0.01 0.01
cm138a 33 726,032 27.64 11.14 26,652 1.05 0.41
cm152a 35 59,962 2.27 1.78 6,158 0.24 0.11
z4ml3 40 185,196 14.64 6.89 281 0.07 0.05
x2 54 104,178,928 4282.80 2118.66 98,322 3.93 2.15

 

Table 5.1: Focused Enumeration Results of Parallel Implementation

essary to enumerate all of the subcircuits from any vertex. The execution time for

the smaller circuits suffered slightly due to the message passing overhead, but in the

larger circuits, execution time was signiﬁcantly improved.

Most of the processing time for enumeration of subcircuits is conﬁned to the

expansion of the vertices with the highest indices; these vertices are the ones which

have the most possible vertices to add because of Rule 1 (page 110). By performing

their expansion in parallel, overall execution time can be signiﬁcantly reduced.

5.2 Circuit Graph Order Reduction

Candidate Subcircuit Enumeration performs an exhaustive generation of the interest-

ing subcircuits within the circuit. Therefore, as the order of the circuit graph grows,

the number of potentially interesting subcircuits can grow eXponentially. To allow

193

 

subcircuit enumeration to be applied in practice, the order of the circuit graphs to

be considered must be kept small (< 100 gates).

The following two techniques provide excellent methods for reducing the order of
the circuit graph without interfering with the effectiveness of the Module Identiﬁcation

effort.

5.2.1 Preliminary Partitioning

Circuit partitioning is absolutely vital to applying candidate subcircuit enumeration
to real circuits. Partitioning allows divide—and-conquer approaches to be applied to
otherwise intractably large circuits, and allows many techniques to be applied more
swiftly to any circuit. Partitioning has been used to achieve higher quality results in
smaller time for standard cell placement, or to determine the appropriate partitions

to be designed as custom ASICs.

A considerable amount of research has focused on the develOpment of algorithms
to perform effective circuit partitioning. Most current algorithms are based on one
of three original circuit partitioning algorithms: Kernigan and Lin [67], Fiduccia and
Mattheyses [45], or Krishnamurthy [70] and have resulted in many useful partitioning
techniques that may be applied to reduce the complexity of module identiﬁcation.
Several surveys are available [3,62]. Current research in partitioning techniques for
VLSI focuses on the partitioning of hypergraphs [26, 27,66], so to take advantage of
current techniques it would be necessary to describe the circuit as a hypergraph for
the purpose of preliminary partitioning.

194

Partitioning techniques for digital circuits focus on decomposing a complex sys-
tem into smaller parts, while minimizing the communication between partitions. The
result of this particular optimization is that functional modules, which are gener-
ally more strongly connected within themselves than to other modules, generally are

completely contained within a partition.

The advantage of partitioning the circuit previous to performing Module Identi-
ﬁcation is that it reduces the order of the circuit graph, thus greatly reducing the
number of candidate subcircuits that can exist within the circuit graph, yet it is not

likely to make partitions that would render functional modules unidentiﬁable.

5.2.2 Preliminary Syntactic Matching

It is possible that stock implementations of some basic modules exist within the
circuit. Syntactic matching techniques are very fast, so examining the circuit for
likely implementations of modules that may exist in the circuit is a reasonable ﬁrst
step. Vertices that have been identiﬁed as being a part of a functional module need not
be considered for inclusion in other modules, thus reducing the number of remaining

subcircuits.

This technique may pose a difﬁculty if the circuit has been optimized so that the
functionality of a cluster of gates is shared by two modules. If the cluster has been
identiﬁed as being part of one of them, the second will not be located because those
vertices are not free to be part of a subcircuit. This situation could be detected by
an unreasonably high percentage of unmatched gates after module identiﬁcation, and

195

 

 

would require intelligent conﬂict resolution techniques, most likely the consideration

of a human engineer.

5.3 Subgraph Order Limiting

Basic functional modules are rarely large; the largest modules are generally composed
of smaller modules. It is reasonable, therefore, to limit the order of enumerated
subgraphs to allow faster exploration of the circuit graph. Effort spent building
subcircuits with hundreds of gates is essentially wasted.

Allowing enumeration to include excessively large subcircuits precludes the appli-
cation of this technique to large circuits because the number of potential subcircuits
may grow exponentially with the size of the circuit. A better solution is to restrict
the number of vertices that may be part of a subcircuit.

When the module library is designed, an order limiting guideline may be deter-
mined for each, based on common implementations and designer input. This is a
preprocessing step that can make use of external knowledge. At runtime, the order
limit for the module identiﬁcation may be set to be the highest suggested order limit
of the library modules of interest.

It is certainly possible that a module could have been designed with an excessive
number of gates to achieve obfuscation, redundancy, or some other design goal. In
this case, the module may not be identiﬁed as a functional module if the order limit
is imposed. The system or an engineer may make a decision to raise the order limit
after an exploration of the circuit based on the number of unmatched logic devices

196

 

 

remaining.

This heuristic is applied during steps 1 and 3 of the subgraphExpand subroutine
of the subcircuit enumeration algorithm described in Section 3.2.3. Before a vertex is
added to a subgraph, the subgraph is tested to determine whether the subgraph has
already reached the order or depth limit. These values are calculated as vertices are
added and stored within the subgraph, so only a simple equivalence test is necessary.

The algorithm with order limiting enforced is presented in Figure 5.1.

 

Order Limited Focused Enumeration Algorithm

0. Transform circuit into circuit G with unique vertex indices.
1. foreach v E V(G):

2. Create a subgraph H0 with V(Ho) (— v.

3. Determine f'R(H0).

4. expandSubgraph(Ho).
5. End foreach.

subroutine expandSubgraph(graph H,)
0. foreach vertex v,-+1 E TR(H,-) (Rule 1):
1. Create a subgraph H,“ such that V(H,+1 (— V(H,) + v,+1.
If order of H,“ exceeds order limit, discard H,“ and return.
2. Determine .FR(H,-+1).
3. Add vertices necessary to make H,“ into a subcircuit (Rule 2).
If order of H,“ exceeds order limit at any time, discard H,“ and return.
4. Output Hi“.
5. expandSubgraph(H’).
6. End foreach.

 

 

 

Figure 5.1: Algorithm for Order Limited Focused Enumeration.

Theoretical Complexity Improvement By enforcing an order limit on the sub-
circuits that may be produced, the number of possible subcircuits can be signiﬁcantly

197

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Original Candidate Order Candidate

Circuit Gates Subcircuits Limit Subcircuits
- number time number I time
3-bit Adder 22 522 0.04 10 28 0.01
C17 24 3,951 0.16 10 509 0.02
majority 24 12,171 0.45 10 1492 0.10
b1 25 19,980 0.75 10 4078 0.32
cm138a 33 726,032 27.64 10 2098 0.17
cm152a 35 59,962 2.27 10 1015 0.15
z4m13 40 185,196 14.64 10 823 0.07
decod 48 N / A - 10 7,450 0.97
decod 48 N / A - 20 30,584,750 1230.68
x2 54 104,178,928 4282.80 10 3826 0.71
rd53 89 N/A - 10 55,354 14.44
4b-alu 79 N/A - 10 4,722 0.97
4b-alu 79 N/A - 20 12,261,533 1926.01
cmb 86 N/A - 20 3,616,868 431.79

 

 

 

 

 

 

 

 

 

 

 

 

Table 5.2: Order Limited Focused Enumeration Results

reduced. In fact, the complexity of the subcircuit enumeration algorithm is no longer
exponential, because the size of the subgraphs is a constant value independent of the
number of vertices in the circuit graph. The complexity of the algorithm in Figure

5.1 is 0(2"), where c represents the order limit imposed on subgraph expansion.

The results of executing an implementation of the limited order focused enumer-
ation algorithm (Figure 5.1) on circuits in the LogicSynth Benchmark Suite [82] are
presented in Figure 5.2. The algorithm was implemented in C. The experiments
were run with a Sun Ultra 2 Model 2300 (dual 300mhz processors) with 1024MB of
memory.

198

r-Tﬁi

 

5.4 Hierarchical Module Identiﬁcation

The execution effort to enumerate subcircuits can be greatly reduced by restricting
the order of the subcircuits to be enumerated. Bottom-up module identiﬁcation can
provide a very effective and fast solution to Candidate Subcircuit Enumeration within

the Module Identiﬁcation Problem.

5.4.1 Module Replacement

To dynamically reduce the order of the circuit graph, as subcircuits are identiﬁed as
equivalent to a library module, they may be replaced by a single vertex that encom-
passes all of the functionality contained in the subcircuit. This module replacement
technique would allow hierarchical module identiﬁcation. For instance, two appro-
priately connected vertices representing 1-bit full adders could be subsumed into a
single vertex representing a 2-bit full adder. Replacing a cluster of vertices with a
single vertex representing that cluster is a common technique in clustering [54].

It would be necessary to develop a method of differentiating between the outputs
of a vertex, because they will not all represent the same function, as do multiple
outputs from a logic gate. The outputs of the modules would need to be represented
by vertices that are implicitly connected to the module vertex.

The difﬁculty with this technique is the possibility of premature subsumption
of vertices that should be shared with another library module. When a module is
replaced by a single vertex representing the functionality of the entire module, then
its component vertices are no longer available to be members of another module. This

199

 

means that no shared functionality is permitted with other vertices. For this reason,
it is best to replace only modules that are represented by contained subcircuits, or to

perform ﬁtness evaluation (discussed in Section 5.5 before replacement.

5.4.2 Primitive Functional Modules

The execution of subcircuit enumeration is greatly aﬂ'ected by the order of the subcir-
cuits to be enumerated, which is dependent on the size of the modules to be located.
To perform module identiﬁcation hierarchically, a set of primitive modules should be
developed. These modules would perform common functionalities that could be used
to build larger functional modules. These primitive modules could reasonably be im-
plemented by fewer than 10 gates and have simple functionality to enhance subcircuit

identiﬁcation performance.

By locating these primitive functional modules, gate clusters could be replaced by
a single vertex representing the primitive functionality they perform. Each identiﬁ-
cation and replacement of a primitive module reduces the complexity of the circuit

graph, and thus the performance of the module identiﬁcation process.

This technique would suffer from the same problem that bottom-up clustering
techniques encounter: covering the circuit correctly, so consideration would need to
be given to that problem during the development and application of primitive mod-
ules. The identiﬁcation of a comprehensive and effective set of functional primitives
provides an interesting focus for future research into solutions to the Module Identi-
ﬁcation Problem, and would be an interesting theoretical study as well.

200

 

 

5.5 Subgraph Fitness Evaluation

It is possible to evaluate how likely a subcircuit is to be a functional module by taking

the following information [32,53] into consideration:

0 the gates comprising a module are likely to be physically close together

— Euclidean distance between gates

— number of edges between gates

o modules are generally contained completely within a power/ ground bounding

box

0 all components of a module are generally driven by the same branch of the clock

tree

0 two gates within a module are likely to be connected less strongly than two

gates outside of a module

By using this additional information, a ﬁtness value can be assigned to each sub-
circuit. If it is unlikely that the subcircuit will be a module or a part of a module,
then the expansion of that subcircuit may halt. This prevents the enumeration of
all of the descendent subcircuits that are also unlikely to represent a module. This
approach can provide a signiﬁcant reduction of eﬁ'ort, because only likely subcircuits
will be tested for equivalence with functional modules.

201

 

 

 

5.6 Summary

Despite the inherently difﬁcult problem of enumerating all of the subcircuits of a
circuit so that their equivalence to a known module may be considered, the techniques
presented in this chapter provide sufﬁcient reduction and focus of the enumeration
effort to allow practical use of the theoretical approaches presented in Chapters 3 and

4.

 

The beneﬁt of using semantic matching and subcircuit enumeration to solve the
module identiﬁcation problem is that all modules in the library may be located,
regardless of their implementation. This provides the most comprehensive covering

of an unknown circuit and thus solves the Module Identiﬁcation Problem.

202

 

Chapter 6

Contributions and Future

Directions

Reverse engineering performs the reverse of the synthesis process to obtain a high-level
description of a digital circuit from a low-level speciﬁcation. For instance, a circuit
may be reverse engineered from the transistor level to the gate level, or from the gate
level to the register transfer level. In each case, the details of the speciﬁcation are
encapsulated to produce a more abstract, but still accurate, description of the circuit.

Reverse engineering is necessary for many reasons, including obtaining information
to maintain a competitive edge, or to ensure that one’s intellectual pr0perty has
not been stolen. Reverse engineering is frequently the ﬁrst step in the redesign or
reengineering process, which is very important for replacement of parts to forestall
obsolescence, or to allow reimplementation exploiting improved technology, or to allow
fault repair and replacement.

One of the major focuses of reverse engineering is to raise the level of abstraction

203

 

IFS—"’3

 

from a lower— to higher-level speciﬁcation. We have formulated this problem as follows:
Module Identiﬁcation Problem [37,38,114]. Given a gate-level logic descrip-
tion (netlist) of a target circuit, efficiently identify all gate clusters (subcircuits) that

perform the function of a known library module.

0 Candidate Subcircuit Enumeration Problem. Identiﬁcation of gate clus-
ters (candidate subcircuits) within the netlist that may comprise a functional

module.

0 Subcircuit Identiﬁcation Problem. Proving functional equivalence between

a candidate subcircuit and a known standard library module.

Most existing techniques to perform the transformation from lower to higher lev-
els rely on syntactic matching techniques to locate functional modules by identifying
clusters of gates with identical structure to a known modules. Syntactic matching is
a very rigid matching technique because it will only identify subcircuits as equiva-
lent whose implementation is identical. Within the Module Identiﬁcation Problem,
syntactic matching requires that an extensive library of module implementations is
available. The computation and memory cost of applying syntactic techniques is
frequently prohibitive.

The approach to Module Identiﬁcation presented in this thesis applies semantic
matching instead, a more general matching technique that determines equivalence of
subcircuits based on functionality, regardless of implementation. To locate all gate
clusters that may be equivalent to a known module, it is necessary to enumerate

all subcircuits of the circuit to be reverse engineered. The enumeration of these

204

 

candidate subcircuits for the purpose of module identiﬁcation is the focus of the work
in this thesis. The enumeration of all subcircuit of a circuit is a complex process. By
ensuring that the enumeration is done efﬁciently, with no duplication of effort, the
task is possible. By focusing the enumeration effort on only the gate clusters that

represent functional segments of the circuit, the enumeration effort is further reduced.

Semantic matching can be computationally expensive, so it is beneﬁcial to re-
strict the number of times that it must be applied. Subcircuits that are syntactically
equivalent are necessarily semantically equivalent. Therefore, syntactically equiva-
lent subcircuits may be grouped into structural equivalence classes, so that semantic
matching must only be applied to a single instance of each subcircuit structure. To
partition the enumerated subcircuits into structural equivalence classes, a structural
identiﬁer can be created for each subcircuit. This identiﬁer provides the equivalence
relation to identify the correct equivalence class. To allow the creation of an effective

identiﬁer, the vertices in the subcircuit are ordered based on structural invariants.

Overall, this work has provided several steps within the reverse engineering process
presented in Figure 1.5. Focused subcircuit enumeration allows the application of
semantic techniques to locate modules and raise the level of abstraction. Structural
equivalence classes ensure that each subcircuit structure must only be considered once
by semantic equivalence identiﬁcation. Thus, the necessary applications of semantic
matching to perform subcircuit identiﬁcation are signiﬁcant reduced without lack of
efﬁcacy.

205

 

 

6.1 Contributions

6.1.1 Module Identiﬁcation Problem

We have developed a well-deﬁned formulation of a speciﬁc step in the reverse en-
gineering process: the location of modules to facilitate the transformation from a
gate-level speciﬁcation to a module-level speciﬁcation (Section 1.3). This formulation
deﬁnes two subproblems: Candidate Subcircuit Enumeration and Subcircuit Identi-
ﬁcation. The work presented in this thesis focuses on developing techniques to solve
the Candidate Subcircuit Enumeration problem and reduce the number of necessary

applications of the solution to the Subcircuit Identiﬁcation Problem.

6.1.2 Unique Subgraph Enumeration

We have represented circuits by a circuit graph, a directed graph in which vertices
represent gates or ﬂip—ﬂops and arcs represent wires and interconnects (Section 3.1). A
naive algorithm for subgraph enumeration was presented in Figure 3.5. This algorithm
produces many identical subgraphs and is therefore very inefﬁcient, but it provides
a framework for developing a more focused algorithm. One of the focuses of the
work described in Chapter 3 is on reﬁnement of the algorithm to avoid duplication of
subcircuits.

The duplicate subgraphs created by the naive algorithm were eliminated by de-
veloping a rule to ensure that each subgraph was created exactly once (Section 3.2.1).
A subset of the subgraph neighborhood, the reachable frontier (Deﬁnition 3.10) is
deﬁned. Instead of allowing any neighboring vertex to be added to the subgraph

206

during subgraph expansion, only vertices on the reachable frontier may be added.
The reachable frontier is dependent on the index of the subgraph and the vertices
that have already been added, and thus enforces an ordering in which vertices may be
added to create subgraphs. The unique subgraph enumeration algorithm is presented
in Figure 3.11 and the proof that this algorithm enumerates each subgraph exactly

once is presented in Section 3.2.3.

6.1.3 Restricted Subgraph Types

The number of subgraphs in any directed graph can be prohibitively large. Even
efﬁcient unique enumeration of subgraphs is not an acceptable solution to the Can-
didate Subcircuit Enumeration Problem. Because candidate subcircuit enumeration
is applied within a speciﬁc domain, in which subgraphs represent gate clusters in a
digital circuit, enumeration may be focused on subsets within the class of subgraphs:

subcircuits and contained subcircuits.

Subcircuits

A signiﬁcant restriction on the number of subgraphs that must be enumerated can
be obtained by focusing the enumeration only on subgraphs that may possibly be
equivalent to a known module. A subgraph representing a non-functional cluster
of gates is not interesting with regard to the Module Identiﬁcation Problem. Only
functional gate clusters, subcircuits (Deﬁnition 3.6), need to be enumerated.

A subcircuit is a subgraph in which none of the vertices include a proper subset of
the vertices representing its input gates. It is important to consider these subgraphs

207

because only they have functional meaning. If only one input of a 2-input AND
gate is known, then the functionality of the AND gate is ambiguous. Therefore, the
functionality of the circuit represented by any subgraph that contains one of these
ambiguous vertices is also ambiguous.

We have modiﬁed the unique subgraph enumeration algorithm to focus on enu-
merating only subcircuits; this algorithm is presented in Figure 3.15. As soon as
a vertex is added to a subgraph, all of the vertices that are necessary to make the
subgraph a subcircuit are added (provided that they are on the reachable frontier).
Subgraphs that cannot have functional meaning are not investigated or enumerated.
This provides a signiﬁcant reduction in computational effort, as evidenced by the

results in Section 3.3.

Contained Subcircuits

When circuits are designed using standard cell libraries, each cell or module is placed
into the circuit and connected together with glue logic. The logic gates that comprise
these modules are connected exclusively to each other, without sharing their func-
tionality with other modules. The modules are represented within a circuit graph by
a contained subcircuit (Deﬁnition 3.8).

With regard to the Module Identiﬁcation Problem, a standard cell or module is
represented by a contained subcircuit. A contained subcircuit is a subcircuit in which
none of the vertices have only a partial set of their outputs also in the subcircuit. If
a vertex has an output are that leaves the subcircuit, then all of its output arcs must
also leave the subcircuit, which deﬁnes the vertex as an output of the subcircuit. If

208

all of the output arcs lead to vertices within the subcircuit, then the vertex represents
an internal gate. If all of the output arcs lead to vertices outside of the subcircuit,

then the vertex represents an output of the subcircuit.

The number of contained subcircuits is signiﬁcantly smaller than the number of
subcircuits, so the enumeration of this set can be performed more quickly than the
entire set of subcircuits. Although contained subcircuit enumeration is not sufﬁcient
to locate all modules in a circuit, it provides a useful preliminary matching technique
for reducing the order of the entire circuit graph, thus reducing the execution effort

necessary to locate all remaining modules by applying subcircuit enumeration.

6.1.4 Circuit Structural Identiﬁer

Another objective of the work presented in this thesis is to reduce the number of
times that the solution to the Subcircuit Identiﬁcation subproblem must be applied.
Many of the subcircuits enumerated by Candidate Subcircuit Enumeration will be
structurally identical, and are therefore semantically identical. It is only necessary to
determine equivalence of a single instance of each set of structures. These are called

structural equivalence classes, presented in Chapter 4.

We have developed a method for determining an effective structural identiﬁer
for the structure of a circuit, which deﬁnes an equivalence relation to partition the
subcircuits into a set of structural equivalence classes. To allow the creation of these
identiﬁers, a well-deﬁned vertex ordering based on structural invariants of the vertices
within the circuit graph was developed. The set of equivalence classes is not minimal,

209

 

 

because more than one identiﬁer may represent a single structure, but it is an eﬂ'ective

reduction technique for module identiﬁcation.

This ordering is develOped by sorting the vertices at each level of the circuit graph
using information about the vertices themselves, their input vertices, and their output
vertices, then applying a unique weight that represents the vertex ordering. If that
information is not enough to induce a total ordering on the vertices, then a second
pass through the circuit graph is performed, taking advantage of the information
developed during the ﬁrst pass. After the vertices have been sorted and weighted by
the algorithm presented in Figure 4.2.4, a total ordering has been deﬁned for the set
of vertices in the circuit graph.

An example of a structural identiﬁer generation technique is provided that creates
a preﬁx logic formula that uniquely describes the circuit structure. Interconnection
information must be preserved correctly, so nets are used to represent networks that
appear more than once within the structural identiﬁer. These identiﬁers can be
used to determine appropriate structural equivalence classes for the subcircuits, thus

reducing the required applications of subcircuit identiﬁcation.

6.1.5 Heuristics for Practical Application of Candidate Enu-

meration

The enumeration and structural classiﬁcation of candidate subcircuits provides signif-
icant improvement to the application of module identiﬁcation, but to handle realistic
circuits containing thousands or millions of gates, several heuristic techniques must

210

 

be applied.

The limiting element of candidate subcircuit enumeration is the order of the cir-
cuit graph that represents the target circuit. In Section 5.2, we described several
techniques for reducing target circuit order, such as preliminary partitioning, prelim-
inary syntactic matching coupled with module replacement, and hierarchical module
replacement throughout the module identiﬁcation process. Another factor that af-
fects the complexity of the enumeration process is the order of the subcircuits that
must be enumerated. We described and justify an order limiting heuristic in Section
5.3 that is based on a reasonable order derived from the size of the library modules.

The process of candidate subcircuit enumeration is embarrassingly parallel be-
cause expansion of a vertex into its descendent candidate subcircuits is a discrete
operation for each vertex in the circuit graph, so employing a parallel implementa-
tion of the enumeration algorithm reduces overall execution time for larger circuit

graphs (Section 5.1) without duplication of effort.

6.2 Future Directions

6.2.1 Candidate Subcircuit Enumeration

Future work within the Candidate Subcircuit Enumeration solution includes a possi-

ble improvement to the implementation of the focused enumeration algorithm.

Rules 1 and 2, when producing subcircuits or contained subcircuits, respectively,
occasionally must discard a subgraph because Rule 1 prohibits the addition of a ver-

211

 

tex or vertices necessary to make constituent vertices fully speciﬁed or contained.
Although the number of subgraphs discarded is small, and the computation improve-
ment provided by Rules 2 and 3 far outweighs this cost, an ideal solution would
proceed without any discarded subcircuits. This may be possible by implementing a
look-ahead algorithm to investigate the validity of a creation path before the actual

subgraph duplication and expansion is performed.

6.2.2 Module Identiﬁcation Problem

The techniques described in this thesis have been designed to be applied within the
solution to the Module Identiﬁcation Problem. There are several improvements that
may be developed that would allow module identiﬁcation to be applied to circuits of
any size. The limitation at this time is the order of the circuit graphs that may un-
dergo candidate subcircuit enumeration because the number of subcircuits increases
rapidly with the order of the graphs. By implementing the following techniques de-
scribed in Chapter 5, circuits of any size may be reverse engineered by the application

of Module Identiﬁcation.

0 Preliminary partitioning (Section 5.2.1): The application of a partitioning tech-
nique to effectively reduce the order of the circuit graphs to be explored may
be applied. If effective partitioning techniques are applied, each partition may

be considered individually without interfering with module identiﬁcation.

0 Module replacement (Section 5.4.1): Subcircuits that have been found to repre-
sent functional modules in the circuit may be replaced by a single vertex that

212

 

 

represents the functionality of the subcircuit. This reduces the order of the
overall circuit graph, and thus decreases the number of candidate subcircuits in

the circuit.

0 Hierarchical module identiﬁcation (Section 5.4): The efﬁciency of enumerating
candidate subcircuits is strongly dependent on the size of the subcircuits that
must be enumerated. By locating smaller modules ﬁrst, and replacing them

with a single vertex, and then considering larger modules that may contain those

 

modules, module identiﬁcation can be performed hierarchically. This allows the
order limit of the enumerated subcircuits to be kept small, thus allowing faster

identiﬁcation of modules.

0 Subgraph ﬁtness evaluation (Section 5.5): Development of an effective ﬁtness
function to determine the likelihood that a subcircuit could comprise a func-

tional module is a future focus for improving the module identiﬁcation process.

6.2.3 System Integration

To allow these techniques to be applied easily to real circuits, many techniques would
need to be integrated to create an effective reverse engineering tool. The following
systems, presented in Figure 1.5 on page 25 need to be designed and implemented

around the solution to Candidate Subcircuit Enumeration presented in this thesis.

0 Netlist partitioning: There are existing partitioning systems that can be applied
at the beginning of the reverse engineering process to allow a divide-and-conquer
approach to module identiﬁcation.

213

o Combinational logic extraction: The current solutions available for semantic
matching are unable to perform equivalence comparisons for sequential circuits.
To allow module identiﬁcation to be performed, the combinational logic should

be extracted from the circuit.

0 Subcircuit enumeration: Enumeration of candidate subcircuits that may be

equivalent to a higher-level functional module.

 

0 Structural equivalence classes: Syntactically equivalent subcircuits are also se-
mantically equivalent. This step places the candidate subcircuits into structural
equivalence classes to eliminate extraneous applications of semantic matching

to syntactically equivalent subcircuits.

0 Semantic matching: Functional equivalence comparison of an instance of a sub-

circuit equivalence class with known functional modules.

0 Circuit covering: If desired, an engineer can view the high-level modules and
determine which ones comprise the best covering of the circuit. In general,
modules are likely to share logic, so this step is not necessary. Modules may
overlap in the module-level representation of the circuit and still present a useful

and accurate view of the circuit.

6.3 Conclusion

The work presented in this thesis provides techniques to solve the Candidate Sub-
circuit Enumeration Problem within the Module Identiﬁcation Problem. Efﬁcient

214

 

enumeration of only those subgraphs of the circuit graph that may be equivalent to
a known module, candidate subcircuits, is the focus of this work. To provide further
improvement, the candidate subcircuits are placed into structural equivalence classes
by creation of a structural identiﬁer. One instance of each class must be tested for
equivalence to known modules. All other members of the class inherit matching re-
sults.

These techniques, when applied in conjunction with a semantic matching solution
to the Subcircuit Identiﬁcation Problem, provide the basis of a tool to perform the

reverse engineering of digital circuits.

215

BIBLIOGRAPHY

216

Bibliography

[1] Alfred V. Aho and Neil J. A. Sloane. Some doubly exponential sequences.
Fibonacci Quarterly, 11:429—437, 1973.

[2] Sherri Al-Ashari. System veriﬁcation from the ground up. Integrated System
Design, January 1999.

[3] Charles J. Alpert and Andrew B. Kahng. Recent directions in netlist partition-
ing: A survey. Integration: The VLSI Journal (1.995), 19:1—81, 1995.

[4] Charles J. Alpert and Andrew B. Kahng. A general framework for vertex
orderings with applications to netlist clustering. IEEE Transactions of VLSI
Systems, 4(2):240 — 246, 1996.

[5] Srinivasa R. Arikati and Ravi Varadarajan. A signature based approach to
regularity extraction. In Proceedings of the IEEE International Conference on
Computer-Aided Design, pages 542 — 545, 1997.

[6] Peter J. Ashenden. The Designers Guide to VHDL. Morgan Kauffman Pub-
lishers, 1995.

[7] Mikhail J. Atallah. Algorithms and Theory of Computation. CRC Press LLC,
1998.

[8] E. Augustus. VLSI circuit layer determination by reﬂectance for use in reverse
engineering. Master’s thesis, Air Force Institute of Technology, 1990.

[9] Pradeep Batra and David Cooke. Hcompare: A hierarchical netlist comparison
program. In Proceedings of the ACM/IEEE Design Automation Conference,
pages 299 - 304, 1992.

[10] Carter Bays. A non-recursive technique for recreating a digraph from its K-
formula representation. The Computer Journal, 19(4):326 — 329, 1976.

[11] C. Benecke, Adalbert Kerber, and Reinhard Laue. Canonical numbering of
3D-molecules. In Proceedings of the International Electronic Conference on
Computational Chemistry, November 1995.

217

[12] J. D. Benstock, D. J. Berndt, and K. K. Agarwal. Graph embedding in
synchem2, an expert system for organic synthesis discovery. Discrete Applied
Mathematics, 19:45 — 63, 1988.

[13] Alfs T. Berztiss. Data Structures: Theory and Practice. Academic Press, Inc.,
second edition, 1972.

[14] Alfs T. Berztiss. A backtrack procedure for isomorphism of directed graphs.
Journal of the ACM, 20(3):365 - 372, July 1973.

[15] J. Bhasker. A Verilog HDL Primer. Star Galaxy Publishers, second edition,
1999.

[16] Ted J. Biggerstaff, Bharat G. Mitbander, and Dallas E. Webster. Program
assignment and the concept assignment problem. Communications of the ACM,
37(5):72 — 82, May 1994.

[17] Michael Boehner. LOGEX — an automatic logic extractor from transistor to
gate level for CMOS technology. In Proceedings of the 25th ACM/IEEE Design
Automation Conference, pages 517 — 522, June 1988.

[18] Daniel R. Brasen and Gabriele Saucier. Using cone structures for circuit par-
titioning into fpga packages. In Proceedings of the Computer Aided Design of
Integrated Circuits and Systems, pages 592 - 600, July 1998.

[19] Randal E. Bryant. Graph-based algorithms for boolean function manipulation.
IEEE Transactions on Computers, C-35(8):677 — 691, August 1986.

[20] Randal E. Bryant. Binary decision diagrams and beyond: Enabling technologies
for formal veriﬁcation. In Proceedings of the IEEE International Conference on
Computer-Aided Design, pages 236 — 243, 1995.

[21] Anthony Bumbalough. Electronic parts obsolescence initiative workshop. Tech-
nical report, Department of the Air Force, Air Force Research Laboratory,
Wright-Patterson Air Force Base, Dayton, OH., April 2000.

[22] H. Bunke and Messmer. Recent advances in graph matching. International
Journal of Pattern Recognition and Artiﬁcial Intelligence, 11(1):169 — 203, 1997 .

[23] Jerry R. Burch and Vigyan Singhal. Robust latch mapping for combinational
equivalence checking. In Proceedings of theIEEE International Conference on
Computer-Aided Design, pages 563 — 569, 1998.

[24] Eric J. Byrne. A conceptual foundation for software re—engineering. In Proceed-
ings of the Conference on Software Maintenance, pages 226 — 235, November

1992.

218

[25] Andrew E. Caldwell, Hyun-Jin Choi, Andrew B. Kahng, Stefanus Mantik, Mio-
drag Potkonjak, Gang Qu, and Jennifer L. Wong. Effective iterative techniques
for ﬁngerprinting design IP. In Proceedings of the ACM/IEEE Design Automa-
tion Conference, pages 843 - 848, June 1999.

[26] Andrew E. Caldwell, Andrew B. Kahng, Andrew A. Kennings, and Igor L.
Markov. Hypergraph partitioning for VLSI cad: Methodology for heuristic
development, experimentation and reporting. In Proceedings of the ACM/IEEE
Design Automation Conference, pages 349 - 354, June 1999.

[27] Andrew E. Caldwell, Andrew B. Kahng, and Igor L. Markov. Hypergraph
partitioning with ﬁxed vertices. In Proceedings of the ACM/IEEE Design Au-
tomation Conference, pages 355 — 359, June 1999.

[28] Gary Chartrand and Ortrud R. Oellermann. Applied and Algorithmic Graph
Theory. McGraw-Hill, 1993.

[29] J. K. Cheng and T. S. Huang. A subgraph isomorphism algorithm using reso-
lution. Pattern Recognition, 13:371 — 379, 1981.

I30] Elliot J. Chikofsky and James H. Cross. Reverse engineering and design recov-
ery: A taxonomy. IEEE Software, pages 13 — 17, January 1990.

[31] Chipworks. Circuit analysis. Available on the world wide web at
http: //www.chipworks.com/ .

[32] Gregory H. Chisholm, Steven T. Eckmann, Christopher M. Lain, and Robert L.
Veroff. Understanding integrated circuits. IEEE Design 62 Test of Computers,
16(2):26 - 37, April - June 1999.

[33] Amit Chowdhary, Sudhakar Kale, N aresh Sehgal, and Rajesh Gupta. A general
approach for regularity extraction in datapath circuits. In Proceedings of the
19.98 International Conference on Computer Aided Design, pages 332 — 340,
November 1998.

[34] Nicos Christoﬁdes. Graph Theory: An Algorithmic Approach. Academic Press
Inc, 1975.

[35] D. G. Corneil and C. C. Gotlieb. An eﬂicient algorithm for graph isomorphism.
Journal of the ACM, 17(1):51 — 64, January 1970.

[36] Travis E. Doom. Design Recovery for Combinational Logic Exploiting Boolean
Relationships. PhD thesis, Michigan State University, East Lansing, MI, May
1998.

[37] "Davis E. Doom, Jennifer L. White, Gregory Chisholm, and Anthony S. Wojcik.
Identiﬁcation of functional components in combinational circuits. Technical
Report ANL/DIS/TM—47, Argonne National Laboratory, January 1998.

219

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]
[49]

[50]

Travis E. Doom, Jennifer L. White, Anthony S. Wojcik, and Gregory H.
Chisholm. Identifying high—level components in combinational circuits. In Pro-
ceedings of the 19.98 Great Lakes Symposium on VLSI, pages 313—318, February
1998.

M. A. Dukes. A generalized extraction system for VHDL. In Proceedings of the
Conference on Advances in Modeling and Simulation, pages 165 - 171, 1994.

Paul J. Durand, Rohit Pasari, Johnnie W. Baker, and Chun che Tsai. An efﬁ-
cient algorithm for similarity analysis of molecules. Internet Journal of Chem-
istry, 2, June 1999.

Carl Ebeling. GeminiII: A second generation layout validation tool. In Proceed-
ings of the IEEE International Conference on Computer-Aided Design, pages
322—325, 1988.

Carl Ebeling and Ofer Zajicek. Validating VLSI circuit layout by wirelist com-
parison. In Proceedings of the IEEE International Conference on Computer-
Aided Design, pages 172 — 173, September 1983.

Steven T. Eckmann and Gregory H. Chisholm. Assigning functional mean-
ing to digital circuits. Technical Report ANL/DIS/TM-43, Argonne National
Laboratory, July 1997.

Adrian Evans, Allan Silburt, Gary Vrckovnik, Thane Brown, Mario Dufresne,
Geoffrey Hall, Tung Ho, and Ying Liu. Functional veriﬁcation of large ASICs.
In Proceedings of the ACM/IEEE Design Automation Conference, pages 650 —
655, June 1998.

C. Fiduccia and R. Mattheyses. A linear time heuristic for improving network
partitions. In Proceedings of the ACM/IEEE Design Automation Conference,
pages 175 — 181, 1982.

The Center for Software Maintenance. The maintainer’s assistant: A tool for
reverse engineering and re—engineering code using formal methods. Available
on the world wide web at http://www.dur.ac.uk/CSM/projects/ma/.

E. Fretheim. Reverse engineering VLSI using pattern recognition techniques.
Master’s thesis, Air Force Institute of Technology, 1988.

Daniel D. Gajski. Principles of Digital Design. Prentice-Hall, Inc., 1997.

Gerald G. Gannod and Betty H.C. Cheng. Strongest postcondition semantics
as the formal basis for reverse engineering. The Journal of Automated Software
Engineering, 3(1), 1996.

Michael R. Garey and David S. Johnson. Computers and Intractability. W. H.
Freeman and Company, 1979.

220

[51] Thomas Griiner, Adalbert Kerber, Reinhard Laue, and Andri Ruckde-
schel. MOLGEN. Available on the web at http://www.mathe2.uni-
bayreuth.de/molgen/ .

[52] Lars Hagen and Andrew B. Kahng. A new approach to effective circuit cluster-
ing. In Proceedings of the IEEE International Conference on Computer-Aided
Design, pages 422 — 427, 1992.

[53] Mark C. Hansen, Hakan Yalcin, and John P. Hayes. Unveiling the ISCAS-
85 benchmarks: A case study in reverse engineering. IEEE Design 6'2 Test of
Computers, 16(3):72-80, July - September 1999.

[54] Scott Hauck and Gaetano Borriello. An evaluation of bipartitioning techniques.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Sys-
tems, 16(8):849 - 866, 1997.

[55] R.’ Hayden. Analysis system for reverse engineering VLSI circuits. Master’s
thesis, Air Force Institute of Technology, 1989.

[56] M. G. Hinchey and J. P. Bowen. High-Integrity System Speciﬁcation and Design.
Springer-Verlag, 1999.

[57] Christoph M. Hoﬂ'man. Group Theoretic Algorithms and Graph Isomorphism.
Springer-Verlag, 1982.

[58] Kai-Ti Huang and David Overhauser. A novel graph algorithm for circuit recog-

nition. In Proceedings of the International Symposium on Circuits and Systems,
pages 1695 — 1698, 1995.

[59] Sung-Woo Hur and John Lillis. Relaxation and clustering in a local search
framework: Application to linear placement. In Proceedings of the ACM/IEEE
Design Automation Conference, pages 360 — 366, June 1999.

[60] Masayasu Ito and Yasuhiro Nikaido. Recognition of pattern defects of printed
circuit board using topological information. In Proceedings of the 11th Inter-
national Electronics Manufacturing Technology Symposium, pages 202 — 206,
1991. -

[61] Jawahar Jain, Amit Narayan, M. Fujita, and A. Sangiovanni-Vincentelli. A sur-
vey of techniques of formal veriﬁcation of combinational circuits. In Proceedings
of the IEEE International Conference on Computer Design, pages 445 - 454,
October 1997.

[62] Frank W. Johannes. Partitioning of VLSI circuits and systems. In Proceedings
of the 33rd ACM/IEEE Design Automation Conference, pages 83 - 87, June
1996.

221

[63]

[64]

[651

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

Andrew B. Kahng, J. Lach, W.H. Mangione-Smith, Stefanus Mantik, Igor L.
Markov, Miodrag Potkonjak, Paul Tucker, Huijuan Wang, and Gregory Wolfe.

Watermarking techniques for intellectual property protection. In Proceedings of
the ACM/IEEE Design Automation Conference, pages 776 - 781, June 1998.

Andrew B. Kahng, Stefanus Mantik, Igor L. Markov, Miodrag Potkonjak, Paul
Tucker, Huijuan Wang, and Gregory Wolfe. Robust IP watermarking method-
ologies for physical design. In Proceedings of the ACM/IEEE Design Automa-
tion Conference, pages 782 — 787, June 1998.

Andrew B. Kahng and Rahul Sharma. Studies of clustering
objectives and heuristics for improved standard cell placement.
http://nexus6.cs.ucla.edu/ abk/papers/report/clu10dist.pdf, January 1997.

George Karypis and Vipin Kumar. Multi-way k-way hypergraph partitioning.
In Proceedings of the ACM/IEEE Design Automation Conference, pages 343 -
348, June 1999.

B. Kernighan and S. Lin. An efﬁcient heuristic procedure for partitioning of
electrical circuits. Bell System Technical Journal, 49(2):291 — 307, January
1970.

Kurt Keutzer. DAGON: Technology binding and local optimization by DAG
matching. In Proceedings of the ACM/IEEE Design Automation Conference,
pages 341 — 347, 1987.

L. Krider. A ﬂow analysis algorithm. Journal of the ACM, 11(4):429 — 436,
October 1964.

B. Krishnamurthy. An improved min-cut algorithm for partitioning VLSI net-
works. IEEE Transactions of Computers, pages 438 — 446, 1984.

Yuji Kukimoto, Robert K. Brayton, and Prashant Sawkar. Delay-Optimal tech-
nology mapping by DAG covering. In Proceedings of the ACM/IEEE Design
Automation Conference, pages 348 — 351, June 1998.

Sandip Kundu. GateMaker: A transistor to gate level model extractor for
simulation, automatic test pattern generation and veriﬁcation. In Proceedings

of the IEEE International Test Conference, pages 372—381, 1998.

Oak Ridge National Laboratory. PVM: Parallel virtual machine.
http://www.csm.ornl.gov/pvm/, 1993.

Yung-Te Lai, Sarma Sastry, and Massoud Pedram. Boolean matching using
binary decision diagrams with applications to logic synthesis and veriﬁcation. In
Proceedings of the IEEE International Conference on Computer Design, pages
452—458, October 1992.

222

 

[75] Richard H. Lathrop, Robert J. Hall, and Robert S. Kirk. Functional abstraction
from structure in VLSI simulation models. In Proceedings of the ACM/IEEE
Design Automation Conference, pages 822 — 828, 1987.

[76] F. Luellau, T. Hoepken, and E. Barke. A technology independent block ex-
traction algorithm. In Proceedings of the 2Ist ACM/IEEE Design Automation
Conference, pages 610 — 615, 1984.

[77] Vijay K. Madisetti. Reengineering digital systems. IEEE Design 82 Test of
Computers, 16(2):15 — 16, April - June 1999.

[78] Vijay K. Madisetti, Yong-Kyu Jung, Moinul H. Khan, Jeongwook Kim, and
Theodore Finnessy. Reengineering legacy embedded systems. IEEE Design 6'2
Test of Computers, 16(2):38 — 47, April - June 1999.

[79] F. Mailhot and Giovanni De Micheli. Algorithms for technology mapping based
on binary decision diagrams and on boolean operations. IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, 12(5):599 - 620,
1993.

[80] Lawrence Markosian, Philip Newcomb, Russell Brand, Scott Burson, and Ted
Kitzmiller. Using an enabling technology. Communications of the ACM,
37(5):58 — 70, May 1994.

[81] Yusuke Matsunaga. On accelerating pattern matching for technology mapping.
In Proceedings of the IEEE International Conference on Computer-Aided De-
sign, pages 118 — 123, 1998.

[82] Ken McElvain. Lgsynth93 benchmark set. Available on the World Wide Web
at http://www.cbl.ncsu.edu/CBL-Docs/lgs93.html, 1993.

[83] RT. Messmer and H. Bunke. Subgraph isomorphism in polynomial time. Tech-
nical Report IAM 95—003, Institut fiir Informatik, 1995.

[84] Giovanni De Micheli. Synthesis and Optimization of Digital Circuits. McGraw-
Hill, Inc., 1994.

[85] Amit Narayan, Adrian J. Isles, Jawahar Jain, and Robert K. Brayton. Reach-
ability analysis using partitioned ROBDDS. In Proceedings of the IEEE Inter-
national Conference on Computer-Aided Design, pages 388 - 393, 1997.

[86] Amit Narayan, Jawahar Jain, M. Fujita, and A. Sangiovanni-Vincentelli. Parti-
tioned ROBDDS - a compact, canonical and efﬁciently manipulable representa-
tion for boolean functions. In Proceedings of theIEEE International Conference
on Computer-Aided Design, pages 547 — 544, 1996.

[87] Raymond X.T. Nijssen and Jochen A.G. Jess. Two-dimensional datapath regu-
larity extraction. In Proceedings of the 5th ACM/IEEE Physical Design Work-
shop, April 1996.

223

 

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

[96]

[97]

[98]

[99]

Raymond X.T. Nijssen and C.A.J. van Eijk. Regular layout generation of logi-
cally optimized datapaths. In Proceedings of the 1.997 ACM/IEEE International
Symposium on Physical Design, pages 42 — 47, April 1997.

Jim Q. Ning, Andre Engberts, and Wojtek Kozaczynski. Recovering reusable
components from legacy systems by program segmentation. In The Working
Conference on Reverse Engineering, 1993.

Jim Q. Ning, Andre Engberts, and \-'\!"ojtek Kozaczynski. Automated support
for legacy code understanding. Connnunications of the ACM, 37(5):50 — 57,
May 1994.

Miles Ohlrich, Carl Ebeling, Eka Ginting, and Lisa Sather. SubGemini: Identi-
fying subcircuits using a fast subgraph isomorphism algorithm. In Proceedings
of the ACM/IEEE Design Automation Conference, pages 31—37, 1993.

Masahiko Ohmura, Hiroto Yasuura, and Keikichi Tamaru. Extraction of func-
tional information from combinational circuits. In Proceedings of the IEEE In-
ternational Conference on Computer-Aided Design, pages 176—179, November
1990.

George Pelz and Uli Roettcher. Circuit comparison by hierarchical pattern
matching. In Proceedings of the IEEE International Conference on Computer-
Aided Design, pages 290 — 293, 1991.

Marek A. Perkowski, Malgorzata Chrzanowska-Jeske, Alan Coppola, and Ed-
mund Pierzchala. An exact algorithm for the technology ﬁtting problem in the
application speciﬁc state machine device. In Proceedings of the International
Symposium on Circuits and Systems, pages 1977 — 1980, 1992.

D. Sreenivasa Rao and Fadi J. Kurdahi. Partitioning by regularity extraction.
In Proceedings of the ACM/IEEE Design Automation Conference, pages 235 —
238, 1992.

D. Srinivasa Rao. On clustering for maximal regularity extraction. IEEE Trans-
actions on Computer Aided Design, 12(8):1198 — 1208, 1993.

M. G. Rekoﬂ, Jr. On reverse engineering. IEEE Transactions on Systems, Man,
and Cybernetics, SMC-15(2):244 — 252, March/April 1985.

Kenneth Rose and David Miller. Constrained clustering for data assignment
problems with examples of module placement. pages 1937 - 1940, 1992.

Patrick Schaumont, Radim Cmar, Serge Vernalde, Marc Engels, and Ivo
Bolsens. Hardware reuse at the behavioral level. In Proceedings of the
ACM/IEEE Design Automation Conference, pages 784 — 789, June 1998.

224

[100] Yehuda Shiran. YNCC: A new algorithm for device-level comparison between
two functionally isomorphic VLSI circuits. In Proceedings of theIEEE Interna-
tional Conference on Computer-Aided Design, pages 298-301, 1986.

[101] T. M. Sobh and J.C. Owen. A sensing strategy for the reverse engineering
of machined parts. Journal of Intelligent and Robotic Systems, pages 1 — 18,
August 1995.

[102] Fabio Somenzi. CUDD: Colm'arlo university decision diagram package.
http://www.bessie.colorado.edu/ f'ahio/CUDD/, 1997.

[103] Sponsored by Argonne National Laboratory. Record of the Argonne/DOD Work-
shop on Reverse Engineering of Digital Systems, Del Mar, California, January
1998.

[104] Ruth V. Spriggs. Identiﬁcation of ,3-sheet motifs in three-dimensional protein
structures, using a subgraph isomorphism algorithm: an update of a 1992 study.
Master’s thesis, University of Sheffield, 1999.

[105] Ronald C. Stogdill. Dealing with obsolete parts. IEEE Design 62 Test of Com-
puters, 16(2):17 — 25, April — June 1999.

[106] Makoto Takashima, Atsuhiko Ikenrhi, Shoichi Kojima, Toshikazu Tanaka,
Tamaki Saitou, and Jun ichi Sakata. A circuit comparison system with rule-
based functional isomorphism checking. In Proceedings of the 25th ACM/IEEE
Design Automation Conference, pages 512 — 516, 1988.

[107] Armistead Techologies. Armistead technologies reverse engineering. Available
on the world wide web at http://\\'\\'w.armtec.net.

[108] W.B. Thompson, S.R. Stark J.C. Owen, H.J. de St. Germain, and TC. Hender-
son. Feature—based reverse engineering of mechanical parts. IEEE Transactions
on Robotics and Automation, February 1999.

[109] Donald G. Truhlar, W. Jeffrey Howe, Anthony J. Hopﬁnger, Jeff Blaney, and
Richard A. Dammkoehler, editors. Rational Drug Design, volume 108 of The
IMA Volumes in Mathematics and its Applications. Springer-Verlag, 1999.

[110] J. R. Ullman. An algorithm for subgraph isomorphism. Journal of the ACM,
23(1):31—42, January 1976.

[111] John F. Wakerly. Digital Design: Principles and Practices. Prentice-Hall, Inc.,
third edition, 2000.

[112] Richard C. Waters and Elliot Chikofsky. Reverse engineering: Progress along
many dimensions. Communications of the ACM, 37(5):23 — 24, May 1994.

[113] Jennifer L. White and Anthony S. \\'ojcik. A technique for unique subgraph
enumeration. Technical Report hlSlT-CSE—99-35, Computer Science and Engi-
neering, Michigan State University. liast Lansing, h'lichigan, October 99.

2'25

[114] Jennifer L. White, Anthony S. \V'ojrik, Moon-Jung Chung, and Travis Doom.
Candidate subcircuits for functional module identiﬁcation in logic circuits. In
Proceedings of the 10th Great Lakes Symposium on VLSI, March 2000.

[115] Qinghong Wu, C. Y. Roger Chen, and John M. Acken. Efﬁcient boolean match-
ing algorithm for cell libraries. In Proceedings of the IEEE International Con-
ference on Computer Design, pages 30 - 39, October 1994.