..
i3?) ,
.5353»... ,
uh 21.3....va hr
3%?»33. he
1., .5... .. .. .
5 E. £159”.
. 2.. z :5.” . 21 h
I”? 4.4.. 5.x

w: Dawn. «.52...

.

“2%?
x 3.... a -

55......533 . . ﬁr.
-muuﬁmmﬁw. a . .
5.539.. a a: sunburn!x 1..

:
g... L
, 2251...: v.
a . . . . a...
.. 3...... . . .

.. z. .
4.0.3 .
Run.
r...

:. 1i. :7
II. ‘7). a)

t
1...: t l. “

£3693.) )I. _
nun! .I). 017. an.

m n. :...w..MH. BACK
at. . .

E.
2..., if...
an.“ we...“

t .51»:
3.1 .3. s 5
3.3.3.4....
k...
1..

SI - .3.

air.

P...
. grit...
:1. .. 2 ..
5!. 3! .. . ”up...
.212; ... ..
.3 s: .. a.
59.53.. . a.-.
59...}... i
.3223: :9 $2
.31.
.9...

. ”zulﬁ35. Ag xx..tlldl‘§
I A .5.“anng 2.....Hv-‘unbuq
h inguiwhﬁpgls Lu...“ .

“.mwumuﬁﬂ .. mzuﬁmﬁgﬁ . a . ., . , ,

 

UNIVERSITY LI BR RSARIE

I90 IIIII IIIIIIII

 

III

 

 

 

 

 

 

 

 

 

 

 

13333 IIIIIIIIIIIIIIIIIIIIIIII

3129301390
(I 5“}57

N

This is to certify that the
dissertation entitled

DESIGN METHODOLOGY IN HIGH LEVEL SYNTHESIS

presented by

Sea Hawon Choi

has been accepted towards fulﬁllment
of the requirements for

Doctor of Philosophy degreein Dept. of Computer Science I

 

 

Dr. Moon-Jung Chung

Major professor

@~m%

 

Date Nay. 15. 1995

MS U is an Afﬁrmative Action/Equal Opportunity Institution 0-12771

 

LIBRARY
Mlchlgan State
Universlty

 

 

 

 

 

 

PLACE ll RETURN BOX to remove thin checkout from your record.

TO AVOID FINES Mum on at bdon dd. duo.

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

DESIGN METHODOLOGY IN HIGH LEVEL SYNTHESIS
By

Sea H. Choz'

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science

1995

ABSTRACT
DESIGN METHODOLOGY IN HIGH LEVEL SYNTHESIS
By
Sea H. 07103

As the complexity of hardware components and the tools involved increases, design
process management becomes the key issue in improving productivity. Modeling the
hardware system to be designed is the first, and one of the most important, steps
of a design process for high level synthesis. In this thesis, new modeling techniques
and a novel design process management scheme for high level synthesis, which help
to increase the productivity and the quality of design, are presented.

The issues involved in the modeling of hardware systems and their integration with
a hardware description language (HDL), especially the ﬁery High Speed Integrated
Circuit (VHSIC) H_DL (VHDL), are addressed. First, several new techniques for
modeling Finite State Machines (FSMS) using VHDL are presented. Then, these
techniques are extended to parallel machines. It is Shown that VHDL is an effective
way of modeling a wide range of systems from a simple FSM to the structure and
behavior of a parallel system. How different modeling techniques affect the quality of

design is also demonstrated.

This thesis also proposes an execution environment which efficiently handles the
design process. The environment evaluates design methodologies and assists design-
ers in selecting and executing appropriate tools and methodologies. It also supports
concurrent exploration of multiple design alternatives in a distributed environment.
A salient feature of the proposed execution environment is that the execution environ-
ment is separated from the specification of methodologies. The proposed execution
environment has been modeled using an Colored Petri-Net model. Both the imple-
mentation of the environment based on this modeling and benchmark results are

presented.

© Copyright 1995 by Sea H. Choi
All Rights Reserved

To my wife and children

ACKNOWLEDGMENTS

First of all, I would like to give my sincere thanks to my wife, Juhee Choi, and
two children, Esther and Eliot Choi, for their patience, endurance, and everlasting
support. I also thank my parents and parents-in-law.

I wish to thank all the individuals who have assisted me in some way during my
years of graduate study. In particular, the professional advice of my thesis advisor,
Dr. Moon J. Chung, has been very beneﬁcial to my development as a researcher and
has helped to sharpen my focus on the many issues involved in hardware modeling

and framework.

vi

TABLE OF CONTENTS

LIST OF TABLES

LIST OF FIGURES

I Introduction

1 Introduction

II Modeling

2 Finite State Machine

2.1 Remote I/O Circuit Functionality ......................
2.2 VHDL Modeling of the Finite State Machine ................
2.2.1 Basic Structure of State Modeling Using VHDL .............
2.2.2 State Transition and Output Generation .................
2.2.3 Subroutine Handling ............................
2.2.4 Interrupt Handling .............................
2.3 Discussion ...................................

3 Modeling of Parallel Architectures and Algorithms
3.1 Examples ...................................

4 Signal Processing in a Custom Computing Machine

4.1 Splash 2 ....................................
4.1.1 Splash 2 Architecture ...........................
4.1.2 Programming Splash 2 ...........................
4.2 Convolution ..................................
4.3 VHDL Implementation ............................
4.3.1 1-D Convolution ..............................
4.3.2 Splash 2 Implementation ..........................
4.4 Experimental Results .............................

5 FSM Modeling Styles and Synthesis Results

6 Discussion

vii

xi

12
13
16
18
20
21
21
24

25
28

36
38
38
41
41
43
44
46
48

53

63

III CAD Framework

7 Previous Work
8 High Level Synthesis

9 Methodology Management

9.1 Design Process Speciﬁcation ...................
9.1.1 Process Flow Graphs ......................
9.1.2 Design Process Grammars ...................
9.1.3 Specification Hierarchy .....................
9.2 Execution Environment .....................
9.3 Tool Encapsulation ........................

10 The CAD Framework: Execution Environment

10.1 Proposed CAD Framework Overview ..............
10.2 Petri Net .............................
10.3 Execution Model .........................
10.3.1 Cockpit .............................
10.3.2 Daemon Processes .......................
10.3.3 Execution Process .......................
10.3.4 Token Semantics Extension ..................
10.4 Backtracking Mechanism .....................
10.5 Multiple Alternatives .......................
10.6 Process Simulation ........................
10.7 Graphical User Interface .....................
10.8 Constraints and Checklist ....................
10.9 Load Balancing ..........................
10.10Version Control ..........................

11 Synthesis Example

11.1 FPGA Synthesis .........................
11.1.1 VHDL Compilation .......................
11.1.2 Placement and Routing ....................
11.1.3 Modified Design ........................
11.1.4 Synthesis Results ........................

IV Conclusion

12 Conclusion

12.1 Contributions ...........................
12.2 Current Implementation Status .................
12.3 Future Work ...........................

000000

64
66
74

77
82
82
84
87
89
89

91

92

93

99
100
104
105
105
111
113
115
116
117
120
122

125
125
126
126
129
129

135

136
138
140
141

APPENDICES 143
A Glossary 143

BIBLIOGRAPHY 149

ix

LIST OF TABLES

5.1 Synthesis Statistics Critical to the Performance and Realization of the
Hardware Subsystem ........................... 54

1.1

2.1
2.2
2.3
2.4
2.5
2.6
2.7

3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8

4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12

5.1

5.2
5.3
5.4

LIST OF FIGURES

Y-Chart .................................... 4
Block Diagram ................................ 13
Remote I/O State Flow Diagram ...................... 15
FSM Implementation ............................. 16
Basic State Implementation ......................... 19
Timing Diagram ............................... 19
State Assign And Output Assign ...................... 20
Stack Operation and State 1000 RETURN ................. 22
General Structure of SIMD Algorithm Model ................ 27
Smallest Number Finding Problem ..................... 29
4-Cube Connection .............................. 30
4-Cube Connection in VHDL ........................ 30
Bit Configuration of Connected PBS in 4-Cube ............... 31
PE Behavior of Batcher’s Sorting Algorithm ................ 33
5X5 Torus Connection ............................ 34
TORUS Connection ............................. 35
Splash 2 Architecture ............................. 39
Processing Element (PE) in Splash 2 .................... 40
Sequential Algorithm of Convolution .................... 42
PE Behavior and Configuration for 1-D Convolution (k = 5) ....... 44
Splash PE Input/ Output Data Paths .................... 45
Behavioral Modeling ............................. 45
PE Entity Description ............................ 46
PE Behavioral Description .......................... 47
Top Module Conﬁguration Description ................... 49
Simulation Result for 1-D Convolution ................... 50
Timing Result Using Memory Lookup ................... 51
Timing Result Using Loop Statement .................... 52
State Transition Diagram for the Splash 2 Implementation of On-The-Fly

Order Algorithm for PE 1 to 16 ..................... 55
Sample VHDL Code for a Waiting States Without Counter (Approach 1) 57
Approach 1 Synthasis Results of PEO for Broadcasting Data using X-Bar 58
Approach 1 Synthesis Results (PEi) for Broadcasting Data using X-Bar . 59

xi

5.5 Sample VHDL Code for a Waiting State Using Counter (Approach 2) . . 60
5.6 Approach 2 Synthesis Result for PEi for Broadcasting Data using X-Bar 61
5.7 Approach 3 Synthesis Results (PEi) for Systolic Modeling Approach . . . 62

8.1 High Level Synthesis Steps .......................... 75
9.1 Block Diagram of CAD Framework ..................... 81
9.2 Production Graph Example ......................... 83
9.3 Production Graph Example 2 ........................ 85
9.4 Expansion Example .............................. 86
9.5 Speciﬁcation Deﬁnition Editor Window ................... 88
9.6 Layout Synthesis Task Hierarchy ...................... 88
10.1 Block Diagram of System .......................... 92
10.2 Petri Net Firing Sequences .......................... 95
10.3 Copy Transition ................................ 98
10.4 Algorithm for Cockpit ............................ 102
10.5 Algorithm for Daemon Process ....................... 105
10.6 Algorithm for Execution Process ...................... 106
10.7 Token Color .................................. 108
10.8 Production Example for Expansion Steps .................. 108
10.9 Expansion Steps ............................... 109
10.10Backtracking Example ............................ 112
10.11Smaller Number of Tokens in the Output .................. 114
10.12Larger Number of Tokens in the Output .................. 115
10.13Production Editor Window ......................... 116
10.14Production Scoring Example ......................... 119
10.15Pre—Evaluation Function Example ...................... 120
10.16Break-Point and Checklist .......................... 121
10.17Version History and Compatibility Checking ................ 124
11.1 Production of FPGA Synthesis ....................... 127
11.2 Decomposition of VHDL Compile ....................... 127
11.3 Initial VHDL Description .......................... 128
11.4 Decomposition of Placement and Route .................. 128
11.5 Modiﬁed VHDL Description ......................... 130
11.6 Decomposition of Bit Generation ..................... 131
11.7 Decomposition of FPGA Synthesis ..................... 132
11.8 PPR and Timing Report Summary ..................... 133
11.9 Graphical Timing Result ........................... 134

xii

Part I

Introduction

Chapter 1

Introduction

The increasing complexity of system design and the emergence of new technologies
make the design process the key issue in the micro-electronics and computer-aided
design (CAD) industries [1]. A complex system design has the following characteris-
tics: hierarchical design, multiple design representations, and a large design space. A
large system design typically includes multiple boards, a variety of implementation
technologies, and interfaces. Design engineers must deal with many issues, such as
partitioning among different boards, interface timing, and packaging.

There are three important issues in system design. The ﬁrst issue is design mod-
eling. Design modeling deﬁnes the functionalities of the design and veriﬁes its cor-
rectness by simulation. These models not only ensure design correctness but also
greatly affect synthesis results. Design methodology or workﬂow management in a
Computer-Aided Design Framework must support a seamless way of carrying out
the design process as well as a suitable way of representing the design process. It
must also support tool encapsulation in order to carry out the design process. Design

2

3

data management deals with storing design data and capturing relationships such as
version control and conﬁguration binding. Design modeling and methodology man-
agement in relation to a novel execution environment and framework developed by
this author (see description below) comprise the main issues discussed in this thesis.
For a detailed analysis of design data management, see Kim [2].

This thesis presents issues involved in the design process in high level synthesis.
First, design modeling techniques in the design process are discussed. Next, method—
ology management is discussed. Issues relating design modeling and methodology
management involve the modeling techniques of systems to be designed and their
integration with Mery High Speed Integrated Circuit (VHSIC) HDL (VHDL). This
thesis emphasizes the system modeling experiences and problems using VHDL (Part
II). Different modeling techniques affecting the design process and the quality of de-
sign will also be shown. Models written in VHDL will be used as input descriptions
to a proposed CAD framework (Part III).

In the relationship proposed by this thesis between modeling and design method-
ology management, modeling the prospective hardware system is the ﬁrst, and one
of the most important, steps of the design process. The relationship between design
process and design hierarchies is represented by the Y-C’hart [3, 4] shown in Fig-
ure 1.1. The design process can be formulated as a methodology traversing through
the design space in a spiral fashion. More detailed information is added when we
traverse toward the center. Typically, complex systems are designed in a hierarchical
fashion using a top-down approach. Systems are modeled by building virtual proto-

types. Desired modules can be gradually replaced by hardware components until the

4

System S thesis

    
      
    
      
  

Structural
Domain

Behavioral

Register-Transfer Synthesis Domain

Flowcharts, Algorithm

   

Process, Memories. Buses Logic Synthesis

  

   

Registers, ALUs, MUXs Register Transfers
Gates, Flip-Flops Boolean Expressions
Tr 3119810“ . Transistor Functions

 

Transistor Layouts
Cells

Chips

Boards. MCMS

Physical
Domain

Figure 1 .1: Y—Chart

ﬁnal hardware is obtained. A top-down, hardware description language (HDL) based
design methodology reduces the overall design time by virtual prototyping, veriﬁca-
tion, and synthesis. The pre—existing HDL model can be reused in the later design of

components.

This thesis has selected an HDL to model hardware systems based on the effect of
the particular language on design quality and time. VHDL was chosen because of its
wide use in modeling the micro-electronic systems at the behavioral and structural
levels. The virtual prototypes based on VHDL can be used in software components of
hardware/software co-design environments. The software model can be veriﬁed using
a software simulation environment or a hardware/ software co—simulation environment.
As this thesis discusses later, standard modeling styles and guidelines are crucial issues

in concurrent design and integration as well as in hardware/software co-design and

integration.

This thesis illustrates the effectiveness of VHDL in modeling a wide range of sys-
tems, from a simple ﬁnite state machine (FSM) to complex parallel systems. Using the
FSM, one of the most ubiquitous, and therefore important, among micro-electronic
control devices, several new techniques of modeling FSMS using VHDL are presented.
These techniques are then used to model the architectures of parallel machines and
applications. The efficacy of these techniques is illustrated by implementing a real
time signal processing application on a Splash 2 computer system. The same ba-
sic technique is used to model and synthesize an application, an on-the-ﬁy sorting
algorithm, in a hardware system. This thesis shows that different synthesis results
are obtained from each different modeling approach and stresses the importance of
modeling style.

As the complexity of computer systems increases, not only the modeling of such
systems but also design methodology management take on critical importance. Ef-
fective design methodology management must support the following functions: spec-
iﬁcation and execution. Speciﬁcation involves representing available methodologies
and encapsulating them so that they can be searched and used. Execution identiﬁes
an appropriate methodology for a given task: this involves decomposing tasks hierar-
chically, selecting tools and invoking them, scheduling tasks with regard to hardware
resources and available data, handling parallel exploration of alternatives, and back-
tracking.

A CAD framework is deﬁned as “a software infrastructure that provides a common

Operating environment for CAD tools [5].” In order to meet the criteria of effective

6

design methodology management, the ideal CAD framework would provide a ﬂexible,
reconﬁgurable, reusable, and controllable design environment to designers. First, the
separation of design methodology speciﬁcation and execution should provide ﬂexibility
to the framework. Second, controllability of the ideal framework would allow the
designer to intervene in the automatic execution of the existing framework. Third,
the execution model of the well-designed framework should allow formality in the
expression of the model itself.

Existing CAD Frameworks, such as ADAM [6, 7, 8], OCT [9], and Monitor [10]
have several shortcomings. Flexibility in these frameworks is lacking since the speci-
ﬁcation of design methodology and execution are not separate and methodologies are
ﬁxed. Secondly, incorporating process metrics in methodology selection is difficult,
representing dependencies between design data and methodologies is impossible, and
systematic search of design space is problematic, all of which reduce controllability.
Finally, since these approaches lack formalism, representing and storing the selected
design process for reuse are not effective. These issues and their effect on high level
synthesis will be addressed later in this thesis.

The CAD framework proposed in this thesis provides unique solutions to the prob-
lematic aspects inherent in other CAD frameworks. First, the proposed framework is
highly flexible since it incorporates the separation of design methodology speciﬁcation
and execution. Second, the execution model of the environment is based on a Colored
Petri-Net model [11, 12, 13], providing formality. Finally, this framework achieves a
high degree of controllability through feasibility of intervention, backtracking, and

simultaneous execution. Thus, this thesis presents a new execution environment for

7

managing the design process in high level synthesis which is able to support multiple
functions.

The thesis is organized as follows: In Part II, various modeling techniques and
examples are presented, showing how VHDL descriptions can be used in the design
process as input ﬁles. In Part III, the proposed CAD framework execution envi-
ronment is explained in detail. Related work and synthesis examples employing the
CAD framework are also presented. Finally, conclusions drawn from the research are

summarized in Part IV.

Part II

Modeling

9

Modeling the micro-electronic system is the important ﬁrst step in the design
process. Proper modeling technique is essential for rapid prototype and hard-
ware/software co—design. Properly modeled components facilitate design reusability.
Efﬁciently developed models can be used in the rapid prototyping of a system to be
designed and can be reused in hardware synthesis. The resulting hardware model is
not required to be synthesizable, although this is desirable. The hardware model can
be implemented quickly and inexpensively using an HDL such as VHDL [14, 15, 16]
or Verilog [17, 18]. The entire system model, both hardware and software, can now
be realistically simulated and tested.

An HDL is necessary for a designer to describe the functionality of micro-electronic
hardware parts. At each stage of the design, rapid prototyping using HDL can be
used to explore design options and ensure that Speciﬁcations meet design function-
ality requirements. Functional veriﬁcation is usually done by simulation of the HDL
speciﬁcation. Initially, the prototype is entirely software, but as the design progresses,
the prototype includes hardware for more and more subsystems until the ﬁnal imple-
mentation is developed. The HDL modeling of such a prototype reduces design errors
and integration risks.

The HDL under consideration is VHDL. Many engineering communities are us-
ing VHDL not only as a design documentation medium but also as a Simulation
medium [19, 20, 21]. VHDL supports a top-down approach by allowing high-level
design abstractions as well as a bottom-up approach.

A single HDL such as VHDL captures the complete speciﬁcation for design as well

as all necessary constraints. Many other design ﬁles may be derived from a single

10

VHDL ﬁle as the design process progresses, but the designer only interacts with and

changes the single VHDL ﬁle. A single HDL ﬁle provides the following advantages:

0 Changing the control for a design involves changing only a single ﬁle rather than

several ﬁles.

a Design ﬁle management is easier because the design speciﬁcation is contained

in one ﬁle and all other secondary design ﬁles are derived from the speciﬁcation.

0 Design consistency is easily maintained for design data within a Single ﬁle.

Since most design ﬁles are derived from a Single VHDL file, different modeling
styles may produce different derived design ﬁles. Hence, the modeling style used is
another important issue in the design process.

In this part, several important issues relating to modeling are presented. New
FSM modeling techniques are proposed. These techniques are especially suitable
for the modeling of large FSMS. By carrying out the actual design process starting
from modeling to synthesis, it will be Shown that different modeling techniques pro-
duce different synthesis results, thus highlighting the critical importance of modeling
style. Benchmark results for 1-D convolution in a FPGA-based custom computing
machine (CCM) and instrumentation system algorithm Show trade-offs between mod-
eling styles and their synthesized results.

Modeling techniques for systems and integration with VHDL are addressed in Part
II, where system modeling experiences and the problems using VHDL are presented.
In Chapter 2, modeling techniques for ﬁnite state machines (FSMS) are discussed.

Parallel algorithms/ architectures can be modeled using VHDL; this technique, along

11

with several examples, is presented in Chapter 3. In Chapter 4, the modeling expe-
rience in a COM is considered. Finally, the relationship between the modeling styles
and the synthesis results are given in Chapter 5. In Chapter 6, conclusions for Part II

are presented.

Chapter 2

Finite State Machine

Almost all micro-electronic circuits are sequential circuits, i.e., they contain memory
or storage elements in the form of ﬂip-ﬂops or latches as well as combinational cir-
cuitry. These sequential circuits are most often modeled using Finite State Machines
(FSMS). A FSM is a mathematical model of a system with discrete inputs, discrete
outputs, and a ﬁnite number of internal states. FSMS are very important in the
micro-electronic devices, especially Since the control circuitry of the system are usu-
ally implemented as FSMs. The choice of modeling technique is important because
different models create different synthesis results.

In this chapter, a modeling technique for FSMS using VHDL is presented. The ap-
plication of this technique to model systems has been demonstrated [22, 23, 24]. The
actual machine selected to model FSM is the Remote I/O Module in the SINCGARS
Radio [25, 26]. Figure 2.1 shows a block diagram of the Remote I/O Module in the
SINC'GARS Radio. Using VHDL, the current system, which uses a microprocessor,
has been retargeted to a behavioral description of an FSM. The modeling technique

12

I .5

13

 

Remote I/O LSI

A A A

 

 

 

 

 

 

 

 

 

VYY

DEE?

 

 

 

 
   

 

 

 

 

 

 

I mm

Figure 2.1: Block Diagram

has been successfully used to implement the SINCGARS circuit [19, 22].

The SINCGARS circuit was chosen for the following reasons. Since the ﬁnal FSM
is large, it provides an appropriate demonstration of the suitability of this approach
to large FSMS. Furthermore, since the existing circuit has already been implemented
and tested, testing the new FSM after replacement of the existing microprocessor is

relatively less problematic.

2.1 Remote I/O Circuit Functionality

The current implementation of the Remote I/O Module consists of the GDP 1802
microprocessor, ROM, RAM, OUTPUT BUFFER, DEC/TIM, and the REMOTE
INTERFACE as shown in Figure 2.1. The microprocessor is used in the SINCG'ARS
Receiver- Tmnsmitter(RT) to control its status. It controls the inputting, decoding,

outputting, and encoding of remote control words sent from the other radio mod-

14

ules. Thus, the radio can be controlled from a distance via a physical wire. The
ROM contains the stored program to perform the function of operating the GDP
1802. The RAM is used for temporary storage during operation of the CDP 1802
system. The OUTPUT BUFFER can be loaded with parallel data which can be
shifted out in a serial data stream. Output clocks and gates are also provided. The
DECODER/ TIMER is designed to decode control signals to the microprocessor and
to provide timing for the system. The REMOTE INTERFACE contains two groups
of latches. One group of latch signals is fed to it by the CDP 1802. These signals are
used for output Signals to the RT. The other group of latches is clocked and continu-
ally latches the latest state of its input lines. Figure 2.2 shows the state diagram of the
Remote 1/ O module. During the microprocessor SCAN routine shown in Figure 2.2,
the latched signals are read by the microprocessor via the data bus. The signals are
analyzed, and appropriate action is taken. The microprocessor also provides direct

memory access to the RA M.

The Remote I/O Module controls the power on and initializes operation of the
RT in the remote conﬁguration. It provides a self-test routine that tests the RAM,
ROM, and remote 2-wire controls. It also provides remote front-panel controls for
the RT: MODE, CHANNEL, RF POWER, FUNCTION, KEYBOARD CONTROL,
and COMSEC. The Remote I/O Module sets and maintains the operating mode
associated with the remote control units as listed above, and sends and receives the
control signal for both control words and baseband information. Control Signals are

provided to modulate or demodulate an F SK carrier for control words.

15

 

 

 

OBKB

COMP

 

IOBFI']

Figure 2.2: Remote 1/ O State Flow Diagram

16

 

state_assign:PROCESS

BEGIN
-- see Figure:State Assign
END PROCESS;
fem:PROCESS(present_state,x) PROCESS -- state 0
BEGIN BEGIN
CASE present-state IS WAIT ON clk;
WHEN 0 => IF (present-state-O
AND elk-’0’) THEN
output <- ’0’; output <- ’0’;
IF(x=’O’) THEN IF(x:’O’) THEN
transition <= 0; transition <- 0;
ELSE ELSE
transition <= 1; transition <2 1;
END IF; END IF;
ELSE
output <-I NULL;
transition <- NULL;
END IF;
END PROCESS;
WHEN 1 => PROCESS -- state 1
(A) Case Statement (B) Process Statement

 

Figure 2.3: FSM Implementation

2.2 VHDL Modeling of the Finite State Machine

In VHDL modeling of FSMS, the case statement is widely used and cited in the
literature [20, 27, 28]. An example is illustrated in Figure 2.3 (A) to model a simple
FSM. However, when the FSM is large, it is useful to decompose the whole FSM
into a number of small communicating FSMS so that each smaller FSM can then be
modeled separately. These smaller FSMS are shown as circles in Figure 2.2. The
modeling approach using case statements cannot handle this kind of decomposition

effectively. In a proposed new approach which is described in this chapter, each small

17

FSM is represented as a single process, as shown in Figure 2.3 (B). There are several
advantages to using the new approach: it is easy to decompose the large FSM into
smaller FSMS; it is easy to manage smaller FSMS instead of handling one huge FSM;
it is easy to update FSMS and maintain FSMS given the greater manageability of
smaller FSMS.

The microcode in the GDP 1802 processor is modeled as a set of FSMS using
VHDL. First, the overall behavioral description of the processor is represented with a
VHDL behavioral description. This description is divided into a number of states in
the ﬁnite state machine. In Figure 2.2, the circles represent main states, and the rect-
angles represent subroutines. These states and subroutines are further divided into
states based on the generation of output Signals, accessing the RAM, and waiting for
the 640KHz clock input. The subroutines in the microprogram are also implemented
as states.

The decomposed states in the original ﬁnite state machine are implemented using
the process statements of VHDL. The FSMS are synchronized to the rising edge of
the 640KHz clock. The interrupt routine is executed when the interrupt signal as
well as the interrupt enable signal are both high.

The major disadvantage of using process is its extensive use of the null as-
signment. If the same signal is driven in more than one process block in VHDL, it
must have a bus resolution function associated with this driver. In the new design,

7

a signal (e.g., “transition,’ which selects the proper next state value) must be driven
from only one process at a time, in other words, the rest of the drivers should be

disconnected at the given simulation time. This resolution function is quite different

18

from those known as “Wired-Or,” “Wired-And,” and “Wired-X” functions, where
all the drivers do not need to be disconnected. Their values are collected and used
to calculate an output signal value of the function. In this design, if all the drivers
except one are successfully disconnected, the resolution function simply returns the
ﬁrst driver’s value because there is only one driver active at that time. In order to

achieve this, a null assignment statement for each driver must be used.

Once the microprocessor is eliminated, the ROM is no longer necessary. The new
block diagram is basically the same as the old one, except that there is no ROM, and

CDP 1802 is replaced by FSM.

2.2.1 Basic Structure of State Modeling Using VHDL

There are several ways of representing the ﬁnite state machine in VHDL. The case
statement is the most frequently used. An example of using a case statement is shown
in Figure 2.3 (A). The other method, recommended here, uses a process statement.

This method is illustrated in Figure 2.3 (B).

Each state has a ﬁxed structure, as shown in Figure 2.4. One process represents
one state. Each process waits for the changing value of clk (wait on statement). The
next if statement controls the execution of the statements in that particular state.
If the conditions are satisﬁed, the sequential statements inside of the if statement
are executed. Each process can have any number of sequential statements. If the
conditions are not satisﬁed, the null assignment statements are executed. At any

given point of Simulation time, only one process among all the processes actively

19

 

PROCESS

-- local variables declarations here if needed

BEGIN
WAIT ON clk;
IF(present-state 8 ? AND elk 8
-- ? represents the current
-- Sequential statements
output <8 new_output; --
transition <= next_state; --
-- Sequential statements
ELSE
output <= null;
transition (8 null;
END IF;
END PROCESS;

’0’) THEN
state number

output assign
state transition

 

Figure 2.4: Basic State Implementation

 

 

1 ..........................
clk ['2
0 ............... g

i

t+ At

v
thachycle
V
t+half_cycle+ A t

Figure 2.5: Timing Diagram

updates the new signal values; the rest of them simply disconnect all the signal

drivers because of the if statement.

The new present state value is assigned at time t, which is the rising edge of the
clock. For VHDL simulation, this value is available after At time interval. This

timing diagram is shown in Figure 2.5. Thus, when the clock is low, the new value is

already available to the process.

I—

20

 

state-assign : PROCESS
BEGIN
WAIT ON elk;
IF(c1k 8 ’1’ AND NOT clk’STABLE) THEN --
IF(interrupt-enable = ’1’ AND interrupt 8 ’1’) THEN --
present_state <8 900; -- interrupt routine --
state-save (8 next-state; --
ELSE
present-state (8 next_state; -- 5
END IF;
-- Output assignments -- 6
END IF;
END PROCESS;

Atom»

 

Figure 2.6: State Assign And Output Assign

2.2.2 State Transition and Output Generation

A state transition occurs at the rising edge of the clock, i.e., at the time t in
Figure 2.5. This condition is checked by statement 1 of Figure 2.6. This new state
value is available after At time delay, speciﬁcally, after the time t+At in Figure 2.5.
Thus, when the clock is ‘0,’ this new value is already available to all the processes in
which the conditions can be checked. This is the case in the normal state assignment
(statement 5) in Figure 2.6. However, if an interrupt request is made (conditions in
statement 2), the next state value is saved (statement 4), and the flow of control goes
to the state “900” (the beginning state of the interrupt handling routine) to serve the

interrupt routine (statement 3).

Any output can be generated at the rising edge of clock at the same time that the

state transition occurs. All the output signals can be assigned at statement 6.

21
2.2.3 Subroutine Handling

Subroutines are also implemented as FSM states. The execution of a subroutine is
accomplished by assigning the beginning state number of the subroutine to transi-
tion and saving the return state number onto the stack. There is an independent
process which handles the stack Operation, shown in Figure 2.7. RETURN can be
implemented by assigning the state “1000” to transition. At the state “1000,” it
simply pops the return address and makes the next transition available, as shown in
Figure 2.7.

Since the procedure in VHDL cannot directly generate the output signal to the
outside of the FSM, it is not possible to use procedures if there is output from the
subroutines. This is one reason why subroutines are implemented by the FSM instead
of using procedures. Each subroutine consists of several states, with the last state
of each subroutine simply restoring the return state to the transition from the stack.
Using the same technique as the main ﬁnite state machine, subroutine calling can be
easily implemented.

The nested subroutine call uses the stack. Since the return address must be saved
somewhere in order to return to the right place, the stack is an effective data structure.

Every time a subroutine is called, the return address is saved onto the stack.

2.2.4 Interrupt Handling

If the interrupt request occurs while the program is executing in any of the states, as

is shown in statement number 2 of Figure 2.6, the next state value is saved and the

22

 

stack-0peration : PROCESS
VARIABLE pointer : INTEGER :8 O;
VARIABLE stack : stack_ty :8 (others 8> 0);
BEGIN
WAIT 0N clk;
IF( clk 8 ’0’) THEN
IF(stack-operation 8 push) THEN
stack(pointer) :8 return_state;
pointer :8 pointer + 1;
next-state <= transition;
ELSIF(stack_operation 8 pop) THEN

pointer :8 pointer - 1;
next_state <8 stack(pointer);
ELSE
next_state <8 transition;
END IF;
ELSE
next_state <8 null;
END IF;
END PROCESS;

PROCESS -- RETURN
BEGIN
WAIT ON clk;
IF(present_state 8 1000 AND clk 8 ’0’) THEN
stack_operation <8 pop;
memory_operation<8 no;
ELSE
stack-operation <8 null;
memory_operation<8 null;
END IF;
END PROCESS;

 

Figure 2.7: Stack Operation and State 1000 RETURN

23

transition is changed to state “900,” the start of the interrupt handling routine. At
the end of the service, the saved state is restored to the signal transition in order to

resume execution.

In the microprocessor, the execution of an instruction is divided into three phases:
fetching, decoding, and executing the instruction. The interrupt request can occur
during any of these three phases. When this happens, the current status is saved, the
interrupt handling routine is invoked, and upon completion of the interrupt handling
routine, the saved status is restored and the control is returned to the point where it

was halted.

In VHDL simulation, no suitable way of implementing this situation can be found.
Interruption can only occur when an instruction is ﬁnished executing. Although
conditions are restricted so that interruption can occur after executing an instruction,
this is not suitable for FSM modeling. Since several microinstructions were grouped
into one state instead of allowing a single instruction per state, interruption should

be allowed after ﬁnishing all instructions in a state.

The interrupt handler is implemented as is a ﬁnite state machine and can be viewed
as a subroutine. When the interrupt request is given and when intermpLenable = ’1, ’
the next state transition is saved onto the Signal variable called state.save, the new
present state then becomes 900, or the beginning of the interrupt handling routine.
Upon completion of the interrupt handling routine, the next state is 1000, which pops
the saved state and restores the right transition information onto the signal variable

present.state.

24
2.3 Discussion

How VHDL can be used to model the FSM and retargeting the current design into a
new design have been shown. Since each state is modeled as one process in VHDL,
this method has several advantages over use of a case statement. Using process
statements gives better readability: each state may contain complicated sequential
statements, decomposition can be easily performed, and only some states can be
compiled and simulated independently within the whole system. Since this approach
is modular, it is easy to manage smaller FSMS instead of handling one large FSM,
and it is easy to update and maintain FSMS because only partial re-compilation is
needed. This style, however, has the limitation of leading to excessive use of NULL
assignment statements, which needs a bus-resolution function [22].

This FSM model is used to implement a hardware/software co—design system [24],
which is explained in Chapter 5. Different synthesis results from several different

FSM models are also summarized in Chapter 5.

Chapter 3

Modeling of Parallel Architectures

and Algorithms

High-performance computing requires both parallel architectures and algorithms that
work well on those architectures. To design a high-performance system, architectures
and their suitable algorithms must also be designed. VHDL can be used to model
parallel architectures and algorithms. For parallel algorithms, actual hardware with
the desired architecture is usually not available and must be Simulated. The tight
coupling between architecture and algorithm requires a language that can describe
both effectively.

For parallel architectures, two computing models are commonly used. In the single
instruction, multiple data (SIMD) model, all of the processing elements (PEs) execute
the same instruction synchronously. In multiple instruction, multiple data (MIMD)
model, each PE has its own program and PBS only synchronize periodically. In either

model, memory may be globally shared so that any PE can access any variable, or

25

26

memory may be local to each PE. In the latter case, PES communicate by passing
messages.

In general, a parallel algorithm on a Speciﬁc architecture can be modeled as follows.
At the top level, a structural description is used to create the processing elements and
connect them using the chosen topology or architecture. A behavioral description in
the architecture of each PE describes the activities of individual nodes. The memory
of each node is modeled by variables within this architectural body of the VHDL
description of the PES.

For regular topologies, such as LINEAR ARRAY, RING, STAR, TREE, COM-
PLETELY CONNECTED, MESH, TORUS, and HYPER-CUBE [29, 30], VHDL generate
statements work well. However, it may take some practice to correct the port maps
using only the indices of the generate statements. Once a topology is developed, it
usually can be easily extended to a larger system.

In the MIMD algorithm, individual PEs may be programmed independently. One
way to handle this is to deﬁne different entities for each. This may preclude the use
of the generate statement to describe the topology. Another solution is to lump the
programs into one architecture and select a program using the if or case statement.

Figure 3.1 illustrates the general structure of a VHDL model for a SIMD parallel
algorithm. The control unit for SIMD algorithms is modeled using a process at
the top level. To be completely faithful to the SIMD model, most of the algorithm
would have to reside in the control unit, and the PE architecture should only deﬁne
the handling of individual instructions. However, individual instructions are of too

ﬁne a granularity. Instead, the algorithm must be divided into meta-instructions-

27

 

-- Top Level

FOR 1 IN 0 T0 ... GENERATE
pe PORT HAP(1ocal-clk, command,
..);
END GENERATE;

contrl: process

end contrl;

-- Individual Node
ENTITY pe IS
PORT(loca1-c1k : IN BIT;
command : IN INTEGER;
..);
END pe;

ARCHITECTURE arch-pe 0F pe IS
BEGIN

PROCESS(loca1_clk)

VARIABLE k: INTEGER:=O;

BEGIN

CASE command IS
WHEN ... =>
WHEN ... 8)

END;
END;

 

Figure 3.1: General Structure of SIMD Algorithm Model

28

which handle sections of the code between necessary synchronization points. The
process for the control unit issues these meta-instructions and the process for each

PE executes the corresponding section of code.

3.1 Examples

In order to solve a given problem, ﬁnding a suitable algorithm and the best parallel
architecture associated with the algorithm is needed. This is the driving program of
each processing element. In MIMD, each PE executes different programs, while in
SIMD, each PE executes the same instructions one at a time.

The ﬁrst problem is ﬁnding the minimum among n different numbers, each of
which is assigned to one node in a HYPERCUBE. The algorithm is a typical example
of reduction. In each cycle It, every PE compares its data with its neighbor’s data
along the k — th dimension, then saves the smaller number. After log(n) cycles, every
PE has the smallest number that is the solution. The VHDL code in Figure 3.2
implements the algorithm when n = 8.

The implementation of the algorithm in VHDL requires the modeling of a HY-
PERCUBE. A diagram of a HYPERCUBE with dimension 4 (called 4-Cube) is shown in
Figure 3.3, and the VHDL code for the 4-Cube interconnection is Shown in Figure 3.4.
Here, dim0 is used as the ﬁrst dimensional connection such as X direction, dim] is
used Y direction, dim? is used Z direction, and ﬁnally, dim3 is used for the fourth
dimension. The individual PE’S id number is set by using id(i), and inidata(i) is used

to initialize the data for simulation if necessary.

29

 

LIBRARY WORK;

USE WORK.ALL;

ENTITY pe IS

PORT(00,01,02: OUT INTEGER;

iO,i1,i2: IN INTEGER;
id: IN INTEGER;
loca1_clk: IN BIT;
1: IN INTEGER);

END pe;

ARCHITECTURE arch_pe OF pe IS
BEGIN
PROCESS (local-clk)
VARIABLE k: INTEGER:80;
VARIABLE data: INTEGER;
TYPE int_array IS ARRAY(0 T0 2) OF INTEGER;
VARIABLE in_temp, out-temp: int_array;
BEGIN
in_temp(0) :8 i0;
in_temp(1) :8 i1;
in-temp(2) :8 12;
IF ((local,clk’EVENT) AND (loca1_clk 8 ’0’ )) THEN
IF (k80) THEN

data :8 x;
END IF;
out_temp(k) :8 data;
k :8 k+1;
END IF;

IF ((local_clk’EVENT) AND (local_c1k 8 ’1’ )) THEN
IF ( in_temp (k-l) < data ) THEN
data :8 in_temp(k-1);
END IF;
END IF;
00 <8 out_temp(0);
01 <8 out_temp(1);
02 <8 out-temp(2);
END PROCESS;
END arch_pe;

 

Figure 3.2: Smallest Number Finding Problem

30

 

Figure 3.3: 4-Cube Connection

 

3: FOR 1 IN 0 T0 15 GENERATE
id(i) <8 1;
p0: pec PORT MAP(dim0(i), dim1(i), dim2(i), dim3(i) “-(1) OUT’s
dim0(i+(1-2*(i mod 2)», "(2) IN’B
dim1(i+(1-2*((i/2) mod 2))*2 ),
dim2(i+(1-2*((i/4) mod 2))84 ).
dim3(i+(1-2*((i/8) mod 2))* 8 ).
id(i), clk, inidata(i));
END GENERATE;

 

Figure 3.4: 4-Cube Connection in VHDL

 

IO ‘1 a...

 

 

31

0010

 

 

 

Figure 3.5: Bit Conﬁguration of Connected PEs in 4-Cube

Since each PE has the same hardware conﬁguration and executes the same pro-
gram, there is only one PE description, which is saved in the design library. The
network of PBS can be instantiated using PORT NAP statements. In addition, since
the structure of the connection is regular, the GENERATE statement is used to instan—
tiate all PES.’ Each PE is connected to 4 PEs whose id differs only in one bit position.
Figure 3.5 illustrates the connections of PE 0. The formula, i+(1-2*((i/ 281m) mod
2))8288n, in (2) of Figure 3.4, is used to ﬁnd such a bit conﬁguration. Here i rep-
resents the relative id number of each PE, and n represents the n-th bit position of
id.

The next example implements Butcher’s Sorting Algorithm on a TORUS network.
The VHDL code segment shown in Figure 3.6 is an implementation of the algorithm
based on Nassimi’s paper [31]. Each PE receives commands from the top module,
which may be regarded as an SIMD controller, and responds to those commands.

The commands are very simple, such as shift.data_left, ..right, -up, and -down, or

32

change_data between different components within the same PE. Operations related
to a single PE are implemented by sending corresponding commands and parameters
to each PE.

A 5-by-5 TORUS is shown in Figure 3.7, and its VHDL code is shown in Figure 3.8.
Again, the regularity of TORUS permits the use of a generate statement to instantiate
all the components. In the VHDL code, the signals conO, con1, con2, and con3
are used to connect each PE in all four directions. The wrap-around is implemented
using integer mod functions.

The second program shown in Figure 3.6 describes the implementation of an actual
SIMD PE. A PE receives a command, such as left.shift or changeOandZ. The execution
of this instruction is synchronized at the control signal locaLclk.

Note that the VHDL implementation of the second problem more closely model
the behavior of SIMD machines than the implementation of the ﬁrst problem. In
SIMD machines, a PE has no program to run. Each PE only responds to instructions
sent from the controller or control signals such as clock. However, this implementation
in the second example still uses some meta command characteristics. For example,
the command changeOand2 contains more than one instruction. In the ﬁrst example
(Figure 3.2), however, each PE contains most of the algorithmic code. The controller
sends only synchronization Signals as opposed to instructions.

Describing MIMD machines is not an easy task, especially when the machine is
dynamically conﬁgured. Each PE may have a different program (instruction) to run;
thus, it may not be feasible to use the generate statement. Each PE must be coded

independently and connected separately.

33

 

-- ... deleted
ARCHITECTURE arch-pe OF pe IS
BEGIN
PROCESS(local_clk)
-- ... deleted
rowin_right_temp :8 rowin-right;

CASE command IS
WHEN left-shift 8>
IF(local-c1k 8 ’0’ AND local_clk’EVENT) THEN
IF(colid MOD (2*par1) + par2 <8 2*par1) THEN
rowout-1eft_temp :8 pe1;
END IF;
END IF;
IF(local-clk 8 ’1’ AND local-clk’EVENT) THEN
IF(par2 + colid MOD (2*par1) - par1 >8 0) THEN
pe1 :8 rowin_right_temp;
END IF;
END IF;
-- ... deleted
WHEN changeOand2 =>
IF(local_clk 8 ’0’ AND local-clk’EVENT) THEN
IF(rowid MOD (2*par1) < parl) THEN
IF(sort_order 8 1) THEN
IF(pe2 < peO) THEN

temp :8 peO; peO :8 pe2; pe2 :8 temp;
END IF;
END IF;
IF(sort_order 8 -1) THEN
IF(pe2 > peO) THEN
temp :8 pe0; pe0 :8 pe2; pe2 :8 temp;
END IF;
END IF;
END IF;
END IF;
-- ... deleted
WHEN OTHERS 8> NULL;
END CASE;
-- ... deleted

result <8 p60;

rowout-left :8 rowout_left_temp;
END PROCESS;
END arch_pe;

 

Figure 3.6: PE Behavior of Batcher’s Sorting Algorithm

34

 
   
 
     

con3(3)

 

  
 

1)
eon1(l)

con2(2
con 1(2)

Figure 3.7: 5X5 Torus Connection

35

 

g:FOR i IN 0 TO 4 GENERATE
31: FOR j IN 0 TO 4 GENERATE
id(i*5+j) <8 1*5+j;

p0: pec PORT MAP(con0(i*5+j), --(1) EAST out
con1(i*5+j), --(2) SOUTH out
con2(i*5+j), --(3) WEST out
con3(i*5+j), --(4) NORTH out
con0(((i*5+j-1) mod 5) + i85), --(5) WEST in
con1(((i-1)*5+j) mod 25). --(6) NORTH in
con2(((i*5+j+1) mod 5) + 185), --(7) EAST in
con3(((i+1)*5+j) mod 25), --(8) SOUTH in
id(i*5+j), Clk); --(9) PE id and Clock
END GENERATE;
END GENERATE;

 

Figure 3.8: TORUS Connection

VHDL has been shown to be powerful enough to describe various parallel architec-
tures and their timing behaviors. Using VHDL, parallel algorithms can be simulated
and the performance of parallel architectures measured. To model a speciﬁc parallel
application and synthesize to a speciﬁc parallel hardware, Splash 2 was used as a

signal processing application. This application is covered in the next chapter.

Chapter 4

Signal Processing in a Custom

Computing Machine

Signal processing applications are computation-intensive, primarily because of the
large amount of data to be handled in a very short time. Single processor machines
cannot attain the desired performance, parallel machines and application-speciﬁc in-
tegrated circuits (ASICs), therefore, are used to obtain the desired speed. However,
these options are costly, moreover, once the ASIC is built, the design is very difﬁcult
to be changed.

Field Programmable Gate Arrays (FPGAS) have gained considerable attention
recently because of their reconﬁgurability. FPGAs allow a new form of computing
where the architecture of a computer may evolve over time, changing to ﬁt the needs
of each application it executes. With an FPGA-based machine, architecture can be
tailored to meet the desired performance for a given application.

The reconﬁgurable architecture, which has been adopted, is Splash 2, developed

36

37

and built by the Supercomputing Research Center (SRC). Splash 2 is a special pur-
pose attached parallel processor having processing elements (PES) based on user pro-
grammable Xilinx 4010 FPGA chips. The Splash 2 system consists of a Sun SPARC-
station as a host, an interface board, and Splash array boards ranging from one to
sixteen boards. Each array board consists of 16 PBS with linear connections as well

as a reconﬁgurable crossbar interconnection between PES.

The Splash 2 system can be used to enhance existing applications by using its mul-
tiple PBS. The applications Should be modeled appropriately to the Splash 2 system.
The Splash 2 system does not have any ﬁxed instructions; any sets of instructions
can be designed in this system. Thus, modeling is very important step in the design

process when using the Splash 2 system.

VHDL is used to model computational algorithms as well as architectures. Sim-
ulation is used to verify the parallel model. Then, from the VHDL description, the
compiler retargets the algorithms and architectures into FPGAS. The role of VHDL
in this system is multifaceted: it is used as both the speciﬁcation (modeling) lan-
guage and the implementation language. VHDL describes both algorithms within
user applications and the necessary hardware to realize them (including processing

units, memories, and interconnections) [32].

In this chapter, research work to model and synthesize a signal processing appli-
cation using VHDL targeted to a parallel custom computing machine, Splash 2, is
described. Different modeling approaches are greatly affected in the ﬁnal synthesized

results: this aspect is summarized in Section 4.4.

38
4.1 Splash 2

Splash 2 [33] is an attached special purpose parallel processor where each processing
element is a user programmable FPGA chip. The architecture of Splash 2 can easily
support parallel applications, such as systolic or data-parallel computations. Splash
2 has been developed and modiﬁed from the Splash 1 system [34], which consisted
of a ﬁxed size linear array of Xilinx 3090 FPGA chips. Splash 2 incorporates several
improvements over Splash 1. Splash 2 is based on newer hardware technology, the
Xilinx XC4010 FPGA. A crossbar has been added to connect PES on a board in
any pattern. The linear path was the only conﬁguration in Splash 1. The program-
ming environment of Splash 2 centers on VHDL, in place of SRC’S Logic Description

Generator (LDG) [35, 36, 37].

4.1.1 Splash 2 Architecture

Splash 2 is attached to a Sun SPARCstation host. Figure 4.1 shows the Splash 2
architecture [38, 39]. The host is connected to Splash 2 via an interface board. The
host can read from and write to memories on the Splash processing boards via this
interface board.

Each Splash 2 processing board has 16 processing elements, X1 through X16, with
one special PE, X0, controlling the data ﬂow into the processor board. Each X,- is a
PE built around a Xilinx 4010 FPGA chip. A crossbar connection can be programmed
by X0. The processing element organization is Shown in Figure 4.2. Each PE has

512 KB of attached memory, which has a 16-bit word Size. The host can access this

 

.H J ..
tint.

 

 

39

Splash Boards

 
         
 
   
 
   
 
    
   

  

X1X2X3X4X5X6X7

Crossbar

X16 X15 X14 X13 X12 X11 X10

Interface Board

    

    

X1 X2 X3 X4

  

X5 X6X7

      
   
 
   
 

XL
Crossbar

X16 X15 X14 X13 X12 X11 X10

X1X2X3X4X5X6X7

Crossbar

X16 X15 X14 X13 X12 X11 X10

 

Figure 4.1: Splash 2 Architecture

40

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SBus Read 256k By 16
RD Memory RD
SBus Wnte WR WR
SB
Addfess 74> Addresil/ Data
32 / /
/ 18 / 16

SBus %\/
Data 32 M T

Processor 0 Processing

“nub" Element (PE)

To Left / / To Right
Neighbor / / Neighbor
36 36
/
/ 36

 

To Crossbar

Figure 4.2: Processing Element (PE) in Splash 2

memory via the SBus.

Each PE can communicate using the SIMD Bus (left-right neighbor data paths)
or the crossbar. Data can be broadcast using the crossbar. The Splash 2 architecture
supports systolic and pipeline modes of computation, SIMD, or data-parallel mode
(PES execute the same instructions on different data streams), or even MIMD mode
(PES execute different instructions on different data streams). The Splash 2 system
can run at a maximum clock speed of 40 MHz where the maximum is limited by the
FPGA technology. The actual operating speed is determined when the FPGA logic

is synthesized.

41

4.1.2 Programming Splash 2

The programming environment for Splash 2 is based on VHDL [14, 40]. The be-
havioral description is analyzed, simulated, and synthesized onto Xilinx FPGAS. An
application for Splash 2 is developed by writing its behavioral description in VHDL,
and the description is iteratively reﬁned and debugged with the Splash 2 system sim-
ulator. After the description is veriﬁed to be functionally correct by simulation, it is
translated into a Xilinx net list form. The net list is then mapped onto the FPGA
architecture by an automatic partition, placement, and routing tool to form a load-
able FPGA object module. A static timing analysis tool is then applied to the object

module to determine the maximum operating speed.

To program Splash 2, each of the PES should be programmed, i.e., X 0 through X16,
and the host interface. The host interface is responsible for data transfers between
host and the Splash 2 board. For this purpose, the system provides a C-language

interface and C programs are written for the host’s tasks.

4.2 Convolution

An important class of signal and image processing algorithms is based on the convolu-

tion of two signals. For analog one-dimensional signals f (t) and g(t), the convolution

h(t) is deﬁned as

h(t) = /+°°g(z)f(x — t>dz (4.1)

-CXD

42

1. Input: A 1-dimensional vector f(t), a mask vector g(t).
2. Output: A l-dimensional result vector h(t).

3' Begin Assume 1: PES are available.

The it" PE holds the g(i) mask value.
On each PE:
Receive from left neighbor: signal value f(i), partial sum S(i-l).
Compute new partial sum S(i) = S(i—l) + f(i) * g(i).
Send Signal value f(i) and partial sum S(i) to right neighbor.
End.

Figure 4.3: Sequential Algorithm of Convolution

 

Discrete-time one-dimensional signals reduce to the following equation:
+oo
h(t) = 2 gene — t) (42)
-00

Further, if the signal g(t) is a ﬁnite-time duration signal (called the mask signal),

then the summation range changes, so that

h(t) = Z g(xlf (1‘ - t) (43)

where k is the mask size.

On a sequential machine, convolution is easily implemented, as shown in Fig-
ure 4.3. However, a sequential computer may not be practical when convolution
needs to be done in real-time for a large number of data points. For parallelizing a
convolution computation, two approaches can be taken [41]: (i) data parallel comput-
ing and (ii) systolic computing. Data parallel computing uses a divide—and-conquer

approach to deal with the large amount of data. Usually f (t) has a large number

43

of data points (spread over a large time domain) compared to the mask signal g(t).
Hence, a set of processors can be used to compute on shorter segments of the data
in parallel. This computational model assumes that each PE is powerful enough to
carry out all computations and that signal values are already available at each PE. If
this latter assumption is not the case, then the data path from host to PBS becomes
a bottleneck, preventing the distribution of data to the PES. This problem can be
overcome by using a systolic approach. This requires only that a single data path
exist between the host and 1: PBS and that the PEs be powerful enough to perform
the add and multiply operations.

A set of PBS are used as a linear array. The basic convolution algorithm translates
into the above systolic algorithm. The input is fed into the PE array at the left input
of the ﬁrst PE with the partial sum initialized to zero. At the output of the last PE,
the ﬁnal result is obtained.

Clearly, the algorithm outputs one result every clock cycle after the initial pipeline
latency. The overall model is schematically described in Figure 4.4. If the number
of PBS available is smaller than the number of mask values, several virtual PBS to a
physical PE should be mapped, in which case, the PEs need wider communication

paths and require more cycles to produce results.

4.3 VHDL Implementation

In this section, the implementation of a 1-D convolution using VHDL is described. In

the implementation of this convolution, the PES are conﬁgured as a linear array and

44

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PE
left_in -—" right_out <= left_in; r——* right_out
sum_right <= lefLin ‘ mask + sumJeft; ,
sum_left —' —. sum _nght
Memory Memory Memory Memory Memory
left_in —8I —-> right_out
PE PE PE PE PE _
sum_left —H ‘-—‘> sum_nght

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.4: PE Behavior and Conﬁguration for 1-D Convolution (k = 5)

a systolic method is used. In a 1-D convolution, each signal value (ranging from 0
through 255 in the applications being targeted) is input into the left-most PE at each
clock cycle, and this input is multiplied by the mask value. For each signal value, the

input value and partial sum are passed to the next PE to the right for accumulation.

4.3.1 1-D Convolution

A Signal value, left.in, is received at each clock cycle and is multiplied by the mask
value, mask. The partial sum, psum, is simultaneously received and is added to this
newly-formed product. The signal value and the new partial sum, sum, are then
passed to the right PE for accumulation. The VHDL program representing this PE
behavior is shown in Figure 4.6. The conﬁguration is shown in Figure 4.5.

In the program, the calculation of the accumulation is shown at line 4 in Figure 4.6.
Synchronization with the clock is achieved by waiting for the clock transition (line 1)

in Figure 4.6.

45

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Memory
Amhaﬁ’ f”“
’18 ’ 16
/ value(7 downto 0) ‘ value(7 downto 0)
1’8 1’8 >
m—i“(35,d°“‘° 0.) ] partial sum(23 downto s) K partial sum(23 downto 8) right_out(35 downto 0)
I I I .1 l _
36 716 Buf + 7 16 r r 36 V
unused(35 downto 24) unused(35 downto 24)
1 I L
\ ’ 12 ’ 12 '

 

 

 

PE

Figure 4.5: Splash PE Input/Output Data Paths

 

WAIT ON valid_clk;

psum :8 sum-left;

IF (left_in>0) AND (psum>80) THEN
sum :8 mask * left_in + psum;

ELSE
sum :8 psum2;

END IF;

right_out <8 1eft_in;

sum_right <8 sum;

1
l
poorer-s

 

Figure 4.6: Behavioral Modeling

46

 

ENTITY Xilinx_Processing_Element IS

GENERIC(
bd-id : INTEGER :8 0; -- Splash Board ID
pe_id : INTEGER :8 0); -- Processing Element ID
PORT (
XP_Left : INOUT DataPath; -- Left Data Bus
XP_Right: INOUT DataPath; -- Right Data Bus
XP_Xbar : INOUT DataPath; -- Crossbar Data Bus

XP_Xbar_EN_L : OUT BIT_VECTOR(4 DOWNTO 0);

-- Crossbar Enable (low-true)
XP,Clk : IN BIT; - Splash System Clock
XP_Int : OUT BIT; - Interrupt Signal
XP-Mem-A: INOUT MemAddr; - Splash Memory Address Bus

XP_Mem_D: INOUT MemData; -- Splash Memory Data Bus
XP_Mem-RD-L : INOUT RBit3; -- Memory Read Signal
XP-Mem_WR_L : INOUT RBit3; -- Memory Write Signal
XP_Mem_Disable: IN BIT; -- Splash Memory Disable
XP_Broadcast : IN BIT; -- Broadcast Signal
XP_Reset : IN BIT; -- Reset Signal

XP_HSO : INOUT RBit3; -- Handshake Signal Zero
XP_HSI : IN BIT; -- Handshake Signal One
XP-GOR-Result : INOUT RBit3; -- Global OR Result Signal
XP_GOR_Valid : INOUT RBit3; -- Global OR Valid Signal
XP_LED : OUT BIT); -- LED Signal

END Xilinx_Processing-Element;

 

Figure 4.7: PE Entity Description

4.3.2 Splash 2 Implementation

The 1-D convolution program is ported to the Splash simulator in order to simulate
and synthesize for the Splash architecture and for Xilinx FPGAS. The entity descrip-
tion [36] of each PE is shown in Figure 4.7. The signals are self-explanatory and are

shown schematically in Figure 4.2.

The behavioral code for a 1-D convolution using VHDL cannot be directly syn-

thesized into a Xilinx FPGA chip, since an FPGA consists of a ﬁnite, and therefore,

47

 

WAIT until XP_Clk’EVENT AND XP_Clk 8 ’1’; -- 1
psum :8 psum1;

psum1 :8 bvtoi(left_in(23 downto 8)); --
-- sum :8 mask8bvtoi(1eft-in(7 dosnto O))+psum; --
Address(7 downto 0) <8 left_in(7 downto 0); --
sum :8 bvtoi(Data)+psum; --
right_out(7 downto 0) <8 left_in(7 downto 0); --
right_out(23 downto 8) <8 itobv(sum,16); --

#0301.th

 

Figure 4.8: PE Behavioral Description

limited number of logic devices. The logic synthesized directly from the VHDL de-
scription includes 8-bit multiplication, which consumes too many gates to ﬁt into the
Xilinx chip. To reduce the logic requirements, the multiplication operation is con-
verted into a table look-up operation using the memory of the PES. The VHDL code
segment for the Splash PE is shown in Figure 4.8. A diagram of the PE input/ output

data paths is shown in Figure 4.5.

Each PE is synchronized by the rising edge of the clock, XP.Clk (line 1) of Figure
4.8. The effect of lines 4 and 5 is Shown in the comment of line 3: a memory look-up is
used in place of multiplication. Since each signal is 8 bits wide, there are 256 distinct
values possible for each data point. The 8-bit input is shown in line 4. It is used
as an address of a 256-word look-up table, which holds the results of the 256 values
times the mask value (a single constant for that PE). The 256 multiplications can
be calculated in advance and load these values into the PE’S memory. The loading
of these values is done from the host of the Splash system before the convolution

execution starts. The Signal value is used as the look-up table address.

48

Once the Address is set to a certain value (line 4), the value in the memory location
pointed to by Address is loaded into a variable Data at the next clock cycle. Thus the
multiplication result is available at the next clock cycle in Data (line 5). One buffer
is used for restoring a temporary result (psum1 at line 2) since Data is available one
clock cycle later. Before the partial sum is calculated, conversion of the signal of the
bit-vector type into an integer type is needed. This is done by using the function
bvtoi. Similarly, itobv is used to convert an integer to a bit-vector. The input signal
value and partial sum are packed and passed to the right neighbor (lines 6 and 7).

Once the PE program is complete, the Splash system can be conﬁgured. This
is done by programming the VHDL top model. A portion of this code is shown
in Figure 4.9. A predeﬁned interface board conﬁguration is used. By assigning the
generic constants such as input. dot, the system can be tailored (line 1). Line 2 speciﬁes
one Splash board. A speciﬁc interconnection between PBS is obtained by loading the
initial conﬁguration from a ﬁle, here named zcrossbar (line 3). Each PE is conﬁgured
by using a predesigned component. Line 4 marks the section of code for PE 1, line 5

for PE 2; the rest of the PEs are deﬁned similarly.

4.4 Experimental Results

Splash simulation indicates the correct behavior of the model, as determined by ana-
lyzing Figure 4.10, the simulation waveform. The synthesized result achieves a clock

rate of 18.5 MHz for the 1-D convolution, as shown in Figure 4.11.

The timing results show that different operations can run at various maximum

49

 

CONFIGURATION top OF Splash_System IS
USE ENTITY Interface.Interface_Board(Structure) -- 1
GENERIC MAP (Input_file1 8> "input.da ",
Output_file1 8> "result.dat",
Fi1e_Type 8) Hex, Clock_Freq 8> 20);
USE ENTITY S2Board.Splash2_Boards(Structure) -- 2
GENERIC MAP (Number_0f_Boards 8> 1);
USE ENTITY S2Board.Splash_Crossbar(Behavior) -- 3
GENERIC MAP (Config_File => "xcrossbar“);
FOR xparts(1) -- PE 1 i -- 4
FOR ALL : Xilinx_Processing_part
USE ENTITY WORK.Xilinx-Processing_Part(conv_1d);
END FOR;
FOR ALL : Memory_Part
USE ENTITY S2Board.Memory_Part(Dynamic)
GENERIC MAP (Load_File 8> "lookup01.dat");
END FOR;
END FOR;
FOR xparts(2) -- PE 2 -- 5
FOR ALL : Xilinx-Processing_part
USE ENTITY WORK.Xilinx_Processing_Part(conv_1d);
END FOR;
FOR ALL : Memory_Part
USE ENTITY S2Board.Memory_Part(Dynamic)
GENERIC MAP (Load_Fi1e 8) "lookup02.dat");
END FOR;
END FOR;

 

Figure 4.9: Top Module Conﬁguration Description

50

 

 

 

 

 

 

 

 

«- mmmmrmm f;

 

 

 

 

 

. > armpurmwn

 

 

/ 0 ..

. . - mrmrrpurrnxm

‘\ I.

f x ,1,

 

" VARMWARWXﬁCLK

fa

 

 

.....- L__.____l-MML___JM~L___J‘M"‘I
1

 

. .- mrsnmmmmnsy --

 

 

 

~ ' mmyamns)

mmmryrnxm ‘
f mrneunwrcznpu) ‘

7-.....”

hwy—r.

 

 

C-....«.~y—-_-....,..x.-......A...'

 

~ - mrmrrpurmm)

H MARTSOWABTMECLK

- warm a)

 

 

 

mnemmmmnn
‘ . “rm-mm” ..........

 

 

 

...—......

 

 

 

3 mmmymm l. . a . ... _

 

an'rmarptrrmqm,
.‘ “’“TWPABTIXP5LK

 

«mammasmm _

 

 

 

[In
MWWOATMB!)
j PMIM.W11 I: .-...-
: Wmmmmn I 3 "

,«-.-.~_.-».«._~—._.~~-.

v-o-nw‘

 

.~~-.-n

.~—---.-.——~~m.—-~u~~.—~~. «a.» ~..-.~..~.-—-~~~~a-.

 

 

ABTMNLommaj
ermnpurmm) - ﬂ
WMTWMTM_¢I.K
stswmmrmsg,
i 'WI-mmmam ‘ .
Tssyxemmsrnmmxz 3) r

...............................

«...—....~~-/

_,,...,,,..-,,.....,.......

K
" 3‘ "
r t r. \‘
‘ . - . . -- ~ '1'» A. . . ~ ‘4 AV ‘-

 

\
/

 

 

..... .... _ '
,. r \ ’ \ -
’ X’ r‘ I
. x , ,.
.- ' ~ ’ \ .
a A ‘~,~-.\.—-.n.~~—- '_.~~“~_. I I ‘ ‘
...-.-.-...-.-.-..l r..................1 [...-.-m.--...-.—l [............. .
a "nuns-..-..- 1.....--.-.. -
1 t
.-. .........._. _ ...-.......J..._... .. ,

 

 

\
.’ \

 

Li- ::
II '
ii :

 

 

WMHWWIM .f

‘5
3 E
:Q:
: 2
j .
I

 

 

 

 

 

J—U ‘

     

.p',‘

‘ia:‘-4.“".I""3 J' - " -"- " C: x. u ‘ ‘-
. “n. -. r- -.— .v‘ v

 

Figure 4.10: Simulation Result for 1-D Convolution

 

‘ . I
. '3‘.
E -.;-.'w.'

'8. «.... .

...; bl. ....

 

Figure 4.11: Timing Result Using Memory Lookup

clock rates. For example, in Figure 4.11 several operations can run at a maximum
clock rate of 18.5 MHz, while other operations can run at a maximum rate of 40 MHz.
The 18.5 MHz clock rate becomes the maximum operating speed. Figure 4.12 shows a
different timing proﬁle obtained by using a different PE implementation. Speciﬁcally,
multiplication is performed as a series of additions. For non-negative mask values, the
signal value is repeatedly added a number of times equal to the mask value. Using
this method, the performance decreases to a maximum clock rate of 9.5 MHz, as

shown in Figure 4.12.

52

., . .. ...)
{...T .3. E...»

...... w::3» #24. .11).; . .vaﬂwswza
A.

s
, r

.L

a

 

ing Loop Statement

ing Result Us

.12: Tim

4

igure

F

Chapter 5

FSM Modeling Styles and

Synthesis Results

A single FSM, which represents a software component of a hardware/software co-
design of an Instrumentation System (IS) [42], is implemented using VHDL in several
different ways. The same FSM modeling style presented in Chapter 2 is used to
implement the software component of an IS. A hardware/software approach using
reconﬁgurable hardware and software modules to design an IS for improving system
performance in a real-time system is presented in the literature [24]. This approach is
different from well-known hardware and hybrid (hardware and software) monitoring
approaches. The Splash 2 custom computing machine (CCM) is used, transferring
the critical parts of the analysis modules of an IS to programmable logic to ensure
that other real-time analysis and optimization tasks are not affected by the latency
of these modules.

Three different modeling styles were implemented and to synthesize each design.

53

54

 

 

 

 

 

 

 

 

# of Clock
Max. Clock Cycles Needed % Utilization
Rate in MHz for each # of States of Available
PE # (Clock Period) Iteration Needed CLBs
0 18.9 (52.8 nsec) 13 21 73
1-16 17.9 (56 nsec) 13 28 50

 

Table 5.1: Synthesis Statistics Critical to the Performance and Realization of the
Hardware Subsystem

In this chapter, the different synthesis results obtained from the different modeling
styles are presented. State diagrams are used to model the behavior of the subsystems
in VHDL. Figure 5.1 shows the state diagram for each PE in Splash 2. Commercial
synthesis tools from Synopsis and Xilinx, integrated into the Splash 2 environment
directly, use this representation to generate logic conﬁgurations for the PBS in Splash
2. This synthesis process provides various statistics regarding the design of the hard-
ware subsystem critical to its performance, for instance, timing information. Different

VHDL representations of the same function may result in different timing behaviors.

The maximum clock rate at which each PE can operate is represented in the
second column of Table 5.1. PE 0 can operate Slightly faster than the rest of the
PEs; however, the overall performance is restricted by the Slower PES, i.e., PES 1 to
16. Thus PES 1 to 16 affect the ﬁnal operating speed, 17.9 MHz (56.0ns), though
PE 0 can operate at a maximum speed of 18.9 MHz. The number of cycles needed
(13 cycles) for handling each event record is the same for all PEs (as shown in the
column three of the table), despite the total number of states in PE 0 and PBS 1 to
16 are 21 and 28, respectively, shown in the column four. The last column shows the

utilization rate of the conﬁgurable logic blocks (CLBs) of Xilinx 4010 chips. PE 0

Memory Read for ID

0000
l

.. .. o
2
4
® Data Read from Xbar
00:10 Assign to Buff
Send Event w vent not related to this PE

0%

 

 

Figure 5.1: State Transition Diagram for the Splash 2 Implementation of On—The-Fly
Order Algorithm for PE 1 to 16

H
i

2' a"!
‘.

56

utilizes more logic than the other PES.

The actual execution latency of the hardware subsystem implementing the on-
the-fly algorithm can be found from the number of states for each iteration of the
algorithm. As shown in Table 5.1, every PE goes through 13 states during each
iteration of the algorithm. A transition occurs at every rising edge of the clock. Since
the feasible operating clock rate is 17.9 MHz (i.e., 56 nsec period), the latency per
iteration of the algorithm is 56nsec * 13 = 728nsec. One iteration of the algorithm
that returns one output (valid or NULL event record) takes the same amount of time,

regardless of the number of states.

Some PES stay in waiting states several times. These states are needed either
for waiting for the completion of Read Operation or for synchronization with other
PES. The actual code of these states is the same, except for their state numbers.
These waiting states can be implemented in two different ways: by using as many
separate waiting states as possible or by using counters to stay in the same state for
the required number of cycles. In the remainder of this chapter, different modeling
approaches and their synthesis results are summarized. Three different modeling
approaches, (1) using Crossbar connection of Splash 2 to broadcast data and no
counter is used, (2) using Crossbar and with counter, and (3) using Serial connection

instead of using crossbar, are described below:

0 Approach 1: Using Crossbar without Counter
This approach utilizes the crossbar connection of Splash 2. Many duplicate

waiting states, e.g., states 3, 4, and 20 through 27 in Figure 5.1, are used in

57

 

PROCESS
BEGIN
WAIT ON elk;
IF (present_state 8 "10100" AND elk 8 ’1’) THEN
transition <8 "10101"; -- state 21
ELSE
transition <8 NULL;
END IF; -- end of state 20

END PROCESS;

PROCESS
BEGIN
WAIT ON elk;
IF (present_state 8 "10101" AND elk 8 ’1’) THEN

transition <8 "10110"; -- state 22
ELSE
transition <8 NULL;
END IF; -- end of state 21
END PROCESS;

 

Figure 5.2: Sample VHDL Code for a Waiting States Without Counter (Approach 1)

this approach. Input data are broadcasted to all PES via the crossbar. Thus,
each datum is processed in 13 cycles. All PEs examine the same input data
concurrently. Sample waiting states are Shown in Figure 5.2. The result of this
approach is summarized in Table 5.1. The corresponding synthesis results for

PEO and the rest of the PES are shown in Figure 5.3 and Figure 5.4, respectively.

0 Approach 2: Using Crossbar with Counter
This is a modiﬁcation of approach 1. Here, the VHDL model uses an integer

counter to repeat the waiting states. Counters, which are mapped to adders,

58

 

Maximum clock speed: 18.9 MHz (52.8ns)
Number of states: 21
Number of cycles needed: 13

No. Used Max Available 2 Used

 

 

 

Occupied CLBs 293 400 73%
Packed CLBs 139 400 34%
Bonded I/O Pins: 115 160 71%
F and G Function Generators: 105 800 13%
H Function Generators: 26 400 6%
CLB Flip Flops: 279 800 34%

IOB Input Flip Flops: 0 160 0%
IOB Output Flip Flops: 0 160 0%
Memory Write Controls: 0 400 0%
3-State Buffers: O 880 0%
3-State Half Longlines: 0 80 0%
Edge Decode Inputs: 0 240 0%
0

Edge Decode Half Longlines: 32 0%

 

Figure 5.3: Approach 1 Synthesis Results of PEO for Broadcasting Data using X-Bar

59

 

Maximum clock speed: 17.9 MHz (56.0ns)
Number of states: 28
Number of cycles needed: 13

No. Used Max Available % Used

Occupied CLBs 200 400 50%
Packed CLBs 95 400 23%
Bonded I/O Pins: 82 160 51%
F and G Function Generators: 80 800 10%
H Function Generators: 18 400 4%
CLB Flip Flops: 191 800 23%
IOB Input Flip Flops: 0 160 0%
IOB Output Flip Flops: 13 160 8%
Memory Write Controls: 0 400 0%
3-State Buffers: 0 880 0%
3-State Half Longlines: 0 80 0%
Edge Decode Inputs: 0 240 0%
Edge Decode Half Longlines: 0 32 0%

 

Figure 5.4: Approach 1 Synthesis Results (PEi) for Broadcasting Data using X—Bar

60

 

PROCESS
VARIABLE counter : INTEGER :8 0;
BEGIN
WAIT ON elk;
IF (present_state 8 "10100" AND elk 8 ’1’) THEN
IF counter 8 5 then
XP_Mem_RD_L <8 ’1’;
XP_Mem_WR_L <8 ’1’;
-- ... other operations

transition <8 "11001"; -- go to state 25
ELSE
counter :8 counter + 1;
transition <8 "10100"; -- same state
END IF;
ELSE
transition <8 NULL;
END IF; -- end of state 20
END PROCESS;

 

Figure 5.5: Sample VHDL Code for a Waiting State Using Counter (Approach 2)

are used to reduce the total number of waiting states. Although the number of
states can be reduced to 21 (compare to approach 1, which requires 28 states),
the synthesis result from this approach operates a slightly more slowly than the
previous approach and occupies more space because of adders and comparators
(50% vs. 58% CLB occupancy rate). The portion of the VHDL code used in

this approach is shown in Figure 5.5; synthesis results are shown in Figure 5.6.

0 Approach 3: Using Splash 2’s Serial Connection
In Splash 2, the system can be used for systolic application by using its left-
to-right serial connections. In this approach, a crossbar connection is not used;

instead, PEs are connected serially. Input data are fed from PE1, and after 8

61

 

Maximum clock speed: 15.0 MHz (66.5ns)
Number of states: 21
Number of cycles needed: 13

No. Used Max Available % Used

 

 

 

Occupied cuss 233 400 58%
Packed CLBs 94 400 23%
Bonded I/O Pins: 82 160 51%
F and G Function Generators: 188 800 23%
H Function Generators: 55 400 13%
CLB Flip Flops: 172 800 21%

IOB Input Flip Flops: 0 160 0%
IOB Output Flip Flops: 1 160 0%
Memory Write Controls: 0 400 0%
3-State Buffers: 0 880 0%
3-State Half Longlines: 0 80 0%
Edge Decode Inputs: 0 240 0%
0

Edge Decode Half Longlines: 32 0%

 

Figure 5.6: Approach 2 Synthesis Result for PEi for Broadcasting Data using X-Bar

62

 

Maximum clock speed: 11.2 MHz (89.4ns)
Number of states: 15
Number of cycles needed: 8816

No. Used Max Available % Used

 

 

 

Occupied CLBs 304 400 76%
Packed CLBs 120 400 30%
Bonded I/D Pins: 118 160 73%
F and G Function Generators: 241 800 30%
H Function Generators: 82 400 20%
CLB Flip Flops: 232 800 29%
IOB Input Flip Flops: 160 0%

O

IOB Output Flip Flops: 13 160 8%
Memory Write Controls: 0 400 0%
3-State Buffers: 0 880 0%
3-State Half Longlines: 0 80 0%
Edge Decode Inputs: 0 240 0%
Edge Decode Half Longlines: 0 32 0%

 

Figure 5.7: Approach 3 Synthesis Results (PEi) for Systolic Modeling Approach

cycles each datum is passed to the right neighbor. An integer counter is used
to reduce the number of waiting states. The synthesis results of this approach

are shown in Figure 5.7.

Chapter 6

Discussion

In Part II, modeling is shown to be an important ﬁrst step in the design process. The
modeling techniques described in the previous chapters are used to implement such
system components, and these models are used in high-level synthesis process.

The importance of modeling issues has been demonstrated in Part II. Several mod-
eling techniques of FSM and the implementing FSM based on VHDL modeling are
presented in Chapter 2. Different modeling styles affect not only design reusability
and simulation results but also the synthesized results. In particular, a rapid pro-
totype of a signal convolution application has been developed, and it is synthesized
and realized using Xilinx FPGAS for the Splash 2 system. It is described in Chapter
4. Another example of modeling styles affecting synthesized results is illustrated in
Chapter 5 with the on-the-fly sorting algorithm in an Instrumentation System.

VHDL has been Shown to be suitable to model a system component, parallel
architectures, and parallel algorithms. Several parallel architectures and algorithms

are modeled using VHDL and these are described in Chapter 2 and 3.

63

Part III

CAD Framework

64

65

Computer-Aided Design Frameworks are design environments consisting of design
tools that aid design activities. The CAD framework’s support for the design process
has three parts: speciﬁcation, execution, and services. Speciﬁcation corresponds to
how tasks can be decomposed, what tools are available, and how they may be used.
Execution is concerned with what methodology or process to select for a given task,
what tool to invoke, and how to invoke it. Services support the coordination of
subprocesses and enforce consistency. Although the concept of the CAD framework
can be applied to many different engineering design disciplines, the discussion here is
concerned with high-level synthesis only.

Part III describes a proposed execution environment for a CAD framework high-
lighting the selection and execution of design methodologies by system guidance, as
well as the separation of the proposed execution environment from design process
speciﬁcation, unique to this system.

Part III is organized as follows: In Chapter 7, existing CAD frameworks are re-
viewed and their strengths and weaknesses summarized. In Chapter 8, high level
synthesis processes are discussed. Chapter 9 presents an overview of methodology
management and summarizes design process speciﬁcation. The proposed CAD frame-
work’s execution environment is presented in Chapter 10. Finally, in Chapter 11, the

synthesis example using the implemented CAD framework is illustrated.

Chapter 7

Previous Work

Research efforts in design methodology management have been made as part of the
CAD framework. One approach to ﬁnding a methodology is to utilize information
about individual tools and to search for a sequence that accomplishes the overall task.
The Design Planning Engine, which is an expert system, is used to build a design
plan. The Design Planning Engine of the ADAM system [6, 7, 8] produces a plan
graph using a forward chaining approach. Acceptable methodologies are Speciﬁed
by listing pre-conditions and post-conditions for each tool in a lisp-like language.
Also, estimation programs are used to guide the chaining. The shortcomings of the
ADAM system are that it is not amenable to simultaneous exploration of multiple
alternatives, and it is difficult to modify with predictable results.

Another system with a Similar approach is Chippe [43, 44, 45], in which Brewer
et al. proposed the concept of “knobs and gauges. ” The evaluator generates design
quality measures (gauges) such as area, delay, performance, component usage fre-
quency, and others. The planner then uses these measures to set the design style

66

67

and strategies (knobs), such as connectivity style, pipelining level, and optimization
order. A design critic uses the “knobs and gauges” paradigm to reach design goals.
In this approach, iterative design procedures are applied at the design hardware com-
ponent level only, since the design process is ﬁxed. Iteration is done by changing
design parameters and/or constraints. The iterative application of different design
methodologies is not supported.

Ulysses [46, 47] and Cadweld [41, 48] use blackboard systems to control design pro-
cesses. A knowledge source, which encapsulates each tool, views the information on
the blackboard and determines when the tool would be appropriate. Ulysses [46, 47]
uses blackboard architecture to construct the methodology. In CADWELD [41, 48],
task requests are posted to a blackboard; CAD Tool Objects (CTOS) then volunteer,
and the requester chooses among them. Here, CTOS describe tool applicability and
invocation mechanisms. This system may also have the same shortcomings as stated
above.

MINERVA [49] and the OCT task manager [9] use hierarchical strategies for plan-
ning the design process. Hierarchical planning strategies take advantage of knowledge
about performing complex tasks involving several subtasks. Hierarchical planning, as
used in MINERVA [49], is more effective because knowledge about how to perform
logical subtasks can be applied more naturally. However, a severe limitation of this
system is that it only allows a manual mode: every selection decision must be made
by the designer.

NELSIS [50, 51, 52] provides Flowmaps for representing sets of methodologies.

Flowmaps can be exponentially large if there are many alternatives. In the NELSIS

68

system, the flow plan is ﬁxed. Rules are used to invoke a particular tool. A procedural
language in which tasks are deﬁned is used. Since tasks are executed manually by
designers in the sequence they choose, this system does not possess the ability to
deﬁne ﬂow dynamically. Furthermore, no parallelism is allowed.

Kleinfeldt et al. review some important concepts in the design process manage-
ment area and give an extensive overview of the existing systems [53].

Recently, Martin-Marietta and Lockheed have developed CAD frameworks to sup-
port the design process of Rapid Prototyping of Application Speciﬁc Signal Processors
(RASSP) [54, 55, 56]. In both systems, a workﬁow shows possible ways a task can be
carried out. Alternative methodologies are represented by OR nodes in a workﬂow.
During the execution time, one of the nodes joined by OR is selected. Therefore, the
actual execution and the speciﬁcation of design methodology are not separate. To
specify methodologies, Lockheed uses its own internal representation. In the RASSP
Design Environment developed at Martin-Mariette, Integration DEFinition (IDEF)
language [57, 58] is used to represent a workflow, and a dataﬁow graph is generated
from the workflow to Show the dependency between the tasks. IDEF includes both a
deﬁnition of a graphical modeling language (syntax and semantics) and a description
of a comprehensive methodology for developing models. Both systems are targeted
mainly for environments in which methodologies are ﬁxed. Both systems have several

shortcomings:

o It is not easy to incorporate process metrics in selecting methodologies.

Methodologies must be selected based on the trade-off determined by process

69

metrics in an ideal system. Examples of the process metric entries are relia-
bility, development time, manufacturing cost, life cycle, system defects, soft-
ware/ hardware reuse, inter-operability, physical constraints such as volume and
weight, and power constraints. These process metrics are usually known as the
design process proceeds and cannot be determined in advance. However, in both

approaches, workﬁow is determined in advance before actual design starts.

Representing dependencies between design data and methodologies is not pos-
sible.

Methodologies selected during the design process depend on what data are avail-
able and how previous tasks have been carried out. For example, when there
are multiple tools for the same tasks which are not interchangeable at the lowest
level, representing this Situation is problematic in either approach. Sometimes,
one tool or methodology selected in the previous step may affect the tools to
be selected later. Since dependency cannot be represented in either system, the
CAD framework cannot help the designer to select methodologies which satisfy

such constraints.

Systematic search of the design space is very difﬁcult.

The CAD framework should help the designer select the design methodology
that best satisﬁes the requirements. If the chosen methodology is not satisfac-
tory, alternative methodologies must be pursued. Also, the designer should be
able to investigate multiple alternatives simultaneously (with different design

parameters or with different versions). However, the lack of formalism in both

70

approaches makes it difficult for the CAD frameworks to search the design space

efficiently.

In [59, 60], a formal approach to design methodology management has been pro-
posed. In contrast to many other approaches, this new approach separates the spec-
iﬁcations of methodologies and the execution environment. In [59, 60], the process
grammar is used to deﬁne all possible methodologies. The details of how method-
ologies are selected and executed are left to the designers. Designers must supply
manager programs to select and execute methodologies.

Several research efforts have used high-level Petri Nets [11] to model the execution
environment of software systems. DECOR [61] and HILDA [62] each use an extension
of Predicate Transition Nets to Specify a design process, such as tool invocation
sequences and their relations. In DECOR, design process monitoring and control is
realized by interpretation of the net. Hierarchical places and transitions are used to
decompose the net into several subnets.

In HILDA, production rules are used to select alternatives, and these rules are
attached to the Petri Net. Static dependencies between tools, scheduling decisions,
conﬂicts, exception handling and backtracking, and tool parameter selection are ex-
amples of knowledge types for making rules. Alternative selections are made by the
user, making it difficult to modify rules after backtracking.

A system called Monitor has been implemented for CAD tools using a Petri
Net [10]. The Monitor uses ﬁxed design methodologies. This system has been very

limited in terms of system functionalities and services. Monitor’s design flow graph

71

is represented as a ﬁxed Petri Net, and the design flow follows this graph. Some of
the main features of the system are (1) graphical display of tools, ﬁles, and their
relationships, (2) automatic invocation of selected tools and report of termination
status, and (3) the tracking of the overall system state.

Development of a Colored Petri Net (CPN) is described in [63], in which CPN is
used to speed up the design and validation of VLSI chips at the register transfer level.
The chip design process is broken down into several pipe-lined stages and each stage
is modeled as a transition in CPN. A similar system can be found in [64], where the
task of modeling and validating the behavior of a VLSI chip using Hierarchical CPN
is presented.

Related works related to design data management can be found extensively in the
literature. In [65, 66], Katz et al. described their data model Version Server, which
provides a complete range of services needed to control the versions and conﬁgurations
of a complex design as it evolves over time. Checking in and checking out design
objects, selecting the preferred design version, and binding dynamic conﬁgurations
are some of the system’s operations.

Kim elaborated further on conﬁguration and version problems in the CAD design
environment in his dissertation [2]. He adopted the object-oriented concept and
applied this concept to his hierarchical data model. This model was applied to VLSI
design data such as VHDL design environments.

For distributed design environments, managing design data is a more challenging
problem since design data have various relations and are shared by several users or

several tools at the same time.

72

By managing persistent data and run time data, the creation of a new version is
detected and saved in the persistent database in a distributed design environment [67].
In order to support the handling of concurrent accesses, a Server-Client scheme is
used to arbitrate the accesses and control versions of design data in the distributed
environment. A server is a process which handles persistent design data and arbitrates
requests from many CAD tools, each of which is a client. Any change is notiﬁed by
the server to all the clients who need to use the new data. In [68], Schettler et al.
also presented a data model for the persistent design data storage and manipulation
in a CAD framework.

Design ﬂow management is another ﬁeld which has been concentrated upon by
many researchers in CAD frameworks. Earlier efforts used a ﬁxed design ﬂow. For
example, Monitor used a predeﬁned design ﬂow which was represented as a Petri
Net [10]. Flowmap was used to control the design flow [51]. Similar approaches can
be found in [69]. In their work, Miyazaki et al. used a ﬂow-chart based design task
flow graph. Most of these ﬂow-based approaches used a ﬁxed design flow, which lacks
flexibility and the capability to dynamically construct design How.

To overcome such limitations, a CAD framework should support the dynamic
construction of design flow. A task graph, a directed acyclic graph with each node in
the graph corresponding to an entity in the task schema and each edge corresponding
to a dependency, is generated dynamically [70]. In their work, Sutton et al. used the
task schema which speciﬁes the dependencies between design entities, both tools and

data. Based on the task schema (the dependency between tools and data) designers

73

can expand the abstract node in the task graph or execute a tool if the node is a

primitive task.

Chapter 8

High Level Synthesis

In this chapter, the steps involved in the high-level synthesis are brieﬂy explained.
The proposed CAD framework helps designers in selecting the right tools and right
methods in each of these steps. High-level synthesis takes as inputs a high-level
description or an abstract behavioral speciﬁcation of a hardware behavior and a set
of constraints, and produces, as a result, a structural description which implements
the behavior satisfying the goals and constraints. The output of high-level synthesis
is usually a register transfer level (RTL) description. Many steps are involved in
high-level synthesis. Typical high level synthesis steps are shown inside of the broken
lined box in Figure 8.1.

The ﬁrst step is usually the compilation of the formal languages into internal
representations (Flow Graph Generation). The next steps comprise the scheduling
of operations, functional unit allocation, and controller generation. These steps are
represented as Architectural Synthesizer in Figure 8.1. In the compilation step, most
approaches use variations of graphs containing both the data and control ﬂow implied

74

VHDL m
Input W

 

 

 

[ Flow Graph Generation J

 

[ Global Operator Mapper l

 

 

 

I

] Architectural Partitioner ]

I

[ Architectural Synthesizer J

 

Execution Environment

 

 

I

I Component Binder ]

I
l

[ External Tool Interface ]

 

 

 

 

I
I
l
l
l
I
I
l
I
I
I
I
l
1
l
I
I
I
I
I
: Framework:
I
I
I
I
I
I
I
I
I
l
I
l
l
l
l
l
I
l
I
l

 

 

Test Vector Waveform Router and Layout
Generator Generator

Vector and Waveform w

Figure 8.1: High Level Synthesis Steps

 

 

 

 

 

 

     

76

by the behavioral speciﬁcation. The data ﬂow graph shows the essential ordering of
operations in the behavioral speciﬁcation. At this step, some initial optimizations,
such as dead—code elimination, inline expansion of procedures, and loop unrolling, are
performed.

Scheduling involves assigning the Operations to speciﬁc control steps, and allo-
cation is actually assigning hardware components to the operations. The goal of
scheduling is to minimize the number of control steps needed for given hardware re-
sources, while the goal of allocation is to minimize the amount of necessary hardware.
Thus, scheduling and allocation are closely interrelated and dependent on each other.
In most systems, optimization of these two steps is done separately, and iterative
reﬁnements are applied until the desired goal is reached.

Controller synthesis is done after the schedule and data path have been chosen.
Most systems produce FSMS for the controllers. The choice between the different
hardware implementations, such as a hardwired FSM and a microcoded controller
with microprogram steps, inﬂuences the ﬁnal product.

There are many existing tools or programs capable of performing the same task;
however, each tool may produce different results in terms of the quality of the output,
time requirements for the task, resource requirements, etc. Thus, some mechanism is
required to optimize tool selection in terms of type, time, and order. Managing such

a task is one of the most important roles in a CAD framework.

Chapter 9

Methodology Management

Methodology is a set of processes or approaches used to solve a given problem.
Methodology management is a technique employed to control these processes or ap-
proaches so that a better solution can be found. In the CAD area, design methodology
management provides “the deﬁnition, presentation, execution, and control of design
methodologies in a ﬂexible, conﬁgurable way” [53]. The goals of design methodology
management are to help the designer reduce the design time and to produce a better
design.

In the last couple of decades, there has been a change in trends in the CAD
community. The main focus of the CAD framework has shifted from managing data
and tools to managing the design process itself.

Design methodology management Should provide speciﬁcation methods, an exe-
cution environment, and miscellaneous services. To choose appropriate speciﬁcation
methods, the following questions must be answered: How can tasks be decomposed?
What tools are available? How will they be used? In the execution environment,

77

78

the management system must provide a means of selecting the appropriate design
methodology or process for a given task and must determine the choice and method
of tool invocation. Miscellaneous services, such as graphical user interface for com-
municating between the user and system, supporting cooperating subprocesses in the
system, and enforcing consistent designs, should also be included in the execution
environment.

The basic building blocks of a design methodology management system or a CAD
framework are tools. In general, a tool cannot be decomposed into any subcompo—
nents; thus, the CAD framework has no way to break a tool down into smaller tools.
Each tool performs a speciﬁc function. A design methodology management system
determines how to use these tools as well as when to use them. The sequence of tool
usage is viewed as a design ﬂow.

A large electronic design has the following characteristics:

0 Hierarchical Design: A large design can itself be hierarchically organized, and
the same is true for the design process. The whole design process can be bro-
ken down into several steps of subprocesses, where each subprocess can again
be decomposed into multiple sub-subprocesses. A large system design can be
partitioned into smaller functional subcomponents. Each of these subcompo—
nents can be composed of other components. Multiple design teams can work

together to produce this design.

0 Multiple Design Representations: A design process can be viewed as a series of

data transformations from one representation to another. Each transformation

79

produces a different type of design data. Furthermore, the same design trans-
formation can be used with different sets of constraints and design parameters.
Consequently, each of these transformations creates a different version of the

data.

Large Design Space: Many alternative processes for a task create many different
versions of data, just as different values of design parameters lead to different
results. As the size of design data increases, the time required to search the de-
sign database increases as well. Thus, a large design space should be maintained

such that an efﬁcient way of searching through the database is possible.

Large Number of Tools: Many tools are involved in the system design process.
These tools should be well maintained so that the right tool is used at the right

time.

These characteristics lead to the use of an integrated design environment or CAD

framework that can manage such issues effectively and guide designers to produce a

viable designs.

In [5], a CAD framework is deﬁned as “a software infrastructure that provides

a common operating environment for CAD tools.” In order for such a software in-

frastructure to provide a good integrated design environment, the CAD framework

should support the following services: Design data management, Design methodology

management, Tool integration/encapsulation, and User interface [71]:

0 Design data management: Design data management deals with the methods

used to store and retrieve design data as well as maintain relationships (such

80

as version, transformation, and conﬁguration) and consistencies between de-
signs. In the CAD design process, many diﬂ'erent data ﬁles (either different
intermediate results or different versions from the same task) must be stored
and retrieved as needed. Thus, an efficient way of managing design data is
necessary. This also includes version and conﬁguration management. Design
data management assists in the use of technology-independent design data. If
the data is technology-independent, these data can be reused in different design

processes without changing them.

Design methodology management: This should provide a formal representation
method of the design process and a. seamless way of carrying out the design pro-
cess. Design methodology management selects the best tool for a given input
and constraints, and guides the designer to produce the best design. In other
words, design methodology management is responsible for selecting and execut-
ing an appropriate sequence of tools to produce a desired design adhering to the
given speciﬁcations. Thus, it guides the user in selecting the right tools in the
correct order. The CAD framework Should also support concurrent engineering
concepts. For these reasons, design methodology management has gained a lot

of attention in the past couple of decades.

Tool integration: CAD tools can be integrated into the design environment by
using a well-deﬁned tool integration method. In order to handle many different
tools which accept different types of input and produce different types of output,

the CAD framework must provide an inter-tool communication mechanism.

81

 

 

Tool 1 Tool 2 ' ' ' Tool n

INPUT > Process Control System > OUTPUT

E Design Data

Figure 9.1: Block Diagram of CAD Framework

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This facility ensures that all tools in the system can communicate with each

other.

0 User interface: The user interface should be easy to use and effective. It should
also hide low-level implementation details as much as possible from the end user

(e.g., tool-invoking sequences and commands).

The block diagram of such a CAD framework is Shown in Figure 9.1. The CAD
framework helps the designer to reduce design time and errors to produce a better

solution.

82
9.1 Design Process Speciﬁcation

Managing the design process necessitates the speciﬁcation of the design process. The
design process consists of a series of transformations, from input to output speci-
ﬁcations. A process ﬂow graph describes which tools and/or subtasks are used to
transform input into output speciﬁcations, and shows the tasks as well as the infor-
mation ﬂow of the design process. A task can be decomposed into several subtasks,
which may in turn be recursively decomposed into several sub—subtasks. For a given
task, a process ﬂow graph (called a workﬂow in other literature) shows the decom-
position and the dependency between subtasks and design data. This hierarchical
decomposition of processes enables a system to capture the design process efficiently.

A graph grammar, also called a process grammar [72, 73], provides a convenient
means of transforming process ﬂow graphs into progressively more detailed process
ﬂow graphs. The user indicates the goals of the design exercise by supplying a start
graph, which indicates what input speciﬁcations are available, what output spec-
iﬁcations are desired, and what logical tasks are to be performed. This graph is
progressively modiﬁed by applying productions of the process grammar. Logical task
nodes are replaced by subgraphs composed of smaller logical tasks and intermediate

speciﬁcations.

9.1.1 Process Flow Graphs

Process flow graphs describe the information ﬂow of a design process. Process ﬂow

graphs describe which tools or subtasks transform input into output speciﬁcations.

 

 

Behav Behav

 

 

 

 

 

 

 

I

CED — q @O

RTL Net List

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 9.2: Production Graph Example

Formally, a process ﬂow graph is a bipartite acyclic directed graph of the form C =
(T, S, E), where T is the set of task nodes (drawn as a ellipse in Figure 9.2), S is the
set of speciﬁcation nodes (drawn as a rectangle in Figure 9.2), and E is the set of

edges indicating which speciﬁcations are used and produced by each task.

There are two types of task nodes: terminal and logical. A terminal task node
represents the execution of an application program, commonly called a tool invoca-
tion. Non-terminal task nodes represent logical tasks, which could potentially be
completed by using several different tools or a combination thereof. A logical task
node is represented by a single-line ellipse, while a terminal task node is represented
by a double-line ellipse. Process ﬂow graphs can describe design processes with vary-
ing levels of detail. A graph containing many logical task nodes indicates what should

be done, in abstract terms, without describing exactly which tools should be used.

84

Conversely, a graph in which all nodes are terminal describes a methodology com-

pletely.

9.1.2 Design Process Grammars

Graph grammars provide a convenient means for transforming process ﬂow graphs
into more detailed process ﬂow graphs. Baldwin and Chung deﬁned design process
grammar [72]. Productions using a design process grammar permit the replacement of
one subgraph by another. A production in a design process grammar can be expressed

as a tuple P = (GLHSIGRHSIO'imUout), where

01,115, G335 are process ﬂow graphs for the left and right side of the production,
respectively, such that GLHS is a single, logical task node representing the logical

task to be replaced.

a,“ is a mapping from input speciﬁcations of GRHS to input speciﬁcations of GLHS.

Each input speciﬁcation must be mapped to one with the same type or subtype.

am is a mapping of output speciﬁcations from the left hand side of the production
to the right hand Side of the production. Each output speciﬁcation must be

mapped to one with the same type or subtype.

A design process can be speciﬁed using design process grammar. In the proposed

CAD framework, the speciﬁcation of the design process is separated from design ﬂow

management, a feature that is absent from all other existing CAD frameworks.
Figure 9.2 illustrates some productions for the Behavioral Synthesis task. As

shown in this ﬁgure, the logical design task BehavSyn is carried out by control

85

 

 

 

Evaluation .--, Bvehmiai _--, Evaluation '8'-
PmetionAl I FmetionAZ ' FunctionA3 .

lcm um: Lcmiim-I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

@ Sched+Alloe+CTkL
w M [:m "'2 m

 

 

 

 

 

 

Prod. A Prod. Al Prod. A2 Prod. A3

Figure 9.3: Production Graph Example 2

data flow graph (CDFG) generation CDFCgen, followed by module selection ModSel
and the architectural synthesis Arch.Syn. Each of these tasks, such as BehavSyn or

Arch.Syn, may be another logical or terminal task which invokes a tool.

The logical task Arch.Syn, which represents the architectural synthesis process,
can be carried out by one of three alternatives: Prod. A1, Prod. A2, or Prod. A3
shown in Figure 9.3. The vertical bar is a shorthand notation indicating multiple
productions with the same GLHS but different 0335’s. In this example, alternative
1, alternative 2, and alternative 3 explain how Scheduling, Allocation, and Controller

Synthesis can be carried out.

In the production graph 9.2, the task Arch.Syn, enclosed by a broken rectangle,
is replaced by the selected production if alternative 2 (Prod. A2 in Figure 9.3) is
applied. An expansion example of Arch_Syn is shown in Figure 9.4. Process ﬂow

graphs in general are discussed in more detail in [72, 73].

86

 

 

 

 

Mod Sel

 

 

 

 

 

 

 

Behav

 

 

 

 

 

 

RTL Net List

‘

 

 

 

 

 

 

 

 

 

 

 

 

 

CDFG
Sched+ Alloe
SchedT able
CTRL
RTL Net List

 

 

 

Figure 9.4: Expansion Example

87

9.1.3 Speciﬁcation Hierarchy

Task speciﬁcations are deﬁned and organized into a specialization and generalization
class hierarchy. Properties of general task speciﬁcations are also available in a special
task speciﬁcation where the general task Speciﬁcation is a parent of the special task
speciﬁcation. A child speciﬁcation inherits its parent’s speciﬁcation properties. For
example, pre— and post-evaluation conditions can be inherited by children; however,

any child speciﬁcation can have its own condition through specialization.

' It is possible to decompose tasks hierarchically into simpler tasks until the indi-
vidual tasks can be performed by single tool invocations. Methodologies are devised
by hierarchically decomposing logical tasks until all tasks are terminal. A single tool
selection can be considered to be a Special case of decomposition in which the set
of subtasks is a single terminal task. A speciﬁcation node deﬁnition editor window
is shown in Figure 9.5. An example of task hierarchy, the Layout Synthesis Task,
is shown in Figure 9.6. Among many layout design styles, three common layout de—
sign styles are Gate Array, Standard Cell, and Full Custom. These design styles can
be applied to any level of layout synthesis hierarchy. These style conditions can be
passed on to children tasks in the hierarchy. However, MCM Layout should have dif-
ferent conditions to be imposed than the conditions for PCB layout and Chip Layout

because in MCM layout, inter-chip delay can be ignored.

88

. -v _- .' ’ . H“ .. ‘ ,- ~. -. , ._ 7., ., ‘. . . e”... -, ...’.-._- "‘I, .--‘.. . . t, 188'" 'v '.‘ . . -- coy - -u“~‘ “we. 84.8 .‘~.'I.' 'wn'\-.'-:\‘ I'I'l"‘ . 5 -Y'--§~.-.v> .-.:w.~

[ Description Behav VHDL Behav ]
Y K: VeriIog Behav ]

‘L Strum VHDL Structural ]

’1 Verilog Structural ]

[ NetLlst H RTL NetList ]——-[ PRTLNetLIst ]

 

 

 

 

 

 

 

 

 

 

 

 

 

J.
.1"th;>a3iv7/s«brszsm—ﬁmﬁhw. " " ' ' ‘ ‘ H " i "Ir“ .4 '

 

 

 

 

 

“MA‘LJ‘

' L31
. ".

[ CDFG ]——[ Partitioned CDFG ]

 

 

 

 

 

 

 

 

 

 

   

 

 

Figure 9.5: Speciﬁcation Deﬁnition Editor Window

 

 

 

 

MCM ID PCB Layout @
uyout with mac @ Symbolic Layout FPGA Layout

Figure 9.6: Layout Synthesis Task Hierarchy

 

 

 

 

 

 

 

 

89
9.2 Execution Environment

The CAD framework execution environment is a software environment which helps
designers in selecting and executing design methodologies by allowing the systematic
exploration of the design planning space. The execution model within the execution
environment is modeled based on Petri Net. When inputs are available, tasks are
executed and outputs are created.

The execution environment allows backtracking, which occurs when the task can-
not be accomplished. If the execution environment detects unsatisfactory results, the
system is allowed to go back and try different alternatives. Another advantage of
the execution environment is that it allows parallel exploration of the design space.

Details of the execution environment are covered in Chapter 10.

9.3 Tool Encapsulation

Tool integrations can be challenging because every tool vendor has its own batch of
command line options and environment variables which must be correctly set. Tool
invoker programs that invoke corresponding tools must set the environment variables
and command line options appropriately. Electronic CAD vendors and users have
recognized the importance of inter-operable tools and created the CAD Frameworks
Initiative (CFI) to facilitate the development of CAD frameworks which integrate
tools. CFI helps to address this problem, suggesting standards for data interchange,
inter-tool communication, and tool encapsulation [74, 75, 76, 77]. CFI proposed a

Lisp-like Tool Encapsulation Speciﬁcation (TES) language providing a uniform format

90

for vendors to specify how to set environment variables, compose command lines, and
interpret exit codes. CFI standards do not govern the selection of tools.

CFI tries to achieve inter-operability between different tool vendors by proposing
standards, such as the Intertool Communication standard, the Message Dictionary
Speciﬁcation standard, and TES. There are several issues associated with tool en-
capsulation. First of all, the current TES proposal may be changed as CFI further
modiﬁes its requirements and methods. Another difﬁculty is that tool vendors do
not yet provide TES ﬁles. Finally, inter-Operability between different tools is very
problematic to maintain in terms of communication since each tool may produce a
different form of an intermediate ﬁle. If there were a standard intermediate ﬁle, then
at most 0(n) translations instead of 0(n2) translations would be necessary. This is
one reason why standards are recommendable. For example, an EDIF standard for-
mat were used in low level design representation, and any output at this level could

be translated into an EDIF format.

Chapter 10

The CAD Framework: Execution

Environment

In this chapter, the execution environment is discussed. The execution environment
of the proposed CAD framework is modeled based on Petri Nets. A new approach
to the execution environment, which dynamically constructs a process graph, auto-
matically selects design alternatives, and automatically backtracks if the result is not
satisfactory, is presented in this chapter. This chapter is organized as follows: First
an overview of the proposed CAD framework architecture is given in Section 10.1.
Petri Nets are reviewed in Section 10.2. The formal model of the execution model
is explained in Section 10.3. The backtracking mechanism is explained in detail in
Section 10.4, and in Section 10.5, the handling of multiple alternatives is explained.
Other issues, such as constraints, process simulation, and version control, are covered

in subsequent sections.

91

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

WTOOB
-l Cockpit III

are Mode nap Too“
room

in? 0...... m

Represmtntion ‘ hunter

I

Tool a
ﬁrm

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 10.1: Block Diagram of System

10.1 Proposed CAD Framework Overview

The proposed system is composed of several main components: Design Process Rep-
resentation, Constraints, a Design Library, an Execution Environment or Cockpit,
and Graphical User Interface. An overview of the prOposed architecture of the CAD
framework is graphically depicted in Figure 10.1. Design Process Representation
represents design methodologies using the productions of a process grammar. Pro-
ductions codify the possible hierarchical decomposition of tasks, which designers use

to build a process ﬂow graph. The process grammar naturally captures the hierar-

93

chical character of the design process and allows systematic exploration of the design
space.

Design constraints are provided by the user. The system performs a pre—evaluation
in order to select the best production or tool, and also performs a post-evaluation
after a task is ﬁnished. When pre—/post-evaluation processes are carried out, the
system uses the constraints as input parameters. Constraints are items such as area,
critical delay, die size, pin number, power consumption, etc. The Design Library
contains various design data.

The execution environment program, Cockpit, keeps track of the design status and
communicates with the designer via the Graphical User Interface (GUI). The GUI
helps users in several ways. Users can browse the available productions via the GUI
and assign one or more input design data ﬁles together with their control information.
The design progress can be displayed either in the form of a production or a Petri Net
structure. The design path is displayed if the user chooses the history menu. Design

data examination and a display of scoring results are additional features of the GUI.

10.2 Petri Net

Petri Net theory was developed by Carl Petri in 1962 to model a system by using a
mathematical representation of the system [11, 12, 13, 78]. Petri Nets offer a means
to graphically and mathematically model discrete event systems. Moreover, Petri Net
models can be converted into computer control mechanisms that can be interfaced to

discrete event handling processes such as high-level synthesis processes. As a graphical

94

tool, Petri Nets can be used as visual communication aids similar to ﬂowgraphs.

DEFINITION 1 A Petri Net Structure, C, is a four-tuple, C = (P, T, I, O). P =
p1,p2,...,pn is a ﬁnite set of places, n 2 0. T = t1,t2,...,tm is a ﬁnite set of
transitions, m _>_ 0. The set of places and the set of transitions are disjoint, PﬂT =
0. I : T —+ P°° is the input function, a mapping from transitions to bags of places.

0 : T —) P°° is the output function, a mapping from transitions to bags of places [13].

The execution of a Petri Net is handled and controlled by the number and distribu-
tion of its tokens. Tokens are primitive concepts of Petri Nets and have no inherent
meaning. Tokens are assigned to the places of Petri Nets. A transition is enabled
when there are tokens in each of the transition’s input places. A transition is ready
to be ﬁred if all of its input places have tokens in! them. A Petri Net executes by ﬁring
enabled transitions. A transition ﬁres by removing tokens from its input places and
creating new tokens which are distributed to its output places. After a task ﬁnishes
its execution, the input tokens are removed from input places and new tokens are
produced and placed in the output places.

Sequences of ﬁring enabled transitions in an ordinary Petri Net are nondetermin-
istic. An example sequence of ﬁring transitions in an ordinary Petri Net is shown in
Figure 10.2. In this ﬁgure, Graph A Shows an initial graph with initial token distri-
butions. At this moment, transition t3 is the only one which is enabled since its input
place, p4, has one token. After transition t3 is ﬁred, a new graph is created; this is
shown in Graph B. A new token is added in both p2 and p3. Now, two transitions,

t1 and t2, are enabled. In this case, the order of ﬁring transitions t1 and t2 is unde-

95

 

(D) After Q is fired

Figure 10.2: Petri Net Firing Sequences

96

termined. Either transition can be ﬁred ﬁrst, both may also be ﬁred concurrently. If
the transition t] is ﬁred ﬁrst, then Graph C becomes the new graph after transition
t1 is ﬁred. The same situation, where both t1 and t2 are enabled, again occurs. At
this time, if the transition t2 is ﬁred, then Graph D results. The execution continues
until there are no more transitions which can be ﬁred.

Many different system models and applications have used Petri Net. Modeling
of hardware, communication protocols, parallel programs, and distributed data bases
are several examples of such applications [11, 78, 79, 80]. Different extensions have
been made for modeling of different systems which have avoided its ordinary non-
deterministic behavior and allowed the extended semantic meanings of the ordinary
Petri Net. Jensen [63] uses an extended Petri Net to design VLSI chips. In his
extension, transitions correspond to hardware components, places hold design data,
and tokens correspond design data. Firing of transitions cause the execution of the
hardware components using input data. Di Janni [10] describes similar extensions
for the VLSI circuit design system, Monitor. In Monitor, transitions are CAD tools
and tool invocations correspond firing of the transitions. In addition to these exten-
sions, conditional ﬁring of transitions, inhibitors, are also used. Kljaich et al. use an
extended Petri Net called flow nets to describe and verify the fault-tolerance capabil-
ities of digital systems [81]. In their work, additional extensions are made: symbolic
token and user-supplied types. Token has a symbolic value, such as data and control
information, associated with it. Different types are associated with transitions, and
places. Different output tokens are generated depending on the transition type after

the transition is applied.

97

There are several beneﬁts to using a Petri Net model for the execution environ-
ment. The Petri Net model captures design processes naturally. Designers can see
the design progress easily. Moreover, by manipulating tokens, execution control is
much easier than that of ad hoc methods. For example, if the designer assigns three
tokens in an input place of a production, three alternatives will be carried out.

The execution model proposed in this thesis is based on an Colored Petri Net.
The proposed execution model described in this thesis does not make any extension
to Colored Petri Nets, but it has adopted several extensions which had been made
by various people. Contribution of the execution model described in this thesis is the
application of Colored Petri Net for the process control execution model.

The extensions adopted by the proposed execution model will be described in the
subsequent paragraphs. First of all, in the proposed execution model, logical and
terminal tasks are modeled as transitions, and design data are modeled as tokens.
Tokens contain several data ﬁelds, such as a design data name list ﬁeld, a version
history ﬁeld, token color, and other control ﬁelds. Thus, tokens have both data and
control information.

In the proposed model, tokens are not consumed, but their states are changed as
transitions occur. In ordinary Petri Nets, tokens are consumed after transitions are
ﬁred. In [63, 64], data values attached to tokens are referred to as token colors. The
values of token colors are predeﬁned in a color set. In the proposed model, token
color represents the states of token, such as used, not yet used, or used but failed.
Thus, in this Colored Petri Net, semantic meanings are assigned to tokens by their

colors. Semantic meanings of token colors are explained in Section 10.3.4.

98

Copy Transition
0 VH DL

37838818 - — Simulation

_.>

 

Simulation

Figure 10.3: Copy Transition

Two additional transitions are introduced, pre-evaluation and post-evaluation
transitions in this model. These transitions are not a part of the actual design pro-
cess. After ﬁnishing a pre—evaluation transition, the system puts the different-colored
output tokens in the corresponding input places, depending on the pre—evaluation
function result. This transition makes the execution environment deterministic and
helps to select suitable production alternatives. After post-evaluation transitions, the
output tokens are generated conditionally depending on the result of the evaluation.
If the post-evaluation fails (e.g., the result is not satisfactory), the token color of
the starting production is changed in order to allow backtracking. If the transition

succeeds, the new result is placed in the normal output place.

Another transition is needed when an input datum is used in more than one
transition. For a production alternative, only one transition is selected and a single
input token is used for the transition. However, when there are several transitions in
a production which use the same token, the token needs to be copied and distributed
to the these transitions. , The token is duplicated by introducing a new transition

named Copy Transition shown in Figure 10.3.

99
10.3 Execution Model

The execution model provides for dynamic execution of tasks and the representation
of state information. At a minimum, the execution model allows the designer to access
tasks and designs by tracking the information required to invoke tasks. The execution
model constructs a process graph by selecting the proper production for each logical
task. This selection is guided by invoking a pre-evaluation of the alternatives of the
logical task. When an appropriate task is selected, the execution model either expands
the graph or invokes a tool for the execution of a terminal task. After executing the
tool, the execution model post-evaluates the result based on the criteria (constraints).
If the result is not satisfactory, it backtracks to try another alternative.

The execution model is based on an Colored Petri Net and performs the following

functions:

Dynamic Construction of a Process Graph

Pre—Evaluation of the Alternatives

Selection of a Production for each Logical Task

Execution of a Tool

Post-Evaluation of the Result

Backtracking if Needed

The execution model creates design ﬂows by reading the production graph, de-

termines the possible design alternative processes, and invokes the right tool for exe-

100

cution or expands the logical task. In order to choose the right alternative, Cockpit
performs a pre-evaluation of all available logical tasks. An evaluation function is as-
sociated with each logical task. When the pre—/post-evaluation process is carried out,
Cockpit uses the constraints as one of the input parameters.

Expansion is dynamically performed as the design process progresses. When the
production graph is read by Cockpit and is converted into a corresponding Petri Net
internal structure, a pre-evaluation function is called for each alternative and the
results are posted, such as the score of each alternative, in the net. The highest
score enables a corresponding transition. The scheduler in Cockpit now schedules or
chooses which transition is to be ﬁred based on resource availability. After ﬁnishing
one path, the result is checked; this is called post-evaluation. If this does not agree
with the anticipated result, the system backtracks to the selection point and tries

another alternative.

10.3.1 Cockpit

Cockpit is a routine of the execution environment which performs the following func-

tions:

0 Creating Daemon processes (Initial Graph)

0 Keeping track of the design process,

0 Dynamically constructing a process graph,

Scheduling task(s) by pre-evaluation,

101

0 Performing the post-evaluation,

o Interacting with the user, and

o Controlling the GUI.

Cockpit is implemented using the algorithm described in Figure 10.4. Cockpit
initially creates several Daemon processes which maintain task speciﬁc knowledge.
Cockpit’s information about the design process comes entirely from an input ﬁle
indicating a set of possible tasks and those decompositions that should be considered
for each logical task.

The user interacts with Cockpit, which keeps track of the current status of the
design process and informs the user of possible actions. Cockpit’s display indicates
to the user what design tasks have been completed so far and what tasks remain.

To assist the user in choosing an appropriate action, Cockpit invokes several evalu-
ation functions. The evaluation functions provide ratings for the possible task decom-
positions and check the results. The ratings help the system to select tools. Cockpit
determines what decompositions are available for the remaining logical tasks. This
information is then displayed to the user.

Cockpit supports two modes of operation: manual and automatic. The manual
mode is normally used for high-level decisions and stepping through the design pro-
cess. The designer may wish to use the automatic mode for lower level decisions. In
the manual mode, Cockpit waits for the user to select a decomposition or execute a
task. In this mode, the system performs pre-/post-evaluation and the system guides

the user by Showing the results. The user makes the ﬁnal decision for selection of

102

 

Initialization()
{
Start graph is selected;
Create initial Daemon process and place tokens (send message);
}
Wait for message;
IF the message is from Execution process THEN
{
IF the message is WANT-EXPAND THEN
{
Invoke Pre-Evaluation function;
Select Production based on Pre-Evaluation;
Display expanded process flow;
Send EXPAND-THIS or FAILED message to Execution process;
}
IF the message is POST_EVAL THEN
{
Post-evaluation;
Send result POST_EVAL-OK or -FAIL to Execution process;
}
IF the message is FAIL_EXPANSION THEN
Delete useless tokens;
IF the message type is FAILED THEN
Kill the child process;
}
ELSE /* Message from Daemon process */
{
IF the message is FAILED THEN
Kill the child process;
ELSE
{
Create a Daemon for the subsequent task;
Put the output token in the newly created Daemon’s input
place;

END IF;

Figure 10.4: Algorithm for Cockpit

 

103

a production or backtracking based on the suggestions made by the system. When
the user selects a decomposition, Cockpit displays the new subtasks in place of the
original task. When the user requests that a task to be executed, Cockpit sends a
message to the corresponding Daemon process for execution. For terminal tasks, the
tool invoker responds by invoking a tool. The user invokes the automatic mode by
executing a logical task instead of selecting a decomposition. In response to an exe-
cution message for a logical task, the Daemon process uses encoded knowledge from
a process graph to select a decomposition and then executes the subtasks (also in
automatic mode). If necessary, the designer may reverse any decision made by the
Daemon process in the manual mode.

In the automatic mode, the execution model utilizes the Petri Net structure more
naturally since there is no human interaction. Cockpit dynamically creates design
ﬂows by reading the production graph, determining possible design alternative pro-
cesses and invoking the correct tool for execution or expanding the logical task. In
the execution environment, Cockpit uses the evaluation function to determine the
alternative. After ﬁnishing the task execution, Cockpit post-evaluates the result.

When the output of a task is not satisfactory, it is necessary to backtrack. Either
different parameters must be supplied to some of the tools or diﬂerent tools must be
chosen; or alternatively, the task must be decomposed in an entirely different way. The
designer may request that certain task decompositions be reversed. Additionally, if a
decomposition was requested by a Daemon process, that process can direct Cockpit
to reverse it. Cockpit saves the state of the session before backtracking in case the

designer later decides to cancel the reversal.

104

10.3.2 Daemon Processes

Each Daemon process is invoked (created) by Cockpit and Execution process. Dae—
mon processes are dynamic repositories of task-speciﬁc knowledge. Each message
from the Daemon process indicates the task being evaluated or executed and pro-
vides all the inputs and outputs ﬁle names. The constraints may be included in one
of the input ﬁles or may be passed to the Daemon process directly.

Each Daemon process is activated by an event signaling the arrival of a token in
its input or output places. If an input event occurs, the Daemon process creates an
execution process by sending the task name to Cockpit to create the process. For
an output event, the Daemon process checks the output token numbers, which is
assigned by the user. If the number of tokens does not reached the required number,
the Daemon process tries a yet untried alternative by changing the input token color.
The usage of colored tokens is explained in detail later in Section 10.3.4. If more
than enough tokens are generated, the Daemon process selects the best tokens. Each

Daemon process retains the following information:

1. Parent Task Name

2. Name of Input Places

3. Name of Output Places

4. Child Task Names

5. Information about the required input and output token numbers (counters)

The procedure for Daemon process is shown in Figure 10.5.

105

 

Wait for messages;
IF the message is from its parent (Execution) process THEN
{
IF the message is TOKEN_IN THEN
Create a child Execution process;
(Send the production name to the child process)
}
IF the message is from its child (Execution) process THEN
{
IF the message is FAILED THEN
Send the FAILED message to its parent (Execution) process;
IF the message is TOKEN_IN THEN
Send the TOKEN-IN message to parent (Execution) process;

Figure 10.5: Algorithm for Daemon Process

 

10.3.3 Execution Process

Each Execution process is created by Daemon and receives a production name. The
execution process is responsible for invoking a tool, asking for expansion, and asking
Cockpit to do a post-evaluation after ﬁnishing its job. Each Execution process handles
only one token at a time. For multiple tokens, one Execution process is created for
each token. The Execution process contains information about Input Places, Output

Places, and its Task Name. The corresponding algorithm is shown in Figure 10.6.

10.3.4 Token Semantics Extension

Tokens have different semantic meanings from those in the regular Petri Net model.
Tokens contain real design data as well as control information to control the design

process. For example, control information is used by the system to choose appropriate

106

 

Execution()
{
IF Terminal task THEN
Execution of the terminal tool;
(Send TOKEN_IN message to parent Daemon process)
IF Logical task THEN
{
Send WANT-EXPAND message to Cockpit;
Wait reply from Cockpit;
IF the message is EXPAND_THIS THEN
{
Expand the graph by creating one Daemon process for
each task node in the production;
Send TOKEN_IN message to the child Daemon process;
Wait for messages from these children Daemon processes;
IF the message is from child Daemon process THEN
{
IF the message is TOKEN_IN THEN
{
Send POST_EVAL to Cockpit;
Wait for reply from Cockpit;
IF POST_EVAL-OK THEN
Send TOKEN-IN to parent Daemon process;
IF POST_EVAL_FAIL THEN
{
Send FAIL-EXPANSION, FAILED, WANT-EXPAND
to Cockpit;

}
IF the message is FAILED THEN
{

Send message to parent Daemon process;
Exit;

}
}
IF message is FAILED THEN
Send FAILED to parent Damon process and Exit;

Figure 10.6: Algorithm for Execution Process

 

107

alternatives.

Each token has one of several colors as control information. A color is assigned to a
token to indicate the status of the design data and control transitions. As a transition
is ﬁred, the token is not consumed, but its color is changed, signifying a change of
state. After ﬁnishing a task, the output is examined based on given constraints.

The token colors used in the system are as follows:

0 BLUE Token: Each BLUE token indicates that the corresponding alternative
must be tried and that the output token must be generated whether the output

is acceptable or not.

0 YELLOW Token: This token indicates that the corresponding alternative has
not yet been tried. When backtracking occurs, a YELLOW token is changed

to a BLUE token for an alternative trial.

a RED Token: If the output does not meet the constraints, the corresponding

input token is changed to RED, indicating that this production has failed.

In Figure 10.7, alternative 1 with a BLUE token is applied. Here, alternative 2
with a RED token has already been applied and failed, while alternative 3 with a
YELLOW token has not yet been applied.

Figure 10.9 illustrates the steps involved in the dynamic expansion of the Petri Net
using the productions shown in Figure 10.8. Figure 10.9 (A) shows an initial graph
with one Blue token in the input place of a transition, HLS. This token enables the
transition, and since the task is logical, the graph is expanded as shown in Figure 10.9

(B). When a logical task is expanded, pre-evaluation and post-evaluation transitions

 

 

 

 

108

Altl A1 A18

Figure 10.7: Token Color

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

i 81 82 S3
Ll L2 L3
Production A Produciton B Production C

Figure 10.8: Production Example for Expansion Steps

 

 

109

l. A Blue token in place 1 enables the abstract transition HLS.
2. By ﬁring the abstract transition HLS. the net is expanded.

 
  
  
  
  
  
  
 
 
 
 
 
  

3. The Blue token again enables

the Pro-evaluation transition for the
abstract production HLS.

4. The pro-evaluation function determines
that production A has the highest score.

 

5. A Blue token is produced and placed

in the input place of production A;

A Yellow token is placed in the remaining
input places. for productions B and C.

6. At the next cycle. the transition 81
of production A will be enabled because
there is a Blue token in its input place.

" For multiple tokens. each token is
handled separately.

Figure 10.9: Expansion Steps

110

are added to the Petri Net. A Backtracking Path (shown in broken lines) is also
added, as seen in Figure 10.9 (B and C). The Blue token enables a pro-evaluation
transition. In Figure 10.9 (C), the preevaluation transition determines the best
alternative by invoking the evaluation function. If Production A receives the best
score, the transition places a new Blue token into its output place in order to pursue
Production A as a selected alternative. The remaining alternatives receive Yellow

tokens, and these tokens prevent the ﬁring of these alternative transitions.

The user can limit the maximum number of concurrently executed alternatives
through the number of tokens assigned to the production’s input place. Likewise, the
number of output tokens, which limits the number of output results, can be speciﬁed.
The strategy for selection among multiple production outputs is also speciﬁed by the
user. Currently available strategies are ﬁrst-available (FA) or best-result (BC). Under
FA, the post—evaluation transition selects the output ﬁrst generated and ignores all
other outputs; whereas under BC, all outputs are collected, compared and the best
result(s) is selected. For example, in Figure 10.9 (c), assume that two productions,
Production A and Production B, are concurrently applied. Under FA, the post-
evaluation accepts whichever result was generated ﬁrst from either Production A or
B, while under BC, the post-evaluation postpones the comparison until both results

are available.

111
10.4 Backtracking Mechanism

When the output of a task is found to be unsatisfactory during post-evaluation, it
is necessary to backtrack. Different parameters must be supplied for some tools
or different tools must be chosen; alternatively the task must be decomposed in
an entirely different way. In the manual mode, the user controls the backtracking
mechanism by directing the productions to be tried; however, in the automatic mode,

Cockpit itself makes these selections based upon the given constraints.

Automatic backtracking is done by managing the token colors and traversing the
backtrack path. Figure 10.10 (A) shows some logical tasks, and Figure 10.10 (B)
illustrates the Colored Petri Net together with tokens when these tasks are fully
expanded. This ﬁgure shows that the tasks CDFG gen, Partitioning, and MCM Arch
Syn have 3, 2, and 2 production alternatives, respectively. For example, suppose that
Path-3 is ﬁrst selected by the pre—evaluation transition when CDFG-gen is executed.
If the Post-evaluation-l result is not satisfactory, the system tries to backtrack, but all
the alternatives have already been tried, (in this case, there is only one path), and thus
the resulting failure token (the Red token) is deposited in its output place, Place-1.
Thus, in Figure 10.10 the system backtracks via the Back-2 loop and selects Path-2 for
its next attempt. After ﬁnishing Path-2, Post-Evaluation-2 evaluates the new output
from Path-2. If the result is satisfactory, the transition generates a successful token
to its output place. This new token in turn enables the pre-evaluation for Path-4 and

Path-5. The Yellow token in the unvisited path remains the same.

.tlll

O

V
we I.

A...

[ €13?ij

    
   

“*Path-3

 

 

 

 

 

 

P CDFG gen
¢
1 "cums” ] "‘Post-EvaH
P RTL Net List ‘ A; ""Place—l
/

(A) Abstract Net

> Partitioning

F MCM Arch Syn

 

(B) Fully Expanded Net

Figure 10.10: Backtracking Example

113
10.5 Multiple Alternatives

Several alternatives may be Simultaneously explored. This helps the user to Obtain
better results by selecting the best solution among several solutions. There are two
forms of parallel exploration of alternatives in the design process: use of multiple
parameter alternatives and multiple production alternatives. For a given production,
there may be several parameter choices available. If a production does not produce
an output which meets the design constraints, the same production should be tried
with different parameter sets until all possible parameter sets have been exhausted. In
addition, a given logical task may be accomplished in several ways. Each methodology
alternative represents a separate production for the logical task.

Multiple alternatives can be expanded and executed concurrently if the user spec-
iﬁes multiple tokens in the logical production’s input place and/or output place via
the GUI. The system assumes one token is in each place if the user does not place
any token in the input or output places of a production. There are two cases which

should be considered for multiple alternatives:

1. Multiple Tokens in the Input Place:
The number of tokens indicates the number of productions to be simultaneously
executed. If there is a child production which also has multiple tokens, it is
carried out simultaneously. The total number of productions active at any
given time is controlled by a global control variable, Total.Production, and by

resource constraints.

2. Multiple Tokens in the Output Place:

114

2 Tokens for task A generates

only one output.
1 (Each input tokens generate output,
and among two outputs, one is selected.)

 

 

 

 

A
7 However, if tokens are used in other
parallel path nodes, then two tokens
are generated.
(Ignore the user’s request.)

Figure 10.11: Smaller Number of Tokens in the Output

The number of tokens indicates the number of desired acceptable outputs. If
the number of acceptable outputs reaches the token number, the production is
considered a success. If not, the system backtracks and try other productions.
If the desired number has not been reached even after all the productions have

been tried, all acceptable outputs are used for the next step.

Basically, the number of multiple tokens in the output place dictates the number
of alternatives that must be tried, unless the same input token is used as input to
different transitions. These aspects are illustrated in Figure 10.11 and Figure 10.12.

When multiple alternatives are executed simultaneously and several compatible
outputs are produced, the system must select from among the requested number of
outputs. Selection is based on the chosen selection strategy, ﬁrst-available (FA) or
best-choice (BC).

A problem can occur when multiple alternatives handle the output ﬁles. Since

each alternative production creates an output and the ﬁle names are the same for all

115

X Y 2Tokens.labeleduXandY.fortaskAgenerate
tlueeoutputs.EitherXorYisusedseveraltimes

L mtilthenumberot'wpmissatisﬁed.
(Possible outputs can be (X1, X2, Y1) or (X1. Y1, Y2).)

 

A

 

 

 

®
0.

Figure 10.12: Larger Number of Tokens in the Output

alternatives, overwriting to an existing output ﬁle should be prevented. To solve such

a problem, different working directories are used for each token.

10.6 Process Simulation

Based on the evaluation results, the execution environment makes suggestions as to
which production and / or tool is best suited for the given input. Using these functions,
together with all the values assigned to each production, design process simulations

are possible without actual design process invocation.

The input ﬁle type, the ﬁle size, the maturity of tools and productions in the CAD
community, and estimated time to ﬁnish a task given by the input ﬁle comprise several
examples of parameters the evaluation function can use to determine the suitability
of the production. Design process Simulation allows the user to predict or expect

certain results.

116

‘1

‘ n, 2......,...u-.,< a»; "an.-. .K~,.~,,.., . .

.
,
. v "WM-"ﬂrFN-Ja. .

._ i.\:1
"1‘1“"

_ ,h,
"' .\‘;Iﬂ
. 1,:

«now My

\
or ...

w

\.<la)‘.,-—- ._

 

State encoding

State table Logic synthesis

Netllst

 

 

 

 

 

 

State Olagram

 

.. r .8. «1..»u.;.,--.... .-.J- . \. «...»,

 

 

 

 

 

 

Chip synthesis

-4<.-.mn . , .-

..~ ._.,,., WW. .-,-».-.v.-.l, ...._-, . .,.. .
. WWWW-«MH~MV.A#”

 

i

l
1%
.E
i

E.
_
.3
:I
.i
:I
.,
:i

 

 

 

 

 

 

 

Chip description

 

Physical synthesis

WWWMWWWWW ‘
, . .. "(h-w. «Hr-2314, .8-81;.«. .8”. .8 'I-f- z .- -- x .. - - . . , .

 

Chlp description

 

 

 

nag-4,7. wvhhﬁ-ﬂmﬂ'rv’i'.WVJ-fp—.JA —-l-’- 1' p .4..-

4w.

«WM ”WWsomww W III-I Wmmwwm

l

d "“M~-¢»*WW~_ ﬂ . ”umgugmwma—Ho-umu-p»MW,..n....,...-.."and”.....e.......-.--e..v-..~,....,.
- aswno-n~W ~-n.-.~-.—-o-pv->«..-—“.08.;uupmeqw.
. . . . n r n: ,. ,e—‘V. .314“ a}
‘ :- ‘. J3 inc)» u'

 

.'--~---'_——‘-;' ' ~ )

1; .'vf“§\

Figure 10.13: Production Editor Window

10.7 Graphical User Interface

Graphical User Interface (GUI) is used to establish communication between the user
and the execution environment. Through GUI, the user can do several things, such
as set the initial graph, partially expand the process graph, and browse through the
alternatives.

The Production Editor Main Window is used to create, browse, and edit pro-
ductions as well as to edit task node speciﬁcations and specify input/output node
information. A top view of the editor window is shown in Figure 10.13.

In the case of a rollback, the display of the situation is as follows: First, the

117

rollback message is displayed at the bottom of the Message window, ensuring that
the user can see what has happened in the system. Then, the parent production graph
(the graph displayed just before the expansion leading to this task) is redisplayed and
the same pro-evaluation process is invoked. Cockpit must record this history; that
is, when the user requests the execution history, the GUI displays the overall history
using different colors, e.g., a failed path is drawn with RED lines and a current

path/success path is drawn with BLUE lines.

10.8 Constraints and Checklist

Constraints are used to select a proper tool for a given task, to execute the tool,
or to verify the correctness of a design. Constraints must be managed properly so
that the CAD framework can function properly. Area, Maximum/Minimum Delay,
Power Consumption, Pin Number, Operating Condition, Maximum Fanout, Wire
Load, Clock Period, Technology Library, and Testability Requirement are several ex-
amples of such constraints in the computer hardware design.

Kim [2] categorized constraints into four different categories: Performance con-
straints, Environment constraints, Relativity constraints, and Selection constraints.
Some examples of performance constraints are area and delay; operating conditions
are environment constraints. Relativity constraints restrict what other designs can be
used in conjunction with a design when it is instantiated as a component, while selec-
tion constraints restrict what designs can be instantiated for a particular component

of a design. This classiﬁcation of constraints is helpful for analyzing characteristics

118

of the constraints themselves.

Baldwin introduced a new language to express constraints [73]. Although this
language has been claimed to be powerful enough to express any kind of constraint,
it has its drawbacks. Designers must learn the language syntax to express constraints,
not a simple task for hardware designers.

Kim [2] and Baldwin [73] considered constraints associated with design data only.
However, in order to form a good CAD framework, there should be some way of
answering a question like “Which tool ( or program ) produces a better result for a given
input .7” These kind of constraints, tool selection or production selection constraints,
should also be handled. Several examples of such constraints are tool release history,
size of the tool, average execution time, and user’s preference. These constraints are
used by pre-evaluation functions as illustrated in Figure 9.3.

The quality of a design result depends on the selection of tools, design method-
ology, and design data from certain design libraries. Each tool has different qualities
or capabilities, such as maturity of tools, the speed needed to produce output from
given input, and the output quality produced using the given input data. Each de-
signer can deﬁne any variable for a production and assign/ modify a value in an ASCII
format. An example of such deﬁnition is shown in Figure 10.14. When a production
Arch.Syn is applied, the user uses a pre-evaluation function, named preeva11, in
the current working directory. This routine is written by the user and precompiled.
Post-evaluation function routine can also be deﬁned as well. The next three lines
consist of actual variables and values assigned by the designer.

Designers write pre-evaluation functions using these values. For example, a very

119

 

Arch-Syn.PRE ./preeval1
Arch-Syn.POST ./hello

Arch-Syn.0 time 9 pref 3 history 4
Arch-Syn.1 time 11 pref 2 history 5
Areh-Syn.2 time 4 pref 6 history 2

Figure 10.14: Production Scoring Example

 

simple but complete pre-evaluation function is shown in Figure 10.15. Here, if the
designer assigns different weights to the variables t, p, or h, a different evaluation
result is produced, where t represents the time to ﬁnish this production, p represents
the variable which holds the penalty value for converting the input ﬁle type, and h
represents the time the production has been available. The weights are assigned by
the designer based on experience or preference. In this example, the designer prefers
a long history of the production and shows very little concern about the translating

ﬁle type.

Similar functions can be written for post-evaluation functions. After each produc-
tion is completed, the post-evaluation function is invoked and determines whether to

accept the result or not.

In this way, the speciﬁcation and the execution environment can actually be sep-
arated. Different designers can also use different ratings without modifying the pro-

ductions.

The checklist is a utility similar to reminder, in which a checklist can be created
by the designer. When the design process reaches a predeﬁned point, the designer

can browse the contents of the checklist. This feature is not directly related to the

120

 

#include <stdlib.h>
#include <stdio.h>

main(int argc,ehar 8*argv)
{

int t,p,h;

int score;

if (arge < 7 )
exit(-1);
t8atoi(argv[2]);
p8atoi(argv[4]);
h8atoi(argv[6]);

score 8 t*0.2 + p*O.1 + h*0.7;

exit(seore);

}

Figure 10.15: Pre—Evaluation Function Example

 

actual execution environment. The checklist helps the designer remember things that
must be done. The break-point feature can help the system stop at a certain design

point where the checklist can be examined. An example is shown in Figure 10.16.

10.9 Load Balancing

In a distributed environment, load balancing is one of the most important issues
in system performance. All system performance depends on resource contention.
In any computer system, there are three basic resources: CPU, memory, and the
Input/Output (I/O) subsystem. Among these three types of resource usages, CPU

usage is the main concern because most CAD tools are CPU intensive.

 

Figure 10.16: Break-Point and Checklist

Each process (or program) requires a certain number of CPU cycles to execute,
and it is not possible for a single process to use the CPU alone until execution is
ﬁnished. Usually, several processes share the CPU at one time. If loads are assigned
to machines which are already heavily loaded, then the overall system performance is
degraded: processes in a heavily loaded machine take a long time to ﬁnish, and the
remaining tasks may depend on the results of the previous processes.

There are several ways to measure CPU contention. The simplest one is the UNIX
load average, reported by the rap command, which shows the host status of remote
machines. The load average tries to measure the number of active processes at any

given time. A typical result of this command is

pixel up 12 days, 6:38, load average: 0.23, 0.19, 0.01

122

The ﬁrst load average (0.23) is measured over the last minute. The second and the
third load average are measured over the last 5 and 15 minutes, respectively.

In the proposed system, the machine with the smallest load average at the time of
task execution is used. Available machines are listed in the resource ﬁle. This at least
ensures that a particular machine is not overloaded before assigning it a task. The
selection criteria can be extended by examining the second and third load averages,
from which the load trend can be inferred.

Several problems remain associated with the method described above. First, the
command rap does not guarantee the correct result. For example, if the Network File
System (NFS) server crashes while a process is waiting for the disk I/O to complete
across NFS, the process is considered to have been running the entire time although
nothing was actually happening. Another problem is that the load average does not

account for priority. Finally, the load average cannot predict future events.

10.10 Version Control

A design task may produce multiple versions of an output speciﬁcation for several
reasons: (i) the ﬁrst one is not satisfactory, so another iteration is performed with
some changed decisions, (ii) multiple productions are applied in order to pursue alter-
natives, or (iii) there are multiple versions of input speciﬁcations. This thesis limits
itself to the problem of design process related version. In high-level synthesis process,
a version problem occurs when multiple tokens are allowed for an input place. Con-

sider the case of high-level synthesis in which the task is to schedule and allocate from

123

the Optimized CDFG. One way of accomplishing the task is by pre-scheduling and
allocating the CDFG, and merging the two results for the ﬁnal architectural synthe-
sis. Suppose that the previous transition of optimizing the CDFG has generated two
versions of output, namely A and B, as shown in Figure 10.17. If the scheduling uses
token A and generates output AS, and the other path uses B and generates B.L,
these two results cannot be merged because they are descendants of incompatible
data.

In the execution environment, a version number is assigned to each token to
distinguish the design data and solve the incompatibility problem. Consider the case
where the production has one input place and one output place. Suppose that a
token B is generated by applying a production P with input token A. Then, the
version number of B is assigned to VA.P, where VA is the version number of token
A. In general, if a task X has A1, ...,An input places and 81,...,B,, output places,
and production X with parameter a is applied to the task’s inputs having version
numbers VA,,...,VAH, then the version number of output B,- is (VA,,...,VA0).(P,a,
8,). The version number shows the history of productions applied to the input token
throughout the design process.

Data compatibility can be ensured by checking the version history. Consider task
X in a process ﬂow graph. Suppose that a reﬁned process ﬂow graph by applying
production P is obtained. Let A be the token in the input place of X, and Y be a
subtask of X in production P. With token A, the task will be carried out, and Y
will generate a token which has the version number A.Z, where Z is the history of

productions applied to token A. For two tokens A and A’ in the input place of X,

124

Two tokens: A and B
I,’ “““““““““ ‘6; Each token: S and L

’ A and B are Version histories
' up to this point.
.’ S and L are attached to versions.

 

     

I Scheduling Allocation

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 10.17: Version History and Compatibility Checking

the corresponding output tokens generated by Y will have different version numbers.
Thus, if Y has two input places and has tokens B .Z and C .Z’ for each of these places,
where Z and Z’ are the attachments to production histories carried out as a subtask
of X, then B.Z and C.Z’ are compatible if and only if B = C. If candidate tokens
are not compatible, they cannot be used together to make a transition. For example,
in Figure 10.17, the Scheduling-Allocation task is completed and successful only if

compatible tokens are merged, i.e., AS and AL.

Chapter 11

Synthesis Example

11.1 FPGA Synthesis

In this chapter, a synthesis scenario illustrates how the proposed CAD framework can
be used. The tools and decompositions employed are intended to be representative,
however, not exhaustive.

The circuit, being synthesized into a Field Programmable Gate Array (FPGA)
chip, is a convolver for a signal processing application. The primary output is a point
multiplication result of input pixels. The objective is to design an FPGA chip from a
VHDL behavioral description of the convolver. There are constraints on the number
of connections between the FPGA chips and on timing. There is also a constraint on
the area of the chip, the most serious limitation of FPGA design.

The functional behavior of the component can be veriﬁed via simulation. This
simulation and debugging cycle is not part of the synthesis example. Prior to the
beginning of synthesis, Cockpit is running with an input ﬁle indicating the standard

125

126

tools and task decompositions available at our site. The primary task, called FPGA
Synthesis, is initially displayed since this is a goal task. Upon selecting this task,
Cockpit tells the user that it can be decomposed into the subtasks VHDL Compile,
Place and Route, and Bit Generation. The user asks Cockpit to apply this de-
composition and the FPGA Synthesis icon is replaced in the display by the others.

The production is shown in Figure 11.1.

11.1.1 VHDL Compilation

The transformation of the VHDL behavioral description into the Xilinx netlist ﬁle
(XNF) and symbol report ﬁle generation is the ﬁrst step of VHDL compilation. When
the VHDL Compile is expanded, Cockpit uses the production, elaborate, shown in
Figure 11.2. When this production is applied, Elaborate checks the VHDL syntax
and transforms the VHDL description into the proper Xilinx netlist ﬁle. A portion

of the initial VHDL description is shown in Figure 11.3.

11.1.2 Placement and Routing

Once the XNF ﬁle generation has been completed, the next step is Placement and
Routing. In this step, the FPGA logic cells are deﬁned, placed, and routed. Cockpit
invokes two tools sequentially: xnfprep followed by ppr using the production shown
in Figure 11.4. The ﬁrst tool, xnfprep, takes the XNF ﬁle as an input and generates
the FPGA logic cell deﬁnition and PRP report ﬁle. Then the second tool, ppr, is

called to place and route the logic cells onto a FPGA chip.

127

 

VHDL

 

 

 

 

 

 

 

 

 

 

 

VHDL
FPGA Synthesu' .
D Netlist "m
Synbol

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

FPGA
Configuration
Bit File
Logic
Cell Placed & Rained
Report Deﬁnition
Bit Generation
Conﬁguration Report
Bit ﬁle
Figure 11.1: Production of FPGA Synthesis
VHDL VHDL
VHDL Compile 9Q
, Nellist "m
Netllst Symbol Noam SW

 

 

 

 

 

 

 

 

 

 

 

 

Figure 11.2: Decomposition of VHDL Compile

128

 

PROCESS
BEGIN
WAIT until XP_Clk’EVENT AND XP_Clk = ’1’;
multtemp(31 downto 0) <8 itobv(bvtoi(left_in(15 downto 0))
8 bvtoi(left_in(31 downto 16)).32);
addtemp(31 downto 0) <8 itobv(bvtoi(addtemp(31 downto 0))
+ bvtoi(multtemp(31 downto 0)),32);
right_out(31 downto 0) <8 addtemp1(31 downto 0);
right-out(35 donate 32) <8 left_in(35 downto 32);
END PROCESS;

Figure 11.3: Initial VHDL Description

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Nellist N
xnfprep
W
Cell PRP
Deﬁnition ”PM
PRP Placed & Rooted
Report Logic Cell
Deﬁnition

 

 

 

 

 

Placed & Rooted
Logic Cell

 

 

 

 

 

Definition

 

Figure 11.4: Decomposition of Placement and Route

129

At this time, ppr cannot ﬁnish its placement since the whole logic cannot be ﬁtted
into a single Xilinx 4010 FPGA chip. Cockpit detects ppr’s failure after Cockpit per-
forms the post-evaluation function, e.g., by examining the ppr output ﬁle. Cockpit
tries to backtrack to the Place and Route production to see if there is another alter-
native for this production, which it tries if available. For this example, however, there
is no other alternative, therefore, Cockpit now moves up to the previous production,
VHDL Compile. This production also lacks another alternative, whereupon the whole
process fails. Thus, the design should be modiﬁed and the process should also be

retried.

11.1.3 Modiﬁed Design

The original design is changed so that the new design can ﬁt into a single FPGA
chip while maintaining functionality. Since a single l6-bit multiplier takes up a large
amount of space, this multiplier is decomposed into 4 8-bit multipliers and several
adders. A partial description of this decomposition is shown in Figure 11.5. The new

description is used for the same process.

11.1.4 Synthesis Results

Bit Generation decomposition and all its steps are shown in Figure 11.6 and 11.7.
Although the modiﬁed design is used for a new synthesis process, the designer can
try different constraints and parameters using the original design. For example, one of

the constraints subject to change is the operating condition, which is set to WCCOM

130

 

PROCESS
BEGIN
WAIT until XP_Clk’EVENT AND XP_Clk 8 ’1’;
addtemp1(15 donate 0) <8 itobv(bvtoi(left_in(23 downto 16))
8 bvtoi(left-in(7 downto 0)),16);
addtemp2(23 downto 8) <8 itobv(bvtoi(left_in(31 downto 24))
8 bvtoi(left_in(7 downto 0)),16);
addtemp5(23 donate 8) <8 itobv(bvtoi(addtemp1(15 downto 8))
+ bvtoi(addtemp2(23 downto 8)),16);
addtemp5(7 donate 0) <8 addtemp1(7 downto 0);
addtemp3(15 downto 0) <8 itobv(bvtoi(left_in(23 downto 16))
8 bvtoi(left_in(15 downto 8)),16);
addtemp4(23 downto 8) <8 itobv(bvtoi(left_in(31 dounto 24))
8 bvtoi(left_in(15 downto 8)),16);
addtemp6(23 downto 8) <8 itobv(bvtoi(addtemp3(15 downto 8))
+ bvtoi(addtemp4(23 downto 8)),16);
addtemp6(7 downto 0) <8 addtemp1(7 downto 0);

right-out(35 downto 32) <8 left_in(35 donate 32);

END PROCESS ;

Figure 11.5: Modiﬁed VHDL Description

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Placed & Rooted Placed & Rooted
Logic Cell Logic Cell
Deﬁnition Deﬁnition

I
Bit Generation
FPGA . BIT FPGA arr
Conﬁguration Report Conﬁguration Report
Bit File Bit File

 

 

 

 

 

 

 

 

 

 

 

Figure 11.6: Decomposition of Bit Generation

(Worst-Case-Commercial). Examples of parameters are random seed, placer.effort,
and router.eﬁ’ort. AS shown in Figure 11.8, the ﬁnal design occupies about 98% of
CLBS and 68% of function generators. The maximum speed at which this chip can
operate is about 8 MHz, as shown in Figure 11.8 and 11.9. In Figure 11.9, the graph
shows that most of the assignments of the nets are done at about a 10 MHz clock
rate, although a few of them can operate at a 40 MHz rate. The slowest operations

determine the overall rate of the chip’s operation speed.

 

 

 

 

 

 

132

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Logic
Cell PRP
FPGA Synthesis Deﬁnition RPPO“t
...,
Conﬁguration
Bit File
Placed & Rooted
Logic Cell PPR
Deﬁnition W
makebits
c T” B"
on guratlon Report
Bit File

 

 

 

 

 

 

Figure 11.7: Decomposition of FPGA Synthesis

133

 

Partitioned Design Utilization Using Part 4010P6191-6

No. Used Max Available % Used

Occupied CLBs 395 400 98%
Packed CLBS 275 400 63%
Bonded I/O Pins: 77 160 48%
F and G Function Generators: 551 800 68%
H Function Generators: 58 400 14%
CLB Flip Flops: 64 800 8%
IOB Input Flip Flops: 0 160 0%
IOB Output Flip Flops: 36 160 22%
Memory Write Controls: 0 400 0%
3-State Buffers: 0 880 0%
3-State Half Longlines: 0 80 0%
Edge Decode Inputs: 0 240 0%
Edge Decode Half Longlines: 0 32 0%

Minimum Clock Period : 125.9ns

Estimated Maximum Clock Speed : 7.9MHz

Figure 11.8: PPR and Timing Report Summary

 

134

t e.

m ,

s 1.“.wa

 

Figure 11.9: Graphical Timing Result

Part IV

Conclusion

135

Chapter 12

Conclusion

Design process management is an issue of critical importance. More and more virtual
prototyping techniques are used to reduce design time and error. In the process or
virtual prototyping, modeling is one of the key issue. The modeling style greatly
affects not only the quality of ﬁnal design but also the design process.

This thesis has proposed new techniques for FSM modeling using VHDL. The
proposed techniques improve the readability of FSM description and allow a large
FSM to be easily decomposed into smaller FSMS. These techniques are illustrated by
modeling the micro-controller in SINCGARS radio circuit as a set of FSMS. The same
techniques are used to model software component of an Instrumentation System and
a signal processing application. Different synthesis results are obtained from different
FSM models: the quality, such as operating speed and area used, of the design is
greatly affected by the modeling style; a particular modeling style may not produce
a synthesized result because of given constraints; thus, a different model must be
developed, and this new model has to be used in the design process again.

136

137

VHDL is also shown to be a suitable and highly effective hardware description
language to describe parallel architectures and algorithms speciﬁc to these parallel
architectures. A signal processing application is implemented for a parallel machine,
Splash 2. This VHDL model is then synthesized to Xilinx FPGAS in order to run on

Splash 2 hardware.

An execution environment for high-level synthesis is proposed. A Colored Petri
Net is used to model the execution environment. The proposed execution environment
utilizes this formalism to assist designers in selecting and executing appropriate design
processes. The proposed environment is especially applicable to a design environment
where a hardware design is carried out hierarchically and many alternative processes

are possible for the same task.

A prototype of the proposed execution environment has been implemented. The
execution environment has been found to be quite useful and elegant. Several design
methodologies, including design processes for high level synthesis, have been modeled
using design process grammars [73]. Design exercises have been successfully carried
out using these grammars. Design process grammars are shown to be useful methods
to describe design processes. Currently, more tools are being integrated and to im-
prove encapsulated knowledge. This CAD framework will become more practical as
CAD vendors adopt the practice of open software systems and allow for greater tool

inter-operability.

138
12.1 Contributions

Several contributions are made. Specifically,

0 An execution model based on the Colored Petri Net has been proposed. Al-
though the Colored Petri Nets are used in many applications, no existing CAD
frameworks are modeled based on the net. The execution environment has also
been implemented and several design exercises have been performed using the

proposed CAD framework.

0 The execution environment allows specifications of design methodologies and
execution of design methodologies separately, a factor lacking in existing CAD
frameworks. The execution environment does not need to know the details of
the design process. Even if a new design process is introduced to the system,

the existing execution environment does not need to be changed.

0 Flexible means of pre-/post-condition representation and checking mechanisms
have been deﬁned and implemented. The selection of an alternative produc-
tion or a tool is done by examining pre—evaluation results. Users assign ratings
for each alternative, including terminal tasks, when the process graph is built.
Users may also change these ratings as they gain more experience using tools
and productions. Users can write their own evaluation functions and easily
incorporate these functions into the execution environment, which allows back-

tracking if post-condition is not satisfactory.

139

0 Speciﬁcation hierarchies have also been proposed. Task and speciﬁcation are
organized into a class hierarchy. Features of tasks, such as pre-condition and

post-condition, can be inherited.

o Concurrent execution of multiple alternatives is allowed in our execution envi-
ronment. By dynamically creating Daemon processes, all alternatives can be

tried simultaneously.

a The execution environment allows users to construct a partially expanded pro-
cess graph at the beginning of a design exercise. The execution environment
takes this graph as the start graph and dynamically completes it until the de-
sign goal is reached. The execution environment also allows process evaluation
without actually starting the design activity. Users can see the possible design

flow by process simulation.

0 The execution environment allows execution of tools in a distributed environ-
ment. The CAD framework’s scheduler utilizes a resource ﬁle containing neces-
sary information to invoke tools, such as full path information, environment in-
formation, and input/output requirements, and decides which processes should
be run under which domains. Static load balancing is maintained between given

domains.

0 New modeling techniques have been proposed for FSMS using VHDL. These
techniques are successfully used to model an existing circuit and are especially

useful when the target FSM is large and needs to be decomposed into several

140

FSMS. The existing techniques are not suitable for this case.

0 Modeling styles for parallel architectures have been suggested and several algo-

rithms have been implemented for given parallel architectures using VHDL.

o It has also been empirically proved that modeling styles has a great impact on

the quality of the synthesis results.

12.2 Current Implementation Status

The proposed execution environment is implemented using C/C++ under Solaris Op-

erating System. Functionalities completely implemented are summarized as follows:

0 Major components such as Cockpit and GUI

Separation of Speciﬁcation and Execution environment

Automatic selection and execution of methodologies

0 Dynamic expansion of productions

Automatic backtracking

Concurrent execution of multiple alternatives

Independent Pre—evaluation and Post-evaluation functions

Display of design history information

Miscellaneous features, such as Break-point, Checklist, and Resource ﬁle

141

0 Selection of any start graph and partial expansion of the design process graph

at the beginning of the design time

0 Process Simulation

12.3 Future Work

Further research and improvement are needed in the following areas:

0 Tool encapsulation must be improved or redesigned as CFI defines new stan-

dards. Several translation tools should be developed and added to the current

CAD framework.

0 Parameter change representation and its usage should be further investigated
and implemented. Design knowledge gained from design experiences should be
captured and used in the subsequent design. Even when backtracking occurs,
the design knowledge should be consulted and used to modify the parameters

based on the previous results and experiences.

0 Currently, static average load is checked before a task is assigned to the lightly
loaded machine. Dynamic load balancing and redistribution of loads if certain
resources are overloaded should be further studied and implemented. Another
limitation of current implementation is that available machines must be listed
before they can be used. Ability to check all available machines should be

developed and implemented.

142

0 Specification hierarchy through specialization/ generalization has been proposed

and needs to be implemented.

0 Modeling techniques and styles should be further studied for different hard-
ware components, such as memory, bus, and ALU. These component modeling
techniques should be suitable for not only description and simulation but also

synthesis.

APPENDICES

Appendix A

Glossary

CFI : See Section 9.3.
CAD Frameworks Initiative is a consortium of electronic computer aided design
(CAD) users, integrators, and vendors that provides interoperable solutions

through industry standards.

Cockpit : See Section 10.3.1.
The coordinator between the designer, the daemon process, and execution pro-
cesses in the proposed execution environment. Cockpit keeps track of the cur-

rent status of the design process and informs the user of possible actions.

Computer Aided Design Framework: See Part III.
CAD Frameworks are design environments consisting of many different design

tools that aid design activities.

Daemon Process: See Section 10.3.2.

Daemon process of a task is a process that waits for input event (input token)

143

144

and performs its task by creating an execution process and communicating with

the Cockpit.

Design Data Management : See Chapter 9.
Design data management captures relationships between design data, such as
versions and conﬁgurations [2]. Design data management deals with methods
used for storing and retrieving design data, maintaining relationships (such as
version, transformation, and conﬁguration) between designs, and maintaining

consistencies among them.

Design Methodology : See Chapter 9.
Design methodology is a sequence of tool invocations used to transform given

input speciﬁcations into desired output speciﬁcations [72].

Design Methodology (Workﬂow) Management: See Chapter 9.
Design methodology management is the selection and execution of an appropri-

ate sequence of tools to produce a desired design from a given speciﬁcation.

Design Process : See Part III.
Design process is a series of operations to produce micro-electronic hardware

systems.

Execution Environment : See Section 9.2.
The framework execution environment is a software environment which helps
designers in selecting and executing design methodologies by allowing the sys-

tematic exploration of the design planning space. The execution environment

145

helps designers select proper design methodologies and execute them dynami-

cally and automatically.

Execution Model: See Section 10.3.
Model of dynamic execution of process ﬂow is called the execution model. Our

execution model is based on the extended Petri Net.

Execution Process : See Section 10.3.3.
Execution process of a task is a Unix process that either invokes a tool to
perform the terminal task or creates another daemon process for the execution

process’ logical task.

Field Programmable Gate Array (FPGA) : See Chapter 4.
FPGA is an integrated circuit whose conﬁguration can be dynamically deter-

mined not by a mask pattern but by external information.

Hardware/Software Co-design : See Chapter 5.
Hardware/ software co-design is a system designing method in which the system
contains both hardware and software components. Hardware/ software co-design
is concerned with partitioning a system into higher quality hardware and soft-

ware components, in shorter time and at lower cost than existing systems.

High Level Synthesis : See Chapter 8.
High-level synthesis takes an HDL-based behavioral description and automati-

cally translates it into a RTL-level description.

146
IDEF : See Chapter 7.

IDEF is the speciﬁcation language for Integration DEFinition for function mod-
eling. IDEF includes both a deﬁnition of a graphical modeling language (syntax
and semantics) and a description of a comprehensive methodology for develop-

ing models.

Instrumentation System (IS) : See Chapter 5.
An instrumentation system is a collection of those modules that can be used to

collect runtime information from distributed processes.

Petri Net : See Section 10.2.

Petri Net is a mathematical representation of a system to be studied.

Post-condition: See Section 10.8.
Post-condition is a condition which must be satisﬁed after a methodology ﬁn-

ishes its execution.

Pre—condition: See Section 10.8.
Pre-condition is a condition which must be satisﬁed in order for a methodology

to be selected for execution.

Process Flow Graph: See Section 9.1.

Process ﬁow graph is a graphical representation of a design methodology.

Process Grammar: See Section 9.1.
Process grammar is a formal method of representing the transformation of pro-

cess flow graphs into progressively more detailed process ﬂow graphs [73].

147

Process Metric : See Chapter 7.
Process metrics are quantiﬁable measures of either the design process or the
products. thdamental performance measures include design cycle time, ease

of use, reusability, and dependability.

Production : See Section 9.1.2.
Production is a substitution rule that permits the replacement of a logical task

with a graph that represents a possible way of performing the task [72].

Task: See Chapter 1.
Terminal Task: Task in the design methodology representing a tool invoca-
tion.
Logical Task: Abstract task that could be accomplished by different tools or

tool combinations.

Version Control: See Section 10.10.
Version control is a control method that handles different versioning problems

in micro-electronic hardware design process.

VHDL : See Chapter 1.
VHSIC Hardware Description Language
VHSIC: Very High Speed Integrated Circuit

VHDL is a language for hardware design, documentation, simulation, and syn-

thesis. VHDL had been adopted as a standard HDL in 1987 by IEEE.

148

Virtual Prototyping: See Chapter 1.
The concept of virtual prototyping is to develop an effective modeling capability,
including reusable models of proven designs coupled with hardware synthesis to

produce a hardware model.

Workﬂow : See Chapter 1.
A workflow or methodology is a representation of how the design process should

be carried out.

BIBLIOGRAPHY

Bibliography

[1] D. G. Fairbairn, “1994 Keynote Address,” in Proceedings of the 3lst Design
Automation Conference, pp. xvi—xvii, 1994.

[2] S. Kim, Conﬁguration Managment and Version Data Modeling in VLSI Design
Environments. PhD thesis, Michigan State University, 1994.

[3] D. D. Gajski and R. Kuhn, “Guest Editors’ Introduction: New VLSI Tools,”
IEEE Computer, vol. 16, pp. 11—14, December 1983.

[4] D. D. Gajski, High-level Synthesis: introduction to chip and system design.
Kluwer Academic, 1992.

[5] CFI, “CAD Framework Users, Goals, and Objectives,” Tech. Rep. Version 0.91,
CAD Framework Initiative, Inc, Aug. 1990.

[6] R. Jain, K. Kucukcakar, M. J. Mlinar, and A. C. Parker, “Experience with
the ADAM Synthesis System,” in Proceedings of the 26th Design Automation
Conference, pp. 56—61, 1989.

[7] D. Knapp and A. Parker, “The ADAM Design Planning Engine,” in Artiﬁcial
Intelligence in Design, Volume II, pp. 263—285, Academic Press, 1992. reprinted

from IEEE Transactions on Computer Aided Design of Integrated Circuits and
Systems, Vol. 10, No. 7, July 1991.

[8] D. W. Knapp and A. C. Parker, “A Design Utility Manager: The ADAM Plan-
ning Engine,” in Proceedings of the 23rd ACM/IE'E’E Design Automation Con-
ference, pp. 48—54, 1986.

[9] T. Chiueh and R. H. Katz, “A History Model for Managing the VLSI Design
Process,” in International Conference on Computer Aided Design, pp. 358—361,
1990.

[10] A. D. Janni, “A Monitor for Complex CAD Systems,” in Proceedings of the 23rd
Design Automation Conference, pp. 145-151, 1986.

[11] K. Jensen and G. Rozenberg, Eds, High-level Petri Nets. Springer-Verlag, 1991.

[12] T. Murata, “Petri Nets: Properties, Analysis and Applications,” Proceedings of
the IEEE, vol. 77, pp. 541—580, April 1989.

149

150

[13] J. L. Peterson, Petri Net Theory And The Modeling 0f Systems. Prentice-Hall,
Inc., 1981.

[14] IEEE Standard VHDL Language Reference Manual. IEEE Std 10764987, New
York, NY, 1988.

[15] D. R. Coelho, VHDL Handbook. Kluwer Academic Press, 1989.

[16] B. Cohen, VHDL Coding Styles and Methodologies. Kluwer Academic Press,
1995.

[17] E. Sternheim, R. Singh, and Y. Trivedi, Digital Design with Verilog HDL. Au-
tomata Publishing Company, 1990.

[18] D. E. Thomas and P. R. Moorby, The Verilog Hardware Description Language.
Kluwer Academic Publishers, 1995.

[19] M. J. Chung, J. H. Lee, C. Y. Lee, and B. E. Reidenbach, “VHDL Modeling
Benchmark Test: SINCGARS Radio Circuitry,” in Government Microcircuit
Applications Conference, pp. 331-334, 1989.

[20] R. Lipsett, C. Schaefer, and C. Ussery, VHDL: Hardware Description and Design.
Kluwer Academic Publishers, 1989.

[21] J. R. Armstrong, Chip-Level Modeling with VHDL. Prentice Hall, 1989.

[22] S. H. Choi, M. J. Chung, C. Y. Lee, and B. E. Reidenbach, “System Redesign
with VHDL,” in Government Microcircuit Applications Conference, 11 1990.

[23] S. H. Choi and M. J. Chung, “Methodology of System Design using VHDL,” in
Spring VI UF .Qngp 11—1ﬂ8jw5m1ﬂ992gm

[24] A. Waheed, Z. Youssﬁ, S. H. Choi, and D. T. Rover, “Hardware/ Software Code-
sign of an Instrumentation System Module for Monitoring Distributed Com—
puting Systems,” in Submitted to Second International Symposium on High-
Performance Computer Architecture (HPCA-Z), 1996.

[25] ITT, “Development Speciﬁcation for Remote I / 0 Module Firmware, CDRL 2017,
Vol 1, Part 2,” tech. rep., ITT.

[26] ITT, “Operational Firmware Documentation for Remote I/O Module for SINC-
GARS Production, CDRL 2018, Vol. 1, Part 2,” tech. rep., ITT.

[27] S. Carlson, Introduction to HDL-Based Design Using VHDL, ch. 3-4. Synopsys
Inc., 1991.

[28] R. B. Segal and S. Carlson, “VHDL Style Issues in Sequential Design Descrip-
tion,” in Spring 1.9.90 VHDL Users’ Group Meeting, April 1990.

151

[29] K. Hwang and F. A. Briggs, Computer Architecture and Parallel Processing.
McGraw-Hill Book Company, 1984.

[30] K. Hwang, Advanced Computer Architecture with Parallel Processing. McGraw-
Hill Book Company, 1993.

[31] D. Nassimi and S. Sahni, “Bitonic Sort on a Mesh-Connected Parallel Com-
puter,” IEEE Transactions on Computers, vol. c-27, January 1979.

[32] S. Choi, N. Ratha, M. J. Chung, and D. Rover, “Signal Processing Application
using VHDL on Splash 2,” in Fall VIUF .94, pp. 6.11—6.19, 11 1994.

[33] J. M. Arnold, D. A. Buell, and E. G. Davis, “Splash 2,” in Proc. 4th Annual
ACM Symp. on Parallel Algorithms and Architectures, pp. 316-322, 1992.

[34] M. B. Gokhale, B. Holmes, A. Kosper, D. Kunze, D. Lopresti, S. P. Lucas,
R. G. Minnich, and P. Olsen, “SPLASH: A Reconﬁgurable Linear Logic Array,”
in Proceedings of the International Conference on Parallel Processing, August
1990.

[35] J. M. Arnold, “The Splash 2 Software Environment,” in Proc. of IEEE Workshop
on FPGAS as Custom Computing Machines, April 1993.

[36] J. M. Arnold and D. A. Buell, “VHDL Programming on Splash 2,” in Proc. of
the International Workshop on Field-Programmable Logic, 1993.

[37] M. B. Gokhale and R. G. Minnich, “FPGA Computing in a Data Parallel C, SRC-
TR-93-097,” tech. rep., Supercomputing Research Center, Institute for Defense
Analyses, April 1993.

[38] J. M. Arnold and M. A. McGarry, “SPLASH 2 Programmer’s Manual, SRC-
TR—93-107,” tech. rep., Supercomputing Research Center, Institute for Defense
Analyses, September 1993.

[39] D. A. Buell, “A SPLASH 2 Tutorial, SRC-TR—92-087,” tech. rep., Supercomput-
ing Research Center, Institute for Defense Analyses, December 1992.

[40] Synopsys, VHDL Compiler Reference Manual. Synopsys Inc., Mt. View, CA,
1992.

[41] J. Daniel] and S. W. Director, “An Object Oriented Approach to CAD Tool
Control,” IEEE Transactions on Computer-Aided Design, pp. 698-713, June
1991.

[42] A. Waheed and D. T. Rover, “A Structured Approach to Instrumentation System
Development and Evaluation,” tech. rep., Department of Electrical Engineering,
Michigan State University, April 1995.

152

[43] F. Brewer and D. Gajski, “Chippe: A System for Constraint Driven Behavioral
Synthesis,” IEEE Transactions on Computer-Aided Design, vol. 9, pp. 681—695,
July 1990.

[44] F. D. Brewer and D. D. Gajski, “An Expert-System Paradigm for Design,” in
Proceedings of the 23th Design Automation Conference, pp. 62—68, 1986.

[45] F. D. Brewer and D. D. Gajski, “Knowledge Based Control in Micro-Architecture
Design,” in Proceedings of the 24th Design Automation Conference, pp. 203—209,
1987.

[46] M. Bushnell and S. W. Director, “Automated Design Tool Execution in the
Ulysses Design Environment,” IEEE Trans. on Computer-Aided Design, vol. 8,
pp. 279—287, March 1989.

[47] M. L. Bushnell and S. W. Director, “VLSI CAD Tool Integration Using the
Ulysses Environment,” in 23rd ACM/IEEE Design Automation Conference,
pp. 55-61, 1986.

[48] J. Daniell and S. W. Director, “An Object Oriented Approach to CAD Tool Con-
trol Within a Design Framework,” in Proceedings of the 26th Design Automation
Conference, pp. 197—202, 1989.

[49] M. F. Jacome and S. W. Director, “Design Process Management for CAD Frame-
works,” in Proceedings of the 29th Design Automation Conference, pp. 500—505,
1992.

[50] K. Bosch, P. Bingley, and P. van der Wolf, “Design Flow Management in the
NELSIS CAD Framework,” in Proceedings of the 28th Design Automation Con-
ference, pp. 711-716, 1991.

[51] P. van den Hamer and M. A. Treffers, “A Data Flow Based Architecture for CAD
Frameworks,” in Proceedings of the 1990 International Conference on Computer
Aided Design, pp. 482-485, 1990.

[52] P. van der Wolf, G. Sloof, P. Bingley, and P. Dewilde, “Meta Data Manage-
ment in the NELSIS CAD Framework,” in 27th ACM/IEEE Design Automation
Conference, pp. 142-145, 1990.

[53] S. Kleinfeldt, M. Guiney, J. K. Miller, and M. Barnes, “Design Methodology
Management,” Proceedings of the IEEE, vol. 82, pp. 231—250, February 1994.

[54] W. R. Hood and C. Myers, “RASSP: Viewpoint from a Prime Developer,” in
Proceedings of the Ist RASSP Conference, pp. 9—17, 1994.

[55] J. Pridmore and W. Schaming, “RASSP Methodology Overview,” in Proceedings
of the Ist RASSP Conference, pp. 71—85, 1994.

153

[56] J. Saultz, “Martin Marietta RASSP Program Overview,” in Proceedings of the
Ist RASSP Conference, pp. 18—32, 1994.

[57] NIST, “Integration Deﬁnition For Function Modeling (IDEFO), FIPS Pub 183,”
tech. rep., Computer Systems Laboratory, National Institute of Standards and
Technology, December 1993.

[58] NIST, “Integration Deﬁnition For Function Modeling (IDEFIX), FIPS Pub 184,”
tech. rep., Computer Systems Laboratory, National Institute of Standards and
Technology, December 1993.

[59] R. Baldwin, S. H. Choi, and M. J. Chung, “VHDL Synthesis Framework,” in
Spring VI UF 94, May 1994.

[60] R. A. Baldwin and M. J. Chung, “Design Methodology Management Using Graph
Grammars,” in 313t ACM/IEEE Design Automation Conference, pp. 472—478,
1994.

[61] E. Kupitz and J. Tacken, “DECOR - Tightly Integrated Design _Qontrol and
Qbsegvation,” in Proceedings of the 1992 International Conference on Computer
Aided Design, pp. 532—537, 1992.

[62] F. Bretschneider, C. Kopf, H. Lagger, A. Hsu, and E. Wei, “Knowledge Based
Design Flow Management,” in Proceedings of the 1990 International Conference
on Computer Aided Design, pp. 350—353, 1990.

[63] K. Jensen, “Coloured Petri Nets: A High Level Language for System Design
and Analysis,” in High-Level Petri Nets (K. Jensen and G. Rozenberg, Eds),
pp. 44—119, Springer-Verlag, 1991.

[64] R. M. Shapiro, “Validation of VLSI Chip Using Hierarchical Colored Petri Nets,”
in High-Level Petri Nets (K. Jensen and G. Rozenberg, Eds), pp. 667—687,
Springer-Verlag, 1991.

[65] R. H. Katz and E. Chang, “Managing Change in a Computer-Aided Design
Environment,” in Conference on Very Large Databases, 1987.

[66] R. H. Katz, R. Bhateja, E. E.-L. Chang, D. Gedye, and V. Trijanto, “Design
Version Management,” IEEE Design and Test, vol. 4, pp. 12—22, February 1987.

[67] A. Takahara, “Versioning and Concurrency Control in a Distributed Design En-
vironment,” in Proceedings of the 1992 International Conference on Computer
Design, pp. 540—543, 1992.

[68] O. Schettler and A. Bredenfeld, “BEPPO: A Data Model for Design Represen-
tation,” in Proceedings of the 1993 European Design Automation Conference,
pp. 378—382, 1993.

154

[69] T. Miyazaki, T. Hoshino, and M. Endo, “A CAD Process Scheduling Technique,”
in Proceedings of the 1990 International Conference on Computer Aided Design,
pp. 354-357, 1990.

[70] P. R. Sutton, J. B. Brockman, and S. W. Director, “Design Management Using
Dynamically Deﬁned Flows,” in 30th ACM/IEEE Design Automation Confer-
ence, pp. 648—653, 1993.

[71] D. S. Harrison, A. R. Newton, R. L. Spickelmier, and T. J. Barnes, “Electronic
CAD Frameworks,” Proceedings of the IEEE, vol. 78, pp. 393—417, February
1990.

[72] R. A. Baldwin and M. J. Chung, “A Formal Approach to Managing Design
Processes,” IEEE Computer, pp. 54—63, February 1995.

[73] R. A. Baldwin, A Discipline Independent Framework for Engineering Design.
PhD thesis, Michigan State University, 1994.

[74] CFI, “Inter-Tool Communication Message Dictionary,” Tech. Rep. Document
Number ITC-90—G-04, CAD Framework Initiative, Inc, 1991.

[75] CFI, “Tool Encapsulation Speciﬁcation Standard,” Tech. Rep. Document Num-
ber DMM-91-G-1, CAD Framework Initiative, Inc, 1991.

[76] K. W. Fiduk, S. Kleinfeldt, M. Kosarchyn, and E. B. Perez, “Design Methodology
Management - A CAD Framework Initiative perspective,” in 27th ACM/IEEE
Design Automation Conference, pp. 278-283, 1990.

[77] T. J. Scallan, “CAD Framework Initiative - a user perspective,” in 29th
ACM/IEEE Design Automation Conference, pp. 672—675, 1992.

[78] W. Reisig, Ed., A Primer in Petri Net Design. Springer-Verlag, 1992.

[79] L. Ferrarini, “An Incremental Approach to Logic Controller Design with Petri
Nets,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 22, pp. 461-
473, May/June 1992.

[80] M. Zhou, F. DiCesare, and A. A. Desrochers, “A Hybrid Methodology for Syn-
thesis of Petri Net Models for Manufacturing Systems,” IEEE Transactions on
Robotics and Automation, vol. 8, pp. 350—361, June 1992.

[81] J. Kljaich, Jr., B. T. Smith, and A. S. Wojcik, “Formal Veriﬁcation of Fault Tol-
erance Using Theorem-Proving Techniques,” IEEE Transactions on Computers,
vol. 38, pp. 366—376, March 1989.

"lllllllllllllllllf