N
'1
.n hJ

 

.5-

“31b

. Nu
.tvl A1. t .I.&III).,\;I
, . 4 , Ivvll.4Y. nil

”fr .7 , ﬁg a?“ 7 A ; EﬁExw .. ¥ .. A. .. 25$;wg.

.‘\1.l.v 1

 

 

 

IllllllllllllI|||lllll|||lllllll|||||llilllllilllllllllllllll

3 1293 02058 987

This is to certify that the

dissertation entitled

A METHODOLOGY FOR BEHAVIORAL-LEVEL SWITCHING
ACTIVITY ESTIMATION IN CMOS CIRCUITS

presented by
Ronnie Lee Wright

has been accepted towards fulﬁllment
of the requirements for

Ph. D degree in Electrical Eng

WW

Major professor

Date ,4,» 020" I747

MS U is an Afﬁrmative Action/Equal Opportunity Institution

0-12771

   
 
     

  

LIBRARY

Michigan State
University

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINE return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

moo chIMpSS—p.“

 

 

 

A Methodology For Behavioral-Level Switching

Activity Estimation in CMOS Circuits

By

Ronnie Lee Wright

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

Department of Electrical and Computer Engineering

1999

ABSTRACT

A Methodology For Behavioral-Level Switching

Activity Estimation in CMOS Circuits
By

Ronnie Lee Wright

The demand for computer-aided design (CAD) tools to accurately perform power
analysis on high-level design speciﬁcations has become increasingly important. Two
major factors are responsible for promoting this research and development: the emer-
gence of low power as a key VLSI design parameter and the integrated circuit de-
sign industry’s increased migration from hardware-design methodologies based on,
schematics or simple programmable logic device (PLD) languages to methodologies
based on high-level design speciﬁcations described by hardware description languages.
Such CAD tools will enable the development of power efﬁcient digital circuits within
a process that offers increased design ﬂexibility, design reuse, and lower cost.

A CAD tool’s ability to accurately compute the internal power dissipated by
a circuit heavily depends on how well it estimates switching activity and network
capacitance. The switching activity is a key factor used in the calculation of dynamic
power dissipation for CMOS circuits; it represents the probability or frequency at

which power-consuming gate output transitions take place.

The objective of this research is to develop a more accurate and cost-effective
technique for computing the switching activity and dynamic power dissipation of
behavioral-level design speciﬁcations described in VHDL (VHSIC Hardware Descrip-
tion Language). Unlike some gate- or circuit- level statistical or probabilistic esti-
mation schemes, this new technique operates at the behavioral level of design ab-
straction without considering technology or statistical circuit characterizations. This
new approach accepts a user-speciﬁed depth-accuracy parameter that is responsible
for controlling the accuracy of the switching activity estimate at the expense of time
and memory. The developed techniques and algorithms have been implemented in
a program called the Behavioral-Level Activity and Power Estimator (BLAPE). Re-
sults and benchmark comparisons with other power analysis CAD tools are given to

validate the application and effectiveness of the new approach.

Copyright © by
Ronnie Lee Wright

1999

To my grandmother, Mrs. Lillie Mae Heller

in], _
III; .

 

35 an
“’7’“ a:

"LIT 1P

Milly;

ACKNOWLEDGMENTS

First, giving honor to God, who is the head of my life. If it were not for God and
my savior, Jesus Christ, the completion of this research effort would not have been
possible. I thank God for giving me the ability and strength to persist and weather
the storm.

I would like to thank my hero and grandmother, Mrs. Lillie Mae Heller for inspir-
ing me to become an engineer. She was brave and fought in the struggles that made
this and many other Opportunities possible. I would like to thank my grandfather,
Mr. Willard Ervin Heller (Expired: May 18, 1995) for being a good provider, a great
father ﬁgure and a very positive inﬂuence in my life. As a young man my grandfather
pushed for excellence in all of my schoolwork; his insistence initiated the possibility
of undergraduate studies and ensured the completion of advanced graduate studies.
I would like to thank my uncle and father ﬁgure, Dr. Oscar Lee Wright, for serving
as an excellent male role model and demonstrating that hard work and determina-
tion are the key elements to success. I would like to thank my mother, Ms. Burnice
Marie Wright for passing on the ability to be academically successful and constantly
reminding me that no problem was too difﬁcult for me to solve. I would like to

thank my youngest aunt, Mrs. Marneta Lynn Grifﬁn, for her support in xeroxing

vi

n—n-p .4.» w.—

 

 

 

 

’U) I I

v.3
f‘wL

f X
W 1
Mg

w;
(1.?

documents/papers, as well as occasional proof-reading of my writings. I would like
to thank all my friends and relatives that placed me in their prayers and offered their
support. I would even like to thank those who encouraged me to quit when times were
difficult. I would like to thank Dr. ChristOphe Fiorio of the Technische Universitat
in Berlin, Germany for his development and support of the algorithm2e.sty [MEX
style ﬁle. I would like to give a special thanks to my major dissertation advisor, Dr.
Michael A. Shanblatt. Dr. Shanblatt demonstated excellence in advising and assisted
in helping me improve my research skills. He believed in me and the research I was
conducting. Also, Dr. Shanblatt was always very professional, timely, and an expert
at obtaining funding for my graduate studies and research. Additionally, I would like
to thank Dr. Shanblatt’s family for being understanding; there were many weekends,
evenings, and nights in which I telephoned Dr. Shanblatt at home to discuss research.

I would like to thank my disseration committee: Dr. Michael A. Shanblatt, Dr.
Bruce Kim, Dr. Diane T. Rover, Dr. Anthony S. Wojcik, Dr. Richard J. Enbody,
and Dr. Byron C. Drachman for serving on the committee and providing constructive
criticisms to improve the quality of my research. I would also like to give a special
thanks to Dr. Wojcik for providing papers and other references to assist in the

research.

vii

TABLE OF CONTENTS

LIST OF TABLES xii
LIST OF FIGURES xiii
1 Introduction 1
1.1 Background ................................ 2
1.1.1 Electronic Design Automation Process ............. 2

1.1.2 Design Representations ...................... 2

1.1.3 Design Optimization ....................... 3

1.1.4 Design Phases ........................... 3

1.1.5 Steps Within the Design Phase ................. 5

1.2 Motivation ................................. 6
1.3 Low Power Design ............................ 7
1.3.1 Sources of Power Dissipation ................... 7

1.3.2 Low Power Optimization Techniques .............. 11

1.3.2.1 Supply Voltage Reduction ............... 11

1.3.2.2 Physical Capacitance Reduction ............ 12

1.3.2.3 Switching Activity Reduction ............. 13

1.3.2.4 Structural Optimization ................ 14

1.3.2.5 System and Architectural Level Optimizations . . . . 16

1.4 HDL—Based Design ............................ 17
1.5 Problem Statement ............................ 18
1.6 Dissertation Overview .......................... 19

2 Switching Activity Estimation 21
2.1 Calculation of Switching Activity .................... 21
2.1.1 Input Pattern Dependence .................... 22

2.1.2 Glitch Activity .......................... 24

2.1.3 Delay Model ............................ 25

2.1.4 Logic Function .......................... 26

 

 

 

 

 

5 I

2.1.5 Circuit Structure ......................... 27

2.2 Previous Work .............................. 29
2.2.1 Statistical Methods ........................ 30
2.2.2 Probabilistic Methods ...................... 34
2.2.3 High-Level Methods ....................... 42

3 Behavioral Representations of Switching Functions 46

3.1 Switching Algebra Background ...................... 46

3.2 Truth Tables ............................... 47
3.2.1 Logic Operations ......................... 47
3.2.2 Output Enumeration ....................... 49

3.3 Boolean Expressions ........................... 49

3.4 Binary Decision Diagrams ........................ 51

3.5 Behavioral VHDL Speciﬁcations ..................... 54

4 Structural Representations of Switching Functions 56

4.1 Schematics ................................. 57

4.2 Netlists .................................. 58

4.3 Structural VHDL Speciﬁcations ..................... 59

4.4 Connective Binary Decision Diagrams .................. 60
4.4.1 Motivation ............................. 61
4.4.2 Overview ............................. 61
4.4.3 Minimized Scalable BDDs .................... 62

4.4.3.1 MSBDD Generation Routines ............. 63

4.4.4 CBDD Deﬁnitions ........................ 65

4.4.5 Analysis of Connective-BDDs .................. 67

4.4.6 Implementation .......................... 70

4.4.7 Advantages and Disadvantages ................. 72

4.4.8 Results ............................... 73

4.4.9 Summary ............................. 74

5 Behavioral-Level Switching Activity Estimation 76

5.1 Methodology Overview .......................... 77
5.1.1 Methodology Assumptions .................... 77
5.1.2 Methodology Outline ....................... 78

5.2 Methodology Task Decomposition .................... 78
5.2.1 Transformation of VHDL into Boolean Equations ....... 80
5.2.2 Decomposition of Boolean Equations .............. 84

ix

6 I

 

7 I

AP

A I

5.2.2.1 Implicit Structure Representation ...........

5.2.2.2 Mapped Structure Representation ...........

5.2.3 Building the CBDD Representation ...............
5.2.3.1 CBDD Selection Rationale ...............

5.2.3.2 Generation of CBDD Graph ..............

5.3 Signal Probability Computation .....................
5.3.1 Overview .............................
5.3.1.1 Network Levelization ..................

5.3.1.2 Disjoint Post-Order Boolean Equation Generation . .

5.3.1.3 IPR Cubeset Generation ................

5.3.1.4 Computation of Signal Probability ..........

5.3.1.5 SPCA Component Interconnection ..........

5.4 Switching Activity and Power Computation ..............
5.5 Experimental Results ...........................
5.5.1 Experiment 1 : 64-Bit Adder Benchmark ............
5.5.2 Experiment 2 : Arithmetic Benchmarks .............
5.5.3 Experiment 3 : ISCAS-85 Benchmarks .............
5.5.4 Experiment 4 : Nonredundant ISCAS-85 Benchmarks .....
5.5.5 Experiment 5 : MCNC Synth89 Benchmarks ..........
5.5.6 Remarks ..............................

Behavioral-Level Visualization of Switching Activity

6.1 Activity Viewer Tool ...........................
6.1.1 Input Transformation Process ..................
6.1.2 Activity Viewer Results .....................

6.2 Future Work and Summary .......................

Conclusions
7.1 Contributions ...............................
7.2 Future Work ................................

7.3 Impact of Contributions .........................

APPENDICES
A Deﬁnitions and Formulas

B Application of BLAPE to 4-bit Booth Multiplier

B.1 High-Level VHDL Speciﬁcation Input .................
B.2 Boolean Equation Generation .....................

84
86
88
88
88
92
93
94
98
104
109
111
115
119
120
123
124
125
126
128

132
132
135
135
137

139
139
141
142

143

143

146
146
148

B.3 Generate Implicit Structural Speciﬁcation ............... 158

BA Network Levelization ........................... 164
B5 SIS Activity and Power Estimation ................... 171
B6 BLAPE Activity and Power Estimation ................. 177
BIBLIOGRAPHY 184

xi

LIST OF TABLES

2.1 Boolean expression signal probabilities .................. 22
2.2 Effects of uncorrelated inputs. ...................... 23
2.3 Effects of spatially correlated inputs. .................. 23
2.4 Effects of temporal input correlations ................... 24
2.5 Effects of spatio-temporal input correlations ............... 24
2.6 Two-input AND, OR, XOR truth table. ................ 26
2.7 Two—input AND, OR, XOR signal probability table ........... 27
2.8 Switching activities for two-input AND, OR, XOR gates. ....... 27
3.1 Huntington’s postulates. ......................... 47
3.2 Truth tables for NOT, AND, OR logic operations ............ 48
3.3 'Ii‘uth table for full-adder carry-out bit .................. 49
4.1 Benchmark Results ............................. 74
5.1 Post-order Boolean equations for FA carry-out bit. .......... 103
5.2 Output from GenerateJPR-Cubeset procedure. ............ 108
5.3 Signal probability computation for full-adder carry-out bit. ...... 112
5.4 Symbol deﬁnitions. ............................ 115
5.5 Procedure running times .......................... 115
5.6 64-bit adder size/ power estimates ..................... 120
5.7 Power/ Accuracy estimates of arithmetic benchmarks. ......... 123
5.8 Time estimates of arithmetic benchmarks. ............... 123
5.9 ISCAS-85 power/accuracy benchmarks .................. 124
5.10 ISCAS—85 time benchmarks. ....................... 124
5.11 Nonredundant ISCAS—85 power/ accuracy benchmarks. ........ 125
5.12 Nonredundant ISCAS-85 time benchmarks ................ 125
5.13 MCNC Synth89 power/ accuracy benchmarks. ............. 127
5.14 MCNC Synth89 time benchmarks ..................... 128
5.15 Benchmarks not available in SIS. .................... 130

xii

1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8

2.1
2.2
2.3
2.4
2.5
2.6

3.1
3.2
3.3
3.4

4.1
4.2
4.3
4.4
4.5
4.6
4.7

5.1
5.2

LIST OF FIGURES

EDA process. ............................... 4
Design phase. ............................... 6
Short-circuit current. ........................... 8
Capacitive current. ............................ 8
Standby current. ............................. 9
Leakage current ............................... 9
Path balancing. .............................. 14
Technology decomposition ......................... 15
Glitch example. .............................. 25
Reconvergent fanout circuit example. .................. 28
Sequential circuit model .......................... 31
Symbolic sequential circuit model ..................... 36
k-unrolling of next state logic. ...................... 41
m—expanded network with m=2 ...................... 42
BDD of full-adder carry-out bit ...................... 52
BDD with ordering 1 ............................ 53
BDD with ordering 2 ............................ 53
Behavioral VHDL model of FA carry-out bit ............... 55
Schematics of basic logic gates. ..................... 57
Logic diagram for FA carry-out bit. ................... 58
Gate-level BLIF netlist for FA carry-out bit. .............. 59
Structural VHDL model of FA carry-out bit ............... 60
Minimized-scalable BDDs. ........................ 62
Deﬁnition of CBDD. ........................... 66
CBDD of FA carry-out bit ......................... 71
BLAPE methodology diagram. ..................... 79
Behavioral VHDL speciﬁcation for full-adder. ............. 83

xiii

5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
5.17
5.18
5.19
5.20
5.21

6.1
6.2
6.3
6.4
6.5

Full-adder Boolean equations. ...................... 83

Netlist for implicit full-adder structure .................. 85
Logic diagram for implicit full-adder structure .............. 85
Netlist for mapped full-adder structure .................. 87
Logic diagram for mapped full-adder structure. ............ 87
Implicit structural CBDD full-adder. .................. 9O
Mapped structural CBDD full-adder. .................. 91
SPCA structural overview ......................... 94
DAG of FA carry-out bit network ..................... 97
Carry-out bit adjacency list and levelization ............... 97
Expression tree for full-adder carry-out bit ................ 100
BLAPE implicit structural FA activity/ power estimates. ....... 117
BLAPE mapped structural FA activity/ power estimates. ....... 117
SIS implicit structural FA activity/ power estimates ........... 118
SIS mapped structural FA activity/power estimates ........... 118
Power v. k (64-bit adder). ........................ 121
Accuracy v. 1: (64-bit adder). ...................... 122
Time v. k (64—bit adder) .......................... 122
Average accuracy v. It (All benchmarks) ................. 129
Activity color mapping ........................... 133
Bar view. ................................. 134
Level view. ................................ 134
Booth multiplier(4-bit) level view ..................... 136
Booth multiplier(4-bit) bar view. .................... 137

xiv

.——.~'— ~—-

CHAPTER 1

Introduction

The goal of accurately estimating dynamic power at the behavioral level of design
abstraction has recently received increased attention. The demand for accurate
behavioral-level computer-aided design (CAD) tools for power analysis is due to the
IC industry’s large-scale migration to HDL—based hardware design methodologies,
and the emergence of power consumption as a key VLSI design parameter

The estimates generated by many behavioral-level power analysis CAD tools con-
tain inaccuracies stemming from a lack of speciﬁc information which is determined or
speciﬁed at deeper levels (i. e., the circuit- and physical- levels) within the electronic
design automation (EDA) process. Despite these inaccuracies, HDL—based hardware
design methodologies are necessary for efﬁcient hardware design. Currently, the ben-
eﬁts of behavioral-level design outweigh the inaccuracies found in the estimates of
behavioral—level power analysis CAD tools.

The focus of this research is to develop enhanced tools which provide additional
accuracy for behavioral-level design. The research presented in this dissertation out-
lines a new technique which provides improved accuracy of switching activity and
power estimates using behavioral-level design speciﬁcations described in VHDL. Be-
fore discussing the details of this research, a review of the EDA process and the

motivation for this research tOpic is given in the following sections.

1 . 1 Background

1.1.1 Electronic Design Automation Process

Today’s complex circuit designs require computer support for virtually all aspects
of design. Computer-based design automation (DA) tools make the design of very
large or complex circuits feasible. Designs which are too large or complex for manual
design use such processes for improved quality (performance and reliability), reduced
product cost, and shortened design time. Given a speciﬁcation of an abstract object,
a computer-based DA system generates the physical design automatically and veriﬁes
that the design satisﬁes its requirements speciﬁcation. The design process followed by
DA systems can be viewed as a sequence of transformations on the following design

representations: behavioral, structural, and physical, at various levels of abstraction.

1 . 1 . 2 Design Representations

o Behavioral representations - describe a Circuit’s function. Behavior can be de-
scribed in a functional or procedural fashion. Emphasis is placed on what the
design does, not how it is built. Designs are viewed as one or more black boxes
with sets of inputs and outputs, and a set of functions describing the behavior

of each output in terms of the inputs over time.

0 Structural representations - describe the composition of circuits in terms of
cells (abstractions of circuit element deﬁnitions) and components (abstractions
of instances of circuit elements) and the interconnection among these compo-
nents. Structural descriptions include block diagrams, schematic drawings, and

netlists.

0 Physical representations - are characterized by information used to manufac-

ture and fabricate the physical system. They are concerned with binding the

structural design space to silicon. Physical information includes geometric lay-

out data such as transistor location and wire routing.

1.1.3 Design Optimization

When designing complex integrated circuits, removing false design paths is as impor-
tant as executing the correct design procedures. The removal of false design paths is
accomplished by analyzing a number of design variations and choosing the one that
best meets the system’s requirement speciﬁcations. The earlier one optimizes a design
or removes an unsatisfactory design variation, the less effort one exerts in pursuing

an alternative design that does not meet requirements.

1.1.4 Design Phases

The electronic design automation process is a top-down approach to designing large
and complex integrated circuits. The EDA process starts with a system speciﬁcation
and continues through a series of abstraction level transformations, resulting in sepa-
rate design phases (Figure 1.1). Each succeeding level adds more speciﬁc or detailed

design information. The process is brieﬂy described below.

0 Design Speciﬁcation considers the following: 1) application of system, 2) per-
formance requirements, 3) system architecture, 4) external interfaces and pro-

tocols, and 5) manufacturing costs.

0 Behavioral Design is synthesized to meet speciﬁcations. The result is a behav-
ioral representation such as Boolean expressions, differential equations, instruc-
tion set descriptions, algorithms, or ﬂowcharts. Behavioral simulation is the

method of analysis.

Requirements

 

 

 

 

 

Design
Specification
Behavioral s cifilcation
simulation pe
Behavioral
Design
behavioral representation Logic
simulation

 

Logic
Design
structural representation

Circuit
Design

Extraction and
structural representation
Veriﬁcation

Physical
Design

physical representation

I

Circuit
analysis

 

I

 

Fabrication

Figure 1.1. EDA process.

0 Logic Design is concerned with the logic structure that implements the behav-
ioral design. The design representation may be a register-transfer—level (RTL)
description, schematic description, logic diagram, or netlist of gates. For anal-
ysis the representations are simulated at the transistor-, gate-, and register-
transfer- levels. Validation takes place by comparing results from the logic-level

and behavioral-level simulations.

0 Circuit Design is concerned with the electrical laws that govern the detailed
behavior of the basic elements such as transistors, resistors, capacitors, and
inductors. Transistors are sized to meet signal delay requirements. Analysis is

performed using circuit and timing simulations.

0 Physical Design is concerned with the transformation of the structural repre-
sentation from the previous phase into the geometric shapes (and layout cells)
representing details of the fabrication process. Additionally, the placement of

cells and routing are important concerns of the physical design stage.

1.1.5 Steps Within the Design Phase

The process of removing a false design path is governed by the following Operations:

synthesis, analysis, and veriﬁcation, illustrated in Figure 1.2.

0 Synthesis derives a new design representation based on the representation of a

previous stage.

0 Analysis evaluates the correctness of a design representation against its require—

ments.

0 Veriﬁcation provides a formal process for demonstrating the equivalence of two

design representations under speciﬁed conditions.

From upper level

I

Synthesis Accept

 

Reject

 

 

Analysis

 

 

 

From lower level

 

Veriﬁcation Reject—> To upper level

 

I
Accept

i
To upper level

Figure 1.2. Design phase.

1.2 Motivation

The emergence of low power as a critical VLSI design parameter combined with

the popularity and large-scale movement to HDL-based hardware design methodolo-

gies provide the motivation for this research effort. Additionally, the overwhelming

inaccuracies in previous behavioral-level power analysis CAD tools present a signiﬁ-

cant problem, whose elimination will beneﬁt various aspects of the electronic design

automation process. Discussions on low power design and HDL—based design are

presented next.

1.3 Low Power Design

Reducing power consumption has emerged as a critical design parameter for digi-
tal VLSI systems [1]. The trend has been to develop methodologies and techniques
which maintain a Circuit’s throughput and area constraints while achieving some de-
sired level of power efﬁciency. The power efﬁciency revolution was initiated by the
introduction of high-throughput portable electronic devices, such as laptop comput-
ers, portable televisions, camcorders, and wireless communications systems. For these
devices it is desirable to support high-speed computation with complex functionality,
while more efﬁciently using minimal size and minimal weight batteries. Advances in
battery technology have also promoted the design of power efﬁcient circuits. It is
anticipated that battery lifetimes will increase to about 90 - 110 watt-hours/ kilogram
over the next ﬁve years [1]. If low power design techniques are not considered, then
high-throughput portable electronic devices will suffer from short operation times
due to limited battery lifetimes or become burdened by heavier battery packs. Other
factors which motivate the design of power efﬁcient circuits include chip packaging

costs, cooling, and reliability.

1.3.1 Sources of Power Dissipation

Power dissipation in digital CMOS circuits can be classiﬁed into two categories: static
and dynamic dissipation. The static and dynamic power dissipations are determined
by the way in which the individual MOS transistors circulate current [2]. There are
four main currents which are related to power dissipation in CMOS circuits: short
circuit current, capacitive current, leakage current, and standby current.

The short circuit current arises when both N MOS and PMOS transistors are simul-
taneously conducting current from the supply to ground during an input transition.

The capacitive current is present when charging and discharging of a capacitance takes

 

 

 

pl;

(SDI

Ill;

I‘ll:

at).

K

place to switch the output state of the logic device. The short circuit and capacitive
currents are illustrated in Figures 1.3 and 1.4 for a CMOS inverter. Short-circuit
and capacitive currents are known as dynamic currents because they result from the
switching which arises when an input transition takes place. The magnitude of these

currents are on the order of microamperes and milliamperes.

1‘12. vdd

4 _ —«I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1|]l[<_
l

V

Figure 1.3. Short-circuit current. Figure 1.4. Capacitive current.

The standby current represents current which is continuously drawn from the sup-
ply to ground. The leakage current, which is a property of the fabrication technology,
consists of reverse bias current in the parasitic diodes, which are formed by source-
drain diﬂusions and p-well or n-well diﬂusions. The leakage current contributed by
the reverse biased parasitic diodes can be described by the diode equation: i D =
10kg — 1], where 10 is the reverse saturation current, q is the charge of an electron,
V is the voltage across the diode’s pn junction, k is Boltzman’s constant, and T is
the temperature of the device material in degrees Kelvin.

An additional component of leakage current is due to subthreshold conduction
current, which is generated by the inversion charges that are produced for gate volt-

ages below the threshold voltage. The standby and leakage currents, shown in Figures

| I I ,',
I . :: CL ——.l 1 :2 CL

—-—_—-—-—. —

 

 

 

 

 

1.5 and 1.6 for a CMOS inverter, are known as static currents. These currents result
only from applying power to the device. The leakage and standby currents are very

small, on the order of nanoamperes and microamperes, respectively. The leakage and

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

_Vdd_ _Vd2_
J4 : IStandby l
: Vin=0 vout
: : lsth : lr.bias
'_‘l , __I : T:
__ _"'_ ___‘_"
Figure 1.5. Standby current. Figure 1.6. Leakage current.

standby currents are responsible for the static power dissipation component in digital
CMOS circuits. The static power dissipation quantity is the result of summing the
leakage and standby currents to form a static current, and multiplying this current

by the supply voltage (Kid). The equations are

IStatic = ILeakage + IStandy

PStatic : IStatic‘fdd-

The dynamic power dissipation due to short circuit current is given by Equation
1.1, where 0 represents the switching activity factor or the probability that a power-
consuming transition occurs, Q50 is deﬁned as the amount of charge transferred per
transition, fCLK is the clock frequency, and Vdd is the supply voltage. The short

circuit power dissipation contributes about 10 percent to the total power dissipation.

y—._...._r-... . n“..—

RS'hort = aQSCfCLKVdd- (1-1)

The dynamic power dissipation due to charging/ discharging of node capacitances
is referred to as switching activity power or switched-capacitance power. This power
component is given by Equation 1.2, where CL represents the node capacitance, and
the other components are as deﬁned previously. The switched-capacitive power dis-

sipation contributes about 90 percent to the total power dissipation.

1 .
PSwitch = §aCLfCLKVd2d (1-2)

The switched-capacitance power is much larger than short-circuit power dissipa-
tion due to the low rate of occurence in which both NMOS and PMOS transistors of
the CMOS inverter are simultaneously on. The dynamic power dissipation quantity
is the result of summing the short-circuit and switched-capacitive power dissipations,
given by

PDynamic : PShort + PSwitch- (13)

When all of the power dissipation components are combined, the total power
dissipation is

PTotal : PDynamic "I" PStatic (14)

01‘

1
PTotal = iaCLfCLKVdii + aQschLKVdd + [Staticvdd- (1-5)

The most dominant component is the switched capacitance power dissipation,
Which accounts for 90 percent of the power dissipation in CMOS circuits [3]. Most of
the research in the area of low power design is concerned with reducing the switched

capacitance power dissipation component because of its dominance.

10

1.3.2 Low Power Optimization Techniques

The majority of low power circuit research targets the dynamic power component as
a means of reducing power. The switched-capacitance power, which is the power con-
sumed in CMOS circuits caused by switching currents, is the primary focus because it
is responsible for 90 percent of the total power dissipation. The switched-capacitance
power for a single gate is given by Equation 1.2. The supply voltage (Vdd), physical
capacitance (CL), and switching activity factor (a) are the parameters most targeted
for optimization within the switched-capacitance power component. Other methods
for switched-capacitance power minimization will be addressed in this discussion. Re-
ducing the supply voltage is most attractive because of its quadratic relationship to
power. A factor of two decrease in the supply voltage will yield a factor of four
decrease in the switched-capacitance power. However, as supply voltage is lowered,
circuit delays increase leading to reduced system performance. For Vdd > Vt, delays
increase linearly with decreasing voltage. This is expressed by

Vdd

Dela or ——,
y (Vdd - V02

where V, is the threshold voltage, or the voltage level at which the transistor conducts.

1.3.2.1 Supply Voltage Reduction

One approach taken to scale or reduce supply voltage without sacriﬁcing throughput
was reported in [1]. In this approach the threshold voltage (Vt) is reduced, allowing
the supply voltage to be scaled down without loss of speed. Reducing both supply and
threshold voltages by some small amount maintains the operational behavior of the
MOS transistors, without decreases in device currents. The limitations in lowering
the threshold voltage are due to the requirement to retain adequate noise margins

and control of increased subthreshold leakage currents. In [4], a study was conducted

11

on the effect of reducing the supply voltage for a variety of different logic circuits,
containing from 56 to 44,000 transistors. The results of the study indicated that a
speed penalty was incurred for all circuits and that delays drastically increased as the
supply voltage approached the sum of the threshold voltages of the devices. Shen et
al. [5] suggested a method of designing lower power circuits, where the initial task is
to build the circuit to be fast as possible, regardless of the area and power. Finally,

the supply voltage is decreased to lower power dissipation.

1.3.2.2 Physical Capacitance Reduction

A technique used for minimizing switched-capacitance power dissipation involves low-
ering or reducing the circuit’s physical capacitance. Power dissipation is dependent
upon the physical capacitance associated with each of the individual gates in the
circuit. Methods for reducing physical capacitance include using less logic, smaller
transistor sizing, and shorter wirelength. Techniques for optimizing logic include re-
source sharing and improved logic equation minimization methods. The reduction or
scaling of transistor size reduces physical capacitances, but also reduces the current
drive capability of the transistor, resulting in slower circuits [3]. Lowering physical
capacitance by means of using shorter wiring can be achieved by improved placement
and routing strategies, which attempt to optimally locate and position logic blocks
such that minimal wiring is used for block interconnections. Optimal wiring strate-
gies result in reducing physical capacitance and PR power losses. Improvements in
the placement and routing areas are discussed in [6], where a new implementation
of the simulated annealing algorithm is presented. The new algorithm makes use
of an improved cost function which includes an enhanced overlap penalty function,
combined with the inclusion of a timing path penalty function. These improvements
allow for the optimized placement and routing of rectilinearly shaped macro cells and

have produced modest reductions in chip area and total wire length.

12

1.3.2.3 Switching Activity Reduction

Minimizing the switching activity factor is another means for optimizing power dissi—
pation in CMOS circuits. The switching activity is dependent upon the logic function
implemented and the primary input signals to the circuit. The switching activity a
at node a: is deﬁned as the fraction of time the node performs a transition within a
clock period. Switching activity may be reduced at the algorithmic and architecture
levels.

One method for reducing switching activity involves optimizing the number repre-
sentation [7]. For certain signal processing applications, a change from the two’s com-
plement number representation to the sign-magnitude number representation gave
modest improvements. It was found that the two’s complement representation was
subject to higher switching activity when the sign of input signal values changed.
This is due to the fact that sign extension causes many of the most signiﬁcant bits
(MSB’s) to toggle. In the case of the sign magnitude representation, only a single
bit toggles when the input signal changes sign, resulting in a reduction of switching
activity for some of the MSB’s.

A method to reduce the switching activity factor by means of path balancing
is illustrated in Figure 1.7. Path balancing reduces glitches or spurious transition
activity in combinational logic circuits. Glitching is reduced if paths in the circuit
that converge at certain gates all have roughly equal lengths (delays) [5]. Balancing
path delays leads to nearly simultaneous switching on the input signals to a gate, and
thus eliminates possible hazards at the output of the gate. This equalization of path
delays in a circuit results in a reduction of spurious gate output transitions which
reduces switching activity, thus lowering power dissipation. Path balancing can take
place before technology mapping by selective collapsing and logic decomposition or

after technology mapping by delay insertion and pin reordering.

13

 

 

a a
b b
___—>
C
C
d d

Figure 1.7. Path balancing.

A switching activity reduction method has been proposed for ﬁnite state machines
(F SM’s), where an encoding of states was the focus of attention [8]. This method uses
a hypercube embedding technique and generates state encodings such that the sum
of bit toggles between each pair of states multiplied by the encoding affinity between
states is minimized. By reducing the number of bit toggles in each state transition,
the switching activity in the combinational logic which determines the next state and

output functions is reduced.

1.3.2.4 Structural Optimization

Techniques which consider a design’s structure and component interconnection also
offer reductions in switching activity. Various combinations and arrangements of spe-
ciﬁc logic gates and subcircuits lead to improved power reduction for certain designs.

Sobelman et al. [9] proposed a power dissipation reduction technique for multiplier
circuits. This method makes use of a self-timed evaluate signal, such that each carry-
save or carry-propagate adder within the array triggers only after all of its inputs
have stabilized. This technique avoids spurious switching of internal nodes so that
the average power dissipation is minimized. Moshnyaga et al. [10] proposed another
multiplier power dissipation improvement. The goal of this method is to lower the
switching activity per operation by reducing the number of active elements in the
adding array. This method incorporates the use of 4-2 compressors which lower the
number of propagation stages in the adding array. The 4-2 compressors have ﬁve

inputs and three outputs which can compress four partial products into two. The

14

compressor has the same logic function as that of a carry-save adder constructed by
two serial full adders, but uses fewer transistors and has 25% less propagation delay.

Tsui et al. [11] discuss a technology decomposition technique which converts a
set of Boolean equations, or network, to another network consisting of only AND
and INVERTER gates. This technique minimizes the total switching activity in the
ﬁnal two-input AND tree using a zero—delay model. The goal of the procedure is
to apply high-activity inputs into the tree composition at the latest possible stage.
Consider Figure 1.8 for example, where P(zr,) is the signal probability of primary
input or internal signal and E,,,,(g) represents the switching activity of a gate. In
this technology decomposition example, the highest activity signal (d) is applied last
in the tree decomposition of conﬁguration A, thus yielding a lower switching activity
than the equivalent logic in configuration B. The placement of higher activity signals
in later stages of the tree decomposition limits the switching activity experienced by

internal gates and lowers the number of transitions taking place at the gate’s output.
P(a) = 0.3
P(b) = 0.4

P(c) = 0.5
P(d) = 0.7

C

 

 

d

 

Conﬁguration A: Conﬁguration B:
Esw(g) = P(ab) + P(abc) + P(abcd) E“,(g) = P(ab) + P(cd) + P(abcd)
= 0.222 = 0.512

Figure 1.8. Technology decomposition.

15

1.3.2.5 System and Architectural Level Optimizations

Other techniques for power dissipation minimization which have received attention
are power-down methods, precomputation, don’t care optimization, low-power software
optimization, and adiabatic-switching. In the case of the power-down approach, blocks
of logic not involved in the present computation are automatically turned off to save
power. Methods for detecting and disabling unused blocks as well as scheduling
algorithms which maximize the “shut-down” period of execution units are discussed in
[1, 12]. When certain conditions are satisﬁed, modules are disabled, thus eliminating
any switching activity and power dissipation. In the precomputation technique the
idea is to selectively precompute the output logic values of the circuits one clock
cycle before they are required, and then use the precomputed values to reduce internal
switching activity in succeeding cycles [13]. The technique for don’t care optimization
reported in [14] explores the Boolean space in an effort to identify minterms highly
appropriate for inﬂuencing switching activity. A partitioning of the don’t care set into
regions strongly and weakly inﬂuential upon switching activity is performed. This
variance is exploited to bias area optimization towards reduced power dissipation.
Methods for the measurement of power dissipation for the software component of
an embedded system have received some attention. For some time, application speciﬁc
software running on dedicated microprocessor or microcontroller-based systems have
been optimized for size and speed. In [15], an approach for estimating the power
cost of embedded software is presented. This experimental approach involves the
measurement of the amount of current drawn by the microprocessor or microcontroller
when instructions are executed. The energy cost of the instruction is the observed
average current value multiplied by the number of cycles taken by the instruction.
By reordering several sequences of instructions, the average current for the execution

Of a program was reduced. The goal of this method is to assign energy costs to

16

 

each instruction and to generate power—efﬁcient software by selecting and ordering
instructions such that the overall program energy cost is minimized.

Athas et al. [16] explore a technique for low-power CMOS design by means of
constructing combinational and sequential adiabatic switching logic circuits. The
concept behind adiabatic switching logic is to recycle signal energies stored in circuit

capacitances instead of allowing the energies to dissipate in the form of heat.

1.4 HDL-Based Design

Many IC manufacturers are changing their hardware design methodologies to an HDL-
based design approach. The primary beneﬁt of the HDL-based design approach is
higher productivity [17]. Increased efﬁciency is achieved when using an HDL by the
use of available cores (e. g.,UARTs, bus interfaces, microcontrollers) to incorporate
functions into the design. Another example of HDL design efﬁciency is exhibited in
its ability to facilitate reuse of already designed components.

HDLs promote hierarchical design allowing one to build more ﬂexibility into lower
level building blocks. The ability to deﬁne components at the structural-level or
behavioral-level of abstraction is a key feature of an HDL. This results in shorter
product design completion times.

Additionally, designers have less difﬁculty re-targeting HDL-based designs to dif-
ferent programmable-logic families than with schematic-based designs. This advan-
tage allows a developer to quickly compare the device cost, performance, and other
factors with product-lines of various vendors.

To summarize, HDL-based hardware design methodologies promote design reuse,
Component deﬁnition, design ﬂexibility, and shortened design time. These advantages

translate into higher productivity and reduced cost.

17

1.5 Problem Statement

The research of this dissertation addresses the power dissipation problem. The objec-
tive is to ﬁnd a more accurate and low-cost method of computing switching activity
and power dissipation for behavioral-level designs described in VHDL. The main goal
is to achieve an accurate or reasonably approximate estimate of switching activity
with minimal computational complexity and memory resources. Additionally, the re-
search is concerned with identifying and locating high activity circuit nodes through
the development and implementation of new activity visualization tools.

The use of the proposed estimator and visualization tool will assist circuit design-
ers in the development of power-efﬁcient designs. IC designs will greatly beneﬁt from
the use of the proposed techniques in terms of design ﬂexibility, time, cost, and reuse.

The research is centered on the use of VHDL because of its widespread use in
industry and academic research environments for development, simulation and test-
ing capability of high-level and implementation-free designs. VHDL also supports
behavioral and structural level design. It allows the development of digital systems
based on functional descriptions or component interconnections.

A majority of switching activity estimation techniques target gate- and circuit-
level design descriptions. Very few techniques exist for accurately computing switch-
ing activity at higher levels of design abstraction. On average most high-level switch-
ing activity and power estimates contain about 12% error [18]. The goal of this
research is to improve the accuracy of high-level switching activity and power esti-
mators and to be competitive in terms of accuracy, CPU time and memory resources

with respect to gate— and circuit- level estimators.

18

1.6 Dissertation Overview

Chapter 2 describes the switching activity estimation problem. It provides back-
ground and fundamental information detailing the factors that complicate the accu-
rate calculation of switching activity for CMOS circuits. Additionally, a survey of
current and past switching activity estimation techniques, along with a description
of their advantages and disadvantages, is given.

Chapter 3 provides a discussion of the topic of behavioral-level design. Speciﬁcally,
this chapter addresses some implementation-free or behavioral methods for represent-
ing switching functions. Methods such as truth tables, Boolean equations, Binary
Decision Diagrams (BDDs), and VHDL behavioral speciﬁcations are described.

In Chapter 4, the concept of structural methods for representing switching func-
tions is reviewed. The most common structural representation, the gate-level netlist
is illustrated, along with an overview of VHDL structural speciﬁcations. Additionally,
an introduction and overview of the Connective Binary Decision Diagram (CBDD) is
presented.

In Chapter 5, an overview of the new methodology for accurately estimating the
switching activity of behavioral-level designs described in VHDL is given. The speciﬁc
assumptions and constraints that make the new technique a success are highlighted.
Additionally, the algorithms and the intermediate transformations are discussed. To
demonstrate the effectiveness of the new approach, results and benchmark compar-
isons with other power analysis CAD tools are given.

Chapter 6 presents a new visualization tool that identiﬁes and highlights the
power-hungry areas of the circuit design. A description of the tool’s input transfor-
mation process as well as an overview of the activity views is given. A set of examples

are used to highlight the usefulness of the new tools.

Chapter 7 contains conclusions and a discussion of future enhancements that could

19

improve the behavioral-level activity and power estimator methodlogy and supporting
algorithms.

Appendix A contains deﬁnitions and theorems used throughout the dissertation.
Finally, Appendix B contains a walk-through of the BLAPE algorithm when applied

to a 4-bit Booth multiplier, described in VHDL.

2O

 

 

 

III M

’Ir’h'li

.

Nina

(J
H
_1

SW] I f.’

(Nth

CHAPTER 2

Switching Activity Estimation

The switching activity factor, a, is a key component of the switched-capacitance
power dissipation model, described by Equation 1.2. Dynamic power is dissipated
whenever any internal switching occurs. Thus, switching activity can be determined
by evaluating and summing the 0 ——> 1 and 1 —> 0 transition probabilities at the

speciﬁed node.

2.1 Calculation of Switching Activity

Switching Activity, denoted E,w(:r) or (1(a), is deﬁned as the probability that the logic
signal at node x experiences a change in its logic state. The most common method
of calculating switching activity involves the use of signal probability, P,(:r), which
is the probability that the signal at node a: is equal to logic 1. The evaluation of
signal probabilities for inputs and simple Boolean expressions is described in Table
2.1. Upon determining a node’s signal probability, P,(:r), the calculation of the node’s
switching activity is given by Equation 2.1. A derivation of the switching activity

calculation model is provided in Appendix A.
Esw(I) = 2 - (1 - P.(IE)) - P303) (2-1)

21

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Function Signal Probability Assumptions
A P304) --

Z 1 -— P,(A) ——

A - B P,(A) . P,(B) A Indep B
A+B P3(A)+P,(B) —P,(A-B) AIndepB
2:15:32 - - 0113,, {"11 P,(:r,-) All :r,- Indep
x1+ x2 + .. + r" 1— 'I'I1(1 — P,(a:,-)) All :5,- Indep

 

 

Table 2.1. Boolean expression signal probabilities.

The calculation of switching activity is difﬁcult because it depends on circuit input
streams, various circuit parameters, and technology-dependent factors which may be
unavailable or difﬁcult to characterize. Some of these factors include input pattern
dependence, glitch activity, delay model, logic function, and circuit structure. The
following sections discuss details of switching activity computations which are based
on statistical or probabilistic concepts. Appendix A contains deﬁnitions of terms and

concepts used in the following sections.

2.1.1 Input Pattern Dependence

The switching activity, E,w(g), at gate output g, depends on the input signal prob-
abilities, the gate’s logic function, as well as spatial and temporal input pattern
dependencies. For example, consider a two-input NAN D gate with independent in-
puts (171,172), such that their signal probabilities are each %. From Table 2.2, it is
apparent that 6 out of 16 input patterns result in a gate output transition. There-
%. When spatial correlation conditions are applied to (271,232) such
that (0,0) and (1, 1) are the only input patterns, the NAND gate output switches

twice out of the possible four input arrangements and therefore E,w(g) = % (Table

22

2.3). Given that a temporal correlation governs the inputs, where each 0 applied to
input 231 is followed by a 1, while each 1 applied to input x2 is followed by a O, 4
output transitions out of 9 input patterns take place, with E,w(g) = 3 (Table 2.4).
If spatial-temporal correlation conditions are governing the input dependence, such

that $2 changes exactly when 271 changes, then Esw(g) = -} (Table 2.5).

 

 

 

 

 

1:1 2:2 NAND
0—>0 0-—>0 1—>1
0—>0 0—->1 1——>1
0—>1 0—>0 1—>1
0—+10-+1 1-—)0
0—>01—>0 1—>1
0—>01-—>1 1—>1
0—>11—>0 1—91
0—+11—>1 1—+0
1—+0 0——>0 1—>1
1—>0 0—+1 1-—>1
1—+10—+0 1—+1
1—>1 0—>1 1—+O
1—>01—+0 0—>1
1—>0 1-—>1 O—>1
1—+11—+0 0—+1
1—>11—>1 0—>0

 

 

 

 

 

Table 2.2. Eﬂ'ects of uncorrelated inputs.

 

 

 

 

 

x1 2:2 NAND
0—+0 O—>0 1—>1
0—>1 0—>1 1—>O
1—>0 1——>0 0—+1
1—>1 1——>1 0—>0

 

 

 

 

 

23

Table 2.3. Effects of spatially correlated inputs.

 

 

 

 

 

 

 

 

 

1171 £122 NAND
0—+10—>0 1—+1
O—>10—+1 1—>0
0—>11—->0 1—>1
1—>0 0——>0 1—>1
1—90 0—+1 1—>1
1—>01—>0 0—>1
1—>10—>0 1—>1
1—>10—>1 1—>O
1—>11—>0 0—>1

 

 

 

 

 

Table 2.4. Effects of temporal input correlations.

 

 

 

 

 

 

 

$1 $2 NAND
0—+O O—+O 1—>1
0—>O 1—+1 1—>l
O—+1 0——>1 1—>O
0—>1 1—+0 1—>1
1—>0 1-—>0 0—>1
1—+0 0—+1 1—>1
1—>1 0—>0 1—+1
1—>1 1-—>1 0—+0

 

 

 

 

 

 

 

Table 2.5. Effects of spatio—temporal input correlations.

2.1 .2 Glitch Activity

Spurious transitions at any node outputs represent an additional component of the
switching activity. The spurious transitions, known as glitches or glitch activity, are
1mWanted transitions that occur before a node settles to its ﬁnal steady-state value.
The Simple circuit and waveform shown in Figure 2.1 demonstrate the effects of glitch

activity. The output y of the circuit should always remain at logic one, but due to

24

 

w
—.

It: ’

 

may
[Wit
act-m
SUI‘h

due i

internal delays injected by the inverter (denoted by signal are) a transient drop to
logic zero occurs at the output. Glitches are apparent at the architectural level in
static designs due to ﬁnite propagation delays from one logic block to the next block
resulting in a node having multiple transitions in a single clock cycle before settling

to the correct logic level [4]. Each of the spurious transitions consumes power and

yiiwiw

 

 

 

 

 

0123456789

Figure 2.1. Glitch example.

may account for as much as 10 to 40 percent of the switching activity power loss in
typical combinational logic circuits [3]. It was reported by Shen et al. [5] that glitches
accounted for 20 percent of the power dissipation over a range of circuits, and circuits
such as combinational adders were subject to a 70 percent or more power dissipation

due to glitch activity.

2.1 .3 Delay Model

The delay model used for switching activity estimation is an important factor. Models
web as zero delay and real delay make different assumptions concerning the propaga-
tion 0f the signals throughout the circuit. In the zero delay model it is assumed that

all Changes at the circuit inputs reach the internal gates of the circuit instantaneously,

25

which allows for a glitch-free representation of the circuit. In the case of the real delay
model, each gate in the circuit is given a delay. This in turn, may cause internal and
output nodes to experience multiple transitions during a single input transition. In
many circuits, glitch activity accounts for a large percentage of the switching activity,
thus making the calculation of the switching activity more difﬁcult. The difﬁculties

arise in the determination of the glitch locations.

2.1.4 Logic Function

The logic function implemented by a circuit directly inﬂuences the switching activity.
The logic function of a gate determines the probability that the present value of the
gate will differ from the previous value. For example, the truth table (Table 2.6)
provides the logic input / output conditions for two-input AND, OR, and XOR gates.
To mimic the logic in this truth table from a probability perspective, the signal
probability table (Table 2.7) uses p to denote the probability that an input signal
equals logic 1 or P(x, = 1), and uses g, which is 1 — p, to denote the probability
that an input signal equals logic 0 or P(x, = 0). The signal probability for the logic
gate outputs are computed by summing the probability permutation entries, which
indicate a logic 1 at the gate’s output of the truth table for the corresponding input
combination. The resulting switching activities for each two-input gate (Table 2.8)

are computed by applying Equation 2.1 to the signal probabilities.

.732 AND OR XOR
O O O

O

1 O l 1
O 0 1 1
1 1 1 O

 

Table 2.6. Two-input AND, OR, XOR truth table.

26

 

 

 

 

 

 

 

 

 

P,(;I:1) P,(:1:2) P,(AND) P,(0R) P,(XOR)
q q (12 <12 «172
q p 41) (qr) (qr)
:9 q M (pq) (pq)
p p (1)?) (1)2) :22

 

 

 

 

 

 

 

Table 2.7. Two-input AND, OR, XOR signal probability table.

 

 

 

 

 

 

 

g Ps(g) Esw(g)
AND 102 2192(1 - 102)
OR 1 - (12 MO - (12)
XOR 2m 4(1 - 2mm

 

 

 

 

 

 

Table 2.8. Switching activities for two—input AND, OR, XOR gates.

2.1.5 Circuit Structure

The circuit structure may also cause difﬁculties in computing the switching activity
when reconvergent fanout nodes are considered. This problem is more challenging
because the internal signals may be correlated, potentially requiring a large amount
of computational effort and memory usage. Some power estimation techniques ignore
these fanout correlations. Approximations may be used to improve accuracy while
shortening the execution time of the simulation.

The simple circuit depicted in Figure 2.2 contains reconvergent fanout. The gate
outputs, M1 and M2, are correlated due to their common dependence on input 232. The
correlation results in a signal probability overestimate (Equation 2.4) at output F,
when assuming M1 and M2 are independent. This common inaccuracy is attributed

to a statistical inequivalence of two Boolean expression forms (Equations 2.2 and 2.3)

27

 

x1 ,
. Ml
WE. :D—F
‘ , M2
x3

Figure 2.2. Reconvergent fanout circuit example.

 

 

which equivalently represent the output F. The correct signal probability (Equation
2.5) for node F is obtained when minimizing node F’ s Boolean expression to Equation
2.3, followed by applying the OR-signal probability calculation given in Table 2.1. In
the reconvergent fanout example below, all input signal probabilities are assumed to

be 133(11): %

f1 = M1 + [”2
= (2:1 + 2:2) +(1r2 + 1:3) (2.2)
f2 = (331 + $2 + 1’33) (2-3)

Ps(f1) = PSUIII'i—MQ)
= 1- (1 — P.(M1))(1— P.(M2))
z 1- (1 — as. + x2))(1— P3022 + $3»

= I; (Overestimate) (2.4)

P8(f2) = 1— (1 _ P3(.’171))(1— Ps($2))(1— P3033»

= g (Correct) (2-5)

Esw(f2) : 2'Ps(f2)‘(l-P8(f2))
7 1
= 2°§'§
14
251
= 0.219

28

 

 

 

 

 

2.2 Previous Work

Switching activity estimation techniques have usually been classiﬁed as either sta—
tistical or probabilistic, and in most cases are used at the circuit- or gate- level of
abstraction. In statistical methods, traditional models are used to simulate the cir-
cuit for a set of randomly chosen input vectors while monitoring the switching activity
on each circuit node. The input vectors are generated from user-speciﬁed probability
information. This approach uses statistical mean estimation techniques, such as the
Monte Carlo procedure, to determine when to terminate the simulation to obtain a
certain user-speciﬁed accuracy and conﬁdence level.

In the probabilistic approach, a stochastic model describing the input signals along
with special library models for gates are used. The signal probabilities of the Circuit’s
Primary inputs are propagated into the circuit to promote switching activity at all
internal and output nodes.

Other methods of switching activity estimation are performed at a higher level of
abStraction. In general, these methods involve the transformation of a hardware de-
scriPtion language (HDL) behavioral speciﬁcations into register-transfer-level (RTL)
architectures followed by a simulation process to determine switching activity esti-
mates.

The following sections discuss previous work in the area of switching activity es-
tirnation for both statistical, probabilistic and high-level methods. The discussion
Su“llllarizes techniques which support combinational and sequential circuits, where
fa‘CtOI‘s such as delay model, circuit structure, glitch activity, and input pattern de-

Pendence are considered.

29

2.2.1 Statistical Methods

Burch et al. [19] proposed a Monte Carlo simulation approach for the estimation of
power dissipation in combinational circuits which alleviates the problem of pattern
dependence by properly choosing the input vectors. The approach applies randomly
generated input patterns to the circuit while monitoring the power dissipation for T
clock cycles. Each measurement gives a power sample which is treated as a random
variable. As the sample size T approaches inﬁnity, the sample distribution approaches
a normal distribution in accordance with the central limit theorem. Sample sizes
greater than 30 ensure normal density for most combinational circuits. To establish
a stopping criterion, the normality assumption is required, where total power (PT)
is normally distributed for any T. N different simulations of the circuit of length T
are performed, producing the sample average (771*) and the sample standard deviation
(3T) , for N different PT values. According to [20], there is a (1 — a) at 100% conﬁdence
that [171» — E [PT][ < t% - 5%, where tg. is obtained from the t-distribution with (N — 1)
degrees of freedom. As a result, for the desired percentage error e in the power
eStiInate and for a given conﬁdence level (1 — a), the circuit must be simulated for N

lterations, where N is given as

 

N = (ti 'ST)2. (2.6)
am

The Monte Carlo simulation method may not converge for circuits which do not
have normal power distributions. Non-convergence may also occur when T is too
Small. Moreover, the Monte Carlo method does not support sequential circuits since
it must wait for a setup time TMAX , where TMAX iS the longest delay along any
path. For sequential circuits TMAX z 00. The setup time is the time required before
the beginning of a sample interval to guarantee stationarity of the circuit’s internal

tramsition process. A process is called stationary if its distribution functions or certain

30

expected values are invariant with respect to time [21]. When this technique was

applied to ISCAS-85 benchmark circuits, the maximum error for a 5% accuracy and

99% conﬁdence level was greater than 5% for only 1% of the cases [19].

Stamoulis proposed a Monte Carlo approach for estimating switching activity

in sequential circuits based on the analysis of paths in the state transition graph

(STG) [22]. The simulation is performed at the gate-level and obtains an accuracy

within 10% of the actual switching probability value for a particular node with a 95%

conﬁdence level. Using the circuit described by Figure 2.3 the approach focuses on

accurately determining the switching probabilities at the ﬂip-ﬂop output nodes.

 

 

Primary
Inputs
»
._p
Present
States

Figure 2.3. Sequential circuit model.

 

 

 

 

 

 

 

Primary
Outputs
Combinational
Logic
Next
Latches 4—— States

 

 

 

 

Clock

 

The following assumptions are made: 1) the circuit can be in any of the states

with equal probability upon power up, 2) the latch outputs are glitch-free, and 3)

all latches reach steady state before the next state enters the combinational logic.

31

These assumptions allow for the estimation of the power dissipation of the latches
separately from the rest of the circuit (i.e., the combinational part), and constrain the
number of latch output transitions to at most one per clock cycle. By the deﬁnition
of initializable circuits, the stochastic processes which describe the state of the ﬂip—
ﬂops in the time domain are stationary. States that are N clock ticks apart become
independent as N —-) 00. This ensures that the node processes are also mean ergodic.
The mean ergodicity property refers to a stationary random process X (t), where the
ensemble averages [umhw satisfy 1) ﬁEEOEU/‘xlTl = pa, and 2) #51; Var{[ux]T} = O
[21]. By using the “path” notion in the STG, long time-consuming simulations are
avoided. Pathwise averages are computed using sample simulations with different ini-
tial conditions. The estimation process is performed in two steps. First, the pathwise
transition probabilities are estimated using Monte Carlo simulation to determine the
minimum number of path samples which will contribute to an error s with conﬁdence
(1— 01):: 100%. Second, the average switching probability estimation is computed over
all paths, with the switching probability estimate of each path being one measure-
ment. The results of this technique when applied to ISCAS-89 benchmark circuits
indicate switching probabilities can be estimated with 5% accuracy at a 95% conﬁ-
dence level [22].

Xakellis et al. [23] proposed a switching activity estimation approach which efﬁ-
ciently estimates the transition density at all circuit nodes. This technique improves
upon the approach proposed in [19] by eliminating the statistical sampling at single
gates, which requires a large number of input patterns for convergence. The conver-
gence problem is overcome by classifying nodes into low-density and regular-density
categories and applying absolute error bounds on low-density nodes instead of per-
centage error bounds. This is done by establishing a threshold to classify low- and
regular-density nodes. A node with a transition density value less than the threshold

is a low-density node; nodes with a transition density equal to or above the thresh-

32

 

 

 

 

 

 

 

 

 

in:-

l";

old are regular density nodes. The advantage of this method is that it allows the
desired accuracy error bounds to be speciﬁed by the user. The user also supplies
the transition density for every input node, which is the fraction of time the circuit
input signal is high. If the circuit input probability is not speciﬁed, it is assigned a
default value of %. Next, a random number generator is used to generate the logic
input waveforms which drive the simulator. For a given period T, the number of
transitions at each node is counted. This process is repeated N times to form an
average transition density at each node. This approach, just as in [19], requires a
normal distribution of the node densities to establish a stopping criterion with con-
ﬁdence (1 -—- oz) :1: 100%. Additionally, for synchronous mode simulation, the input
signals are assumed to be Markov. The speed of the algorithm is strongly affected by
the user-speciﬁed threshold which will determine the stopping criterion for the two
categories of nodes. Convergence speeds are improved while sacriﬁcing accuracy only
at low-density nodes. When tested on a variety of ISCAS-85 benchmark circuits, it
was found that over 95% of regular node transition density values have less than 5%
error. Low density nodes performed well; over 95% of the low-density node transition
values were found to be less than the speciﬁed absolute error [23].

Another Monte Carlo simulation technique was proposed by Najm et al. [24] in
which a method for estimating the switching activity at the latch outputs of sequential
circuits is discussed. Similar to [23], this approach makes use of up—front user-speciﬁed
accuracy information. The algorithm runs until the speciﬁed accuracy is achieved.
This technique applies a number of randomly generated input vectors to the circuit
and collects statistics at the latch output using zero-delay logic simulation. When
computing the state line probabilities (signal probability of latch outputs), the Monte
Carlo approach is used for estimating N, the number of iterations necessary to achieve
the user-speciﬁed error-tolerance e and conﬁdence level oz [23, 25]. The approach

makes two assumptions: 1) the sequential circuit is a non-decomposable FSM, and

33

2) the state of the machine at time K becomes independent of its initial state as
K —) oo. Assumption 2 implies that the F SM is aperiodic in that it does not cycle
through a repetitive pattern of states. The method requires that two simulation runs
be performed, with each run starting in a different initial state, say (X0,X1). The
signal probabilities PK(z,-|Xo) and PK(:1:,-|X1) are computed for increasing values of
K. Both simulation run estimates should converge to P(r), the signal probability
of the input signal. When both measures remain within a window of is for three
consecutive time instants, the node is assumed to have reached convergence. When
all nodes have converged, the simulation is complete and the average of the last
PK(:r,-|X0) and PK(a:,-|X1) value is reported as the signal probability P(zi), for each
:ri. An important feature of this technique is that no assumptions about the FSM
behavior (Markov or otherwise) or state line independence is made. Results for this
approach on sequential circuits with 1452 ﬂip—ﬂops required 4.6 hours of simulation

time (SUN SparclO) to achieve an error tolerance of 5% with conﬁdence 95% [24].

2.2.2 Probabilistic Methods

Ghosh et al. [26] proposed a probabilistic method for the estimation of average switch-
ing activity in combinational and sequential circuits. The method considers temporal
correlation at the internal nodes and outputs, but requires that the primary inputs
be uncorrelated. The method automatically computes switching rates and correlation
among ﬂip-ﬂop outputs of FSM’s. The behavior of the primary inputs is described
in terms of their transition probabilities. The general delay model is used to cor-
rectly compute Boolean conditions by compensating for glitch activity. For the signal
at, the transition probabilities are p2, which denotes the probability that the sig-
nal a: will experience a transition from state i to j. The probability of the signal
a: changing from 1 to O is p],0 = —,1,7$(k)§(k—+—1—) The signal probability of node a:

in terms of transition probabilities is P,(:1:) = P; = P310 + P3”. These signal prob-

34

abilities are propagated from the inputs to internal and output nodes by summing
the probabilities which disjointly cover the Boolean function in terms of its primary
inputs. For example, if the function g = 3:372 + Eyz + 5372, then P(g 2 1) or
P9 = p($)p(y)p(z) + p(E)p(y)p(z) + p(f)p(y)p(§). The transition probabilities are
propagated from the inputs to internal and output nodes by ﬁnding the signal prob-
ability of the XOR of the Boolean function of each node in two consecutive time
frames. For node y, P,(y) = P,(y(t) ®y(t + 1)). For a two-input AND gate, where

y = 271 AND $2, the following equations apply:

PAS!) = P3[$1(t)$2(t)$$1(t+1)$2(t+1)] = Pslflyll
fl?!) = Cl71(t+1)1131(l)-’152(15)‘l'

and

Pslf (31)] = vii 19.1.2 + pig mil + p211 191-2 + pit 19312-

Binary decision diagrams (BDDs) are used for the calculation of the signal probabil-
ities for each of the Boolean functions [27]. This is more simplistic than the enumer-
ation of disjoint covers for each Boolean function. The exact signal probabilities can
be computed by performing a linear traversal of the BDD representation of a logic
function [28, 29]. To account for gate delays a symbolic simulation method is used to
generate a multiple-output function that represents the total switching activity over
any possible input vector pair.

The method proposed by Ghosh et al. [26] may also be applied to F SM’s. This

technique accounts for correlation by transforming the conventional FSM structure

35

 

 

 

 

 

 

 

 

 

 

>I.

def
SUM

an.

no}

the I

 

('01,,

DIUI‘

 

(Figure 2.3) to a new structure (Figure 2.4) containing symbolic simulation equations
which represent internal and output nodes in the next state logic. For the present
state, the gate output’s switching activity can only be determined by primary inputs.
To compute the transition probabilities, the static probabilities for the present state
are used. The next state logic generates Boolean equations which model correlation
between the present and next states, thus computing the transition probabilities auto-
matically, which considers the correlation between transitions. This method assumes
that present state lines are uncorrelated and that upon power-up the FSM can be in

any of the 2N states, where N is number of ﬂip—ﬂops.

 

 

 

 

 

 

 

 

 

I(t+l)
a,
b . Transition
[(0 Ssgymfmhc Probabilities
___.__.p Next State p(t+1) 1muat1on i———>
P(t) Logic ‘—’ Equations
a

 

 

 

Figure 2.4. Symbolic sequential circuit model.

In [29] a new measure of activity, the transition density is proposed. The transition
density may be deﬁned as the average switching rate at a node. This method uses a
stochastic model of the logic signals in which the density values of the primary inputs
are prOpagated to internal and output nodes. The transition density at a: is deﬁned
as D(:r) = 11:“; 3%3, where nx(T) is the number of transitions of the signal x(t) at
node :1: in the interval (—§, g]. The propagation algorithm involves a single pass over
the circuit and computes the transition density at all nodes. The primary inputs are

considered to be spatially independent and strict sense stationary (SSS). A random

process X(t) is called strict sense stationary (SSS) if all of the distribution functions

36

 

 

 

 

 

 

I\'l

.I, f I f‘ :7

nl'dr

lei"

_ — —'— M __

 

describing the process are invariant under a translation of time [21]. The transition

density at each node is deﬁned as

 

pip (3:?) Dee), (2.7)

i=1

where 5% represents the Boolean difference of a function y with respect to signal :12,-
and P(-) represents the equilibrium probability. Since the input signals are SSS, the
output will have the same statistics as its inputs. The zero-delay model is assumed,
and the algorithm is limited to combinational circuits only. The algorithm considers
the circuit to be an interconnection of logic modules, where each module represents
a Boolean function based on the zero-delay model. The drawback to the approach
concerns the input independence requirement. If the circuit topology includes re-
convergent fanout and feedback, then internal nodes could become correlated, thus
possibly destroying the independence property. The propagation of density and prob-
ability proceeds on a per module basis from the primary inputs to primary outputs.
Next, the equilibrium probability, P (g3), is evaluated for each node in a module,
on a per module basis using the BDD [30]. This approach has a high cost in time
and memory usage because the propagation algorithm must interact with the BDD,
which may grow exponentially with respect to the number and order of the circuit
inputs.

In [31] an improvement in the accuracy of circuit activity measurement is dis-
cussed. This technique offers a more efﬁcient mechanism for computing the Boolean
diﬂ'erence probabilities at each node of the circuit, which are necessary for the esti-
mation of transition density [24]. In addition, this method allows measurements to be
made in a pattern independent manner. The algorithm partitions the combinational
circuit, which is modeled as a directed acyclic graph, with the goal of maximizing the

number of correlated nodes within each partition.

37

 

 

 

 

.Yl' 7":

as:

r:

for

SlZi-

,

 

 

 

The partitioning aspect of the algorithm prevents overestimation and underesti-
mation inaccuracies in the density simulation which may occur from the correlation of
fanout inputs in the circuit. To compute the Boolean difference probabilities for each
node in a given circuit, this method constructs an ordered BDD (OBDD), by using
the APPLY and RESTRICT procedures at each node [27]. The APPLY procedure
provides the basic method for creating the representation of a function according to
the operators in a Boolean expression or logic gate network. The RESTRICT proce-
dure transforms the graph representing a function f into a function representing the
function f[:r,-:b for speciﬁed values of i and b. The efﬁciency of the computation for
producing maximally reduced OBDDs is improved according to the gate operation
needed. The development of a new operation, the DIFFERENCE operation, gener-
ates Boolean difference functions by applying algebraic operations to other functions.
The DIFFERENCE operation is implemented as a sequence of APPLY operations.

The partitioning aspect of the technique allows for an improvement in accuracy
as the size of the partition grows. However, as the size of the OBDD grows, the
algorithm slows down. The OBDD can grow exponentially with the number of inputs.
For this reason the number of inputs is chosen as a parameter to determine partition
size. Next, a breadth—ﬁrst search is applied at every primary input such that each
partition consists of a single output with k variables, where the variables may be
primary outputs or inputs. The circuit partitioning procedure stops when each node
belongs to a partition which contains k or fewer input variables. The purpose of
the circuit partitioning is to keep each partition small enough to achieve accurate
results. The partitions are then placed in a partition set, stored in the order of their
formation. The transition density is computed for each partition by computing the
Boolean difference probability. The drawback to the technique is the rapid growth of
the OBDD, which leads to memory overﬂow problems. Experimental results report

that for 50 combinational circuits, with up to 20 inputs, this technique required 40%

38

71/81 : f1('l1..i]\1,P81..P81V') (2.8)

7182 = f2(l1..lM,P81..PSN)
”SN '3 fN(ll..l)\1,P81..PSN)
and
P(nsl) = P(f1(’ll..lM,P81..PSN)) (2.9)

P(nsg) = P(f2(ll..7:M,P81..PSN))

P(TLSN) = P(fN(i1..’lM,P81..PSN))

where P(nsl) = P(nsl = 1) and P(ps,) :2 P(ns,) = p, for 1 < i < N. The present
state line probabilities are applied to a nonlinear function g and a nonlinear system

of equations is given by

91 = P1 —‘ 91(P1,P2, "-9le = 0 (2-10)

212 = P2 - 92(P1,P2,---,PN) = 0

yN : pN _ gN(I)1)p2?”°,pN)= O

The nonlinear system of equations may be denoted as Y(P) = 0 or P = C(P). An
iterative solution can be obtained by the use of the Newton-Raphson method for
the system Y(P) = O [33]. Given the nonlinear system of equations, P = G(P),

the Picard-Peano method can be applied to determine a solution [32]. Since the

40

nonlinear solutions do not capture correlation between state line probabilities, minor
inaccuracies are incurred.

To improve accuracy, an unrolling of the next state logic network is performed
(Figure 2.5) [32]. The signal probabilities are approximated by unrolling the next
state logic k times, where k is a user-speciﬁed parameter. Usually an unrolling of the
next state logic improves the accuracy of the results. For k = 3 the average error was
reported to be only 1.5% [32]. As It increases the time consumption increases, along

with a decrease in average error.

 

(signal probability feedback)

 

 

 

NS" N3“

+
PS° I Ps“
—' o o o ———-’

(k=a user deﬁned limit)

 

 

 

 

 

 

 

Figure 2.5. k-unrolling of next state logic.

The correlation accuracy improvement (m-expanded network) involves modifying
the next state logic by selecting m-tuples of the present state lines, separated by
one clock cycle, and computing probabilities for each combination of the m-tuples
pairs (Figure 2.6) [33]. These probability values are fed into the combinational logic
block. Using the ISCAS-85 benchmark circuits, the m—expanded network method for
an accuracy improvement with m = 2 obtained an average error less than 4.1%, and
for m = 4 the reported average error was less than 3.6% [33].

Cheng et al. [34] prOposed a method to increase the speed of estimation of power

dissipation by applying topological analysis to the circuit using the concept of super-

41

 

II":
at.

I111)

 

 

NSL2[II]

Psl
D—

 

 

 

 

 

 

 

 

 

 

 

Next State ‘
Mgic ° N51.2“)”
PS I ..n '
rs,

 

 

Figure 2.6. m-expanded network with m=2.

gates. The supergate of a node in a combinational circuit is the minimal subcircuit
whose inputs are logically independent. The algorithm transforms a circuit to an
undirected graph and solves the problem of ﬁnding supergates by ﬁnding articula-
tion points in an undirected graph. An articulation point is deﬁned as a node whose
removal disconnects the graph. The supergate concept allows for the partitioning of
BDDs, where each partition boundary is deﬁned by its logically independent inputs.
When the supergate concept is applied to most BDD-based power estimation tech-
niques, for select benchmark circuits, cpu-time and memory usage are reduced by as

much as 86.2 and 94.0 percent, respectively [34].

2.2.3 High-Level Methods

A proﬁle driven approach to low power behavioral synthesis is presented in Katkoori et
al. [35, 18]. The method presented is known as the Proﬁle Driven Synthesis System
(PDSS). Given a behavioral design speciﬁcation, a set of input vectors, a param-
eterized library module, and user-speciﬁed constraints such as area and speed, the
switching activity is estimated for the design. Each library module is characterized by
its average switching activity per input vector and consists of units such as adders,

registers, muxes, etc. The behavioral speciﬁcation can be written in a hardware

42

description language such as VHDL. The synthesized design consists of interacting
datapath and controller components. The datapath is composed of modules from the
module library and the controller is a ﬁnite state machine (F SM) implemented as a
PLA/microprogram. The PDSS accepts the behavioral VHDL speciﬁcation as input
and extracts a data ﬂow graph (DFG). The DFG is passed through a proﬁler, where
operations and carriers (edges of the DFG) are collected. In more detail, carriers are
deﬁned as data ﬂow edges that cross a control step boundary which denotes a value
that needs to be stored in a register. The goal of the proﬁling phase is to gather the
following: 1) the number of times a node is executed for a given proﬁling stimuli,
2) the number of times each edge is traversed during execution, and 3) the number
of times the edge value changes. Upon the completion of the proﬁling phase, four
additional phases are entered: 1) scheduling and performance estimation, 2) register
optimization, 3) interconnect optimization, and 4) controller generation.

During the scheduling and performance estimation phase, Operations in the DFG
are assigned to control steps and various operation nodes are bound to speciﬁc mod-
ules selected from the module library. The schedule is acceptable when the estimates
of the area and clock period satisfy the user-speciﬁed constraints. The register op-
timization phase groups carriers such that no two carriers in the same group are
simultaneously active. The interconnect Optimization phase involves the assignment
of interconnect paths to each value transfer in the DFG. The controller generation
phase produces a ﬁnite state machine description. The FSM accepts as input data-
path status ﬂags and produces control signals which enable register transfers in the
datapath. DFG edges that cross control step boundaries denote state transitions in
the FSM. Each control step corresponds to at least one state in the FSM.

Using the data collected in the proﬁling phase, the PDSS system determines an
estimate for aggregate switching activity (ASA). The ASA is the sum of the switching

activities in the datapath and the controller. The switching activity associated with

43

 

 

 

 

 

pr-

(‘01.

Ili-s
F n
ma
Sit;
COI‘:
it },

Elfin

 

fewer nodes then the straight-forward way of computing Boolean difference equations.

In two simultaneous publications by Tsui et al. [32] and Monteiro et al. [33], exact
and approximate methods for estimating switching activity in FSM’s are discussed.
These techniques also closely model the work presented in [26] and share the same
sequential circuit and correlation decomposition structures (Figures 2.3 and 2.4).
Both techniques are almost identical with a few minor differences. To begin with, the
calculation of switching activity for sequential circuits must consider the probability of
the circuit being in any Of its 2N states, where N is the number of ﬂip-ﬂops. Further,
to arrive at an accurate estimate of switching activity, the values for the present state
line probabilities, steady-state probabilities, and signal probabilities must be known.

One method of exactly computing the steady-state probabilities is to ﬁnd a solu-
tion to the 2” linear system of Chapman-Kolmogorov equations. Next, present state
line probabilities are computed from the steady-state probabilities and the signal
probability is computed by generating the Boolean function’s disjoint covering. The
computational cost of computing the steady-state probabilities from the Chapman-
Kolmogorov system of equations is very high, but produces the exact state probabili—
ties. The exact solution involves the system 7r = P7r or (I — P)1r = 0, where the vector
7r contains the steady-state probabilities. P is referred to as the transition probability
matrix, which is derived from the state transition graph (STG). To determine 7r, the
steady-state probability vector, the null-space of the system (I — P)7r = 0 must be
computed. As the number of states grows exponentially with the number of ﬂip-ﬂOps
it becomes impossible to build an STG for large sequential machines and thus the
exact method cannot be applied.

A more time-efﬁcient but less accurate approach searches for an iterative solution
to a nonlinear system of N equations [32, 33]. This system of nonlinear equations
represents the next state logic preceding the symbolic combinational logic block in

Figure 2.4. The next and present state line probabilities have the following form:

39

 

 

 

 

Slit]

of.

ili'i]

Ih]

 

 

nsl = f1(i1..iM,Psl..PsN) (2.8)

71.32 = f2(i1..iM,Psl..PsN)
TLSN = fN(l1..lA[,P81..PSN)
and
P(TlSl) = P(f1(’ll..’l:M,P81..PSN)) (2.9)

P(nsg) '2 P(f2(i1..lM,P81..PSN))

a...) = P(f~(i1--iMaPSI--P3N))

where P(nsl) = P(nsl = 1) and P(ps,) = P(ns,) = p,- for 1 < i < N. The present
state line probabilities are applied to a nonlinear function g and a nonlinear system

of equations is given by

yl : Pl _ 91(p1ap27°"apN) : 0 (2'10)

P2 — 92(P1,P2, -o-,PN) = 0

312

yN : pN — gN(p13p29°"3[)N) : 0-

The nonlinear system of equations may be denoted as Y(P) =2 O or P = G (P) An
iterative solution can be obtained by the use of the Newton-Raphson method for
the system Y(P) = O [33]. Given the nonlinear system of equations, P = C(P),

the Picard-Peano method can be applied to determine a solution [32]. Since the

40

nonlinear solutions do not capture correlation between state line probabilities, minor
inaccuracies are incurred.

To improve accuracy, an unrolling of the next state logic network is performed
(Figure 2.5) [32]. The signal probabilities are approximated by unrolling the next
state logic k times, where k is a user-speciﬁed parameter. Usually an unrolling of the
next state logic improves the accuracy of the results. For k = 3 the average error was
reported to be only 1.5% [32]. As It: increases the time consumption increases, along

with a decrease in average error.

 

(signal probability feedback)

 

 

 

 

NS" NS“
p

PsO [ 1>s"'l
———> o o o ——’

(k=a user deﬁned limit)

 

 

 

 

 

 

Figure 2.5. k-unrolling of next state logic.

The correlation accuracy improvement (m-expanded network) involves modifying
the next state logic by selecting m—tuples of the present state lines, separated by
one clock cycle, and computing probabilities for each combination of the m-tuples
pairs (Figure 2.6) [33]. These probability values are fed into the combinational logic
block. Using the ISCAS-85 benchmark circuits, the m-expanded network method for
an accuracy improvement with m = 2 obtained an average error less than 4.1%, and
for m = 4 the reported average error was less than 3.6% [33].

Cheng et al. [34] proposed a method to increase the speed of estimation of power

dissipation by applying topological analysis to the circuit using the concept of super-

41

 

 

 

will...

 

 

 

i ]-— Nsmm]

PS, ‘
1...... P82 1 D— NSmIIOl
Next State ' l—q
mgic . NSIJZIOII

Pslm .

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2.6. m-expanded network with m=2.

gates. The supergate of a node in a combinational circuit is the minimal subcircuit
whose inputs are logically independent. The algorithm transforms a circuit to an
undirected graph and solves the problem of ﬁnding supergates by ﬁnding articula-
tion points in an undirected graph. An articulation point is deﬁned as a node whose
removal disconnects the graph. The supergate concept allows for the partitioning of
BDDs, where each partition boundary is deﬁned by its logically independent inputs.
When the supergate concept is applied to most BDD-based power estimation tech-
niques, for select benchmark circuits, cpu-time and memory usage are reduced by as

much as 86.2 and 94.0 percent, respectively [34].

2.2.3 High-Level Methods

A proﬁle driven approach to low power behavioral synthesis is presented in Katkoori et
al. [35, 18]. The method presented is known as the Proﬁle Driven Synthesis System
(PDSS). Given a behavioral design speciﬁcation, a set of input vectors, a param-
eterized library module, and user-speciﬁed constraints such as area and speed, the
switching activity is estimated for the design. Each library module is characterized by
its average switching activity per input vector and consists of units such as adders,

registers, muxes, etc. The behavioral speciﬁcation can be written in a hardware

42

 

 

 

 

description language such as VHDL. The synthesized design consists of interacting
datapath and controller components. The datapath is composed of modules from the
module library and the controller is a ﬁnite state machine (F SM) implemented as a
PLA/microprogram. The PDSS accepts the behavioral VHDL speciﬁcation as input
and extracts a data ﬂow graph (DFG). The DF G is passed through a proﬁler, where
operations and carriers (edges of the DFG) are collected. In more detail, carriers are
deﬁned as data ﬂow edges that cross a control step boundary which denotes a value
that needs to be stored in a register. The goal of the proﬁling phase is to gather the
following: 1) the number of times a node is executed for a given proﬁling stimuli,
2) the number of times each edge is traversed during execution, and 3) the number
of times the edge value changes. Upon the completion of the proﬁling phase, four
additional phases are entered: 1) scheduling and performance estimation, 2) register
optimization, 3) interconnect optimization, and 4) controller generation.

During the scheduling and performance estimation phase, operations in the DFG
are assigned to control steps and various operation nodes are bound to speciﬁc mod-
ules selected from the module library. The schedule is acceptable when the estimates
of the area and clock period satisfy the user-speciﬁed constraints. The register op-
timization phase groups carriers such that no two carriers in the same group are
simultaneously active. The interconnect optimization phase involves the assignment
of interconnect paths to each value transfer in the DFG. The controller generation
phase produces a ﬁnite state machine description. The FSM accepts as input data-
path status ﬂags and produces control signals which enable register transfers in the
datapath. DF G edges that cross control step boundaries denote state transitions in
the FSM. Each control step corresponds to at least one state in the FSM.

Using the data collected in the proﬁling phase, the PDSS system determines an
estimate for aggregate switching activity (ASA). The ASA is the sum of the switching

activities in the datapath and the controller. The switching activity associated with

43

the datapath is computed by summing the switching activities determined for the
following: combinational Operators, registers, and interconnection elements such as
buses, multiplexers, and wires. The switching activity contributed by the controller is
determined by analyzing the PLA structure used to implement the controller’s F SM.
The activity concerning the PLA structure is the sum Of the PLA’s input and output
plane switching activities. The experimental results of the method indicate that the
estimated switching activity deviates by less than 10% [35].

Landman et al. [36] present techniques for accurately estimating power consump-
tion based on a high-level description of the system architecture. This approach, based
on stochastic modeling of bus statistics, achieves the accuracy associated with gate-
level estimation tools. Algorithmic and architectural estimation techniques based on
high-level statistics such as mean, variance, and autocorrelation are developed using
concepts from the gate-level techniques. While gate-level techniques focus on power
consumed by individual Boolean logic gates as a function of their input probabilities;
this method considers module (adder, register, multiplier) power consumption in re-
gards to input-word statistics. For a variety of input distributions, the techniques
perform very well on real-world signals such as speech, music and image, typically
found in digital signal processing (DSP) applications. The results obtained from these
techniques exhibit an estimation accuracy within 9.4% of the gate-level simulations
[36].

In [37, 38] Nemani et al. and Najm, in similar publications, presented a digital
IC power estimation technique that Operates at the register-transfer—level (RTL). The
estimator is based on using entrOpy as a measure of the average activity to be expected
in the ﬁnal circuit-level implementation. Entropy is a characterization of a random
variable or random process. If a: is a random Boolean variable with signal probability
p, then the entropy of :1: is deﬁned as: H (3:) = p log2 %+ (1 — p) log2 0+“. The entropy

can be expressed for discrete systems in terms of inputs and outputs. The high-level

44

power estimation methodology for a combinational circuit block that is part of a

synchronous sequential circuit involves the following steps:

1) Run a structural RTL simulation of the sequential circuit to measure the

input / output entropies of the combinational block.

2) From the input/output entropies, estimate average node density, circuit

area, and average power.

3) Combine with latch and clock power to compute total average power.

The area and average power estimates are computed via input/output entropy
estimates where A oc %H (Y), Pm,g oc A x H, such that H (Y) is the output entropy
and H is the average input entropy. When this method was tested against a zero—
delay model for 56 different ISCAS-85 benchmark circuits with sizes ranging from
100 to 22,000 gates, the error was reported to be less than 9% with a 90% conﬁdence
level [37].

In summary, the estimates generated by current high-level power and switching
activity computation methods experience between 9% to 12% error on average. For
some circuits containing large amounts of reconvergent fanout the error is as high
as 80%. The error experienced by current high-level power and switching activity
estimation techniques is too high. This dissertation presents a new methodology for

improved switching activity estimation of behavioral-level designs described in VHDL.

45

 

 

 

 

_C

pr.

in;

{0.
lift;
bin;
Cal [1

[le

CHAPTER 3

Behavioral Representations of

Switching Functions

The fundamental building blocks of digital systems are switching functions. This
chapter speciﬁcally addresses the use of Boolean switching functions. A review of
terminology and notation used to describe the functional and mathematical models
of Boolean switching functions when applied to behavioral design of digital systems is

provided. Additionally, a discussion of various behavioral representations for switch-

ing functions is provided as well.

3. 1 Switching Algebra Background

The mathematics which deﬁne rules and operations for processing the binary set
{0, 1} in a combinational digital system is called Switching Algebra, also known as
t“VG-Valued Boolean algebra, developed by English mathematician George Boole in

1854. The most primitive Operations applied to variables (i.e., V33,- 6 {0,1}) of the

binary set are “-”, “+”, and “—”, which denote switching algebra operations, also
called logic Operations. A switching algebra must satisfy Huntington’s postulates

(Table 3.1), which deﬁne the basis of switching algebra theory [39].

46

 

 

 

M... Pig1

 

 

 

 

Property Meaning

 

 

 

Closure 11:,y 6 {0,1} ——> (a:+y) 6 {0,1}
:c,y€ {0,1}—>(r-y)€ {0,1}
Identity $+O=$ 23-020
$+1=1 vl-x
Commutative 3+3] = y +2:
x'y=y-$

 

Distributive :r + (y - z) = (.7: + y)(:r + z)
$‘(y+2)= (IE-y)+($'2)
Complement :1: + f = 1

a: - E = O

 

 

 

 

 

 

 

 

Table 3.1. Huntington’s postulates.

3.2 Truth Tables

” ﬂ.” “_”

The deﬁnition of the three logic Operators ( “+ , , ) can be deduced from Hunt-
ington’s postulates using truth tables. A truth table is a mechanism for systematically
enumerating the output or result of a logic operation for every possible Boolean input
combination. The truth table can be extended to enumerate the outputs for switch-
ing functions consisting of sets of logic operations and Boolean operands. A double
vertical bar is used to separate truth table inputs from truth table outputs. The
commonly used truth table convention places inputs on the left of the double vertical

bar, with outputs placed to the right.

3.2.1 Logic Operations

The “—” logic operation is called the complement, NOT, or invert operation, of which
the term NOT is the most commonly used. The NOT operation is a unary logic
operation, meaning that it operates on a single operand. When the NOT operation
is applied to the Boolean operand :c, the result is f. The truth table in Table 3.2(a)

shows that the NOT operation changes 0 to 1 and changes 1 to 0.

47

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

a: f .1: y a: . y a: y :r+y
O 1 0 0 0 0 0 0
1 0 O 1 0 O 1 1
1 0 0 1 0 1
1 1 1 1 1 1
(a) NOT (b) AND (c) OR

Table 3.2. Truth tables for NOT, AND, OR logic operations.

The “3’ logic operation is called the AND or conjunction operation. The term
AND is most commonly used. It is a binary operator, meaning that it takes two
operands and generates 22 input combinations. The result after applying the AND
operation to binary operands :1: and y is 1, only when :1: = y = 1, otherwise the result
is 0. The truth table in Table 3.2(b) depicts the AND logic operation.

The “+” logic Operation is called the OR or disjunction operation. The term OR is
most commonly used and is classiﬁed as a binary operator. The result after applying
the OR operation to binary Operands a: and y is 0, only when a: = y = 0, otherwise
the result is 1. The truth table in Table 3.2(c) depicts the OR logic operation.

The truth table technique is an implementation-independent method of repre-
senting switching functions. 'Ituth tables are considered behavioral representations
because no reference to technology, structure, or implementation is made. The truth
table completely speciﬁes a switching function’s output for all possible (2”) input
combinations, where N is the number of Boolean inputs. Consider the 3-input switch-
ing function which represents the carry-out bit for a full-adder (FA). The FA carry-out

bit input/output behavior is speciﬁed for all 23 inputs sequences (Table 3.3).

48

., .1 .. yl.

 

 

 

 

 

 

 

 

 

a b c,-,, cont
0 0 O 0
O 0 1 O
0 1 0 0
0 1 1 1
1 0 0 O
1 0 1 1
1 1 O 1
1 1 1 1

 

 

 

 

 

 

 

 

 

 

Table 3.3. Truth table for full-adder carry-out bit.

3.2.2 Output Enumeration

The outputs of switching functions are easily enumerated with truth tables. Each row
of a fully speciﬁed truth table is referred to as a term or product. The term represents
the input combination that determines the output of a switching function. Each
term or product is logically an AN D-product of input variable literals. A switching
function consisting of N input variables has 2” terms. The enumeration of terms
utilizes the base-2 number system because switching algebra is deﬁned against a two-
valued domain of {0, 1}. The expression for representing a term’s decimal equivalent

for an ordered n-bit sequence (xn_1, $n_2, - - -, 231,170) is

n—1
X = x.._,2"-1+ x.._22"-2 + . . - + $121+ 2:020 = Z 1:,2’ (3.1)
120

3.3 Boolean Expressions

The full-adder’s carry-out bit switching function described by the truth table in Table
3.3 has 23 terms, numbered from O to 7 using Equation 3.1. The terms which result
in switching function’s output of logic 1 are known as minterms. The carry-out bit

switching function has four minterms and can be expressed in the following forms:

49

f(a,b,c,-,,) = 011+101+110+111 (3.2)

2 about + abc,n + abﬁ; + abctn (3-3)
2 mg + ms + m6 + m7 (3'4)
2 Z m(3, 5, 6, 7) (3.5)

Each of the forms represent what is known as a Boolean expression. The Boolean
expression is an n-variable function of Boolean inputs whose output is Boolean, or
f (:13) : {0, 1}" —> {0,1}. Equations 3.2 and 3.3 represent the minterms of the carry-
out bit switching function in binary and literal form, respectively. Each switching
variable is expressed as un/complemented (1 / 0) for the speciﬁed input combination.
Equation 3.4 uses a shorthand notation for minterms, m3. The subscript x denotes
the associated row of the truth table or binary value of the input combination. Equa-
tion 3.5 is the most commonly used form for representing minterm lists, it is a more
compact representation of the form in Equation 3.4. The switching function repre-
sentations shown in Equations 3.2, 3.4, and 3.5 are known as canonical minterm or
sum-of-products (SOP) expressions. After reducing Equation 3.3, using Huntington’s

postulates, the resulting Boolean expression is

f(a1 b) Gin) : ab + cin(a + b)' (36)

The reduced Boolean expression (Equation 3.6) is in SOP form, but it is not
a canonical minterm expression because the product terms are not minterms. The
canonical minterm expressions, as well as Boolean expressions, are implementation-
independent methods for representing the input/output behavior of switching func-

tions. Boolean expressions do not consider technology, circuit structure, or imple-

5O

mentation of a digital design. The advantage of the canonical minterm expression
representation over the truth table representation is size. Canonical minterm ex-
pressions are more compact than truth tables. Truth tables represent all 2N input
combinations, whereas canonical minterm expression representations list the input

combinations for which the switching function’s output is logic 1 only.

3.4 Binary Decision Diagrams

Probability-based power analysis tools depend heavily on Binary Decision Diagrams
(BDDs) to determine signal activity. Historic uses of BDDs have been in the digital
circuit design areas of synthesis, veriﬁcation, and testing. The BDD is not a new
concept. As early as 1959, Lee [40] introduced the concept of Binary Decision Pro-
grams and a set of rules to transform these programs into switching circuits. Later,
in 1978, Akers [41] revisited the concept of BDDs by using the diagram as a means
to deﬁne, analyze, and test large digital functions from an implementation-free per-
spective. It was in 1986 that Bryant [27] demonstrated the advantages of BDDs as
a canonical representation. Bryant demonstrated that BDDs have two very useful
properties. First, BDDs are canonical: given two circuits, they are equivalent if their
BDDs are identical. Second, BDDs are effective at representing combinatorially large
sets, which is useful in FSM equivalence checking and logic minimization.

BDDs represent a switching function as a directed acyclic graph (DAG). A graph
consists of an interconnection of nodes (vertices) and edges (arcs). There are two node
types: decision nodes or terminal nodes. Terminal nodes are characterized by not
having outgoing edges which lead to children nodes and contain ﬁxed values which
possibly correspond to the output of a function. Decision nodes are characterized by
having outgoing edges which lead to other decision nodes and terminal nodes. The

decision node is labeled with a variable identiﬁer and has one outgoing edge for each

51

value this variable can assume.

Since Boolean decisions are being made, the decision node variable identiﬁers can
only assume the values of O and 1. The terminal node values will be ﬁxed at either
0 or 1, and the possibilities for edges will be either the O-edge or 1-edge. The O-
edge will be chosen when the decision node variable assumes the value 0 and the
Ledge is chosen when the decision node variable takes on the value 1. Figure 3.1
is a BDD representing the full-adder (FA) carry-out bit switching function described
by Equation 3.6 and Table 3.3. Typically in BDD graphs dotted arcs represent the

O-edges while the solid arcs represent 1-edges.

ii

 

 

 

 

 

 

Figure 3.1. BDD of full-adder carry-out bit.

The BDD size is directly related to the number of nodes in the BDD’s graph, which
is controlled by the number of input variables and their ordering. One such BDD
type, the Ordered Binary Decision Diagram (OBDD), addresses the BDD size issue
by considering input variable ordering. The ordering of input variables determines
the level at which each input will appear in the BDD’s graph. In the OBDD the
ordering will remain the same for each path taken from the root (lowest order) node

to a terminal node. Different input variable orderings lead to different BDDS, with

52

each potentially having a different size.
Consider the switching function given by Equation 3.7. An input variable ordering
of a < d < b < c leads to the BDD displayed in Figure 3.2, while an input variable

ordering of b < c < a < d leads to the BDD representation displayed in Figure 3.3.

f(a, b, c, d) 2 abc + bd + Ed (3.7)

   

Figure 3.2. BDD with ordering 1. Figure 3.3. BDD with ordering 2.

Clearly the second ordering provides the more compact BDD representation; it
has a smaller node count. Hence, a good input variable ordering will yield a more
compact BDD representation with reasonable memory usage [42]. A modiﬁcation
to the OBDD is the Reduced-Ordered BDD (ROBDD). An initial ordering is given
in the ROBDD, and the iterative identiﬁcation and removal of isomorphic subgraphs
and redundant nodes takes place [39]. The removal results in a BDD which is minimal
for the given input variable ordering and canonical in form.

Once such ROBDD implementation was developed by Brace et al. [30]. This
ROBDD implementation made improvements in the if-then-else (ITE) operator, hash-
ing technique, and memory garbage collection. The results reported that an amortized
memory cost of 22 bytes per node was achieved. Additionally, it was reported that

the improvements yielded a faster, more memory-efﬁcient ROBDD implementation

53

than the original implementation presented in [27].

Shen et al. [43], proposed a data structure called the Free Boolean Diagram (F BD),
which improved the ROBDD representation by trading off canonicity. One distinction
between the ROBDD and the FBD is that the FBD allows different input variable
orderings along different paths from the root node to a terminal node. Additionally,
the nodes in the graph of the FBD may be of type function (XOR or AND nodes),
which is further discussed in [43]. It was reported that the FBD implementation
resulted in an amortized memory cost of 32 bytes per node and for certain cases the
F BD size was signiﬁcantly reduced [43]. The FBD implementation is an improvement
over the ROBDD implementation for certain circuits because its total memory usage

for behavioral representation is less.

3.5 Behavioral VHDL Speciﬁcations

VHDL, short for VHSIC (Very High Speed IC) Hardware Description Language, is
a formal notation for hardware description, standardized by the IEEE in 1987, that
allows for, among other descriptions, the behavioral-level design of a digital system
[44]. The VHDL behavioral speciﬁcation supports the modeling of digital hardware
using sequential statements similar to programming languages such as Ada, C, or
Pascal. VHDL, just like the previously mentioned programming languages, contains
loop, if-then-else, and assignment constructs.

A VHDL model is comprised of entities and architectures. An entity deﬁnes the
interface between a system and its environment, such as signals that ﬂow into and
out of the system or component. The architecture describes the functional nature of
the digital system; it deﬁnes how the outputs respond to the inputs.

A behavioral VHDL speciﬁcation describing the carry-out bit of a full-adder

(FA) is given in Figure 3.4, where the architecture closely resembles the Boolean

54

function deﬁned by Equation 3.6. Behavioral VHDL speciﬁcations are high—level
implementation-independent representations which describe the functional nature of

a digital system without any reference to circuit structure or technology.

ENTITY CARRY is
port (A, B, Cin : IN BIT;
Cout : OUT BIT);
end CARRY;

ARCHITECTURE Behavioral of CARRY is
BEGIN -- Behavioral FA carry-out bit

Cout <= ((A and B) or (Gin and (A or B))) after 10ns;
END Behavioral;

Figure 3.4. Behavioral VHDL model of FA carry-out bit.

55

-'-..‘1

 

 

 

CHAPTER 4

Structural Representations of

Switching Functions

The previously discussed behavioral representations describe only a Circuit’s function-
ality in terms of truth tables, Boolean expressions, or BDDs. When detailed infor-
mation, such as timing, speed, or area are needed, these representations are of little
use. To effectively describe the input/output relationships of a circuit, a lower level
of design abstraction is necessary. The logic— and circuit- design levels of abstraction
within the EDA process (Figure 1.1) provide information which enable designers to
obtain or calculate critical timing, speed, or area information. The logic- and circuit-
design levels of abstraction yield structural representations. Given a logic- or circuit-
level description, structural-level representations eﬂ'ectively describe the design’s in-
put/output relationships, as well as the internal connectivity of circuit elements and
logic gates. This chapter deals with logic-level structural representations, including
the modeling of switching functions in terms of schematics, netlists, structural VHDL

speciﬁcations, and the Connective Binary Decision Diagram (CBDD).

56

4.1 Schematics

Schematics or logic diagrams are a visual means of representing the structure of
a digital circuit design. Schematics illustrate connective relationships between a
circuit’s primary inputs/outputs, internal signals, and logic elements. Schematics
display the logic elements and the wires used to interconnect these elements. The
most basic logic elements used to perform the operations necessary to implement
switching functions are the NOT-, AND-, and OR- gates. The schematics of the
basic logic gates are illustrated in Figures 4.1, and they implement the behavior

given in the truth tables of Table 3.2.

-Dr- ID-

(a) NUT (b) AND (c) 0R

 

Figure 4.1. Schematics of basic logic gates.

The interconnection and recombination of these basic logic gates support the
design of more complex switching functions such as adders, ALUs, muxes, etc. By
interconnecting an arrangement of OR- and AND- gates, a structural representation
for the carry-out bit of the full-adder can be realized, Figure 4.2.

Vendors such as ViewLogic, Xilinx, and Mentor Graphics develop schematic
entry software systems which support the design of digital hardware. Schematic
entry software systems provide a pre-characterized library of logic elements with the
ability to place and wire these components to a design’s connective Speciﬁcation. A
limitation of the schematic design approach is its difﬁculty supporting low-level and
detailed views of large and complex designs. Large and complex designs are limited

to system-level or hierarchical views.

57

cina b

1 Z]

‘1—
22 Cout
23

 

 

 

 

 

 

 

 

 

Figure 4.2. Logic diagram for FA carry-out bit.

4.2 Netlists

The structure of a circuit can be represented using a netlist. A netlist is a data
structure that describes all components connected to each net or to each internally
produced signal within a circuit. Industry standard formats exist for describing a
netlist enabling the easy transfer of designs between various design tools and vendors.

Design tools may use netlist formats which represent circuit structure at various
levels of design abstraction. For example, the SPICE tool considers the circuit-level
of design abstraction. SPICE netlists model a circuit as an interconnection of analog
elements such as capacitors, resistors, and transistors. The Berkeley SIS tool consid-
ers the logic-level of design abstraction. The widely used SIS netlist format, known
as Berkeley Logic Interchange Format (BLIF), models a digital circuit as an intercon-
nection of predeﬁned or user-deﬁned logic elements, which includes but is not limited
to, N/AND, N/OR, NOT, or XOR gates.

Consider Figure 4.3 which is a gate-level netlist that describes the structure of a
full-adder carry-out bit. The gate-level netlist example directly corresponds to the
logic diagram shown in Figure 4.2 and the Boolean expression given in Equation
3.6. In general, netlist representations describe the structure and connective relation-

ships visualized by schematic entry systems and may serve as their input. Similar

58

.inputs c_in a b
.outputs c_out

.gate and2 a=a b=b 0=zl
.gate or2 a=a b=b 0=z2
.gate and2 a=z2 b=c_in 0=z3
.gate or2 a=zl b=23 0=c_out
.end

Figure 4.3. Gate-level BLIF netlist for FA carry-out bit.

to schematic-based design, netlist-based design is limited by circuit size (number Of
nets). As the circuit size grows, the difﬁculty of debugging and isolating problems
increases. Isolating circuit design problems given lower level circuit component gran-
ularity is a complex task when visually analyzing the netlist. Tools, such as analyzers

or simulators are needed to locate and assess low level problems.

4.3 Structural VHDL Speciﬁcations

In VHDL, design architectures may be developed using a structural speciﬁcation.
Structural speciﬁcations express an architecture as a hierarchical arrangement of in-
terconnected components. The interconnected components may be pre—deﬁned li-
brary units or user-deﬁned units. Structural-based VHDL design is powerful because
it simpliﬁes the development of complex systems. Larger or more complex systems
are easier to design because the high-level system architecture can be viewed as an
interconnection of less complex black boxes or components, which can be modeled
behaviorally or structurally. The interconnection of components is made possible
through the use of signals, which are analogous to wires.

A structural VHDL speciﬁcation describing the carry-out bit of a full-adder (PA)
is depicted in Figure 4.4 where the architecture closely resembles the logic diagram in

Figure 4.2 and the gate-level netlist shown in Figure 4.3. Structural VHDL speciﬁca-

ENTITY CARRY is
port (A, B, Cin : IN BIT;
Cout : OUT BIT);
END CARRY;

ARCHITECTURE Structural of CARRY is

-- declare internal components
COMPONENT or_gate
port (X, Y : IN BIT;
Z : OUT BIT);
END COMPONENT; '

COMPONENT and-gate
port (X, Y : IN BIT;
Z : OUT BIT);
END COMPONENT;

signal 21, 22, 23 : BIT;

BEGIN -- Structural FA carry-out bit
-- instantiate declared components to form structure
AND1 : and_gate port map (x=>a, Y=>b, Z=>Zl);
0R1 : or_gate port map (X=>a, Y=>b, Z=>Z2);
AND2 : and_gate port map (X=>22, Y=>Cin, Z=>23);
0R2 : or_gate port map (X=>21, Y=>23, Z=>Cout);
END Structural;

Figure 4.4. Structural VHDL model of FA carry-out bit.

tions are very useful when a design is decomposed into a set of components allowing a
system to be viewed as an interconnection of library or user-deﬁned black boxes. The
structural VHDL design approach supports the interchanging of various off-the—shelf

cores and other pre—deﬁned components to help a designer meet system requirements.

4.4 Connective Binary Decision Diagrams

The Connective Binary Decision Diagram (CBDD) is a variant of the standard BDD.
The CBDD is a major contribution of this research. It is entirely new and has some

advantages over the traditional BDD implementations. CBDDS model the structure

60

present in a digital Circuit’s netlist description as well as the Circuit’s connective
relationships. The combination of the CBDD’S graph along with a set of graph
traversal algorithms provide the necessary support to construct compact structural

representations which maintain circuit behavior.

4.4.1 Motivation

The traditional BDD implementation is attractive because it is a canonical represen-
tation. The canonicity feature is very useful for circuit veriﬁcation applications. But,
the advantages of the BDD quickly diminish when the circuit size becomes large. Tra-
ditional BDDs may result in multiple subgraphs, one for each internal and primary
output. Efficient manipulation of conventional BDDs is limited to modest-sized cir-
cuits. As the number of circuit inputs and logic elements increase, conventional BDD
graphs may experience exponential (z 2”) growth. Exponentially-sized graphs result
in CPU-intensive applications and may exhaust the available memory of a typical
workstation. Given the rapid advancement in VLSI design, the efﬁcient representa—
tion of larger and more complex circuits by probabilistic power anaylsis tools is a
crucial requirement. To meet this challenge a new decision diagram, whose size does

not grow exponentially, is needed.

4.4.2 Overview

CBDD’s, unlike conventional BDD implementations, maintain a Circuit’s structural
input/output relationships and internal connectivity. The CBDD results in one di-
rected acyclic graph which completely represents the structure of an entire circuit.
The CBDD’s graph grows linearly. Its size is proportional to the number of logic el-
ements in the circuit description. As a result, for most circuits, CBDDs yield a more

compact graph representation when compared to conventional BDD implementations.

61

4.4.3 Minimized Scalable BDDS

A signiﬁcant component used to construct the CBDD’S graph is the minimized-
scalable BDD (MSBDD). MSBDDs are subgraphs which represent the behavior of
a Circuit’s structural building blocks such as gates, muxes, adders, etc. The MSBDD
concept is a contribution made by this research effort. Fundamentally, MSBDDs are
Binary Decision Diagrams as discussed in [27], but are minimal in the number of

nodes and edges used to construct the decision graph.

 

(a) f(a.h) = NAND(a.b) (b) f(mb) = NOR(a.b) (C) HELD) - XOR(a.b)

     

(d) f(a.h)=AND(a.b) (c) f(a.h)=0R(a.b) (f) f(a)==N()'l‘(a)

Figure 4.5. Minimized-scalable BDDS.

A revealing feature concerning the MSBDD is its ability to represent the behavior
of a variable-input logic block with minimal size (node count). The MSBDD is de-
signed to have a small number of decision nodes and only two terminal nodes for each
logic block representation. The minimal size is achieved by utilizing unique MSBDD
Generation Routines which recognize and exploit the output drive of the logic block

being represented by the MSBDD. The output drive is a simpliﬁed set of decision

62

node interconnections which serves to minimized the logic block’s output path depth.
For example, an AND gate is O-driven. The presence of a logic 0 at any AND input,
regardless of fanin size, results in a logic 0 gate output. Likewise, a logic 1 at any OR
gate input results in a logic 1 output. Therefore, OR gates are l-driven. The output
drive feature considered by the MSBDD supports the construction of minimized de-
cision diagrams for certain logic elements such as NOT, N AN D, AND, NOR, and OR
gates, Figure 4.5. It is accomplished by increasing the number of direct connections
or edges between decision nodes and terminal nodes within the MSBDD’s graph with

the goal of minimizing the depth.

4.4.3.1 MSBDD Generation Routines

The MSBDD generation routines are unique for each represented logic element. The
main goal of each MSBDD generation routine is to produce an interconnection of
nodes and edges which yield a decision diagram of limited size and depth. The CBDD
implementation utilizes MSBDD generation routines which mimic the behavior of
basic logic elements such as NOT, NAND, AND, NOR, and OR gates. Additionally,
the CBDD implementation supports the inclusion of more complex logic structures.
This is easily accomplished by adding a routine that generates an interconnection of
nodes and edges which mimic the intended behavior.

The Mk_NAND procedure illustrated in Algorithm 1 generates an MSBDD graph
that represents the behavior of an n—input N AN D gate. This procedure interconnects
the MSBDD graph with a CBDD graph in a manner that mimics the structure found
in the input speciﬁcation. The procedure utilizes three input parameters: a CBDD,
an argument list, and fanin size. The CBDD’s graph models the functionality of the
network described by an input netlist. The argument list contains the fanin names,
and fanin size is the number of inputs to the NAND gate.

Lines 2-4 of the Mk_NAND procedure build an array of decision nodes for each

63

 

Procedure: Mk_NAND

input : G - CBDD of network
input : n - Fanin size
input : argv - Argument names

output : M’ - MSBDD of n-NAN D logic gate

begin
for z' E 0..n do
vlist[i] (— Mk_CBDD_Node_Set(G, argu[t])
end
for z' 6 0..n do
Add_Edge(G', ultst[z’], ZEROJERMINAL, ZERO_EDGE)
end
for 2' E 0..n — 1 do
Add_Edge(G, vlz'st(i), ulz'st[z' + 1], ONE_EDG'E)
end
Add_Edge(G,vlist[n-1], 0N E.TERM I N AL, 0N E.EDGE)
Add_Edge(G, ZEROJERMINAL, OUTJVODE, VALUE_EDGE)
Add_Edge(G, ONEJERMINAL, OUTJVODE, VALUE_EDGE)

14 M’ <— Connect_Nodes(G', n, ulz'st, OUTPUTNODE)
15 end

cmqaahuuu

HHHH
“ND-‘9

 

 

Algorithm 1: Mk_NAND_msbdd

argument. Lines 5-7 generate O-edge connections between the decision nodes and
a zero-terminal node. Lines 8-10 perform l-edge connections between the decision
nodes. Line 11 connects the last decision node to a one-terminal node. Using a
value-edge, lines 12 and 13 connect the zero- and one— terminals to the output node.
Line 14 connects the newly created output node to the CBDD’S graph and returns
an MSBDD reference.

The Mk_CBDD_Node_Set, Add_Edge, and Connect.Nodes operations invoked by
the Mk-NAND procedure, run in 0(1) time. These operations are invoked a maximum
of 3n + 4 times, yielding a time complexity of 0(3n + 4), where n is the number of
inputs. Since a NAND gate is being modeled, n is usually small, resulting in n + 3

nodes and 272 + 2 edges. A similar routine exists for the remaining basic logic gates.

64

 

The CBDD package supports logic elements other than gates. The behavior of higher
level logic elements such as muses, adders and encoders can be modeled by the

construction of unique MSBDD structures.

4.4.4 CBDD Deﬁnitions

An advantage of the CBDD is its ability to maintain the Circuit’s structural in-
put/output relationships and internal connectivity. The deﬁnition of the CBDD is
based on the deﬁnitions of a DAG and conventional BDD. Structural and connective
relationships are achieved by altering the conventional BDD’s deﬁnition to support
additional node (vertex) and edge (arc) types and properties. The CBDD’s formal

deﬁnition is given and described by Deﬁnition 4.1 and Figure 4.6.

Deﬁnition 4.1 A Connective Binary Decision Diagram (CBDD) is a directed acyclic

graph which is composed of an MSBDD set M, vertex set V and edge set E.

There are differences between the conventional BDD implementation and the
CBDD. First, the conventional BDD graph contains only two node types, the decision
and terminal node types. The CBDD’s graph supports four node types, with each
node serving a unique purpose. Second, the CBDD’S graph supports an additional
edge, the value-edge. Third, the CBDD’s graph is an interconnection of subgraphs,
known as Minimized-Scalable BDDS (MSBDDs).

CBDDS utilize the 0- and 1- edges in the same manner as a conventional BDD.
The traversal of the 0— and 1- edges indicates that the predecessor decision node
variable is equal to 0 or 1, respectively. The value-edge is new to the CBDD’s design;
value-edges are used to model the propagation of internally produced signals.

The CBDD’s graph contains Output, Input, Internal and Terminal node types.
The concept of an Output node is new; it represents or contains the value of a Circuit’s

primary output. Output nodes are preceded by ﬁxed—valued nodes or Terminal nodes

65

1. Vv E V may be of type:
1.1. Input
1.1.1. followed by children via 0- and 1- edges
1.1.2. assumes variable primary input value
1.1.3. value(v) E {0, 1}
1.2. Internal
1.2.1. followed by children via 0—, 1- and value- edges
1.2.2. assumes variable internal output value
1.2.3. value(v) E {0, 1}
1.3. Terminal
1.3.1. value termination for internal and output nodes
1.3.2. proceeded by children via value-edges
1.3.3. value is ﬁxed, value(v) E {0, 1}
1.4. Output
1.4.1. absolute termination, not followed by children
1.4.2. assumes variable primary output value
1.4.3. value is ﬁxed, value(v) 6 {0,1}
2. Ve E E may be of type:
2.1. O-edge (l-edge)
2.1.1. traversed when value(v) = 0(1)
2.1.2. outgoing to Input, Internal, and Terminal vertices
2.2. value-edge
2.2.1. propagates value(v) to Internal or Output vertices
2.2.2. incoming only to terminal vertices
3. MSBDD set M elements:
3.1. Vm E M, vertex(m) 6 V
3.2. MSBDD terminals are connected to Internal or Output nodes via value-edges
3.3. V MSBDDs represent a function on vertex v, f”
3.4. fv = 3:7 - flow”) + :12,- - fhighm, where a); is a decision variable

Figure 4.6. Deﬁnition of CBDD.

connected by value-edges, and are not succeeded by descendent nodes. The Input
node is essentially a decision node, as in the conventional BDD; it represents the state
of an input variable and may be followed by descendent nodes connected by O-edges
and l-edges. The Internal node is a special decision node whose binary value results
from the evaluation of a logic element represented by an MSBDD. Internal nodes
are preceded by ﬁxed-valued nodes or Terminals via value-edges and are succeeded
by descendent Input nodes. The Internal node’s value is propagated to the Input

nodes of other logic element MSBDD representations by the use of value-edges. The

66

Terminal nodes used in the CBDD’S graph differ from those used by the conventional
BDD. CBDD Terminal nodes carry a ﬁxed binary value but have descendents; they

are succeeded by Internal or Output nodes and connected by value-edges.

4.4.5 Analysis of Connective-BDDS

Circuit size is a limiting factor for applications utilizing traditional BDD implemen-
tations. For large circuits BDD graphs may become exponentially-sized, resulting in
CPU-intensive applications. The CBDD is an alternative graph structure which is
useful for certain applications. This section provides background information con-
cerning the CBDD’s linear growth and functional rationalization. An explanation
detailing the CBDD’s improved circuit representation when compared to the tradi-

tional BDD is given.

Theorem 4.1 The CBDD’s graph grows linearly with respect to the number of logic

elements within the circuit.

Proof: Let M be the number of logic elements present in a circuit. Let n be
the fanin size of a logic element. MSBDD graphs are constructed to grow linearly,
with the largest MSBDD graph (n-XOR gate) having 2n + 3 nodes. The CBDD’s
graph is an interconnection of M MSBDD graphs, resulting in linear growth and a

maximum size of M x (2n + 3) nodes.

Theorem 4.2 The minimized-scalable BDD (MSBDD) maintains the behavior of an

individual logic element.

Proof: The MSBDD’s graph contains a single O-Terminal and a single 1-
Terminal. If a post-order traversal from the 1-Terminal to the root node of the

MSBDD’s graph is performed then a set of disjoint decision paths is generated. The

67

AND-product of the decision nodes along the disjoint paths are minterms in the on-
set of the logic element’s Boolean switching function. The OR—sum of these disjoint
minterms equals the sum-of-products that describes the switching behavior of the

given logic element.

Theorem 4.3 The CBDD, an interconnection of MSBDDs, maintains the behavior

of a circuit.

Proof: From Theorem 4.2, it was established that MSBDDS maintain the be-
havior of individual logic elements. The post-order traversal of a MSBDD’S graph
results in a Boolean switching function whose switching variables may be primary
inputs or internally produced signals. The Circuit’s behavior, a collection of 2-level
Boolean switching functions for all circuit nodes, can be produced by performing re-
cursive replacement of non-primary input variables by their corresponding MSBDD
switching function until all non-primary input variables have been replaced by pri-

mary input variables.

Theorem 4.4 The minimal MSBDD representation for n-input N/AND and N/OR

logic functions contain n + 3 nodes and 2n + 2 edges at most.

Proof: The output of all n-input N/AND and N /OT logic functions is directly
controlled by the logic function’s output drive. The output drive of a logic function is a
single input which directly controls the function’s output. The n-input N / AND logic
functions are O-driven because a single logic zero applied to any of the n inputs leads
to immediate output values of 1/0. Likewise, the n-input N/ OR logic functions are
l-driven because a single logic one applied to any of the n inputs leads to immediate

output values of 0/1. Assuming that a MSBDD graph representation contains a

68

maximum of two terminal vertices, the disjoint terms of the MSBDD’S logic function
are constructed from the AND-product of decision nodes within the MSBDD’s graph.
The terms have the following form: If]? - Ill—01 :r,, where x, are decision nodes given that
z:
0 _<_ N S n — 1, and f]? is an output driven decision node that is directly connected
to a terminal vertex. Let L equal the number of literals for each disjoint term of the
logic function represented by the MSBDD, such that 1 g L _<_ n. The number of edges
necessary to construct a disjoint term and connect the term’s output driving decision
node to one of the terminal vertices within the MSBDD’s graph is 1 + (L — 1) edges.
An additional edge is needed to connect the term’s non-output driving decision node
combinations to a second terminal vertex. Additionally, two value-edges are used
to connect the terminal vertices to the MSBDD output node. Given that K is the

number of disjoint terms within the logic function, the total number of edges to

minimally represent the MSBDD of a N/ AND or N / OR logic function is given by

EdgeCount = 2+f;1(1+(L,—1)) for lgLign nggn
= 2+I:Z_—:01L,- for L,-=i+1
= 2+2}

2+2n

22

The minimal number of MSBDD nodes required to represent a N / AN D or N / OR
logic function is 72. plus the number of terminal vertices (usually 2), plus a single

output node, for a minimum total of n + 3 nodes.

Assertion 4.1 For large circuits CBDDs generally result in smaller and more com-

pact representations when compared to traditional BDDs.

69

Explanation: Applications using traditional BDDS try to make logical sense of
the behavior obtained from a given input speciﬁcation for purpose the of producing
an optimized set of Boolean equations and limiting the BDD’s size (node count). The
BDD’s size is controlled by the number of circuit inputs along with the input variable
ordering.

Circuits containing many internal logic elements and a large number of inputs
(N), may result in BDD graphs comprised of 2N nodes along with N! input variable
orderings. For many applications, the time necessary to explore all N! input variable
orderings is not available, so a non-optimal input variable ordering is generally chosen.
Benchmark comparisons demonstrate that non-optimal input variable ordering results
in a BDD graph which is larger than a CBDD graph for the same circuit.

The CBDD does not analyze behavior or consider input variable ordering. For the
logic elements encountered during the input of the design speciﬁcation, the CBDD
application interconnects the MSBDD subgraphs of the corresponding logic elements.
According to Theorems 4.2 and 4.4, MSBDD subgraphs are unique and minimal in
size for the given logic elements. From Theorem 4.3, the interconnection of MSBDD
subgraphs results in a compact CBDD graph representing a Circuit’s behavior and
connective relationships. For the reasons stated, the CBDD’s graph maintains be-

havior and for large circuits it is smaller than the graph of traditional BDDS.

4.4.6 Implementation

The application program that generates a CBDD reads a BLIF-formatted netlist ﬁle
as input. The basic logic gates such as N/AND, N /OR, XOR, and NOT, when
encountered during the input phase, are converted to minimized-scalable BDDS (MS-
BDDS). These MSBDDs represent the most reduced BDDS, in terms of total node

count, for the given functional unit (logic gate) and size.

70

Figure 4.5 displays a small selection of the MSBDDs. Once the MSBDDs are
generated, their Internal node outputs are interconnected with the Input nodes of
other MSBDDs according to the structure present in the netlist. The full-adder
carry-out bit, structurally represented in Figures 4.2, 4.3, and 4.4, is represented by

the CBDD in Figure 4.7.

OR-Z 2

 

 

 

 

zl

 

 

 

 

 

 

 

 

O

 

 

 

 

 

:; C0015:

 

 

 

Figure 4.7. CBDD of FA carry-out bit.

The CBDD of Figure 4.7 models the structure of the full-adder carry-out bit.
The CBDD of the carry-out bit is an interconnection of MSBDDs which model the

behavior and structural connectivity of two 2-input AND gates along with two 2-

71

input OR gates, in accordance with the logic diagram of Figure 4.2. The MSBDDS
for all logic gates are enclosed by labeled boxes featuring the gate name and size. The
Circuit’s primary inputs, located outside of the MSBDD enclosures, are represented
by boxes containing the input name. Primary inputs are connected to the MSBDD
graphs by value-edges, depicted by heavy solid lines. Within the MSBDD enclosures
O- and 1- edges are represented by dotted lines and solid lines, respectively. Input
nodes are represented by circles enclosing the input name and interconnected by 0-
and 1- edges. Terminal nodes are represented by boxes containing a 0 or 1 value.
Internal nodes are represented by circles containing an internally produced signal
name; they are preceded by Terminal nodes connected by value-edges. Primary
outputs are represented by double-dotted circles enclosing the output name; they are

preceded by Terminal nodes.

4.4.7 Advantages and Disadvantages

In spite of the differences between the conventional BDD implementation and the
CBDD, there are some very positive beneﬁts of using the CBDD. First, the CBDD
is not affected by input variable ordering because the internal logic elements of a
digital circuit can be mapped to pre—deﬁned MSBDDs, which are already minimal
in size. Second, CBDD size (number of nodes) is directly related to the number of
logic elements present in the circuit, so exponential growth will not occur. Third,
the CBDD’s graph represents the entire circuit design, whereas the traditional BDD
utilizes multiple graphs, one for each primary output and internally produced signal.
Fourth, the CBDD’s size grows linearly because its graph utilizes Internal decision
nodes representing internally produced signals, instead of expressing the speciﬁed
node in terms of primary inputs which could result in exponential growth. The use
of Internal nodes increases the sparseness of the CBDD’S graph and provides a more

com act re resentation of the Circuit’s behavior and structure.
P

72

The drawbacks of the CBDD include loss of canonicity. However, if needed, equiv-
alence of two circuits can be determined by performing functional simulations using
their CBDDs, followed by a comparison of their results. Additionally, unlike tradi-
tional BDDS, the CBDD’S graph does not have a single path from the root node to
a terminal node. This is not good because several paths must be considered when
determining a primary output’s value. However, in some cases CBDDs beneﬁt from
this drawback; they yield more compact graphs that are represented by far fewer
nodes and edges than conventional CBDDs. Thus, traversing additional paths in the
CBDD is not costly because the maximum CBDD graph depth is less than the BDD
graph depth. Lastly, in comparison to traditional BDD implementations, CBDDs
generate a more compact representation for large circuits, but may produce larger

than normal or poor representations for small circuits.

4.4.8 Results

The ISCAS-85 benchmark circuits were chosen for an experiment to compare the
CBDD size to the size of traditional BDD implementations. The results after applying
the CBDD implementation to the benchmark circuits are summarized in Table 4.1.
Additionally, as a means of comparison, Table 4.1 provides the results of imple-
mentations used by Brace and Shen in [30, 43] for the same benchmarks. The main
entity used in comparing the BDD implementations was the BDD node count. The
BDD node count is used as a measure of performance for both CPU and memory
utilization. The term, Unable, used in Table 4.1, indicates that the corresponding
decision diagram package was unable to provide a measurement for the given circuit
due to memory limitations. In comparison to the BDD implementations by Brace and
Shen, for almost every circuit of the benchmark suite with the exception of Shen’s
FBD node measurement for circuit c7552, the CBDD resulted in a signiﬁcant node

savings. Given the size of other circuits within the benchmark suite and their cor-

73

 

 

 

Circuit #Inputs #Outputs ROBDD[30] FBD[43] CBDD
#Nodes #Nodes #Nodes

c432 36 7 30200 31195 2017
c499 41 32 49786 33214 2899
c880 60 26 7655 7761 4276
c1355 41 32 39858 33214 6411
c1908 33 25 12463 12734 9387
c2670 233 140 Unable 57767 12886
c3540 50 22 208947 88652 17874
c5315 178 123 32193 26129 26078
c6288 32 32 Unable 115607 29361
c7552 207 108 Unable 19187 37774

 

 

 

 

 

 

 

 

 

 

 

Table 4.1. Benchmark Results.

responding F BD size, it is very likely that Shen’s report of the FBD node count for
circuit c7 552 is an order of magnitude smaller than it should be.

The savings in node count exhibited by the CBDD implementation is due to
usage of Internal nodes which represent the value of internally produced signals.
The Internal nodes along with value-edges propagate the binary output of internal
logic elements to subsequent MSBDD inputs nodes at deeper levels within the circuit
structure. The small size of the CBDD is due to the fact that CBDDs grow with
respect to the number of functional units or logic elements present in the Circuit’s

structure, not the number of inputs or input variable ordering.

4.4.9 Summary

This section has described the Connective BDD (CBDD), deﬁned as a DAG inter-
connection of Minimized-Scalable BDDS. CBDDs are very economical in representing
large circuits and maintain the Circuit’s structural and connective relationships. CB-
DDS represent circuits with far fewer nodes than previous BDD implementations.

CBDDS have a few drawbacks including loss of canonicity, multiple paths from the

74

root to terminal nodes, and poor representation of small circuits. The main advantage
of the CBDD is that it does not suffer from exponential growth when the numbers of

inputs and interconnections grow.

75

 

CHAPTER 5

Behavioral-Level Switching

Activity Estimation

The accurate estimation of switching activity is an important and necessary procedure
for probabilistic power analysis tools. Behavioral-level power analysis tools have the
disadvantage of not having detailed information concerning structure and technology.
This disadvantage is combined with the challenging task of computing dynamic power
and switching activity with improved accuracy from an implementation-independent
perspective.

There are very few design tools which provide power estimation for behavioral
VHDL speciﬁcations. The average minimum error found in their power estimates is
around 10%. Switching activity is a critical parameter needed to compute dynamic
power dissipation (Equation 1.2) in CMOS circuits. A large portion of the error
experienced by power analysis tools can be easily attributed to inaccurate switching
activity (a) estimates due to factors such as reconvergent fanout and input/internal
correlations.

This chapter presents a new technique which accurately performs high—level ac-
tivity and power analysis. This approach operates at the behavioral level of design

abstraction and uses behavioral VHDL speciﬁcations as input. The developed tech-

76

nique, and associated algorithms, have been implemented in a program called the
Behavioral-Level Activity and Power Estimator (BLAPE). The details concerning
the design and implementation are discussed, along with benchmark comparisons to

demonstrate the effectiveness and capability of the new approach.

5.1 Methodology Overview

The focus of this section is to provide an overview of the assumptions, system archi-
tecture, and components used to implement the Behavioral-Level Activity and Power
Estimator (BLAPE) system. A description of all major tasks which contribute to the

process of accurately estimating switching activity is presented as well.

5 . 1 . 1 Methodology Assumptions

0 Circuit designs must be synthesizeable.

Input to the BLAPE system is in the form of behavioral VHDL speciﬁcations.
The VHDL language supports many constructs such as for- and while- loops
which are nice for simulation but may not result in a synthesizeable design. The
BLAPE system requires implementation-free and platform-unspeciﬁc design de-
scriptions which represent real circuits that are mappable to some technology.
This assumption supports the easy transfer of various benchmark VHDL spec-

iﬁcations between BLAPE and other high-level power analysis tools.

0 Circuit types are combinational.

The circuits targeted for analysis must be combinational, where all circuit out-
puts are functions of their primary inputs. This assumption simpliﬁes the con-

version of the input behavioral VHDL speciﬁcation to a set of Boolean equations.

77

0 Zero delay model.

All gate or logic element operation times are zero. Upon a change to the in-
put stream, it is assumed that steady-state logic results appear at their gate
outputs instantaneously. This assumption simpliﬁes the computation of signal

probability, but does not support the computation of glitch activity.

0 Uncorrelated primary inputs.

Primary inputs are assumed to be spatially and temporally independent. This
assumption allows the signal probability computation for each node to be a sum

of disjoint input / internal signal probability products.

5.1.2 Methodology Outline

The BLAPE system is capable of operating on two input sources. It supports the use
of behavioral / structural VHDL speciﬁcations as high-level input, along with a BLIF-
formatted gate-level netlist as low-level input. The output generated by BLAPE
consists of the switching activity (a) and node capacitance (CL) for each net, along
with a power estimate for the entire circuit. The system-level flow of the BLAPE

implementation is given in Figure 5.1.

5.2 Methodology Task Decomposition

The Behavioral-Level and Activity Power Estimator system is composed of many in-
dividual tasks and sub-tasks. In this section a task decomposition of the methodology
together with a brief description of the major tasks is given.

0 Task 1 - Transform behavioral VHDL speciﬁcation to Boolean equations.

This task performs syntax checking and determines if the VHDL input speciﬁ-

cation is synthesizeable. A data structure containing primary inputs, primary

78

 

VHDL Behavioral
Speciﬁcation

Transform VHDL to
_Beolesnﬁqueﬁens ,

 

 

 

 

 

 

 

[ Decompose Boolean? [ ISCAS & other l

Equations J benchmark circuns J

BLIF Netlist BLIF Netlist
Build CBDD

Representation

 

 

 

Compute Signal
Probabilities

Switching Activity
_. ,Pi‘iﬁﬁs ___

 

 

Figure 5.1. BLAPE methodology diagram.

outputs, and a list of Boolean equations in sum-of—products form is generated

when the VHDL input speciﬁcation is synthesizeable.

Task 2 - Decompose Boolean equations.

This task applies structure to the Boolean equations generated in Task 1 in
two possible forms. First, implicit structure, which directly models a Boolean
equation in sum-of-products form using the basic logic operations (NOT, AND,
OR) only is generated. Second, mapped structure, where the Boolean equations
are transformed to a structural speciﬁcation consisting of gates or logic elements

deﬁned by a user-speciﬁed library is produced.

Task 3 - Generate Connective BDD (CBDD).

This task converts the structural speciﬁcation generated in Task 2 to a CBDD.

79

The CBDD maintains the design’s original behavior and maps the input Circuit’s

individual logic elements to nodes within the CBDD’s graph.

0 Task 4 - Determine signal probabilities.

This task traverses the CBDD generated in Task 3 in an effort to collapse a
node’s Boolean expression, represented by a multi-level Boolean function, to
a Boolean expression with a user-speciﬁed depth, not less than 2. The ﬁnal
Boolean expression contains disjoint product terms, whose indices are primary
input and internal signal probabilities. The result is a node signal probability

that is almost independent or free of reconvergent fanout.

0 Task 5 - Estimate switching activities.

This task computes the switching activity for a speciﬁed node given the signal

probability provided in Task 4.

0 Task 6 - Compute dynamic power.

This task determines the dynamic power of the entire circuit, utilizing the
switching activity estimate (a) computed in Task 5, a computed estimate for

node capacitance (CL), and default values for supply voltage (Vdd) , and clock

frequency (fem)-

5.2.1 Transformation of VHDL into Boolean Equations

The initial stage of the BLAPE methodology requires the use of a VHDL speciﬁcation
as input. The VHDL modeling style is assumed to be behavioral, although a structural
VHDL speciﬁcation is permitted. The focus of this transformation is to provide a
systematic and efﬁcient mechanism for transforming a behavioral VHDL speciﬁcation

to a set of Boolean equations.

80

After careful consideration, it was decided that a VHDL compiler was the best so-
lution to this problem. Like Ada, C, and Pascal programming language compilers, the
VHDL compiler would be responsible for syntax analysis, bounds checking, and other
normal compiler tasks. The Altera MAX+PLUS II compiler (Version 9.01) was
selected. Useful and necessary features supported by the selected compiler include de-
termination of design synthesizeability and generation of a data structure containing
the Circuit’s primary inputs and outputs, combined with a list of Boolean equations
describing the input/output relationships of all primary outputs and internal nets.

Similar to the compilers of other programming languages, the selected VHDL com-
piler supports directives and optimizations to control the outcome of the synthesized
design. One such optimization is logic reduction, whereby the desired compact logic
representation is generated by means of technology mapping and by the removal of
redundant or unused logic. The invocation of this logic reduction option will consis-
tently generate the same reduced set of Boolean equations for various behaviorally
equivalent designs. For the purpose of this research, the logic reduction feature was
disabled. The rationale for making this decision was to preserve the design’s in-
herent logical structure, with the understanding that structural modiﬁcations to the
logic represented in the VHDL speciﬁcation directly affect the switching activity and
the power dissipated by the design. In making this decision, the BLAPE system
yields unique power and activity estimates for various designs that are behaviorally
equivalent. The goal of disabling the logic minimization option is accomplished by
setting the synthesis style to WYSIWYG and setting the MINIMIZA TION primitive
to OFF.

Figures 5.2 and 5.3 depict a behavioral VHDL speciﬁcation for a full-adder (FA)
and the corresponding Boolean expressions, generated by the Altera compiler. The
Altera compiler generates a report ﬁle containing the Boolean equations shown in

Figure 5.3. The Altera Boolean equation notation makes uses of the basic (AND / OR)

81

logic operations only. Each equation is represented by a list of product terms, where
an & separates the literals of each term and each term appears on a single line
separated by a # sign. All complemented literals are preceded by the ! character.
These Boolean equations represent the behavior of the VHDL input speciﬁcation
and are eventually used to compute switching activity in later stages of the BLAPE
implementation. Following the compilation of the VHDL speciﬁcation, the generated
Boolean equations are extracted from a report ﬁle. Next, the BLAPE implementation
converts the Boolean equations to an implicit or mapped structural representation.

The result is a gate-level netlist, discussed in the next section.

82

ENTITY FA is
port (x, y, c_in : IN BIT;
sum, c_out : OUT BIT);
END FA;

ARCHITECTURE behav of PA is
BEGIN -- behav

sum <= (x xor y xor c_in);

c_out <= (c_in and (x or y)) or (x and y);
END behav;

Figure 5.2. Behavioral VHDL speciﬁcation for full-adder.

c_in : INPUT;

x : INPUT;

y : INPUT;

c_out = _LC2_A1;

sum = _LC1_A1;
_LC1_A1 = LCELLC _EQOOl);

_EQOOI c_in & x t y
c_in & !x t !y
!c_in & !x a y

!c_in & x a !y;

##3tll

_LC2_A1
_EQOO2

LCELL( _EQOO2);
x a y
c_in & y
c_in & x;

#13:

Figure 5.3. F ull-adder Boolean equations.

83

5.2.2 Decomposition of Boolean Equations

Applying structure to a design which originates as an algorithm or set of Boolean
expressions binds the behavioral representation to a selected set and arrangement of
logic elements. Decomposing the circuit speciﬁcation to various structures enables a
designer to select a design which best satisﬁes the system-level power requirements.
The decomposition stage of the BLAPE implementation results in an implicit or
mapped structural representation, which is then transformed to a BLIF-formatted
gate-level netlist. The BLIF format was selected because of its simplicity and com-

patability with the Berkeley SIS design tool.

5.2.2.1 Implicit Structure Representation

The implicit structure representation mimics the two-level sum-of-products form of
the Boolean expressions. This representation places connective restrictions on the
circuit behavior, using the basic logic gates (NOT, AND, OR) only. Given the full-
adder Boolean equations (Figure 5.3), the resulting implicit structure netlist and
corresponding logic diagram are illustrated in Figures 5.4 and 5.5.

In many instances, the resulting implicit structure representation does not yield
the Optimal structure because of unrealistic, although logically correct, logic elements
which exceed input number restrictions. For this reason, the implicit structure should
be used as an initial or starting structural design solution, which can be improved

iteratively by the use of technology mapping procedures as described in the next

section [45].

84

.inputs c_in x y
.outputs c_out sum

.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.end

not a=c_in 0=NOTc_in

not a=y O=NOTy

not a=x 0=NOTx

bufl a=_LC2_Bl 0=c_out

buf1 a=_LC1_B1 0=sum

and3 =NOTc_in b=x c=NOTy 0=t0
and3 a=NOTc_in b=NOTx c=y 0=t1
and3 a=c_in b=x c=y 0=t2

and3 a=c_in b=NOTx c=NOTy 0=t3
or4 a=t0 b=t1 c=t2 d=t3 0=_LC1_B1
and2 a=c_in b=x 0=t4

and2 a=c_in b=y 0=t5

and2 a=x b=y O=t6

or3 a=t4 b=t5 c=t6 0=_LC2_Bl

Figure 5.4. Netlist for implicit full-adder structure.

LC2_Bl c out

 

Figure 5.5. Logic diagram for implicit full-adder structure.

5.2.2.2 Mapped Structure Representation

The mapped structure representation is a structural circuit speciﬁcation composed of
user-speciﬁed gates or logic elements. This speciﬁcation is generated by an automated
process, known as technology mapping. The technology mapping process transforms
an optimized set of technology independent logic equations into a feasible circuit which
satisﬁes certain area, delay, and power constraints. The role of technology mapping
is neither to radically change the Circuit’s structure nor to reduce the Circuit’s depth
along the critical path, but to perform the best gate selection for implementing the
logic equations, given the system-level area, delay, and power criteria [39].

For the purposes of this research structural mapping is used as a means to gen-
erate a realistic circuit speciﬁcation. As illustrated by Figure 1.8, various circuit
structures or decompositions yield different switching activities, and hence different
power dissipations. This stage of the BLAPE process supports the computation of
improved switching activity by producing a gate-level structural speciﬁcation which
closely resembles the desired circuit. As a result, the activity and power estimates are
more meaningful and consistent with eventual circuit- or transistor- level switching
activity and power approximations.

The BLAPE implementation utilizes the Berkeley SIS tool’s technology mapping
procedure. During the mapping stage of the BLAPE process, an implicit structural
circuit speciﬁcation is used as input to the SIS tool, along with a user-speciﬁed gate
library. The map operator is applied, resulting in a behaviorally-equivalent circuit
that can be optimized for area and delay. For example, given a gate library composed
of NOT- and NAND- gates only, a technology mapping of the full-adder implicit
structural representation, shown in Figure 5.4, results in the mapped structural rep-
resentation displayed in Figure 5.6. Additionally, the corresponding logic diagram for

the full-adder’s mapped structure representation is given in Figure 5.7.

86

.inputs c_in x y
.outputs c_out sum

.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.end

nand2 a=c_in b=y 0=[236]

nand2 a=x b=y 0=[238]

nand2 a=c_in b=x 0=[234]

nand3 a=[236] b=[238] c=[234] 0=c_out
not a=y 0=[182]

not a=c_in 0=[181]

nand3 a=[182] b=x c=[181] 0=[226]

not a=x 0=[183]

nand3 a=y b=[181] c-[183] O=[228]
nand3 a=c_in b=x c-y 0-[230]

nand3 a=[182] b=c_in c=[183] 0=[232]
nand4 a=[226] b=[228] c=[230] d=[232] 0=sum

Figure 5.6. Netlist for mapped full-adder structure.

 

Figure 5.7. Logic diagram for mapped full-adder structure.

87

5.2.3 Building the CBDD Representation
5.2.3.1 CBDD Selection Rationale

The Connective Binary Decision Diagram (CBDD) is a directed acyclic graph (DAG)
which serves as a graph-based behavioral representation capable of modeling a cir-
cuit’s input / output relationships and internal connectivity. The CBDD was selected
due to its linear growth and preservation of circuit structure. Because it uses the
CBDD, the BLAPE implementation is capable of processing larger circuits in com-
parison to the Berkeley SIS tool. Many CAD applications that use traditional BDDS,
speciﬁcally probabilistic power analysis tools, are limited to modest-sized circuits.
CAD applications generally encompass a class of problems, known as nondeter—
ministic polynomial or NP -complete problems [39]. For this class of problems there
are no known feasible or polynomial-time solutions. The solutions to such problems,
which use conventional BDDS, are computationally expensive and easily capable of ex-
hausting a typical workstation’s memory system. The time complexity of algorithms
which use decision diagrams is O (f (|V| , IE |)), where f is an operation performed
on a graph such that W] and IE | represent the size of the graph’s vertex and edge
sets, respectively. A reduction in both WI and |E| is appealing to CAD applications
as it reduces time complexity. An additional reason for selecting the CBDD for this
research is due to its improved representation of large circuits. When compared to
other conventional BDD implementations, using ISCAS-85 benchmark circuits, the
CBDD’S graphs were signiﬁcantly smaller. An average size reduction of one order of

magnitude was achieved [46].

5.2.3.2 Generation of CBDD Graph

The CBDD’s graph is produced during the structural speciﬁcation input phase.

The structural speciﬁcation is a BLIF-formatted netlist which may be generated by

88

BLAPE or selected from the ISCAS—85 or other benchmark suites. Each logic gate,
along with its inputs and output is transformed into a minimized-scalable BDD (MS-
BDD). For the basic logic gates (N/AND, N/OR, XOR, NOT), there are routines
(Algorithm 1) which construct the behavior of the speciﬁed gate. These routines
build MSBDDs with the least number of nodes and edges for the speciﬁed fanin size.
The node and edge data structures are placed into a lookup table of lists for the.
purposes of maintaining connectivity.

During the input phase, an adjacency list is built. The adjacency list contains
connection (wire) information, which describes node to node reachability and is used
for network levelization and other tasks. The CBDD is an edge-driven DAG rather
than vertex or node driven. Traversals and graph inquiries rely on routines which
search for and manipulate sets of edges. The precise modeling of signal ﬂow is made
possible by the CBDD’s value-edge. Additionally, the value-edge is responsible for the
linear growth of the CBDD’s graph. The CBDD representations corresponding to the
implicit and mapped full-adder structural representations shown in Figures 5.4 and
5.6 are illustrated in Figures 5.8 and 5.9, respectively. The mapped structural CBDD
has fewer nodes and edges. It uses an interconnection of NOT and N AND gates only.
Later sections of this chapter will demonstrate that these two full-adder structures

representations yield different switching activities and different power dissipations.

89

 

  
   
  

    

M )T

AND!
“’9
l'
, n I
:I -' f @

 

 

 

 

 

 

 

ANIHJ

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5.8. Implicit structural CBDD full-adder.

90

INOT

 

@ [182] x

— NAND.

II n ma .

NAND-3

 

 

 

 

 

[l83]

 

NAND}

NAND-l

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[l82]

[IRS]

[232]

 

[234]

 

NAND!
u

X

 

 

 

 

 

 

[233]

 

Figure 5.9. Mapped structural CBDD full-adder.

91

 

 

 

 

 

 

 

5.3 Signal Probability Computation

This section presents the most important component of the BLAPE implementa-
tion, the Signal Probability Computation Algorithm (SPCA). An overview concerning
the assumptions, origin and details of the SPCA is given. The SPCA was inspired
by concepts associated with the fault analysis and digital circuit testability ﬁelds.
The SPCA is an improvement over previously mentioned probabilistic and statistical
approaches for computing signal probability because it considers the correlation of
internal signals caused by reconvergent fanout. In the fault analysis and digital circuit
testability areas, reconvergent fanout is a large source of error when computing node
controllability. The 1-controllability of an edge in the circuit graph is deﬁned as the
fraction of 1’s in the truth table of its function. Its 0—controllability is deﬁned as one
minus its l-controllability [47].

A node’s exact controllability computation requires the complete ﬂattening of
a multi—level expression to a two-level or sum-of—products expression, comprised of
primary inputs only. The ﬂattening aspect uncoils the correlated internal signals,
resulting in an expression composed of independent primary inputs. This process
offers an exact solution, but is hampered by an exponential time complexity, thus
limiting the technique’s usefulness to small- or modest- sized circuits.

The exact computation of signal probability encounters the same dilemma expe-
rienced when computing exact controllability. Exact signal probability or control-
lability offers the best or most appropriate solution for a node’s Boolean equation
since each has zero error. When addressing a network as an interconnection of logic
gates, error due to reconvergent fanout arises (Figure 2.2). Reconvergent fanout may
cause an overestimate (Equation 2.3) or underestimate in the computation of sig-
nal probability, due to the statistical dependence of signals arriving at a gate. The

SPCA approximates signal probability, which is less accurate than exact computation.

92

The approximation scheme assumes that the dependence resulting from reconvergent
fanout is reduced the further the distance between a gate’s fanout and the resulting
reconvergent fanin gate. This assumption is used by fault analysis heuristics that
compute node controllabilities of large circuits [47]. The assumption allows SPCA to
assume that after a certain depth (distance), reconvergent fanins may be considered
to be almost-independent or very close to statistically independent. As a result, the
signal probability approximation approach is much faster and less complex than the

exact approach.

5.3. 1 Overview

The SPCA is responsible for cost-effectively computing the signal probability at each
node in the network. The algorithm has four inputs: 1) node identiﬁer, 2) CBDD,
3) input signal probabilities, and 4) depth-accuracy parameter (k). The input signal
probabilities are a user-speciﬁed set of real numbers which represent the probability
that inputs are logic one or P(x; = 1). The depth-accuracy parameter (k) is the
depth or level to which the SPCA collapses or ﬂattens the network. A larger depth-
accuracy yields a more accurate signal probability estimate at the expense of time
and memory resources. A structural overview of the SPCA is illustrated in Figure
5.10.

The SPCA involves the coordination of four main components: 1) Network Lev-
elization, 2) Disjoint Post-Order Boolean Equation Generation, 3) IPR Cubeset Gen-
eration, and 4) Signal Probability Computation. The Network Levelization compo-
nent is responsible for ordering the CBDD nodes according to their degree of logical
independence. The Disjoint Post-Order Boolean Equation Generation component is
responsible for traversing the CBDD from a given node with the intent of producing a
disjoint post-ordered Boolean equation that is ﬂattened to k (depth-accuracy) levels.

The [PR Cubeset Generation component makes sense of the post-ordered Boolean

93

f.—
/ \\

\ /T—‘\\ “\ Input Signal
Qde.) C0399 Ck) (ProbabilitieD
_. \T ‘1 \T/
[ Levelize Network ]

Disjoint Post-Order Boolean
Equation Generation

[ IPR Cubeset Generation ]

 

 

 

 

 

 

 

 

 

Signal Probability
Computation

P.(nodei)

Figure 5.10. SPCA structural overview.

equations; it converts the equations to a set of Integer Pair Representation (IPR)
cubes, in inﬁz form. The Signal Probability Computation component reads the IPR

cubeset and approximates the given node’s signal probability.

5.3.1.1 Network Levelization

Before the generation of Boolean equations takes place, each node in the network
is leveled. Levelization is the process of ordering the internal/output nodes of the
CBDD’s graph in terms of their depth or distance from the primary inputs. The
nodes at lowest level of depth, level zero, are the primary input variables. The
levelization process is managed by the Network_Levelization procedure (Algorithm
2); this process is necessary because the approximated signal probability of internal

nodes is repeatedly used for future signal probability computations involving deeper

94

nodes. The Network-Levelization procedure relies on a simple recursive algorithm
which performs a depth-ﬁrst search of the CBDD’s graph. The search runs in 0(n +
e) time, where n and e are the number of nodes and edges of the CBDD’s graph,
respectively. The depth-ﬁrst search is a pOpular and well known search algorithm
used for traversing graphs. Detailed analysis concerning the complexity of the depth-

ﬁrst search algorithm is reported in [48].

95

 

 

Procedure: Network_Levelization

input : v - CBDD node
input : lev - Level of node
1 if lev > v.lev then
2 v.lev <— lev
3 end
4 a (— v.adj
5 while a 76 NULL do
6 Networkievelization(a.v, lev + 1)
7 a (— a.nea:t
s end

 

Algorithm 2: Levelize network procedure.

Correctness of NetworkLevelization procedure

Claim 5.1 Network-Levelization(v, lev) produces a dependency ordering of the nodes

in a directed acyclic graph.

Explanation: Assume that v is initially the DAG root node, and lev is initially
set to zero. Let a E DAG. Upon the initial invocation of the Network_Levelization
procedure all node levels are zero. Each invocation compares the node level, v.lev, to
the stack argument, lev, at line 1. Line 2 updates v.lev (node v’s level) with the larger
level value from the stack when the comparison condition of line 1 is satisﬁed. For
every invocation of N etworkievelization, node variable a is assigned to the adjacency
list of node variable v on line 4. The adjacency list assigned to variable a contains all
descendents of node variable v. The nodes adjacent to node v are visited sequentially
by the execution of lines 5—7. Each node having descendents experiences a recursive
invocation of the N etwork_Levelization procedure at line 6. Each recursive invocation
traverses to a deeper level within the CBDD’s graph. The traversal ﬁnds a deeper
generation of node v’s lineage and results in an increment of the stack argument lev.

If some node y experiences multiple recursive invocations, y.lev is only updated when

96

 

a larger level is encountered causing each node to be updated with the appropriate
dependency level value. Recursive invocations terminate when all paths from the root

node to all primary output nodes have been traversed.

@
3’42 .

Figure 5.11. DAG of FA carry-out bit network.

 

 

 

Node Adj acency List Level
root a, b, c_in —1
a 21, 22 0

b 21, 22 0
c_in 23 0
21 c_out 1
22 23 1
23 c_out 2
c_out 3

 

 

 

 

 

 

 

 

Figure 5.12. Carry-out bit adjacency list and levelization.

The inputs to the Network_Levelization procedure include: 1) CBDD node and 2)

node level. The procedure is initially invoked with the root node of the given CBDD,

97

with the node level parameter set to zero. The procedure’s output is the level ordering
for each of the CBDD’s internal and output nodes. Figure 5.12 shows an example
levelization for the mapped full-adder carry-out bit network. The DAG modeling
just the primary input, gate and primary output interconnections maintained by the
CBDD is given in Figure 5.11. The network node adjacency list is illustrated in Figure
5.12. The results of the network levelization example demonstrate that primary inputs
are the most independent nodes; they appear at level zero. Due to their dependence
on other signals, internal signals and primary outputs appear at deeper levels of the

network.

5.3.1.2 Disjoint Post-Order Boolean Equation Generation

The generation of disjoint Boolean equations is a key task necessary to the computa-
tion of signal probability. The procedure described in this section produces a disjoint
Boolean equation, where all terms are mutually exclusive similar to the minterms of
a truth table. The Post-Order procedure, Algorithm 3, is a simple recursive rou-
tine that ascends from some node (v) to a user-selected depth-accuracy (k) within
a CBDD’s graph. The goal of the Post-Order procedure is to construct a disjoint
post-ordered Boolean equation composed only of literals that appear a maximum of

1:: levels above node v’s level within the CBDD’s graph.

98

 

Procedure: Post-Order

input
input
input
input
output

1 begin

: CBDD - CBDD of network

: v - Vertex

: d - Vertex depth

: lent - Level counter

: POE.Stack - Post ordered expression stack

2 EdgenS'et (— GetJn_Edges(CBDD, v)
3 for V E E Edge_set do

4 if E.edge_type = ZERO_EDGE or E.edge_type = 0NE_EDGE
then

5 tcnt (_— tent 'l‘ 1
6 eV (— Get.Source_Verte$(CBDD, E.vertea:)
7 Sign (— (E.edge_type = ZERO_EDGE ? FALSE : TRUE)
3 Val (— (E.edge_type = ZERO_EDGE ? — eV.id:r : eV.id:r)
9 if eV.verte:z:_type 2 INPUT then

10 Push(POE _S tack, Val)

11 else

12 if d < depth_accuracy(k) then

13 T <— GetJerminal_Vertex(CBDD, eV, Sign)

14 Post.0rder(CBDD, T, d + 1,0)

15 else

16 Push(POE _S tack, Val)

17 endif

13 endif

19 Post_0rder(CBDD, eV, d, lcnt + 1)

20 if lent > 0 then

21 Push(POE_Stack, *)

22 end

23 end

24 if tent > 0 then

25 Push(POE_Stack, +)

26 end

27 end

23 end

 

 

Algorithm 3: Post-order CBDD traversal.

99

 

The design of the Post-Order procedure is the result of modiﬁcations to the tradi-
tional post-order binary tree traversal. The post-order traversal was selected because
it’s a very effective means for circumnavigating a graph to produce meaningful ex-
pressions. The most common implementation of the post-order traversal algorithm is
recursive, where all un-traversed node edges are visited ﬁrst, followed last by a node
visit(output). The result is an expression in postﬁx notation. Given the expression
tree, Figure 5.13, describing the full-adder carry-out bit Boolean equation (Equation

3.6), the resulting postﬁr expression is ab * cab + *+.

Figure 5.13. Expression tree for full-adder carry-out bit.

Correctness of Post-Order procedure

The Post-Order procedure arguments include 1) a CBDD, 2) vertex v, 3) invo-
cation depth d and 4) node counter lent. Argument v is a node within the CBDD’s
graph, the start point of the traversal. Argument d is the procedure’s invocation depth

given node v. The lent argument tracks the number of Output nodes encountered

100

during the procedure’s recursive invocations for a given vertex v.

Claim 5.2 The Post-0rder procedure generates a valid post-ordered Boolean equa-

tion.

Explanation: The Post_Order procedure generates a post-order Boolean ex-
pression composed of Boolean AN D / OR operations accompanied by literals that rep-
resent the value of individual decision nodes. The procedure assumes that all logic
operations have two inputs which alleviates the need for delimiters. The Post_Order
procedure is invoked with the vertex depth (d) and the level counter (lent) argu-
ments. Initially, both arguments are set to zero. The vertex, v, is initially set to the
l-Terminal of the speciﬁed MSBDD within the CBDD’s graph.

The procedure begins by collecting the incoming edges of node v on line 2. The
incoming edge set of node v is traversed by lines 3-27. For each iteration of the
edge set, line 4 checks if the given edge is a 0—edge or 1-edge. If the line 4 condition
is satisﬁed, then lines 5-23 are encountered, otherwise lines 23—26 are encountered.
Lines 24-26 Push the Boolean OR symbol to the output stack if at least one pair of
post-order arguments have been previously added to the stack. The block between
lines 5 and 23 allow the Post_Order procedure to ascend from a given node to its
predecessor node. A counter tracking the number of post-order argument pairs per
invocation is maintained on line 5.

To assist with the ascending traversal from node v to its predecessor, line 6 lo-
cates the head vertex of the current edge, by invoking the Get_Source_Vertex routine.
Based on the current edge type, the sign variable is updated in line 7. The sign
variable contains the Boolean phase of the head source vertex. The index value of
head source vertex is determined on line 8. A negative index value indicates that the
head vertex literal is complemented. If the head vertex models a primary input, then

lines 9-10 add the head vertex index value of line 8 to the stack. Otherwise, the head

101

source vertex models an internal signal and is dealt with in lines 12-17. If the depth
of the search is less than the depth-accuracy (k) parameter a recursive invocation
takes place on line 14; this allows higher exploration into the CBDD’s graph. Line
13, invokes the Get-Terminal_Vertex routine which directs the upward search from
the edge’s source vertex to a Terminal vertex in the preceding MSBDD. The Ter-
minal is chosen based on the literal sign value. Lines 15—17 place the source node’s
literal value on the output stack. Line 19 performs a second recursive invocation and
increments the level counter argument. The second invocation allows for the upward
traversal within the subgraph of the new MSBDD resulting from Line 13. Lines 20—22
build the Boolean product terms by pushing a Boolean AND symbol to output stack
when the second literal (decision node) is encountered during the recursive invocation.
Lines 24-26 push the Boolean OR operator symbol to the output stack just after the
pairs of product terms encountered during the upward traversal from the Terminal
node. The repeated interaction of lines 2-27 result in an output stack that contains
product terms whose literals are positioned k levels or less above the speciﬁed input
vertex argument within the CBDD’s graph. All terms are described using 2—input
Boolean AND operators and each post-order pair of terms is followed by a Boolean

OR symbol on the output stack.

Analysis of Post_Order procedure

Traditional post-order BDD traversals explore only two edges per node because of
the binary nature of the BDD’s implementation. The Post-Order procedure for the
CBDD must consider two edges per Input node and B edges per Internal node, where
B is the maximum fanout of an internally produced signal. Let G be the number of
logic elements in the circuit. According to the analysis of the Mk_N AN D procedure,
a B—input MSBDD NAND representation has B + 3 nodes and 25 + 2 edges. The

102

fanout of the NAN D MSBDD’S Output node results in a maximum of B value-edges,
yielding a maximum of 3B + 2 edges per logic element. The Post_Order procedure
is recursively applied on a per node basis. The traversal encounters a maximum of
Vmax nodes and Ema, edges which are C x (B + 3) and G x (3B + 2), respectively.
For each recursive iteration of the Post_Order procedure, assume that the oper—
ations at lines 2-8 are always performed. These operations run in constant time,
with a total constant time of 0(7). The if-then-else block at lines 9-18 runs in 0(4)
time, by assuming that lines (9, 12, 13 ,14) always run, yielding the longest possible
thread through that block. The recursive invocation on line 19 runs in 0(1) time,
and the if-then blocks at lines 20—22 and lines 24-26 both run in 0(2) time for a
total constant time of 0(4). The block of lines from 3-27 encloses a total constant
time of 0(7) + 0(4) + 0(5) = 0(16). The search can be viewed as a traversal of all
edges with visitation to all nodes in the ascending ancestry of the speciﬁed start node.
The largest number of invocations of the Post_Order procedure for a given node is
Vmax +Emaz, with each invocation costing a maximum of 0(16) time. Thus, the worst
case running time of the Post_Order procedure is 0 (16 x (Vm + Ema$))- Given the
user-selected depth-accuracy parameter, k, and the maximum logic element fanout,

B, the overall worst case running time of the Post_Order procedure can be expressed

asO(16x (45+5) xG).

 

 

 

 

 

 

 

 

 

 

 

Post — Order BoolearTEquations
Ida: Node k:1 [€22 k:3
1 c.in - - -
2 a
3 b - - -
4 21 32* 32* 32*
5 22 23(-2)*+ 23(—2)*+ 23(-2)*+
6 23 15* 123(—2)#+# 123(—2)*+4
7 c_out 46(-—4)*+ 32*15*(-2)(—3)2*+t+ 32*123(-2)*+*(—2)(—3)2*+*+

 

 

 

 

Table 5.1. Post-order Boolean equations for FA carry-out bit.

103

After invoking the Post-Order procedure with the full-adder carry—out bit CBDD,
Figure 4.7, the resulting post-ordered Boolean equations are shown in Table 5.1 for k
= 1, 2, and 3. From inspection, as the depth-accuracy parameter increases the length
of the post-order equations increase as expected. At the request of a deep network
ﬂattening (large k), nodes having large fanouts (many value-edges) contribute to a
dramatic increase in the run-time of the procedure. Thus, a deeper equation ﬂattening

requires longer execution times and obviously a greater demand on memory resources.

5.3.1.3 IPR Cubeset Generation

The IPR Cubeset Generation component is responsible for converting the post-ordered
Boolean equations, generated by the Post-Order procedure, to a collection of IPR
(Integer Pair Representation) cubes in inﬁz form. IPR is a very compact Boolean
equation representation system, developed by Diaz and Jimenez in [49]. A cube is
similar to a product term. It may represent one or more minterms of a switching
function. It consists of a pair of integers: the position and expansion. The position
integer represents the literals of the individual switching variables and the expansion
integer indicates whether a literal is in use or if it is a don’t care variable [49]. A
cubeset is a list or collection of cubes. For the purposes of the SPCA, cubesets are
used to efﬁciently represent the behavior of switching functions.

The Generate_IPR-Cubeset procedure, Algorithm 4, is responsible for generating
an inﬁx Boolean expression in terms of IPR cubes. The procedure’s input argument
is a post-ordered Boolean expression stack that is composed of Boolean Operators
(AND/ OR) and literals only. It is assumed that all operators are binary, requiring
two inputs. This assumption simpliﬁes the conversion of postﬁr to inﬁr notation

because delimiters are not necessary.

104

 

 

Procedure: GenerateJPR-Cubeset
input : I N _Stack - Input expression stack

output : I PR.Cubeset - IPR cubeset

1 begin

2 while I N -Stack Not Empty do

3 repeat

4 item (— Pop(IN_Staek)

5 if item.type = OPERATOR then
6 Operator (— item

7 else

8 Push(E val _Expr_S tack, item)
9 endif
10 until item = OPERATOR
11 iteml (— Pop(Eval_Expr.Staek)

12 item2 (— Pop(Eval_Expr_Staek)
13 result (— Per f ormJ PR.0peration(0perator, iteml, item2)
14 Push(Eval_Expr_Stack, result)
15 end

16 I PR_Cubeset <— Pop(Eval_Expr_Stack)
17 end

 

 

Algorithm 4: IPR cubeset generation.

Correctness of GenerateJPR_Cubeset procedure

Claim 5.3 The Generate_IPR_Cubeset procedure generates a Boolean expression in

inﬁx notation.

Explanation: The GenerateJPR-Cubeset is invoked with a post-ordered Boolean
expression input stack. Line 2 checks the length of the input stack. While the stack is
not empty the procedure enters lines 3-10, where it continuously removes items from
the stack until an operator (AND/OR) is encountered. Line 4 performs the Pop
command which removes the next item from the stack. If the item is not an Operator
then it must be a literal. The literal is Pushed onto the evaluation expression stack
at line 8. If an operator is encountered by the check at line 5, then the item variable

is updated with the operator value, line 6. Following the update, the repeat-until

105

loop is broken at line 10, with processing continuing to line 11. The repeat-until loop,
lines 3-10, is iterated two times and loads two literals on the evaluation expression
stack, due to post-order pairing of literals in the Post..Order procedure.

Lines 11 and 12 remove the top two items from the evaluation expression stack.

Line 13 invokes the PerformJPR-Operation routine with the two items from the evalu-

ation expression stack and the operator acquired from line 6. The PerformJPR_Operation

routine converts the item arguments to IPR cubes and performs a Boolean (AN D / OR)
IPR operation on the two item cubes. The result of the operation is Pushed onto
the evaluation expression stack (line 14). The procedure returns to line 2 and tests
the size of the input stack. While the input stack size remains nonzero processing
continues as previously described. When the input stack becomes empty, processing
continues to line 16, where the ﬁnal IPR cube inﬁx Boolean expression result is re-

trieved from the evaluation expression stack and returned to the invoking procedure.

Analysis of GenerateJPR_Cubeset procedure

The complexity of the GenerateJPR.Cubeset is more diﬂicult to compute than it
appears. The duration of the outer while-loop, lines 2—15, must be determined; this
duration is actually the size of the post-order input expression stack (in_si2e). The
exact computation of such a value is circuit speciﬁc. However, the derivation for the
worst case in..si2e approximation is presented in the following test.

Let B equal the maximum number of logic element inputs. Given the structure
of a B-input MSBDD, the post-order expression size is determined by summing the
number of decision node literals with the number of Boolean (AND/ OR) operations
encountered during the post-order traversals. For a single level (11:21) ﬂattening of

a typical MSBDD (N /AND, N /OR, XOR), a worst case approximation for in_si2e

(input stack size) given B inputs, is computed as:

106

[3 5-1
in_si2ek:1 2 2i + 22' +s—1
i=1 i=0 '
v V #ORs
#literals #ANDs

= ;ﬁ(1+ﬂ)+%ﬁ(ﬂ—1)+ﬁ—1
ﬁ2+ﬁ ﬁ2+ﬁ
2 + 2 "1

W W
#literals #operations

2 B2+B-1.

 

For multi-level circuits, the Post_Order procedure will expand the CBDD into an
input stack expression whose size depends on k, the depth-accuracy parameter. Let
L, the literal count for a B-input MSBDD, equal 927%. Let C, the number of Boolean
operations for a B—input MSBDD post-order expression, equal ﬂ—Ztg — 1. Let F, equal
L+C or B2+B — 1. The resulting input stack expression size can be modeled using the
following recurrence relation: in.si2e,c = L x in_si2ek_1 + C. After a few iterations

of the recurrence relation the solution was determined to be

in.si2e,c = F for k =1
2 L-F+C for 10:2
2 L-(L'F+C)"‘2+C for k>2

 

2 litre ((WgW’” — 1)'€-2 + 1) — 1 for k > 2

The condition on line 2 runs in constant time, 0(1). The Push/POp Operations
at lines 4, 8, 11, 12, 14, and 16 run in constant time, 0(1). Lines 5—9 run in a
maximum of 0(2) time. The PerformJPR-Operation routine runs in 0(G) time,
where G is the number of circuit inputs plus the number of circuit logic elements.
The while-loop, lines 2-15, executes a maximum of 152938 times. The repeat-until
block, lines 3-10, always runs twice, with a total time cost of 0(8). Lines 11-14,
result in a running time of 0(1) + 0(2) + 0(G) + 0(1) z 0(G + 4). The total

running time for each iteration of the while-loop, lines 2-15, is 0(1) + 0(8) + 0(G +

107

4) z 0(G + 13). Given that the while-loop iterates %§L times, the worst case

running time of the GenerateJPR_Cubeset procedure is 0 (% x (G + 13)) or

o(§[é§t—5((ﬁ‘—+?g‘iﬂ’i—1)k-2+1)—1]x(G+13)),fors>1andk>2.

 

 

 

 

 

Step Input Expression Stack Evaluation Expression Stack

0 32*123(—2)t+*(—2)(—3)2#+*+ -

l 2t123(—2)*+t(—2)(—3)2¢+*+ 3

2 *123(-2)*+*(—2)(—3)2*+#+ 32

3 123(—2)*+*(-2)(—3)2*+*+ (31:2)

4 23(—2)*+*(—2)(—3)2*+t+ (3*2)l

5 3(—2)*+*(—2)(—3)2t+*+ (3:2)12

6 (—2)*+*(——2)(—3)2*+*+ (3:2)123

7 t+*(—2)(—3)2*+*+ (3*2)123(—2)

8 +*(—2)(—3)2*+*+ (332)12(33(—2))

9 *(—2)(—3)2 a: + t + (3 at 2) 1 (2 + (3 t (-—2)))

10 (—2)(—3)2*+¥+ (33:2) ((23: l) + (3t(—2)* 1))

ll (—3)2*+t+ (33:2) ((2t1) + (3*(—2)*1))(—2)

l2 2e+t+ (31:2) ((231) + (3*(-—2)* 1)) (—2) (—-3)

13 at + 3+ (3 t 2) ((2 #1) + (3 t (—2) at: 1)) (—2) (—3) 2

14 + 3 + (3 t 2) ((2 *1) + (3 t (—2) a: 1)) (—2) ((—3) t 2)
15 2+ (3 t 2) ((2 *1) + (3 t (—2) at 1)) ((—2) + ((—3) It 2))
16 + (3 at 2) (((—3) a: 2 It 1) + (3 at: (-—2) t 1))

17 — (332)+((-3)t2*1)+(3t(—2)*1)

 

 

 

 

 

 

 

Table 5.2. Output from GenerateJPR_Cubeset procedure.

The GenerateJPR_Cubeset procedure returns an evaluation stack which contains
the terms of a disjoint Boolean equation that is in sum-of-produets form. Each term is
represented by an IPR cube. The full-adder carry-out bit primary output, c_out, pro-
duced by the Post_Order procedure, Table 5.1, is used in an example to demonstrate
the conversion of post-order Boolean equations to a sum-of-products equation in inﬁx
form. Table 5.2 shows the stages of the input expression stack (c_out) and the eval-
uation expression stack as their contents are modiﬁed by the GenerateJPR_Cubeset
Procedure’s postﬁx to inﬁx conversion. If a node index to literal mapping, Table
5-1, is performed on the ﬁnal evaluation expression stack output, the resulting inﬁx
expression for c_out is ab + abc + abc. The ﬁnal expression is composed of disjoint

teI‘rns only.

108

5.3.1.4 Computation of Signal Probability

The Signal Probability Computation component is responsible for computing the sig-
nal probability for a speciﬁed node within the CBDD’s graph. This task is im-
plemented by the Compute_Signal_Probability procedure (Algorithm 5). Using the
disjoint Boolean expression IPR cubes produced by the GenerateJPR-Cubeset pro-
cedure, the ComputeSignaLProbability procedure determines the signal probability
of each cube within the IPR cubeset and sums all IPR cube signal probabilities.

The result is the signal probability for the node represented by the disjoint Boolean

expression input.

 

Procedure: Compute_Signal_Probability

 

input : I PR.Cubeset - IPR Cubeset
input : SP - Signal probability array
output : Expr_SP - Expression signal probability
1 begin
2 for V C E I PR.Cubeset do
3 T_prob +— 1.00
4 for idx E 1..IPR.Cube_Length do
5 EXP (— (C.exp[idx] AND mask[idx])
6 POS (— (C.pos[idx] AND mask[idx])
7 if EXP = 0 then
8 I_prob (— (POS > 0 ? SP[idx].p : SP[idx].q)
9 T_prob (— T .prob * I .prob
10 end
11 end
12 Expr.SP <— Expr-SP + I_prob
13 end
14 end
\

 

 

Algorithm 5: Signal probability computation procedure.

109

Correctness of Compute_Signal_Probability procedure

Claim 5.4 The Compute_Signal_Probability procedure computes the signal probability

for a disjoint Boolean expression.

Explanation: The Compute_SignalProbability procedure advances through
the list of IPR cubes (terms) within the outer for-loop, lines 2-13. Line 3 initializes
the cube signal probability to 1. The inner for-loop, lines 4-11, iterates over the
entire signal variable space for the current IPR cube. The goal at this point is
to determine the variables used by the cube. Variables in use have a 0 in their
corresponding bit position of the cube’s expansion integer. Line 5, using bitwise
AND operations, determines if the variable is in use. Lines 6 determines the phase
of the variable using bitwise AND operations. The if-block, lines 7-10, is entered
if the variable is in use, otherwise processing continues within the inner for-loop for
the next variable. Line 8 reads the signal probability for the corresponding variable
from signal probability array based on the Boolean variable’s phase. Line 9 performs a
multiplication of the variable’s signal probability with the current cube’s intermediate
signal probability product. Upon termination of the inner for-loop, line 9 computes
the signal probability of a cube in the following manner: P,(eubek) = mg! P,(v),
where Vk is the set of signals in use by cubek. Line 12 performs a summatioric of the

cube signal probabilities. Upon termination of the outer for-loop, line 12 computes

the signal probability for the IPR cubeset as

#cubes
P,(eubeset)= Z P3(eube,-).

i=1

110

Analysis of Compute_Signal_Probability procedure

The running time of the Compute_Signal_Probability procedure heavily depends on
the size of the IPR cubeset (number of terms). In the worst case the maximum number
of terms is 2N , where N is the number of circuit inputs. The system supports a user-
speciﬁed depth-accuracy (k) parameter which limits the maximum number of terms
when k is less than the depth of the input circuit. Given that the circuit is composed
of B-input logic elements, the worst case number of terms is approximately equal to
Bk“, where B is the fanin size of the logic element.

The outer for-loop, lines 2-13, will iterate a maximum of B"+1 times. Line 3
runs in constant time, 0(1). Lines 5—6 run in constant time, 0(2). The if-block,
lines 7-10, in the worst case runs in 0(3) time. The duration of the inner for-loop
is equal to number of logic elements plus the number of primary inputs within the
circuit, designated as R. Therefore, the running time of the inner for-loop is 0(5R).
Line 12 runs in constant time, 0(1). The worst case running time of the procedure is
0((5R+2) x Bk“). From the analysis it’s apparent that an increase in k dramatically
affects the running time of the procedure.

The more difﬁcult tasks of computing signal probability were accomplished by
the Post-Order and GenerateJPR-Cubeset procedures. The tasks performed by the
Compute_Signal_Probability procedure are much easier to execute. An example which
demonstrates the work performed by the Compute_Signal_Probability procedure is
illustrated in Table 5.3. The IPR cubeset for the carry-out bit is used as input in

1

this example and the signal probabilities for primary inputs a, b and c, are %, :4- and

%, respectively.

5.3.1.5 SPCA Component Interconnection

The SPCA is an interconnection of procedures which ﬂatten the input circuit’s net-

work to a user-selected depth and efﬁciently approximates signal probability for all

111

 

 

 

 

 

 

 

 

 

Term
Term Product PS (Term)
ab :-: —
550 i' i ' i 3%
as. iii —
P, (cubeset) é

 

 

 

Table 5.3. Signal probability computation for full-adder carry-out bit.

internal and output nodes. Previous sections have discussed the operational and time
complexity details of the individual SPCA components. The SPCAJntereonnect pro-
cedure, Algorithm 6, is responsible for the invocation and coordination of all SPCA
component procedures. When invoked, the SPCAJnterconnect procedure coordinates
the signal probability computation for an entire CBDD with user-selectable Boolean

expression ﬂattening.

 

Procedure: SPCAJnterconnect

input : CBDD - DAG of input circuit
input : PI .8 P - Primary input signal probabilities
input : k - DepthAccuraey

output : S P_Array - Signal probability array

begin
N etwork_Leveli2ation(C BDD.root, 0)
for V MSBDD E CBDD do
Expr +— Post-0rder(CBDD, MSBDD.0utput_vertex, k, 0)
I PR.Cubeset (— GenerateJPR.Cubeset(Expr)
Sp (— Compute_S ignalfrobabilityU PR_Cubeset, PI .5 P)

SP_Array[M S BDD.0utput_vertex.idx] <— Sp
end

 

cmﬂaﬂhwwﬂ

[.

m
5

Algorithm 6: SPCA interconnection procedure

112

 

Correctness of SPCAJntereonnect procedure

Claim 5.5 The SPCAJnterconnect procedure generates the signal probabilities for
all nodes within a CBDD.

Explanation: Conventional methods for estimating signal probability of a circuit
utilize traditional BDDS. The computation involves a post-order traversal of the
BDDS for each net within the circuit, resulting in a Boolean expression. Assuming
uncorrelated primary inputs and the zero-delay model, the conventional methods for
estimating signal probability of a net modeled by a BDD utilize Equations 5.1, 5.2
and 5.3.

m1

P,(termj) :2 Hp(x,~) (5.1)
i=1

termxﬂtermy = 0 nyéy (5.2)
“k

P,(nodek) = ZP,(termJ-) (5.3)

i=1

0 Equation 5.1 speciﬁes that the term signal probability is the product of the

term’s input signal probabilties.
0 Equation 5.2 speciﬁes that all terms are disjoint or independent of one another.

0 Equation 5.3 speciﬁes that the node signal probability is the sum of disjoint

term signal probabilities.

The signal probability estimation method presented in this dissertation is correct
and satisﬁes Equations 5.1, 5.2 and 5.3. The proposed method utilizes a CBDD with
user-selectable Boolean expression ﬂattening as opposed to a BDD. Theorem 4.3 val-
idates the fact that a CBDD maintains a circuit’s behavior. Equation 5.2 is satisﬁed

by the application of Theorem 4.2 which validates that all paths from any decision

113

node xj to any decision node xk are unique for j # k, within an MSBDD. The gen-
eration of Boolean equations as a collection of disjoint IPR cubes is performed by
the Post_Order and GenerateJPR_Cubeset procedures. Signal probability computa-
tion for each node within the CBDD is performed by the Compute_Signal_Probability
procedure which satisﬁes Equations 5.1 and 5.3.

The SPCAJnterconnect procedure correctly generates the signal probability for all
internal and primary output nodes within a CBDD. The procedures invoked by the the
SPCAJnterconnect procedure are well deﬁned and produce predictable outputs for
their given inputs. The more important procedures which perform searchs/traversals
are based on well known search algorithms which have been proven to be consis-
tent and correct. The overall interconnection of these procedures is logical and has
been tested as a system. The system level testing results are predictable and consis-
tent with the results of conventioanl BDDS; this veriﬁes that the SPCAJntereonnect

procedure correctly computes signal probability for a circuit represented by a CBDD.

Analysis of SPCAJnterconnect procedure

The running time for the SPCAJnterconnect depends on the worst case running times
of all procedure invocations. Line 2, the invocation of Network_Levelization is called
once. The for-loop, lines 3-7, accounts for a majority of the SPCAJnterconnect
running time. The duration of the for-loop is G. Line 7 runs in constant time,
0(1), but is dominated by the time of the procedure invocations of lines 4—6. Us-
ing Tables 5.4 and 5.5, for B > 1 and k > 2 the worst case running time for the
SPCAJnterconnect procedure is 0 (P1 + G x (P2 + P3 + P4)). From observation
it’s apparent that the worst case running times of the SPCAJntereonnect procedures
have the following ranking, P1 << P2 << P4 << P3. The ranking indicates that

the SPCAJnterconnect procedure can potentially spend most of its time in the invo-

114

cation of the GenerateJPR-Cubeset procedure. This procedure is computationally

expensive because the input expression stack could become very large and the literal

to IPR cube conversion has a cost of (G + 13) per cube. The resulting worst case

running time for the SPCAJnterconneet procedure is approximately 0(G x P3) or

 

0 (g [5:23 (MW/3W2 —1)'°-2 + 1) — 1] x (G + 13))

 

 

 

 

 

Symbol Meaning

N Number of circuit inputs

G Number of circuit logic elements
R N + G

B Maximum logic element fanin
|V| CBDD vertex count

[E] CBDD edge count

k Depth-accuracy parameter

 

 

 

 

 

Table 5.4. Symbol deﬁnitions.

 

 

 

 

 

 

Symbol Procedure WorstCaseRunningTime

P1 Networkievelization 0(|V| + IEI)

P2 Post-Order 0 (16 x (4B + 5) x G)

P3 GenerateJPR_Cubaset o G 132,” (“Hgsiﬁ — 1)’=-2 + 1) — 1] x (G +13))
P4 ComputeSignalProbability 0 (5R + 2) x Bk“)

 

 

 

 

 

 

Table 5.5. Procedure running times.

5.4 Switching Activity and Power Computation

The ﬁnal component of the BLAPE implementation involves the computation of

switching activity and dynamic power. The complicated task of computing signal

PTObability was performed by Signal Probability Computation Algorithm (SPCA).

115

Upon completion of the SPCA, a signal probability array is generated. The sig-
nal probability array is passed to a routine which computes switching activity using
a(x) = 2 - P,(x) - (1 — P,(x)). Switching activity is computed for each node in the
network. After this computation, dynamic power is computed for each node using
PSwach($) = -:;oz(x) oCL(x) - fCLK - Vd'ﬁ. Next, the BLAPE implementation provides a
power estimate for the entire circuit using fl Pgw,tc,,(x,-), where N is the number of
nets in the circuit. —

The following examples (Figures 5.14 and 5.15) provide BLAPE generated activity
and power output for both implicit and mapped structural representations (Figures 5.4
and 5.6) of the full-adder. As a means of comparison, SIS generated activity and power
analysis is given in Figures 5.16 and 5.17 for the same structural speciﬁcations. SIS

switching activity roundoff to two decimals places results in a 0.26% power estimate

difference, when compared to BLAPE.

116

Node Activity Capacitance

c_in 0.5000 cap=11

x 0.5000 cap=11

y 0.5000 cap=11

c_out 0.5000 cap=0
sum 0.5000 cap=0
NOTc_in 0.5000 cap=6
NOTy 0.5000 cap=6
NOTx 0.5000 cap=6
to 0.2188 cap=5

t1 0.2188 cap=5

t2 0.2188 cap=5

t3 0.2188 cap=5
_LC1_B1 0.5000 cap=2
t4 0.3750 cap=4

t5 0.3750 cap=4

t6 0.3750 cap=4
_LC2_B1 0.5000 cap=2

Power = 91.94 uW, assuming 20 MHz CLK, Vdd = 5V

Figure 5.14. BLAPE implicit structural FA activity/ power estimates.

Node Activity Capacitance

c_in 0.5000 cap=11
x 0.5000 cap=11
y 0.5000 cap=11
c_out 0.4878 cap=1
sum 0.4851 cap=2
[236] 0.3750 cap=4
[238] 0.3750 cap=4
[234] 0.3750 cap=4
[182] 0.5000 cap=6
[181] 0.5000 cap=6
[226] 0.2188 cap=5
[183] 0.5000 cap=6
[228] 0.2188 cap=5
[230] 0.2188 cap=5
[232] 0.2188 cap=5

Power = 89.58 uW, assuming 20 MHz CLK, Vdd = 5V

Figure 5.15. BLAPE mapped structural FA activity/ power estimates.

117

Node c_in Cap. = 11 Switch Prob. 0.50 Power = 13.8
Node 1 Cap. 11 Switch Prob. 0.50 Power = 13.8
Node y Cap. 11 Switch Prob. = 0.50 Power = 13.8
Node [3019] Cap. = 0 Switch Prob. = 0.50 Power = 0.0
Node [3020] Cap. = 0 Switch Prob. 0.50 Power = 0.0

Node NOTc_in Cap. = 6 Switch Prob. = 0.50 Power = 7.5
Node NOTy Cap. = 6 Switch Prob. = 0.50 Power = 7.5
Node NOTx Cap. = 6 Switch Prob. = 0.50 Power = 7.5
Node _LC2_81 Cap. = 2 Switch Prob. = 0.50 Power = 2.5
Node _LC1_B1 Cap. = 3 Switch Prob. = 0.50 Power = 3.8
Node t0 Cap. = 5 Switch Prob. = 0.22 Power = 2.7
Node t1 Cap. = 5 Switch Prob. = 0.22 Power = 2.7
Node t2 Cap. = 5 Switch Prob. = 0.22 Power = 2.7
Node t3 Cap. = 5 Switch Prob. = 0.22 Power = 2.7
Node t4 Cap. = 4 Switch Prob. = 0.38 Power = 3.8
Node t5 Cap. = 4 Switch Prob. = 0.38 Power = 3.8
Node t6 Cap. = 4 Switch Prob. = 0.38 Power = 3.8

Total Power: 92.187500

Figure 5.16. SIS implicit structural FA aetivity/ power estimates.

Node c_in Cap. = 11 Switch Prob. 0.50 Power = 13.8

Node 1 Cap. = 11 Switch Prob. = 0.50 Power = 13.8
Node y Cap. = 11 Switch Prob. = 0.50 Power = 13.8
Node [3021] Cap. = 1 Switch Prob. = 0.50 Power 8 1.2
Node [3022] Cap. = 2 Switch Prob. = 0.50 Power = 2.5
Node [236] Cap. 8 4 Switch Prob. = 0.38 Power = 3.8
Node [238] Cap. = 4 Switch Prob. = 0.38 Power = 3.8
Node [234] Cap. = 4 Switch Prob. = 0.38 Power = 3.8
Node [182] Cap. = 6 Switch Prob. = 0.50 Power 8 7.5
Node [181] Cap. = 6 Switch Prob. = 0.50 Power = 7.5
Node [226] Cap. = 5 Switch Prob. = 0.22 Power = 2.7
.Node [183] Cap. = 6 Switch Prob. = 0.50 Power = 7.5
lNOde [228] Cap. = 5 Switch Prob. = 0.22 Power = 2.7
INOde [230] Cap. = 5 Switch Prob. = 0.22 Power = 2.7
lNOde [232] Cap. = 5 Switch Prob. = 0.22 Power = 2.7

Total Power : 89 . 687500

Figure 5.17. SIS mapped structural FA activity/ power estimates.

118

The BLAPE implementation uses a behavioral VHDL speciﬁcation as its initial
input. In an effort to provide realistic activity and power estimates for a given design,
structural speciﬁcations at the gate-level are considered. Various structural speciﬁ-
cations yield various power estimates, as shown above. BLAPE’S ability to consider
various structural speciﬁcations, given an initial VHDL behavioral speciﬁcation, com-
plements an HDL-based hardware design methodology by providing improved high-

level power estimates.

5.5 Experimental Results

This section presents experimental results generated by the BLAPE program. BLAPE
results are compared to the results generated by the Berkeley SIS power estimator
tool. The Berkeley SIS tool was chosen for comparison because it is a mature VLSI
development system that is considered to be the standard for switching activity and
dynamic power measurement. The Berkeley SIS tool computes exact signal proba-
bility by traversing a traditional BDD with the intent of generating a 2-level disjoint
Boolean equation for each net of the input circuit.

Quantities such as power (aw) and time (s) are used as a means of comparison
between the two tools. The circuits involved in the tool comparison are a selection of
ISCAS-85 benchmarks, MCNC Synth89 benchmarks, and other arithmetic circuits.
To simplify the exchange of benchmark circuits between tools, the input circuits
are BLIF-formatted gate-level speciﬁcations. For BLAPE accuracy comparisons, it
is assumed that the Berkeley SIS power estimator generates exact power estimates.
Benchmarking was performed on a workstation with the following conﬁguration: Intel
Pentium II 350MHz (processor), 64MB (RAM), and Linux (operating system) with
330MB swap partition.

Five experiments were performed in which power estimates of the BLAPE pro-

119

gram were compared to the estimates of the Berkeley SIS tool. Each experiment
includes one or more benchmark circuits. Tables containing the SIS power estimate
and computation time, along with the corresponding BLAPE measurements for vary-

ing depth-accuracy are given.

5.5.1 Experiment 1 : 64—Bit Adder Benchmark

In the ﬁrst comparison, the BLAPE and SIS tools were used to perform power esti—
mates for a 64—bit adder. The outcome of the comparison is given in Table 5.6. The

64—bit adder was selected because of its depth and high level of isolated reconvergent

fanout.

 

 

- #Inputs #Outputs #Gates #Levels #Nodes

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

128 65 315 128 4944
Tool uw CPU k %Err
SIS 3030.8 1 .208 — —
BLAPE 3030.9 1 .138 12 0.003%

 

 

 

 

 

 

 

 

 

 

Table 5.6. 64—bit adder size/ power estimates.

Subsequent runs of the BLAPE program were performed using the 64—bit adder
as input. For each run, the depth accuracy parameter (k) was increased. The results
of the subsequent runs are illustrated in Figures 5.18, 5.19, and 5.20. The results for
the 64-bit adder indicate that for a small depth-accuracy (k < 3), execution times are
short and BLAPE underestimates power by just 0.019%. For 3 < k < 5, execution
times are a little longer and BLAPE overestimates power by 0.089%. As It increases

(1‘3 > 5), BLAPE starts to converge to the exact power estimate produced by SIS.

120

For k = 12, a very reasonable power estimate is achieved in about the same time as
SIS takes. The time execution (Figure 5.20) reveals an exponential growth pattern,
attributed to increased depth-accuracy.

Power v. k
(644-bit Adder)

I I I I I I I T I I I I I I I I I I

 

3033.6

3033.4

3033.2

3033.0

3032.8

3032.6

3032.4

3032.2

3032.0

3031 .8

Power W)

3031 .6

3031 .4

3031 .2

3031 .0

 

3030.8
—1

3030.6

--1

3030.4

.1

 

 

 

3030.2

1 1 1 1 1 1 1 1 1 1 1 I I I l 1 1

1 2 3 4 5 6 7 8 9 1011121314151617181920
depth-accuracy (k)

 

Figure 5.18. Power v. k (64—bit adder).

121

Accuracy

Time (s)

1 .0000

0.9998

0.9996

0.9994

0.9992

0. 9990

0.9988

0. 9986

Accuracy v. k
(64-bit Adder)

 

 

 

 

1 1 1 1 1 1 1 1 1 1 1

1 l 1 1 1 1 1
1 2 3 4 5 6 7 8 91011121314151617181920
depth-accuracy (k)

 

Figure 5.19. Accuracy v. k (64—bit adder).

Time v. k
(64-btt Adder)

 

24.0 *-

22.0 —

20.0 -

18.0 '-

16.0 >-

14.0 '-

12.0 -

10.0 -

 

I I I I I I TI I I I I I I T I I I I

 

 

I ‘ ‘ L L 1 1 1 1 L 1 1 1 I l

 

2 3 4 5 6 7 8 9 1011 121314151617181920
depth-accuracy (k)

Figure 5.20. Time v. k (64-bit adder).

122

5.5.2 Experiment 2 : Arithmetic Benchmarks

The arithmetic benchmark suite contains adder and multiplier circuits of various sizes.
The arithmetic benchmark comparisons demonstrated the same power/ accuracy and
performance trends observed with the 64—bit adder (Tables 5.7 and 5.8). Certain
circuits, due to their depth and large internal fanouts, were not tested for k > 5. For
the arithmetic class of circuits a depth-accuracy of k = 1 provided an average error
of 0.19% along with an average runtime of 0.012 seconds. As It was increased the
average power estimate error approached zero. For k = 3 there was a small increase

in the average error to about 0.32%, which can be attributed to reconvergent fanout.

This error is so small that it can be considered negligible.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SIS BLP k: k=2 k=3 k: k=7
Circuit pm pm %Err 14w %Err pw %Err aw %Err uw %Err
add8 349 349 0.20 349 0.20 349 0.17 349 0.06 349 0.00
add 16 732 732 0.08 732 0.08 733 0.18 732 0.07 732 0.03
add32 1498 1498 0.02 1498 0.02 1501 0.18 1499 0.09 1498 0.04
mult4 410 410 0.02 410 0.02 411 0.32 410 0.00 410 0.02
mult8 2274 2289 0.65 2289 0.65 2292 0.78 2287 0.58
Average 0.19 0.19 0.32 0.16 0.02

 

 

Table 5.7. Power/Accuracy estimates of arithmetic benchmarks.

add16
add32
mult4
mult8

Table 5.8. Time estimates of arithmetic benchmarks.

0.10
0.10
0.01
1.70

0.00
0.02
0.00
0.03

123

0.00
0.03
0.00
0.10

0.01
0.03
0.01
0.31

0.01
0.06
0.16
10.46

 

0.02
0.09
3.80

 

 

5.5.3 Experiment 3 : ISCAS-85 Benchmarks

The ISCAS-85 benchmark suite contains a variety of ALU, control, and decoder cir-
cuits. Due to circuit depth and large internal fanouts, these circuits were not tested
beyond a depth—accuracy of three. A few circuits exhibited increased error for an
increase in the depth-accuracy parameter. This is attributed to reconvergent fanout
occurring below the depth—accuracy level of a node and the propagation of this error
to subsequent signal probability computations. The ISCAS-85 benchmark compar-
isons demonstrated similar accuracy/ performance trends observed by the arithmetic
benchmarks, but with greater average error. Given a depth-accuracy of k = 1 the
ISCAS—85 benchmark comparison yields an average error of 3.13% along with an av-
erage runtime of 0.22 seconds (Tables 5.9 and 5.10). As It is increased the average

power estimate error slowly approaches zero.

 

 

 

 

 

 

 

 

 

 

 

 

SIS BLP k = l k = 2 k = 3
Circuit uw pw %Err pw %Err pw %Err
C1355 2411 2482 2.96 2375 1.49 2442 1.29
C17 34 34 0.00 34 0.00 34 0.00
c1908 2762 2736 0.94 2732 1.08 2740 0.81
C432 1062 1009 4.94 985 7.21
C499 1820 1811 0.46 1810 0.53 1809 0.60
C5315 12136 11915 1.82 11880 2.11 11871 2.19
C880 1655 1765 6.66 1651 0.25 1608 2.82
C2670 4459 4132 7.33 4402 1.26
Average 3.13 1.74 1.28

 

 

 

 

 

 

 

Table 5.9. ISCAS-85 power/accuracy benchmarks.

 

 

 

 

 

 

 

 

 

 

 

 

 

SIS BLAPE
k = l k = 2 k = 3
Circuit CPU CPU CPU CPU
c1355 6.10 0.07 0.17 0.56
C17 0.00 0.00 0.00 0.00
c1908 1.80 0.05 0.23 0.99
c432 2.80 0.01 5.23
c499 2.60 0.03 0.08 0.29
c5315 17.30 1.36 3.71 13.07
c880 1.50 0.03 0.09 0.69
c2670 1.80 0.23 0.61

 

 

 

 

 

 

124

Table 5.10. ISCAS-85 time benchmarks.

 

 

5.5.4 Experiment 4 : Nonredundant ISCAS-85 Benchmarks

The nonredundant ISCAS-85 benchmark suite contains a variety of ALU, control,
and decoder circuits. These circuits are functionally equivalent to the regular ISCAS-
85 benchmark circuit suite, except all redundant logic has been removed from the
designs. A few circuits exhibited increased error for a larger depth-accuracy due to
reconvergent fanout. Because of circuit depth and large internal fanouts the circuits
were not tested beyond a depth-accuracy of three. The nonredundant ISCAS-85
benchmark comparisons demonstrated the same power/accuracy and performance
trends observed by the ISCAS-85 benchmarks, but the average error for k = 1 was
smaller due to less logic and less internal reconvergence. Given a depth-accuracy of
k = 1 the nonredundant ISCAS-85 benchmark comparison yields an average error of
2.29% along with an average runtime of 0.43 seconds (Tables 5.11 and 5.12). As the

depth-accuracy is increased the average power estimate error slowly approaches zero.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SIS BLAPE kzl 1:22 :3

Circuit uw uw oErr pw %Err uw %Err
c1355nr 2384.4 2462.7 3.00 2357 1.00 2398 1.00
c1908nr 3400.6 3365.9 1.00 3363.5 1.00 3369.6 1.00
c2670nr 3814.3 3772.4 1.00 3759.8 1.00 3757.1 1.00
c432nr 911.8 857.8 6.00 857.8 6.00 875.4 4.00
c499nr 1628.8 1639.8 1.00 1639.8 1.00 1635.1 0.00
c5315nr 11745.4 11510.5 2.00 11484 2.00 11471.3 2.00
c7552nr 13975 13690.2 2.00 13682.8 2.00 13674.6 2.00
Average 2.29 2.00 1.57

 

 

 

 

 

Table 5.11. Nonredundant ISCAS-85 power/accuracy benchmarks.

 

 

SIS BLAPE
IC "—" l k = 2 k : 3
Circuit CPU CPU CPU CPU
c1355nr 5.60 0.06 0.14 0.40
c1908nr 2.30 0.12 0.57 13.71
c2670nr 2.30 0.16 0.37 0.83
c432nr 2.40 0.01 0.54 1.87
c499nr 2.40 0.01 0.03 1.58
c5315nr 26.30 0.81 2.41 7.02
c7552nr 2.20 1.88 5.39 20.51

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 5.12. Nonredundant ISCAS-85 time benchmarks.

125

5.5.5 Experiment 5 : MCNC Synth89 Benchmarks

The Synth89 benchmark suite contains various multi-level circuits which control,
compare and decode. The Synth89 benchmark comparisons demonstrated the same
power/accuracy and performance trends observed by the previous benchmarks, but
the average error and average runtime were smaller due to less internal reconver-
gence and smaller circuit size. Tables 5.13 and 5.14 show the power/ accuracy and

performance comparisons with the Berkeley SIS tool.

126

3323:5283 53583330: mwﬁamm OZOE .mﬁm £an

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

««3 «:3 «33 %3 %3 32%:
%3 «33:: %3 «33:: %3 :32: :.:3 333:: %3 333:: 3.3:: «3.:
%3 2% %3 3% %3 3% 3:3 :30
::3 m.%3: ::3 %.%3: ::3 33:3: ::3 3%3: :3 w.%%: «33%: 3o
:3 3.2:. :33 3.2:. %3 «3:«:. %3 o.%«:. %3 m.%«:. «.%«:. «9:6
%3 3% %3 33% %3 33% %3 3% %3 3% 3% Bi
%3 3.:%« 33.: :33« :3: 33% %.: 33:3 %.: 332% 333% a:
%3 3.3% $3 :.:%« $3 3.3% «3 «3:3 «:3 3.3% 333% :x
33 3%: 3:3 3%: 3:3 33%: «33 3%: «33 33%: 33%: «E
%3 33% %3 «33% %3 «33% 33% «ms:
«3 3.3% R3 «.«%« %3 3.3% «33% :58
%3 3.8% 8.: 3.%:% 333% :8
«3 :3% «3 3% %3 «3% %3 «3% «3 33% «.:% :2
«m.« 3% %.« :.:.% :3.« 33% :3 33% 33% 388
%3 «.«o« %3 «.«3« :3 3% 3:3 w.:o« 3:3 w.:3« %.«3« 38:5
:«3 3%: %3 3:3: %3 3:3: %3 3:3: %3 3:3: :33: 33:8“.
%3 33% ::3 33% %3 «.:% :.:3 3% :.:3 «.:.% «3% 33:8
%3 :.%: %3 :.%: %3 :.%: %3 :.%: %3 :.%: 5%: d3%::5
%3 «3:«: %3 «3:«: :33 33%: %3 33%: %3 33:2 «3:«: Eu
:3 33:. :3 33:. :3 33:. :3 33:. :3 33:. 3.«:. 8
«m3 3%: %3 5%: 33 A3%: %3 3%: %3 3%: 33%: we
%3 3% %3 3% %3 3% %3 3% %3 3% «3:. E
:w.« 333% %.« 33% 3«.« «33% :.:%« :5:
«3.: 3:3: E: 33%: «:.: 3%: $3 «3%: 333: «-«3:
%3 :.:%«: «3.: 33%: 3%.: 33%: 3%.: 33%: :.:%«: 35:3
%3 «3:... ««3 «3:... 3:3 33:... %3 :33 ««3 «3:... 33:3 3::
:«3 :33 :«3 33% :.«3 33% %3 3%... %3 33% 33% «a:
%3 «.«S %3 «.«S %3 «.«S «:3 3%: %3 3.2.: «.«S :33
%3 3% %3 33% %3 «.:% %3 «.:% %3 :.:% 3% :28
Emma 31 .meﬁ 31 Egg 8: knack: a: ﬁg 81 3.1 _ “.28er t
I a: I u: 3 I u: « I a: E ma. l:

 

 

 

127

 

 

SIS BLAPE
k = 1 k = 2 k = 3 k = 5 k = 7
Circuit CPU CPU CPU CPU CPU CPU
adr4 0.90 0.00 0.00 0.00 0.01 0.01
alul 0.10 0.00 0.00 0.00 0.00 0.00
alu2 0.10 0.00 0.00 0.01 0.01 0.01
alu3 0.10 0.00 0.00 0.01 0.01 0.02
QSymml 0.10 0.01 0.02 0.06 7.00
alu2—2 0.10 0.03 0.08 0.21 13.69
alu4 0.50 0.09 0.25 1.24
b1 0.00 0.00 0.00 0.00 0.00 0.00
C8 0.10 0.01 0.02 0.02 0.03 0.03
cc 0.00 0.00 0.00 0.00 0.00 0.00
cht 0.10 0.01 0.02 0.03 0.03 0.03
cm138a 0.00 0.00 0.00 0.00 0.00 0.00
cm150a 0.10 0.00 0.01 0.01 0.03 0.08
cm1628. 0.00 0.00 0.00 0.00 0.00 0.00
cm163a 0.00 0.00 0.00 0.00 0.00 0.00
comp 0.00 0.01 0.02 0.03 0.35
lal 0.10 0.01 0.01 0.02 0.03 0.03
pair 2.30 0.46 1.18
terml 0.10 0.03 0.09 0.20
tlarge 0.50 0.10 0.38 0.71
ttt2 0.10 0.01 0.03 0.04 0.07 0.08
X] 0.10 0.03 0.06 0.10 0.24 0.33
x4 1.10 0.05 0.10 0.18 0.57 0.89
z4ml 0.10 0.00 0.00 0.01 0.02 0.02
duke2 0.10 0.06 0.15 0.29 0.91 1.74
e64 0.20 0.08 0.10 0.11 0.11 0.11
064 0.10 0.01 0.06 8.47
vg2 0.01 0.01 0.02 0.04 0.06 0.06

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 5.14. MCNC Synth89 time benchmarks.

5.5.6 Remarks

BLAPE supports a selective or user-speciﬁed power/ accuracy. The results of the ﬁve
experiments indicate that BLAPE provides a very reasonable power estimate for a
depth-accuracy of k = 1. As the depth-accuracy is increased the average error in the
power estimates decreases. The trends indicate that for large k the average error will
approach zero (Equation 5.4), with an increase in runtime for deep circuits containing
a large number of internal fanouts. The power/ accuracy trend for all benchmarks is
illustrated in Figure 5.21.

The BLAPE system provides an approximation of signal probability. BLAPE

supports user-selectable accuracy via the depth-accuracy parameter. The behavior of

128

Accuracy v. k

 

1 .0000 l— -------------------------------------------------------------------------- —

0.9980

0. 9960

Accuracy

0.9940

0.9920

 

0. 9900

 

 

 

0. 9880

i-

1 l l J 1 l

 

4 5 6 7
depth—accuracy (k)

Figure 5.21. Average accuracy v. k (All benchmarks).

BLAPE’S accuracy is rationalized by the facts given below and demonstrated in the

previously illustrated power/accuracy and time benchmark graphs.
BLAPE’s Signal Probability Computation Accuracy

0 The signal probability of nodes represented by Boolean equations (sum-of-
products) equals the sum of their disjoint term signal probabilities, given by

Equation 5.3.

o Nodes represented by 2-level Boolean expressions, where 2:,- € {PI} such that

PI is the set of primary inputs, yield exact signal probability.

0 Nodes represented by n—level Boolean expressions, where x,- 6 {P1, IS} such
that I S is the set of internal signals, yield approximate signal probability due

to reconvergent fanout.

129

a As nodal equations are reduced in depth, approaching a 2-level expression, the
error in their signal probability approximation approaches zero, described by

equation 5.4.

Error _ Approx -— Exact
k —> Lmax _ Exact

 

—+ 0 (5.4)

The parameters in Equation 5.4 include 1) Lmam, the circuit depth, 2) k, the depth-
accuracy parameter, 3) Approx, BLAPE’S signal probability approximation, and 4)
Exact, the exact (SIS) signal probability. An additional feature of the BLAPE imple-
mentation is its support of large circuits. Circuits which SIS was not able to process,
due to memory limitations, were handled by BLAPE in a reasonable time (Table

5.15)

zrcmt
5445 1

c3540nr 1 594 5713 0.36 5696 1.28 5685 3.34
c7552 3512 13397 1.02 13462 3.46 13581 16.56

 

Table 5.15. Benchmarks not available in SIS.

BLAPE’S ability to handle larger circuits is due to the CBDD’s compact be-
havior/structure representation. Reasonable accuracy is achieved at k = 3. When
analyzing all circuits, BLAPE’s average error is just 1.22% to 0.21%, for k = 1 to 7.
BLAPE achieves a dramatic improvement when compared to behavioral-level activity
and power estimation tools. The error found in most behavioral-level power estima-
tors about 10 - 12 percent on average and about 80% in the worst reported case. The
approach taken by BLAPE is to analyze the high-level structural components from a
logic perspective, by generating the disjoint Boolean equation for each of the Circuit’s

internal and output signals. BLAPE allows the user to perform a trade-off of time

130

l .

versus accuracy. A less accurate estimate can be computed fairly quick but an exact

estimate can take longer, depending on the circuit size and the number of inputs.

131

CHAPTER 6

Behavioral-Level Visualization of

Switching Activity

This chapter introduces a new tool for the visualization of switching activity in CMOS
circuits. The approach presented consists of analyzing post-mortem data collected
by the simulation of switching functions. An estimation of the switching activity for
each circuit partition is captured in one of two views. This tool serves as a switching
activity proﬁler, which illuminates a Circuit’s activity hot-spots and provides the
locations where power dissipation minimization techniques can best be applied. The
new tool allows the designer of CMOS circuits to visualize the switching activity On a
partitioned basis. The illuminated areas should be considered ﬁrst in the application

of power minimization techniques.

6.1 Activity Viewer Tool

The Activity Viewer tool is a new mechanism which provides a partitioned and color-
based viewing/proﬁling of the activity hot-spots within a digital circuit. Previous
switching activity estimation tools used the traditional tabular representation form

[29, 32, 50, 51] while others used the traditional two-axis graph approach as [19, 24, 37,

132

52]. The presentation of the results in the aforementioned references can be greatly
improved where spatial / locality activity inter-relationships can also be captured. The
Activity Viewer tool is an improvement to the traditional circuit activity estimate
viewing approaches.

The Activity Viewer tool is a more effective means of presenting circuit activity
estimates. The use of color-based activity coding combined with partitioning provides
a more insightful depiction of how the circuit behaves in response to input stimuli.
Additionally, the partitioning aspect of the tool gives a clear indication of how the
internal logic partitions are inter-related with respect to activity and input. For a
mapping of color to switching activity (Figure 6.1). (Color results have been repro-

duced in black and white for this dissertation). The Activity Viewer tool provides two

 

Color Activity

0.8 < Esw(g) < 1.0
0.5 < Esw(g) < 0.7
0.3 < Esw(g) < 0.4
0.1 < Esmg) < 0.2

No Activity

 

 

 

 

Figure 6.1. Activity color mapping.

views: the Bar and Level Views, illustrated by Figures 6.2 and 6.3 respectively, which
were generated with artiﬁcial partition and activity data. The Bar View provides a
color-based two-axis graph of the Circuit’s switching activity on a partition basis. As

the input stimulus changes, the Bar View captures the activity of each partition. For

133

 

5M2)

 

Figure 6.2. Bar view.

Level

 

 

Figure 6.3. Level view.

multi-level circuit designs, the Level View, illustrated in Figure 6.3, captures level-
based partitioned switching activity behavior. This view allows the circuit designer

to isolate and inspect the activity proﬁle of a Circuit’s network on a per level basis.

134

6.1.1 Input Transformation Process

The input to the Activity Viewer tool begins with a gate-level circuit netlist described
in the Berkeley Logic Interchange Format (BLIF). The netlist is passed through the
BLAPE tool where level and partition information is obtained. Next, input vectors
are read into BLAPE followed by simulation of the circuit. Following simulation, the
outputs of selected internal and primary output nodes are collected and passed to
an intermediate ﬁle. This intermediate ﬁle is passed to the mkactdat tool, which
performs switching activity estimation (simulation) assuming the zero delay model
and input data independence. Each switching activity estimate is computed by es-
timating the average change of the selected nodes’ simulation outputs. The activity
estimate is based on a block of ten simulation outputs. This simulation output block
is repeatedly updated upon receipt of next input vector. The level and partition
information, along with the switching activity measurements, are used as the ﬁnal
input to the Activity Viewer tool. Once the ﬁnal input data ﬁle is read, the view-
ing of switching activity can be controlled by using the Play, Stop, Forward, and
Reverse buttons located on the control dialog. The menu bar provides options to
features which allow the user to toggle between the Bar and Level views, as well as to
copy the current view to the Window’s clipboard memory. The latter feature serves

as a window snapshot function and was used to generate the ﬁgures in this section.

6.1.2 Activity Viewer Results

A 4-bit Booth multiplier circuit can be used to illustrate the utility of the Activity
Viewer tool. The Booth multiplier is a two’s complement array multiplier which does
not require recoding for the ﬁnal two’s complement result. The Booth multiplier
is composed of controlled add/subtract/shift (CASS) subcircuits. Each row of the

Booth array is headed by a CTRL subcircuit, which controls the CASS subcircuits.

 

Level 13

 

 

26

 

Figure 6.4. Booth multiplier(4-bit) level view.

After transforming the Booth multiplier circuit into a BLIF netlist, the ﬁnal cir-
cuit resulted in a 27 level structure, illustrated by the Level View in Figure 6.4. The
selected partitions consists of the basic array elements: the CTRL and CASS subcir-
cuits. The above view displays low switching activity due to the small change in the
input vectors. The low switching activity is due to minimal changes in the logic of the
CTRL and CASS subcircuits caused by infrequent changes in the sign and number
of 1’s in the input vectors. The Level View indicates very well the relative size of
each partition and the level or distance from the primary inputs. The relative size of
the partitions are preserved by extending the width of each partition. The width of
each partition is computed by summing the number of gates in the individual sums
of products.

Figure 6.5 provides a Bar View of the Booth Multiplier, where partition size
and location are ignored. Each partition is placed on the horizontal axis and it’s
switching activity for the given input block is indicated by color intensities. The
color mappings are speciﬁed in Figure 6.1. The Bar View of the Booth multiplier

is consistent with the Level View, displaying low circuit activities for the speciﬁed

136

input vectors. The switching activity for each partition of the level view matches the
activity of the corresponding partition in bar view, but the level or distance from the

primary inputs is considered.

 

 

 

1.0
0.9
0.8
0.7
Ewe) 0-6 E I
0.5
0.4
0.3 - - '
0.2 I
0.I
I 50 100 150 200 250 300
Partition

Figure 6.5. Booth multiplier(4—bit) bar view.

6.2 Future Work and Summary

Currently, the Activity Viewer tool is a Microsoft Windows based application program
providing only two views. Future improvements may include the addition of new
views that provide the display of total circuit switching activity and power, along
with zoom in/out capabilities to isolate important areas of large circuits which have
lost their viewing deﬁnition. Most importantly, the automation of the process of
transforming the original Circuit’s ﬁle format to the ﬁnal partition / activity ﬁle format
can be considered.

The Activity Viewer tool is a new visualization tool which provides color-based,

partitioned-level viewing of switching activity in CMOS circuits. The tool provides

137

the circuit designer with switching activity proﬁling capability which illuminates the

potential power-hungry hot-spots of the design.

 

138

CHAPTER 7

Conclusions

A methodology for behavioral-level switching activity estimation in CMOS circuits
has been presented. This chapter summarizes the contributions and future work of

this research.

7.1 Contributions

The research presented in this dissertation addresses the problem of computing switch-
ing activity at the behavioral-level for CMOS circuits. The accurate computation of
switching activity provides improved dynamic power estimates. An extensive liter-
ature review was given detailing the background and importance of the research.
Additionally, an overview of behavioral and structural representations of switching
functions was presented. A formal problem deﬁnition along with its solution and
associated constraints/restrictions was given. The problem’s solution was decom-
posed into a set of tasks. Each task was described by an objective along with its

implemented approach.

The contributions of this research are the following:

o A new decision diagram, the Connective Binary Decision Diagram (CBDD) has

been introduced. The CBDD is a graph-based behavioral and structural rep-

139

l: "

resentation for digital circuits. The CBDD advantages include maintenance of
a Circuit’s structural and connective properties and linear growth. When com-
pared to traditional BDD implementations, the CBDD demonstrated an average
reduction in size of more than one order of magnitude for certain benchmark

circuits.

A set of procedures and a systematic approach for visiting each node of the
CBDD’s graph and generating a disjoint Boolean equation in inﬁx notation
was developed. The procedures were veriﬁed using a series of test circuits which

generated predictable CBDDs that have known disjoint Boolean equations.

A technique for computing signal probability was developed. The techinque
results in a signal probability approximation in which estimation error decreases

as a depth-accuracy parameter is increased.

A new methodology was developed for computing the switching activity and
dynamic power dissipation of behavioral—level digital circuit designs described in
VHDL. The implementation, called BLAPE, transforms a VHDL speciﬁcation
into a set of Boolean expressions. Then, the application of structure supports

the computation of realistic switching activity and dynamic power estimates.

A series of benchmark experiments were performed to validate the methodology
and highlight the accuracy and performance of the BLAPE implementation as
compared to the Berkeley SIS power estimator. Experiments were performed
using circuits selected from an arithmetic circuit suite and the ISCAS-85, nonre-

dundant ISCAS-85, and MCNC Synth89 benchmark suites.

A new visualization tool for proﬁling/viewing the powephungry activity hot-

spots within a circuit was developed. This tool is beneﬁcial to circuit designers

140

 

because it identiﬁes the location and switching activity of circuit nodes using

various graphical views.

7 .2 Future Work

The BLAPE implementation provides improved accuracy for high-level switching ac-
tivity and power estimation. Future improvements and considerations in the following

areas will improve BLAPE’S performance, accuracy, and usefulness.

o The Connective Binary Decision Diagram (CBDD) should be more intelligent.
The CBDD should be able to recognize redundant logic and not replicate this
logic each time it is encountered. This improvement will produce a more com-
pact graph, yielding smaller post-ordered Boolean equations and improving the

performance.

0 BLAPE should provide support for sequential circuits. The modeling of sequen-
tial circuits require additional considerations and modiﬁcations to the CBDD.

This improvement will increase the usefulness of the BLAPE implementation.

0 Unit / Real delay models are needed to give more realistic power estimates. These
delay models will yield longer program simulation runtimes, but offer switching
activity and power estimates which are consistent with the results of circuit-level

simulators.

0 Improved memory management for the BLAPE implementation is necessary.
More efﬁcient use of memory resources will allow the analysis of larger circuit

designs.

141

7 .3 Impact of Contributions

The research presented in this dissertation provides a solution to the problem of accu-
rately estimating switching activity and dynamic power for high-level circuit designs
described by VHDL behavioral speciﬁcations. The technique provides a signiﬁcant
improvement to existing behavioral-level activity and power estimators. The esti-
mates generated by existing high—level techniques contain 10 to 12 percent error on
average, with some estimates containing as much as 80 percent error. The application
of the proposed technique, the BLAPE program, allows user-selectable accuracy of
switching activity estimation, at the expense of time and memory resources. During
a benchmark comparison containing 49 circuits, with the Berkeley SIS power esti-
mator, the BLAPE program power estimates contained 1.22% average error for a
depth-accuracy of k = 1 and 0.21% average error for k = 7.

The BLAPE program combined with the use of the Activity Viewer tool provides
an effective means of activity/power proﬁling at the behavioral-level. The combi-
nation of the two tools identiﬁes high-activity and power-hungry hot-spots within
a circuit design. The ability to estimate switching activity and dynamic power at
the behavioral-level will improve the quality of integrated circuits by providing early

warning of power problems and reducing design time and cost.

142

APPENDICES

APPENDIX A

Deﬁnitions and Formulas

Deﬁnition A.1 Glitch Activity: The portion of the switching activity due multiple

gate output transitions in response to an input transition.

Deﬁnition A.2 Reconvergent fanout nodes: Circuit nodes that receive inputs from

two paths that fanout from some other circuit node.

Deﬁnition A.3 Non-decomposable FSM: An FSM is said to be non-decomposable
when every state of the machine is reachable from every other state in a ﬁnite number

of cycles.

Deﬁnition A.4 Signal Probability P,(x): The probability that the signal at node x

evaluates to logic one [53].

Deﬁnition A.5 Transition Probability PT(x): The probability that the logic signal at

node x experiences a change in its logic state [53].

Switching Activity Derivation:
E.w(g) = Po:1(9)+P1:o(g)
= P.‘(‘9’) ° P.‘“(9) + P.‘(g) - Pl“ (‘9')
= PM) - Ps(g) + 11(9) - Ps('g‘)
= 2-P.(g)-Ps(’g‘)
= 2-P.(9)'(1-Ps(9))

143

Deﬁnition A.6 Spatial Correlation: Two logic signals x and y are spatially cor-
related if in the same time slot the state of one signal depends on the state of the

other.

Deﬁnition A.7 Temporal Correlation: The dependence of the current state of a

signal on its previous logic value.
Deﬁnition A.8 Boolean difference (3%) = y|x=1 GB ylxzo = y(x) 63y(?r‘).

Deﬁnition A.9 Equilibrium probability: If x(t) is a logic signal (switching between

0 and I), then its equilibrium probability is deﬁned as P(x) = lianoo % fig/22 x(t)dt

Deﬁnition A.10 Transition Density: The average number of transitions per unit
time experienced by a circuit node [53].

Remark: Then density provides an effective measure of switching activity in logic
circuits in the presence of any delay model.

Remark: If a logic signal x(t) makes n$(T) transitions in a time interval of
length T, then the transition density of x(t) is deﬁned as D(x) := limT_,oo 353.11.

Remark: If all correlations are ignored, so that the input signals are independent
of one another in both space and time then the signals are spatio-temporal independent,
and the transition density is given by D(y) = i=1 P (59%) - D(xi).

Remark: The relationship between transition density and transition probability

is given by D(x) 2 5.1795).
Deﬁnition A.1l Articulation Point: A node whose removal disconnects the graph.

Deﬁnition A.12 The Boolean operators ’+’, ’*’, ’-’, represent the logic disjunction
(0R), conjunction (AND), and inversion (NOT), respectively. By default, ’*’ may

be denoted by a space.

144

 

Deﬁnition A.13 Primary inputs are Boolean variables of a circuit that depend on

no other variables.

Deﬁnition A.14 Primary outputs are Boolean variables on which no other vari—

able(s) depends.

Deﬁnition A.l5 The depth of a circuit is the maximum number of nodes between

any primary input and any primary output.

Deﬁnition A.16 F lattening or Collapsing is the action of reducing the circuit ’3 depth

to a desired level.

145

 

APPENDIX B

Application of BLAPE to 4—bit

Booth Multiplier

This appendix presents a stage by stage view of the BLAPE implementation when
applied to a 4-bit Booth multiplier design. The initial input speciﬁcation is in VHDL,
using the behavioral model for some components, and the structural VHDL model

for the high-level design. The representations for each intermediate stage are given.

B.1 High-Level VHDL Speciﬁcation Input

ENTITY cass IS
port (pin, cin, ain, h, d : IN BIT;
pout, cout : OUT BIT);
END cass;

ARCHITECTURE cass_arch 0F cass IS
BEGIN

pout <= pin XOR (ain AND h) XOR (cin AND h);

cout <= ((pin XOR d) AND ((ain 0R cin) 0R (ain AND cin)));
END cass_arch;

ENTITY ctrl IS
port (x2, x1 : IN BIT;
h, d : OUT BIT);
END ctrl;

146

ARCHITECTURE ctr1_arch OF ctrl IS
BEGIN

h <= x2 XOR x1;

d <= x2 AND ( NOT(x1) );
END ctrl_arch;

ENTITY booth4 IS
port (a, x : IN BIT_VECTOR(3 DOWNTO O);
p : OUT BIT_VECTOR(7 DOHNTO 0));
END booth4;

ARCHITECTURE network OF booth4 IS

COMPONENT ctrl
port (12, 11 : IN BIT;
h, d : OUT BIT);

END COMPONENT;

COMPONENT cass
port (pin, cin, ain, h, d : IN BIT;
pout, cout : OUT BIT);
END COMPONENT;

SIGNAL p : BIT_VECTOR(34 DOWNTO 0);
SIGNAL c : BIT_VECTOR(34 DOHNTO 0);
SIGNAL h : BIT_VECTOR(4 DOWNTO 0);
SIGNAL d : BIT_VECTOR(4 DONNTO 0);

SIGNAL zero : BIT;

BEGIN
zero <= ’0’;

zero, 1(3), h(4), d(4)
x(3), x(2), h(3), d(3)
x(2), 1(1), h(2), d(2)
1(1), 1(0), h(l), d(l)
x(O), zero, h(O), d(O)

ctr1_4 : ctrl port MAP
ctr1_3 : ctrl port HAP
ctr1_2 : ctrl port MAP
ctr1_1 : ctrl port MAP
ctr1_0 : ctrl port MAP

.0

A A A A A
V V V V V
no u

cass_0 : cass port MAP (zero, zero, a(O), h(4), d(4), p(0), c(O)
cass_1 : cass port HAP (zero, c(O), a(1), h(4), d(4), p(l), c(1)
cass_2 : cass port HAP (zero, c(1), a(2), h(4), d(4), p(2), c(2)
cass_3 : cass port MAP (zero, c(2), a(3), h(4), d(4), p(3), c(3)
cass_4 : cass port MAP (zero, c(3), zero, h(4), d(4), p(4), c(4)
cass_5 : cass port MAP (zero, zero, a(O), h(3), d(3), p(5), c(5)
cass_6 : cass port MAP (p(0), c(5), a(1), h(3), d(3), p(6), c(6)
cass_7 : cass port MAP (p(l), c(6), a(2), h(3), d(3), p(7), c(7)

147

VVVVVVVV

cass_8
cass_9

cass_31

: cass
: cass
cass_10 :
cass_11 :
cass_12 :
cass-13 :
cass_14 :
cass_15 :
cass_16 :
cass_17 :
cass_18 :
cass_19 :
cass_20 :
cass_21 :
cass_22 :
cass_23 :
cass_24 :
cass_25 :
cass_26 :
cass_27 :
cass_28 :
cass_29 :
cass_30 :
: cass
cass_32 :
cass_33 :
cass_34 :

C888
C888
C888
C888
C888
C888
C888
C888
0888
C888
C888
C888
C888
C888
C888
C888
C888
C888
C888
C888
C888

C888
C888
C888

port
port
port
port
port
port
port
port
port
port
port
port
port
port
port
port
port
port
port
port
port
port
port
port
port
port
port

MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP
MAP

p <= p(33 DOWNTO 26);

END network;

its inputs.

(p(2),
(p(3).
(p(4),
(zero,
(p(5).
(p(6).
(p(7).
(p(8).
(p(9).

(p(10), C(16), zero, h(2), d(2). p(17), C(17) );

c(7).
C(8),
C(9),
zero,
C(11):
C(12):
C(13):
C(14):
C(15):

a(3). h(3), d(3), p(8). C(8) );
zero, h(3), d(3), p(9), C(9) )3
zero, h(3), d(3), p(10), C(10) );
a(O), h(2), d(2), p(11), C(11) );
a(1), h(2), d(2), p(12).
a(2). h(2). d(2). p(13),
a(3), h(2). d(2), p(14),
zero, h(2), d(2). p(15).
zero, h(2), d(2). p(16).

C(12) )3
C(13) )3
C(14) );
C(15) );
C(16) );

(zero, zero, a(O), h(1), d(l), p(18), C(18) )3

(p(11).
(p(12),
(p(13).
(p(l4):
(p(15)9
(p(16).
(p(17).

C(18)
C(19)
C(20)
C(21)
C(22)
C(23)
C(24)

. a(1),
. a(2).
. a(3).
, zero,
, zero,
, zero,
, zero,

h(1),
h(1),
h(1),
h(1),
h(1),
h(1),
h(1),

d(1),
d(1),
d(1).
d(l),
d(1).
d(1),
d(l),

p(19).
P(20):
p(21):
p(22).
P(23)’
p(24).
p(25):

C(19) )
C(20) )
C(21) )
C(22) )
C(23) )
C(24) )
C(25) )

(zero, zero, a(O), h(O), d(O), p(26), C(26) );

(p(18).
(p(19):
(p(20)9
(p(21).
(p(22),
(p(23):
(p(24).
(P(25):

C(26)
C(27)
C(28)
C(29)
C(30)
C(31)
C(32)
C(33)

148

. a(1),
. a(2).
. a(3).
, zero,
, zero,
, zero,
, zero,
, zero,

h(O),
h(O),
h(O).
h(O),
h(O),
h(O),
h(O),
h(O),

B.2 Boolean Equation Generation

d(O),
d(O).
d(O),
d(O).
d(O),
d(O),
d(O),
d(O),

p(27).
p(28).
P(29):
p(30)9
p(31).
p(32).
p(33):
P(34):

C(27) )
C(28) )
C(29) )
C(30) )
C(31) )
C(32) )
C(33) )
C(34) )

The Booth multiplier is an array multiplier, composed of CASS and CTRL units.
The CASS unit is responsible for performing arithmetic/shifts operations, based on
The CTRL unit is responsible for producing the input signals to the
CASS units. After compiling the each of the independent units (CASS, CTRL), the

multiplier (Booth4) is compiled. The report ﬁle generated by the Altera MAX+PLUS

II compiler contains input / output information as well as the Boolean equations given

below.
** INPUTS **
Pin LC
44 - -
43 - -
42 - -
10 - -
84 - -
2 - ..
1 _ .-
11 - -
tat OUTPUTS u:
Pin LC
62 - -
29 - -
18 - -
17 - -
19 - -
25 - -
24 - -
23 - -
** EQUATIONS **
a0 : INPUT;
a1 : INPUT;
a2 : INPUT;
a3 : INPUT;
x0 : INPUT;
xi : INPUT;
x2 : INPUT;
x3 : INPUT;

-- Node name is ’pO’
-- Equation name is ’p0’,
_LCl_C14;

-- Node name is ’p1’
“- Equation name is ’pl’, type is output

_L05_C6;

‘- Node name is ’p2’

Row

www>>>aa

Col Primitive

Col Primitive

INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT
INPUT

OUTPUT
OUTPUT
OUTPUT
OUTPUT
OUTPUT
OUTPUT
OUTPUT
OUTPUT

type is output

149

0

0000000

0

0000000

0

0000000

1

HHHHHHH

00000000

00000000

11
11
10

9
14

8
13
10

INP FBK OUT FBK

00000000

INP FBK OUT FBK Name

a0
a1
a2
a3
x0
x1
x2
:3

Name
p0
p1
p2
p3
p4
p5
p6
p7

-- Equation name is ’p2’, type is output
p2 = _LC5_A1;

-- Node name is ’p3’
-- Equation name is ’p3’, type is output
p3 = _LC3_A1;

-- Node name is ’p4’
-- Equation name is ’p4’, type is output
p4 = _LC7_A1;

-- Node name is ’p5’
-- Equation name is ’p5’, type is output
p5 = _LC7_B2;

-- Node name is ’p6’
-- Equation name is ’p6’, type is output
p6 = _LC5-B2;

-- Node name is ’p7’
-- Equation name is ’p7’, type is output
p7 = _LC3_B2;

-- Node name is ’lcass:cass_6l:14’
-- Equation name is ’_LC4_A10’, type is buried
_LC4_A10 = LCELL( -EQOOI);
_EQOOI = a1 & x2 & !13
# a0 & x2 & x3
# a1 & !x2 & x3
# a0 & a1 & x3;

-- Node name is ’lcasszcass_6|:23’
-- Equation name is ’_LC4_CS’, type is buried
_LC4_C3 LCELL( _EQOO2);
_EQOO2 = !a0 & a1 & !x2 & x3
# a0 & a1 & x2 & x3;

-- Node name is ’lcasszcass_7l:12’
-- Equation name is ’_LC5_C3’, type is buried
_LC5_C3 = LCELL( _EQOO3);
_EQOOS = a1 & x2 & 13
# a1 & !a2 & 13
# a2 & x2 & !13
# !a1 & a2 & !x2 & x3;
-- Node name is ’lcasszcass_7l:14’
-- Equation name is ’-LC1_C3’, type is buried
_LC1_C3 = LCELL( _EQOO4);

150

 

_EQOO4 = _LC4_C3 & !_LC5_C3 & !x2 &
# _LC4_C3 & !_LC5_C3 & x2 &
# !_LC4_C3 & _LC5_C3
# _LC5_C3 & 12 & x3
# _LC5_C3 & !x2 & !x3;

-- Node name is ’lcass:cass_7l:22’
-- Equation name is ’_LC6_CS’, type is
_L06_03 = LCELL( _EQOOS);
_EQOOS - _LC4_C3
# a2;

-- Node name is ’lcasszcass_7|:23’
—- Equation name is ’_LC3_C3’, type is
_LC3_C3 3 LCELL( _EQOOG);
_EQOOG = a1 & _LC6_C3 & x2 & x3
# !a1 & _LC6_C3 & !x2 & x3;

-- Node name is ’lcass:cass_8|:12’
-- Equation name is ’_LC5_A4’, type is
_LC5_A4 - LCELL( -E0007);
_E0007 8 a2 & x2 & 13
# a2 & !a3 & 13
# a3 A :2 & !13
8 !a2 a a3 & !x2 & x3;

-- Node name is ’Icass:Cass_8|:13’
-- Equation name is ’_LC2_CS’, type is
_LC2_C3 I LCELL( _EQOO8);

_EQOOB 3 !a1 & _LC6_C3 & !12 k 13;

-- Node name is ’lcasszcass_8|:22’
-- Equation name is ’_LC2_09’, type is
_LC2_C9 = LCELL( _EQOOQ)3
-EQOOQ = _LC3_C3
# a3;

-- Node name is ’Icasszcass-9l:13’
-- Equation name is ’_LC3_C9’, type is
_LC3_C9 8 LCELL( -EQOlO);

_EQOIO 3 !a2 & _LC2_C9 & !x2 & x3;

-- Node name is ’lcasszcass_9l:14’
-- Equation name is ’_LC1_09’, type is
_LC1_C9 = LCELL( _EQOII);
_EQOIl 8 _LC3_C9 & !x3
# !a3 & _LC3_09

x3
!x3

buried

buried

buried

buried

buried

buried

buried

151

 

# a3 & !_LC3_C9 & x3;

—- Node name is ’lcasszcass_12|:12’
-- Equation name is ’_LC7_A10’, type is buried
_LC7_A10 LCELL( _EQOIZ);

_EQOI2 = a0 & !_LCl_A12 & _LC5_A10
# a0 & !a1 & _LC5_A10
# a1 & _LCI_A12 A !_LC5_A10
# !aO & a1 & _LCI_A12;

-- Node name is ’lcass:cass_12l:14’
-- Equation name is ’_L03_A10’, type is buried
_LC3_A10 = LCELL( _E0013);
_EQOl3 = !_LCl_A11 & _LC7_A10
# !a0 & _LC7_A10
# !_LC1_A12 & _LC7_A10
# a0 & _LC1_A11 & _LC1_A12 & !_LC7_A10;

-- Node name is ’lcass:cass_12l:23’
-- Equation name is ’_LC6_A10’, type is buried
_LC6_A10 = LCELL( -E0014);
_EQOl4 = a0 & a1 & !_LC1_A11 & _LC5_A10
# a0 & _LC1_A11 & !_LC5_A10
# a1 & _LCl_A11 & !_LC5_A10
# !aO & a1 & _LC1_A11;

-- Node name is ’Icasszcass_13l:14’
-- Equation name is ’_LC2_A10’, type is buried
_LC2_A10 = LCELL( _EQOIS);
_EQO15 = !_LCI_A12 & _LC4_A10
# !a2 & _LC4_A10 & !_LC6_A10
# a2 A -LC1_A12 & !_LC4_A10 & !_LC6_A10
# a2 & -LC4_A10 & _LC6_A10
# !a2 & _LCl_A12 & !_LC4_A10 & _LC6_A10;

-- Node name is ’lcasszcass-13l:23’
-- Equation name is ’_LC1_A10’, type is buried

_LC1_A10 = LCELL( _EQOIS);
_EQ016 = !_LC1_A11 & _LC4_A10 & _LC6_A10
# a2 & !_LC1_A11 & _LC4_A10
# _LC1_A11 & !_LC4_A10 & _LC6_A10
# a2 & _LC1_A11 & !_LC4_A10;

-- Node name is ’Icass:cass_14|:14’
-- Equation name is ’_LC1_A8’, type is buried
_LC1_A8 = LCELL( -30017);
_EQOI7 = a3 & _LC1_A10 & _LC1_C3
# !a3 & _LC1_A10 & _LC1_A12 & !_LC1_C3

152

# !_LC1_A12 A _LC1_C3
# !a3 A !_LCI_A10 A _LC1_C3
# a3 A !_LC1_A10 A _LC1_A12 A !_LC1_C3;

-- Node name is ’lcass:cass_14l:23’
-- Equation name is ’_LC4_A4’, type is buried
_LC4_A4 8 LCELL( _EQOIS);
_EQ018 = _LC1_A10 A !_LC1_A11 A _LC1_C3
# a3 A !_LC1_A11 A _LC1_C3
# _LC1_A10 A _LC1_A11 A !_LC1_C3
# a3 A _LC1_A11 A !_LC1-C3;
-- Node name is ’lcasszcass_15l:14’
-- Equation name is ’_LC2_A4’, type is buried
_LC2_A4 LCELL( _E0019);
_EQOIQ _LC2_C3 A !_LC4_A4 A !_LC5-A4
!_LC2_C3 A !_LC4_A4 A _LCS_A4
!_LCI_A12 A _LC2_C3 A !_LCS_A4
!_LCI_A12 A !_LC2_03 A _LC5_A4
_LC1_A12 A _LC2_C3 A _LC4_A4 A _LC5_A4
_LC1_A12 A !_LC2_C3 A _LC4_A4 A !_LC5_A4;

%ﬁﬁﬁ‘tl

-- Node name is ’lcass:cass_15|:19’
-- Equation name is ’_L03_A4’, type is buried
_LC3_A4 = LCELL( _EQ020);

_EQO20 = _LC2_C3 A !_LC5_A4 A !12
# !_LC2_C3 A _LC5_A4 A 112
# _LC2_CB A !_LCS_A4 A 11
# !_LC2-C3 A _LC5_A4 A 11
# _LC2_C3 A _LC5_A4 A !xl A 12
# !_LC2_C3 A !_LC5_A4 A !11 A 12;

-- Node name is ’lcass:cass_16l:13’
-- Equation name is ’_LC1_A4’, type is buried
_LC1_A4 = LCELL( _EQO21);
_EQOZI = _LC3_A4 A _LC4_A4 A !11 A x2
# _LC3_A4 A _LC4_A4 A :1 A !x2;

-- Node name is ’lcasszcass_19|:12’
-- Equation name is ’_L05_A5’, type is buried
_L05_A5 = LCELL( _EQO22);
_E0022 = a0 A _L01_A12 A !_LCI_A13
# a0 A !a1 A _LCl_A12
# a1 A !-LCI_A12 A _LC1_A13
# !a0 A a1 A _LC1_A13;

-- Node name is ’lcass:cass_19l:14’
-- Equation name is ’_LC4_A5’, type is buried

153

_LC4_A5
_EQ023

LCELL( _EQO23);

!_LC1_A13 A _LC5_A5

!_LC1_C7 A _LC5_A5

!aO A _LCS_A5

a0 A _LC1_A13 A _LC1_C7 A !_LC5_A5;

%ﬁﬁll

-- Node name is ’lcasszcass_19|:23’
-- Equation name is ’_LC6_A5’, type is buried
_LC6_A5 = LCELL( _EQO24);
_EQO24 = a0 A a1 A _LCl_A12 A !_LC1_C7
# a0 A !_LC1_A12 A _LC1_C7
# al A !_LC1_A12 A _LCl_C7
# !aO A a1 A _LC1_CT;

-- Node name is ’lcasszcass_20l:14’
-- Equation name is ’_LC3_A5’, type is buried
_LC3_A5 = LCELL( _EQO25);
_EQO25 = !_LC1_A13 A _LC3_A10
# !a2 A _LC3_A10 A !_L06_A5
# a2 A _LCI_A13 A !_LC3_A10 A !_LC6_A5
# a2 A _LC3_A10 A _L06_A5
# !a2 A _LC1_A13 A !_LC3_A10 A _LC6_A5;

-- Node name is ’lcass:Cass_20I:23’
-- Equation name is ’_LC7_A5’, type is buried
_LC7_A5 = LCELL( _EQO26);
_EQO26 a !_LC1_C7 A -LC3_A10 A -L06_A5
A a2 A !_LCl_C7 A _LC3-A10
# _LCI_C7 A !_LC3_A10 A _LC6_A5
# a2 A -LCl_C7 A !_LC3_A10;

-- Node name is ’Icass:cass_21l:12’
-- Equation name is ’_LC2_A1’, type is buried
_LC2_A1 = LCELL( _EQO27);
_EQO27 = !a3 A _LC2_A10
# !_LC1_A13 A _LC2_A10
# a3 A _LC1_A13 A !_LC2_A10;

-- Node name is ’lcasszcass_21|:13’
-- Equation name is ’_LC2_A5’, type is buried
_LC2_A5 = LCELL( _EQO28);

_EQO28 = _LC1_A13 A _LC7_A5;

-- Node name is ’lcass:cass_21l:23’
-- Equation name is ’_LCI_A5’, type is buried
_LC1_A5 = LCELL( _EQO29);

_EQO29 = !_LC1_C7 A _LC2_A10 A _LC7_A5

154

# a3 A !_LC1_C7 A _LC2_A10
# _LC1_C7 A !_LC2_A10 A _LC7_A5
# a3 A _LC1_C7 A !_LC2_A10;

-- Node name is ’lcasszcass_22|:14’
-- Equation name is ’_LC8_B2’, type is buried
_LC8_B2 ' LCELL( _EQO30);
_EQO3O = !_LC1_A5 A -LC1_A8
# _LC1_A8 A !_LC1_A13
# _LC1_A5 A !_LC1_A8 A _LC1_A13;

-- Node name is ’lcass:cass_22l:23’
-- Equation name is ’_L01_B2’, type is buried
_LC1_B2 8 LCELL( _EQOSi);
_EQO31 - _LC1_A5 A _LC1_A8 A !_LC1_C7
# _LC1_A5 A !_LC1_A8 A _LC1_C7;

-- Node name is ’lcasszcass_23|:14’
-- Equation name is ’-LC2_B2’, type is buried
_LC2_B2 8 LCELL( _EQO32);
_EQ032 8 !_LCl_B2 A _LC2_A4
# !_LC1_A13 A _LC2_A4
# _LC1_A13 A _LCl_B2 A !_LC2-A4;

-- Node name is ’lcasszcass_24l:13’
-- Equation name is ’_LC4-B2’, type is buried
-LC4_B2 - LCELL( -EQO33);
-E0033 8 _LC1_A13 A _LC1_B2 A !_LCI_C7 A -LC2_A4
# _LC1-A13 A _L01-B2 A _LC1_C7 A !_LC2_A4;

-- Node name is ’lcasszcass_26l:11’
-- Equation name is ’_LCl_Cl4’, type is buried
__LCS-06 = LCELL( -EQO35);
_E0035 8 !a0 A al A 10
# a0 A 110 A :1
# a1 A :0 A !xl
# a0 A !a1 A 11;

-- Node name is ’lcass:cass_27l:23’
-- Equation name is ’_LC4_A1’, type is buried
-LC4_A1 = LCELL( -E0036);
_EQO36 = a0 A x0 A :1
# a0 A a1 A 11
# a1 A 10 A :1
# !aO A a1 A 10;

-- Node name is ’Icass:cass_28|:14’

155

-- Equation name is ’_LCS_A1’, type is buried
_LC5_A1 = LCELL( _EQO37);
_EQO37 = _LC4_A5 A !xO
# !a2 A !_LC4_A1 A _LC4_A5
# a2 A !_LC4_A1 A !_LC4_A5 A x0
# a2 A _LC4_A1 A _LC4_A5
# !a2 A _LC4_A1 A !_LC4_A5 A x0;

-- Node name is ’lcasszcass_28l:23’
-- Equation name is ’_L08_A1’, type is buried
_LC8_A1 = LCELL( -E0038);
_EQOBB a2 A _LC4_A5 A !xO
a2 A !_LC4_A5 A x0
_LC4_A1 A _LC4_A5 A !xO
_LC4_A1 A !_LC4_A5 A 10;

#%#II

-- Node name is ’lcasszcass_29I:14’
-- Equation name is ’_LC3_A1’, type is buried
_LC3_A1 = LCELL( _EQO39);
_EQO39 = _LC3_A5 A !xO
# !a3 A _LC3_A5 A !_LC8_A1
# a3 A !_LC3_A5 A !_LC8_A1 A x0
# a3 A _LC3_A5 A -LC8_A1
# !a3 A !_LC3_A5 A _LC8_A1 A x0;

-- Node name is ’lcasszcass_29l:23’
-- Equation name is ’_LC6_A1’, type is buried
_L06_A1 - LCELL( _EQO40);
_EQO40 = a3 A _L03-A5 A !xO
A -LC3_A5 A _LC8-A1 A !xO
# a3 A !_L03_A5 A 10
# !_LCS_A5 A _LC8_A1 A 10;

-- Node name is ’lcasszcass_30l:14’
-- Equation name is ’_LC7_A1’, type is buried
_LC7_A1 = LCELL( _EQO41);
_EQO41 = !_LC2_A1 A -LC2_A5 A !_L06_A1
# _LC2_A1 A !_LC2_A5 A !_LC6_A1
# !_LC2_A1 A _LC2_A5 A !x0
# _LC2_A1 A !_LC2_A5 A !xO
# _LC2_A1 A _LC2_A5 A _LC6_A1 A :0
# !_LC2_A1 A !_LC2_A5 A _L06_A1 A 10;

-- Node name is ’lcass:cass_30l:23’
-- Equation name is ’_LCl_A1’, type is buried
_LC1_A1 = LCELL( -EQO42);

_EQO42 = !_LC2_A1 A _LC2_A5 A _LC6_A1 A !xO

156

# _L02_A1 A !_LC2_A5 A _LC6_A1 A !xO
# _LC2_A1 A _LC2_A5 A _LC6_A1 A x0
# !_LC2_A1 A !_LC2_A5 A _LC6_A1 A x0;

-- Node name is ’lcasszcass_31|:14’
-- Equation name is ’_LC7_B2’, type is buried
-LC7,B2 = LCELL( _EQO43);

_EQO43 = !_LC1_A1 A _LC8_B2
# _LC8_B2 A !xO
# _LCI_A1 A !_LC8_B2 A x0;

-- Node name is ’lcasszcass_32|:14’
-- Equation name is ’_LCS_B2’, type is buried

_LC5_B2 = LCELL( _EQO44);
-EQ044 = _LC2_B2 A _LC8_B2
# !_LC1_A1 A _LC2_B2

# _LC2_B2 A !xO
# _LC1_A1 A !_LC2_B2 A !_LC8_B2 A x0;

-- Node name is ’lcass:cass_33l:13’
-- Equation name is ’_L06_B2’, type is buried
_LC6_B2 = LCELL( _EQO45);

_EQO45 = _LCi_A1 A !_LC2_B2 A !_LCB_B2 A x0;

-- Node name is ’lcass:cass_33|:14’
-- Equation name is ’_LC3_B2’, type is buried

_LC3_B2 = LCELL( _EQO46);
_EQ046 = _LCI_A4 A !_LC1-C9 A _LC4_BZ A _LC6_B2
# !_LC1_A4 A _LCl_C9 A _LC4_82 A _LC6_32
# _LCl_A4 A _LCi_C9 A !_LC4_B2 A _LC6_BZ
# !_LC1_A4 A !_LC1_C9 A !_LC4_B2 A _L06_B2
# -LC1_A4 A _LCl_C9 A _LC4_B2 A !_LC6_B2
# !_LCI_A4 A !_LC1_C9 A _LC4_B2 A !_LC6_B2
A _LC1_A4 A !_LCI_C9 A !_LC4_B2 A !_LC6_B2
# !_LC1_A4 A _LCl_C9 A !_LC4_B2 A !_LC6_B2;

-- Node name is ’lctrlzctr1_1l:5’
-- Equation name is ’_L01_A13’, type is buried
_L01_A13 = LCELL( -EQO47);
_EQO47 = !x0 A x1
# :0 A !x1;

-- Node name is ’lctrlzctr1_1l:8’
-- Equation name is ’_LC1_C7’, type is buried
_LCI_C7 = LCELL( _E0048);

_EQO48 = !xO A x1;

157

-- Node name is ’ICtrlzctr1_2I:5’
-- Equation name is ’_LCl_A12’, type is buried
_LCI_A12 = LCELL( _EQO49);
_EQO49 = !x1 A x2
# 11 A 1x2;

-— Node name is ’lCtr1:ctr1_2l:8’
-- Equation name is ’_LC1_A11’, type is buried
_LCl_A11 = LCELL( _EQOSO);

_EQOSO = !xl A x2;

-- Node name is ’Ictrlzctr1_3l:5’
-- Equation name is ’_LCS_A10’, type is buried
_L05_A10 = LCELL( _E0051);
_E0051 = !x2 A 13
# :2 A !x3;

** COMPILATION SETTINGS A TIMES **

B.3 Generate Implicit Structural Speciﬁcation

Next, a BLAPE program extracts the Boolean equations from the report ﬁle and

applies implicit structure. The resulting structural representation is a BLIF-formatted

gate-level circuit Speciﬁcation.

.inputs a0 a1 a2 a3 10 11 x2 x3
.outputs p0 p1 p2 p3 p4 p5 p6 p7

.gate not a=-LC5_03 O=NOT_LCS_C3
.gate not a=_LC4_C3 O=NOT_LC4_C3
.gate not a=_LC3_CQ O=NOT_LC3_C9
.gate not a=_LCl_A12 O=NOT_L01_A12
.gate not a=_LC5_A10 O=NOT_LC5_A10
.gate not a=_LC1_A11 O=NOT_LC1_A11
.gate not a=_LC6_A10 O=NOT_L06_A10
.gate not a=-LC4_A10 O=NOT_LC4_A10
.gate not a=_LC1_C3 O=NOT_LC1_C3
.gate not a=_LCi_A10 O=NOT-LC1_A10
.gate not a=_LC4_A4 O=NOT_LC4_A4
.gate not a=_L05_A4 O=NOT_LC5_A4
.gate not a=_LC2_03 O=NOT_LC2_C3

158

I?“ _
IL

.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate

.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate

.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate

.gate
.gate
.gate

not
not
not
not
not
not
not
not
not
not
not
not
not
not
not
not
not
not
not
not
not
not
not
not
not

not
not
not
not
not
not
not
not

buf1
buf1
buf1
buf1
buf1
buf1
buf1
buf1

and3
and3
and3

a=_LC1_A13 O=NOT_LCI_A13

a=_LC1_C7
a=_LC6_A5

a=_LC3_A10 O=NOT_LC3_A10
a=_LC2_A10 O=NOT_LC2_A10

a=_LCl_A5
a=_LC1_A8
a=_LC1_B2
a=_LC4_A1
a=_LC4_A5
a=_LC8_A1
a=_LC3_A5
a=_LC2_A1
a=_LC6-A1
a=_LC2_A5
a=_LC1_A1
a=_LC8_B2
a=_LC2_B2
a=_LC1-09
a=_LC1_A4
a=_LC4_B2
a=_LC6_B2
a=_LC2_A4
a=_LC5_A5

a=_LC7_A10 O=NOT_LC7_A10

a=a0
a=a1
a=a2
a=a3
a=xO
a=x1
a=x2
a=13

O=NOT_LC1_C7
O=NOT_LC6_A5

O=NOT_LC1_A5
O=NOT_LC1-A8
O=NOT_LC1_BZ
O=NOT_LC4_A1
O=NOT-LC4-A5
O=NOT_LC8_A1
O=NOT_LC3_A5
O=NOT_LC2_A1
O=NOT_LC6_A1
O=NOT_LC2_A5
O=NOT_LC1_A1
O=NOT_LC8_B2
O=NOT_LC2_B2
O=NOT_LCI-C9
O=NOT_LCI_A4
O=NOT_LC4_B2
O=NOT_LCG_B2
O=NOT_LC2_A4
O=NOT_LCS_A5

O=NOTa0
O=NOTa1
O=NOTa2
O=NOTa3
O=NOTxO

=NOTx1
O=NOTx2
O=NOT13

a=_LC1_Cl4 O=p0

 

a=_LC5_C6
a=_LC5_A1
a=_LC3_A1
a=-LC7_A1
a=_LC7_B2
a=_LC5_B2
a=_LC3_82

a=a1 b=x2
a=a0 b=x2

O=p1
O=p2
O=p3
O=p4

O=p6
O=p7

c=NOTx3 O=t0
c=x3 0=t1

a=a1 b=NOTx2 c=x3 O=t2

.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate

and3 a=a0 b=a1 c=x3 O=t3

or4 a=t0 b=t1 c=t2 d=t3 O=_LC4_A10

and4 a=NOTa0 b=a1 c=NOTx2 d=x3 O=t4

and4 a=a0 b=a1 c=x2 d=x3 O=t5

or2 a=t4 b=t5 O=_LC4_C3

and3 a=a1 b=x2 c=x3 O=t6

and3 a=a1 b=NOTa2 c=13 O=t7

and3 a=a2 b=x2 c=NOTx3 O=t8

and4 a=NOTa1 b=a2 c=NOTx2 d=x3 O=t9

or4 a=t6 b=t7 c=t8 d=t9 O=-L05_C3

and4 a=-LC4_C3 b=NOT-L05_C3 c=NOTx2 d=x3 O=t10
and4 a=_LC4_03 b=NOT_LC5_03 c=x2 d=NOTx3 O=t11
and2 a=NOT_LC4_03 b=_LC5_C3 O=t12

and3 a-_LC5_C3 b=x2 c=13 O=t13

and3 a-_LCS_C3 b=NOTx2 c=NOTx3 O=t14

or5 a=t10 b=t11 c=t12 d=t13 e=t14 O=_LC1_C3
or2 a=_LC4_03 b=a2 O=_LC6_CS

and4 a=a1 b=_LCG_03 c=x2 d=13 O=t15

and4 a=NOTa1 b=-L06_03 c=NOTx2 d=x3 O=t16
or2 a=t15 b=t16 O=_L03_C3

and3 a=a2 b=x2 c=x3 O=t17

and3 a=a2 b=NOTa3 C813 O=t18

and3 a-a3 b-x2 c=NOTx3 O=t19

and4 a=NOTa2 b=a3 c=NOTx2 d=x3 O=t20

or4 a=t17 b=t18 c=t19 d=t20 O=_LC5_A4

and4 a-NOTal b=_LC6_C3 c=NOTx2 d-xa O=t21
buf1 a=t21 O=_LC2_03

or2 a=_L03_C3 b=a3 0=_LC2_C9

and4 a-NOTa2 b=_LC2_09 c=NOTx2 d=x3 O=t22
bufl a=t22 O=_LC3_09

and2 a=_LC3_C9 b=NOTxS O=t23

and2 a=NOTa3 b=_L03_C9 O=t24

and3 a=a3 b=NOT_LCS_C9 c=13 O=t25

or3 a=t23 b=t24 c=t25 O=_LC1_C9

and3 a=a0 b=NOT-LCl_A12 c=_LCS_A10 O=t26
and3 a=a0 b=NOTa1 c=_LC5_A10 O=t27

and3 a=a1 b=_LC1_A12 c=NOT_L05_A10 O=t28
and3 a=NOTa0 b=a1 c=_LC1_A12 O=t29

or4 a=t26 b=t27 c=t28 d=t29 O=_LC7_A10

and2 a=NOT_LCl_A11 b=_LC7_A10 0=t30

and2 a=NOTaO b=_LC7_A10 O=t31

and2 a=NOT_LC1_A12 b=_LC7_A10 O=t32

and4 a=a0 b=_LCl_A11 c=_LCI_A12 d=NOT_LC7_A10 O=t33
or4 a=t30 b=t31 c=t32 d=t33 O=_L03_A10

and4 a=a0 b=a1 c=NOT_L01_A11 d=_LC$_A10 O=t34
and3 a=a0 b=_LCl_A11 c=NOT_LC5_A10 O=t35
and3 a=a1 b=_LCl_A11 c=NOT_LC5_A10 O=t36

160

.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate

and3 a=NOTa0 b=a1 c=_LCI_A11 O=t37

or4 a=t34 b=t35 c=t36 d=t3? O=_LC6_A10

and2 a=NOT_LC1_A12 b=_LC4_A10 O=t38

and3 a=NOTa2 b=_LC4_A10 c=NOT_LC6_A10 O=t39

and4 a=a2 b=_LCl_A12 c=NOT_LC4_A10 d=NOT_LC6_A10 O=t40
and3 a=a2 b=_LC4_A10 c=_L06_A10 O=t41

and4 a=NOTa2 b=_LC1_A12 c=NOT_LC4_A10 d=-L06_A10 O=t42
or5 a=t38 b=t39 c=t40 d=t41 e=t42 O=_LC2_A10

and3 a=NOT_LCl_A11 b=_LC4_A10 c=_LC6_A10 O=t43

and3 a=a2 b=NOT_LCl_A11 c=_LC4_A10 O=t44

and3 a=_LC1_A11 b=NOT_LC4_A10 c=_LC6_A10 O=t45

and3 a=a2 b=_LCl_A11 c=NOT_LC4-A10 O=t46

or4 a=t43 b=t44 c=t45 d=t46 O=_LC1_A10

and3 a=a3 b=_LCl_A10 c=_LCl_C3 O=t47

and4 a=NOTa3 b=_LCi_A10 c=_LCl-A12 d=NOT_LC1-03 O=t48
and2 a=NOT_LCl_A12 b=_LC1_C3 O=t49

and3 a=NOTa3 b=NOT_LCl_A10 c=_LC1_C3 O=t50

and4 a=a3 b=NOT_LC1-A10 c=_LCI_A12 d=NOT_LCl_C3 O=t51
or5 a=t47 b=t48 c=t49 d=t50 e=t51 O=_LC1_A8

and3 a=_LC1_A10 b=NOT_LCi_A11 c=_LC1_03 O=t52

and3 a=a3 b=NOT_LC1_A11 c=_LC1_C3 O=t53

and3 a=_LCI_A10 b=_LC1_A11 c=NOT_LC1_03 O=t54

and3 a=a3 b=-LC1_A11 c=NOT_L01_C3 O=t55

or4 a=t52 b=t53 c=t54 d=t55 O=_LC4_A4

and3 a=_LC2_C3 b=NOT_LC4_A4 c=NOT_LC5_A4 O=t56

and3 a=NOT_LC2_C3 b=NOT_LC4_A4 c=_L05_A4 0=t57

and3 a=NOT_LC1_A12 b=_LC2_C3 c=NOT_LC5_A4 O=t58

and3 a=NOT_LCl_A12 b=NOT_LC2_03 c=_LC5_A4 O=t59

and4 a=_LC1_A12 b=_LC2_C3 c=_LC4_A4 d=_L05_A4 O=t60
and4 a=_LCl_A12 b=NOT-LC2_C3 c=_LC4_A4 d=NOT_LC5_A4 O=t61
or6 a=t56 b=t57 c=t58 d=t59 e=t60 f=t61 O=_LC2_A4
and3 a--LC2_C3 b=NOT_L05_A4 c=NOTx2 O=t62

and3 a=NOT_LC2-C3 b=_LCS_A4 c=NOTx2 0=t63

and3 a=_LC2_C3 b=NOT_LC5_A4 c=x1 O=t64

and3 a=NOT_LC2-C3 b=_LC5_A4 c=xl O=t65

and4 a=_LC2_C3 b=_L05_A4 c=NOTxl d=x2 O=t66

and4 a=NOT_LC2_C3 b=NOT_L05_A4 c=NOTx1 d=x2 O=t67

or6 a=t62 b=t63 c=t64 d=t65 e=t66 f=t67 O=_LC3_A4
and4 a-_LCB_A4 b=_LC4_A4 c=NOTxl d=x2 O=t68

and4 a=_LC3_A4 b=_LC4_A4 c=x1 d=NOTx2 O=t69

or2 a=t68 b=t69 O=_LCI_A4

and3 a=a0 b=_LCl_A12 c=NOT_LCI_A13 O=t70

and3 a=a0 b=NOTa1 c=-LC1_A12 O=t71

and3 a=a1 b=NOT_LCl_A12 c=_L01_A13 O=t72

and3 a=NOTa0 b=a1 c=_L01_A13 O=t73

or4 a=t70 b=t71 c=t72 d=t73 O=_LCS_A5

and2 a=NOT-LCl-A13 b=_LCS_A5 O=t74

161

 

.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate

and2 a=NOT_LC1_C7 b=_LC5_A5 O=t75

and2 a=NOTa0 b=_LCS_A5 O=t76

and4 a=a0 b=_LC1_A13 c=_LC1_C7 d=NOT_LC5_A5 O=t77
or4 a=t74 b=t75 c=t76 d=t77 O=_LC4_A5

and4 a=a0 b=a1 c=_LC1_A12 d=NOT_LC1_C7 O=t78

and3 a=a0 b=NOT_LCl_A12 c=_LCl_C7 O=t79

and3 a=a1 b=NOT_LCl_A12 c=_LC1_C7 O=t80

and3 a=NOTa0 b=a1 c=_LC1_C7 O=t81

or4 a=t78 b=t79 c=t80 d=t81 O=_LC6_A5

and2 a=NOT_LC1-A13 b=_LC3_A10 O=t82

and3 a=NOTa2 b=_LC3_A10 c=NOT_L06_A5 O=t83

and4 =a2 b=_LCl_A13 c=NOT_LC3_A10 d=NOT_LC6_A5 O=t84
and3 a=a2 b=_LC3_A10 c=_L06_A5 O=t85

and4 a=NOTa2 b=_LC1_A13 c=NOT_LC3_A10 d=_LCG_A5 0=t86
or5 a-t82 b=t83 c=t84 d=t85 e=t86 O=_LC3_A5

and3 a=NOT_LC1_C7 b--LC3_A10 c=_L06_A5 0=t87

and3 a=a2 b=NOT_LC1_C7 c=_LC3_A10 O=t88

and3 a=_LCl_C7 b=NOT_LC3-A10 c=_LC6_A5 O=t89

and3 a-a2 b=_LCl_C7 c=NOT_LC3_A10 O=t90

or4 a=t87 b=t88 c=t89 d=t90 O=-LC7_A5

and2 a=NOTa3 b=_LC2_A10 O=t91

and2 a=NOT_LC1_A13 b=_LC2_A10 O=t92

and3 a=a3 b=_LCl_A13 c=NOT_LC2_A10 O=t93

or3 a=t91 b-t92 c=t93 O=_LC2_A1

and2 a-_LC1_A13 b=-LC7_A5 O=t94

bufl a=t94 O=_LC2_A5

and3 a=NOT_LCl-C7 b-_LC2_A10 c=_LC7_A5 O=t95

and3 a=a3 b-NOT_L01_C7 c=-LC2_A10 O=t96

and3 a=_LCl-CT b=NOT-LC2_A10 c=_LC7_A5 O=t97

and3 a=a3 b-_LCI_C7 c=NOT_LC2_A10 O=t98

or4 a=t95 b=t96 c=t97 d=t98 O=_LC1_A5

and2 a=NOT_LCl-A5 b=_LC1_A8 O=t99

and2 a=_LC1_A8 b=NOT_LCI_A13 0=t100

and3 a=_LC1_A5 b=NOT_LC1_A8 c=_LC1_A13 O=t101

or3 a=t99 b=t100 C-t101 O=_L08_B2

and3 a=_LC1_A5 b=_LC1_A8 c=NOT_LC1_C7 O=t102

and3 a=_LC1_A5 b=NOT_LC1_A8 c=_LC1_C7 O=t103

or2 a=t102 b=t103 O=_LCI_B2

and2 a=NOT_LC1_B2 b=_LC2_A4 O=t104

and2 a=NOT-LCl_A13 b=_LC2_A4 O=t105

and3 a=_LC1_A13 b=_LCl_B2 c=NOT_LC2_A4 O=t106

or3 a=t104 b=t105 c=t106 O=_LC2_B2

and4 a=_LC1_A13 b=_LCl_B2 c=NOT_LCI_C7 d=_LC2_A4 O=t107
and4 a=-LC1_A13 b=_LC1_B2 c=_LC1_C7 d=NOT-LC2_A4 O=t108
or2 a=t107 b=t108 O=-LC4_B2

and2 a=a0 b=xO O=t109

buf1 a=t109 O=_LC1_Ci4

162

.L'

.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate
.gate

and3 a=NOTa0 b=a1 c=x0 O=t110

and3 a=a0 b=NOTx0 c=xl O=t111

and3 a=a1 b=x0 c=NOTxl O=t112

and3 a=a0 b=NOTa1 c=xl O=t113

or4 a=t110 b=t111 c=t112 d=t113 O=_LC5_C6

and3 a=a0 b=xO c=x1 O=t114

and3 a=a0 b=a1 c=x1 O=t115

and3 a=a1 b=xO c=x1 O=t116

and3 a=NOTa0 b=a1 c=xo O=t117

or4 a=t114 b=t115 c=t116 d=t117 O=_LC4_A1

and2 a=_LC4_A5 b=NOTxO O=t118

and3 a=NOTa2 b=NOT_LC4_A1 c=_LC4_A5 O=t119

and4 a=a2 b=NOT-LC4_A1 c=NOT_LC4_A5 d=x0 O=t120
and3 a=a2 b=_LC4_A1 c=_LC4_A5 O=t121

and4 a=NOTa2 b=_LC4_A1 c=NOT_LC4_A5 d=xO O=t122

or5 a=t118 b=t119 c=t120 d=t121 e=t122 O=_LCS_A1
and3 a=a2 b=_LC4_A5 c=NOTxO O=t123

and3 a=a2 b=NOT_LC4_A5 c=xO 0=t124

and3 a=_LC4_A1 b=_LC4_A5 c=NOTxO O=t125

and3 a=_LC4_A1 b=NOT_LC4_A5 C310 O=t126

or4 a=t123 b=t124 c=t125 d=t126 O=_LC8_A1

and2 a=_LC3_A5 b=NOTxO O=t127

and3 a=NOTa3 b=_L03_A5 C8NOT_LC8_A1 O=t128

and4 a=a3 b=NOT_LC3_A5 c=NOT_LC8-A1 d=x0 O=t129
and3 a=a3 b=_LC3_A5 c=_LC8_A1 O=t130

and4 a=NOTa3 b=NOT_LCS_A5 c=_L08_A1 d=xO 0=t131

or5 a=t127 b=t128 c=t129 d=t130 e=t131 O=_LC3_A1
and3 a=a3 b=_LC3_A5 c=NOT10 O=t132

and3 a=_LC3_A5 b=_L08_A1 c=NOTxo O=t133

and3 a=a3 b=NOT_LC3-A5 c=x0 O=t134

and3 a=NOT_LC3-A5 b=_LCB_A1 c=xo O=t135

or4 a=t132 b=t133 c=t134 d=t135 O=_LC6_A1

and3 a=NOT_LC2_A1 b=_LC2_A5 c=NOT_LC6_A1 0=t136
and3 a=_LC2_A1 b=NOT_LC2_A5 c=NOT_LC6_A1 O=t137

and3 a=NOT-LC2_A1 b=_LC2_A5 c=NOTxO O=t138

and3 a=_LC2_A1 b=NOT_LC2-A5 c=NOTx0 O=t139

and4 a=_LC2_A1 b=_LC2_A5 c=_LC6_A1 d=xO O=t140

and4 a=NOT_LC2_A1 b=NOT_LC2_A5 c=_L06_A1 d=xO 0=t141
or6 a=t136 b=t137 c=t138 d=t139 e=t140 f=t141 O=_LC7_A1
and4 a=NOT_LC2_A1 b=_LC2_A5 c=_LC6_A1 d=NOTx0 O=t142
and4 a=_LC2_A1 b=NOT_LC2_A5 c=_LC6_A1 d=NOTx0 O=t143
and4 a=_LC2_A1 b=_LC2_A5 c=_LC6_A1 d=x0 O=t144

and4 a=NOT_LC2_A1 b=NOT_LC2_A5 c=_L06_A1 d=x0 O=t145
or4 a=t142 b=t143 c=t144 d=t145 O=_LC1_A1

and2 a=NOT_LC1_A1 b=_LCB_B2 O=t146

and2 a=-LC8_B2 b=NOTx0 O=t147

and3 a=_LC1_A1 b=NOT-LC8_B2 c=xo O=t148

163

.gate or3 a=t146 b=t147 c=t148 O=_LC7_B2

.gate and2 a=_LC2_B2 b=-L08_B2 O=t149

.gate and2 a=NOT_LCl_A1 b=_LC2_B2 O=t150

.gate and2 a=_LC2_B2 b=NOTxO O=t151

.gate and4 a=_LC1_A1 b=NOT_LC2_B2 c=NOT_LC8_B2 d=xO O=t152

.gate or4 a=t149 b=t150 c=t151 d=t152 O=_LC5_B2

.gate and4 a=_LCl_A1 b=NOT_LC2_B2 c=NOT_L08_B2 d=xO O=t153

.gate bufl a=t153 O=_LC6_B2

.gate and4 a=_LC1_A4 b=NOT_LC1_C9 c=-LC4_B2 d=_LC6_B2 O=t154

.gate and4 a=NOT_LCl_A4 b=_LCl_C9 c=,LC4-B2 d=_LC6_B2 O=t155

.gate and4 a=_LCi_A4 b=_LCl_C9 c=NOT_LC4_B2 d=_LC6_B2 O=t156

.gate and4 a=NOT_LC1_A4 b=NOT_LCI_C9 c=NOT_LC4_B2 d=_LC6_B2 O=t157
.gate and4 a=_LC1_A4 b=_LC1_C9 c=_LC4_B2 d=NOT_LC6_B2 O=t158

.gate and4 a=NOT_LC1_A4 b=NOT_LC1_C9 c=_LC4_B2 d=NOT_LC6_B2 O=t159
.gate and4 a=_LC1_A4 b=NOT_LC1_C9 c=NOT_LC4_B2 d=NOT_LC6_B2 O=t160
.gate and4 a=NOT-LC1_A4 b=_LC1_C9 c=NOT_LC4_B2 d=NOT_LC6_B2 O=t161
.gate or8 a=t154 b=t155 c=t156 d=t157 e=t158 f=t159 g=t160 h=t161 O=_LC3_B2
.gate and2 a=NOTx0 b=xl O=t162

.gate and2 a=10 b=NOTxl O=t163

.gate or2 a=t162 b=t163 O=_L01_A13

.gate and2 a=NOTxO b=xl O=t164

.gate bufl a=t164 O=_L01_C7

.gate and2 a=NOTx1 b=x2 O=t165

.gate and2 a=x1 b=NOTx2 O=t166

.gate or2 a=t165 b=t166 O=_LCI_A12

.gate and2 a=NOTx1 b=x2 O=t167

.gate bufl a=t167 O=_LC1_A11

.gate and2 a=NOTx2 b=13 O=t168

.gate and2 a=x2 b-NOTx3 0=t169

.gate or2 a=t168 b=t169 O-_LCS-A10

 

.end

B.4 Network Levelization

The levelization of the network is given below.

( 0) NODE= o LEVEL= 0 NAME: ROOT
( 1) NODE= 1 LEVEL= 0 NAME: a0
( 2) NODE= 2 LEVEL= 0 NAME: a1
( 3) NODE= 3 LEVEL= 0 NAME: a2
( 4) NODE= 4 LEVEL= 0 NAME: a3
( 5) NODE= 5 LEVEL= 0 NAME: 10
( 6) NODE= 6 LEVEL= 0 NAME: x1

164

AAA/\AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/\AAAAAA

7)

8)

9)
10)
11)
12)
13)
14)
15)
16)
17)
18)
19)
20)
21)
22)
23)
24)
25)
26)
27)
28)
29)
30)
31)
32)
33)
34)
35)
36)
37)
38)
39)
40)
41)
42)
43)
44)
45)
46)
47)
48)
49)
50)
51)
52)
53)

NODE= 7
NODE= 8
NODE=
NODE8
NODE8
NODE8
NODE8
NODE8
NODE8
NODE=
NODE8
NODE8
NODE8
NODE8
NODE8 86
NUDE-205
NODE8212
NODE8213
NODE8214
NODE8 63
NODE= 65
NODE8 68
NODE8 72
NODE8 73
NODE8 74
NODE8 87
NODE8 88
NODE8 89
NODE=206
NODE8207
NODE8208
NODE=209
NODE=210
NODE8215
NODE=271
NODE=272
NODE8274
NODE8276
NODE8277
NODE=279
NODE8281
NODE=282
NODE8 9
NODE= 67
NODE= 70
NODE= 75
NODE8 90

LEVEL=
LEVEL=
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL=
LEVEL=
LEVEL8

LEVEL8
LEVEL=

LEVEL8
LEVEL8

LEVEL=
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL=
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL=

wwwwwNNMNNMMMMNNNNNMNMNNNNNMHHHHHHHHHHHHHHHHHOO

NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:

12

x3
NOTaO
NOTa1
NOTa2
NOTa3
NOTXO
NOTxl
NOTx2
NOTx3
t1

t3

t5

t6

t17
t109
t114
t115
t116

t0

t2

t4

t7

t8

t9

t18
t19
t20
-LC1_Cl4
t110
t111
t112
t113
t117
t162
t163
t164
t165
t166
t167
t168
t169

p0
_LC4_A10
_LC4_C3
_LC5_C3
-LC5_A4

165

 

54)
55)
56)
57)
58)
59)
60)
61)
62)
63)
64)
65)
66)
67)
68)
69)
70)
71)
72)
73)
74)
75)
76)
77)
78)
79)
80)
81)
82)
83)
84)
85)
86)
87)
88)
89)
90)
91)
92)
93)
94)
95)
96)
97)
98)
99)
100)

NODE=211
NODE=216
NODE=273
NODE=275
NODE=278
NODE=280
NODE=283
NODE= 10
NODE=
NODE8
NODE8
NODE=
NODE=
NODE=
NODE8
NODE=
NODE=
NODE=
NUDE8
NODE8
NODE= 82
NODE8101
NODE8103
NODE8113
NODE8155
NODE=157
NODE8167
NODE= 76
NODE= 77
NODE8 78
NODE= 83
NODE= 84
NODE8 91
NODE=100
NODE8102
NODE=110
NODE=111
NODE=112
NODE8115
NODE=122
NODE=124
NODE8154
NODE=156
NODE=164
NODE=165
NODE=166
NODE= 81

LEVEL=
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL=
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8

LEVEL8

LEVEL8

LEVEL8
LEVEL8
LEVEL8
LEVEL8

0:010:0101mmmmmmmmmmmmmmmppphpppppppbphphbpppwwwwwww

NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:

_LC5_C6
_LC4_A1
-LC1_A13
_LC1_C7
-LC1_A12
_LC1_A11
_LC5_A10

p1

NOT_LCS-C3
NOT_LC4_C3
NOT_LCI_A12
NOT_LCS_A10
NOT-LCl_A11
NOT_LC4_A10
NOT_LC5_A4
NOT_LCI_A13
NOT_LCI_C7
NOT_LC4_A1

t13
t14
-LC6_C3
t27
t29
t37
t71
t73
t81
t10
t11
t12
t15
t16
t21
t26
t28
t34
t35
t36
t38
t44
t46
t70
t72
t78
t79
t80
_LC1_C3

166

 

AAA/NAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/\AAAAAAAAAA

101)
102)
103)
104)
105)
106)
107)
108)
109)
110)
111)
112)
113)
114)
115)
116)
117)
118)
119)
120)
121)
122)
123)
124)
125)
126)
127)
128)
129)
130)
131)
132)
133)
134)
135)
136)
137)
138)
139)
140)
141)
142)
143)
144)
145)
146)
147)

NODE= 85
NODE= 92
NODE=104
NODE=114
NODE=158
NODE=168
NODE= 23
NODE= 25
NODE= 29
NODE= 32
NODE= 53
NODE= 54
NODE= 93
NODE=105
NODE=106
NODE=107
NODE=118
NODE=119
NODE=121
NODE=123
NODE8128
NODE=133
NODE=139
NODE=144
NODE=146
NODE=148
NODE=159
NODE=160
NODE=161
NODE= 94
NODE=108
NODE=116
NODE=117
NODE=125
NODE=135
NODE=140
NODE=145
NODE=147
NODE=149
NODE=162
NODE= 26
NODE= 95
NODE=109
NODE=120
NODE=126
NODE=127
NODE=132

LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL8
LEVEL=

LEVEL=

LEVEL=
LEVEL8
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL8
LEVEL=
LEVEL=
LEVEL8
LEVEL=
LEVEL=
LEVEL=
LEVEL8
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL8
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=

000000000mmmmmmmmmﬂﬂ‘lﬂﬂﬂﬂﬂﬂ‘lﬂﬂﬂVNNNNNNNNN00000Q

NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:

_LC3_C3
_LC2_CS
_LC7_A10
-LC6_A10
-LC5_A5
_LC6_A5

NOT_LC6_A10
NOT_LCI_C3
NOT_LC2_C3
NOT_LC6_A5
NOT_LC5_A5
NOT_LC7_A10

-LC2_C9
t30
t31
t32
t41
t42
t43
t45
t49
t53
t58
t62
t64
t66
t74
t75
t76
t22
t33
t39
t40
_LC1_A10
t55
t59
t63
t65
t67
t77

NOT_LC1_A10

_LC3_C9
_LC3_A10
-LC2_A10
t47

t48

t52

167

 

AAAAA/KINAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

148)
149)
150)
151)
152)
153)
154)
155)
156)
157)
158)
159)
160)
161)
162)
163)
164)
165)
166)
167)
168)
169)
170)
171)
172)
173)
174)
175)
176)
177)
178)
179)
180)
181)
182)
183)
184)
185)
186)
187)
188)
189)
190)
191)
192)
193)
194)

NODE=134
NODE=150
NODE=163
NODE= 19
NODE= 33
NODE= 34
NODE= 39
NODE= 96
NODE= 97
NODE=129
NODE=130
NODE=136
NODE=169
NODE=170
NODE=172
NODE=175
NODE=176
NODE=180
NODE=181
NODE=187
NODE=217
NODE=218
NODE=220
NODE=223
NODE=225
NODE= 27
NODE= 98
NODE=131
NODE=141
NODE=142
NODE=151
NODE=152
NODE=171
NODE=173
NODE=177
NODE=178
NODE=182
NODE=189
NODE=219
NODE=221
NODE=224
NODE=226
NODE= 36
NODE= 99
NODE=137
NODE=138
NODE=153

LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL8
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL8
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL8
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL8
LEVEL=
LEVEL=
LEVEL=
LEVEL8
LEVEL8
LEVEL=

000

10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
12
12
12
12
12

NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:

t54
_LC3_A4
_LC4_A5
NOT_LC3_C9
NOT_LC3_A10
NOT_LC2_A10
NOT_LC4_A5
t23

t24

t50

t51
-LC4_A4
t82

t83

t85

t87

t88

t91

t92

t96

t118

t119

t121

t123

t125
NOT_LC4_A4
t25
_LC1-A8
t60

t61

t68

t69

t84

t86

t89

t90

t93

t98

t120

t122

t124

t126
NOT_LCI_A8
-LC1-C9
t56

t57
_LCI_A4

168

 

AAA/\AAAAAAAAA/KAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

195) NODE=174 LEVEL= 12 NAME: _LC3_A5
196) NODE=179 LEVEL= 12 NAME: _LC7_A5
197) NODE=183 LEVEL8 12 NAME: _LC2_A1
198) NODE=192 LEVEL8 12 NAME: t100

199) NODE=222 LEVEL= 12 NAME: _LC5_A1
200) NODE=227 LEVEL8 12 NAME: _LC8_A1
201) NODE= 11 LEVEL= 13 NAME: p2

202) NODE8 40 LEVEL= 13 NAME: NOT_LC8_A1
203) NODE= 41 LEVEL8 13 NAME: NOT_LC3_A5
204) NODE= 42 LEVEL8 13 NAME: NOT_LC2_A1
205) NUDE8 48 LEVEL8 13 NAME: NOT_LCI_09
206) NODE= 49 LEVEL8 13 NAME: NOT_LC1_A4
207) NODE=143 LEVEL8 13 NAME: _LC2_A4
208) NODE=184 LEVEL8 13 NAME: t94

209) NODE8186 LEVEL8 13 NAME: t95

210) NODE=188 LEVEL8 13 NAME: t97

211) NODE=228 LEVEL8 13 NAME: t127

212) NODE=231 LEVEL8 13 NAME: t130

213) NODE=234 LEVEL8 13 NAME: t132

214) NODE=235 LEVEL8 13 NAME: t133

215) NODE8 52 LEVEL8 14 NAME: NOT_LC2_A4
216) NODE=185 LEVEL= 14 NAME: _LC2_A5
217) NODE8190 LEVEL8 14 NAME: _LC1_A5
218) NODE8199 LEVEL8 14 NAME: t105

219) NODE8229 LEVEL= 14 NAME: t128

220) NODE=230 LEVEL8 14 NAME: t129

221) NODE=232 LEVEL8 14 NAME: t131

222) NODE8236 LEVEL8 14 NAME: t134

223) NODE8237 LEVEL8 14 NAME: t135

224) NODE8 35 LEVEL8 15 NAME: NOT_LC1_A5
225) NODE= 44 LEVEL8 15 NAME: NOT_LC2_A5
226) NUDE-193 LEVEL= 15 NAME: t101

227) NODE=195 LEVEL8 15 NAME: t102

228) NODE=196 LEVEL8 15 NAME: t103

229) NODE=233 LEVEL8 15 NAME: _LC3_A1
230) NODE=238 LEVEL8 15 NAME: _LC6_A1
231) NODE=241 LEVEL8 15 NAME: t138

232) NODE= 12 LEVEL= 16 NAME: p3

233) NODE= 43 LEVEL8 16 NAME: NOT_LC6_A1
234) NODE=191 LEVEL8 16 NAME: t99

235) NODE8197 LEVEL8 16 NAME: _LC1_B2
236) NODE8242 LEVEL8 16 NAME: t139

237) NODE=243 LEVEL= 16 NAME: t140

238) NODE=244 LEVEL= 16 NAME: t141

239) NODE=246 LEVEL= 16 NAME: t142

240) NODE=247 LEVEL= 16 NAME: t143

241) NODE=248 LEVEL= 16 NAME: t144

169

 

 

AAA/KAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

242)
243)
244)
245)
246)
247)
248)
249)
250)
251)
252)
253)
254)
255)
256)
257)
258)
259)
260)
261)
262)
263)
264)
265)
266)
267)
268)
269)
270)
271)
272)
273)
274)
275)
276)
277)
278)
279)
280)
281)
282)
283)

NODE=249
NODE= 37
NODE=194
NODE=200
NODE=202
NODE82O3
NODE=239
NODE=240
NODE=250
NODE= 45
NODE= 46
NODE=198
NODE=204
NODE=245
NODE=252
NODE8 13
NODE= 50
NODE8201
NODE8251
NODE=253
NODE= 47
NODE=254
NODE=255
NODE=256
NODE=257
NODE= 14
NODE=258
NODE=260
NODE=259
NODE=261
NODE8 15
NODE8 51
NODE8262
NODE=263
NODE=264
NODE=265
NODE=266
NODE=267
NODE=268
NODE=269
NODE8270
NODE8 16

LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL=
LEVEL8
LEVEL=
LEVEL=
LEVEL8
LEVEL8
LEVEL=
LEVEL8
LEVEL=
LEVEL=
LEVEL=
LEVEL8
LEVEL8
LEVEL=
LEVEL8
LEVEL=
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL=
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8
LEVEL8

LEVEL=
LEVEL8
LEVEL=
LEVEL8
LEVEL8

NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:
NAME:

t145

NOT_LC1_B2

_LC8_B2
t106
t107
t108
t136
t137
_LC1_A1

NOT_LC1_A1
NOT-LC8_B2

t104
_LC4_B2
_LC7_A1
t147

p4

NOT-LC4_B2

-LC2_B2
t146
t148

NOT_LC2_B2

_LC7_B2
t149
t150
t151

p5

t152
t153
-L05_B2
_LC6-B2

p6

NOT-LC6_B2

t154
t155
t156
t157
t158
t159
t160
t161
_LC3_B2

p7

170

 

B.5 SIS Activity and Power Estimation

The switching activity and power estimates provided by SIS are given below.

Script started on Fri Jun 11 02:25:50 1999
Combinational power estimation, with Zero delay model.
Network: bth4.b1f, Power 8 2292.5 uw assuming 20 MHz Clock and Vdd 8 5V

sis> time
elapse: 1.0 seconds, total: 1.0 seconds

sis> power_print

Node [3393] Cap. 8
Node [3394] Cap. 8 Switch Prob. 8
Node [3395] Cap. Switch Prob. 8 .29 Power 8
Node _LC5_C3 Cap. 8 11 Switch Prob. 0.47 Power 8 12.9
Node NOT_LC5_C3 Cap. 8 8 Switch Prob. 8 0.47 Power 8 9.4
Node -LC4_C3 Cap. 8 12 Switch Prob. 8 0.22 Power 8 6.6

Node NOT_LC4_C3 Cap. 8 2 Switch Prob. 8 0.22 Power 8 1.1
Node _LC3_C9 Cap. 8 5 Switch Prob. 8 0.12 Power 8 1.5

Node NOT_L03_C9 Cap. 8 3 Switch Prob. 8 0.12 Power 8 0.9
Node _LCI_A12 Cap. 8 46 Switch Prob. 8 0.50 Power 8 57.5
Node NOT_LC1_A12 Cap. 8 24 Switch Prob. 8 0.50 Power 8 30.0
Node _LC5_A10 Cap. 8 12 Switch Prob. 8 0.50 Power 8 15.0
Node NOT-LCS-A10 Cap. 8 9 Switch Prob. 8 0.50 Power 8 11.2
Node _LCI_A11 Cap. 8 26 Switch Prob. 8 0.38 Power 8 24.4
Node NOT_LC1_A11 Cap. 8 18 Switch Prob. 8 0.38 Power 8 16.9
Node _LC6_A10 Cap. 8 16 Switch Prob. 8 0.34 Power 8 13.7
Node NOT_L06_A10 Cap. 8 7 Switch Prob. 8 0.34 Power 8 6.0
Node _LC4_A10 Cap. 8 17 Switch Prob. 8 0.47 Power 8 19.9
Node NOT-LC4_A10 Cap. 8 14 Switch Prob. 8 0.47 Power 8 16.4
Node _LCI_C3 Cap. 8 17 Switch Prob. 8 0.47 Power 8 19.9
Node NOT_L01_03 Cap. 8 14 Switch Prob. 8 0.47 Power 8 16.4
Node _LC1_A10 Cap. 8 16 Switch Prob. 8 0.38 Power 8 15.0

.46 Power 8
.43 Power 8

Switch Prob. 8

Node aO Cap. 8 59 Switch Prob. 8 0.50 Power 8 73.8
Node a1 Cap. 8 75 Switch Prob. 8 0.50 Power 8 93.8
Node a2 Cap. 8 55 Switch Prob. 8 0.50 Power 8 68.8
Node a3 Cap. 8 48 Switch Prob. 8 0.50 Power 8 60.0
Node :0 Cap. 8 75 Switch Prob. 8 0.50 Power 8 93.8
Node :1 Cap. 8 32 Switch Prob. 8 0.50 Power 8 40.0
Node x2 Cap. 8 52 Switch Prob. 8 0.50 Power 8 65.0
Node :3 Cap. 8 66 Switch Prob. 8 0.50 Power 8 82.5
Node [3388] Cap. 8 0 Switch Prob. 8 0.38 Power 8 0.0
Node [3389] Cap. 8 0 Switch Prob. 8 0.47 Power 8 0.0
Node [3390] Cap. 8 0 Switch Prob. 8 0.49 Power 8 0.0
Node [3391] Cap. 8 0 Switch Prob. 8 0.50 Power 8 0.0
Node [3392] Cap. 8 0 Switch Prob. 8 0.49 Power 8 0.0

0 0 0.0

0 0 0.0

0 0 0.0

171

Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node

NOT_LC1_A10 Cap. 8 7 Switch Prob. 8 0.38 Power 8 6.6
-LC4_A4 Cap. 8 19 Switch Prob. 8 0.38 Power 8 17.8
NOT-LC4_A4 Cap. 8 6 Switch Prob. 8 0.38 Power 8 5.6

_LC5_A4 Cap. 8 23 Switch Prob. 8 0.47 Power 8 27.0

NOT_LC5_A4 Cap. 8 20 Switch Prob. 8 0.47 Power 8 23.4
-LC2_03 Cap. 8 21 Switch Prob. 8 0.12 Power 8 6.2
NOT_LC2_C3 Cap. 8 20 Switch Prob. 8 0.12 Power 8 5.9
_LCI_A13 Cap. 8 39 Switch Prob. 8 0.50 Power 8 48.8
NOT-LC1_A13 Cap. 8 13 Switch Prob. 8 0.50 Power 8 16.2
_LC1-C7 Cap. 8 33 Switch Prob. 8 0.38 Power 8 30.9

NOT-LC1_C7 Cap. 8 25 Switch Prob. 8 0.38 Power 8 23.4
_LC6_A5 Cap. 8 16 Switch Prob. 8 0.34 Power 8 13.7
NOT-LC6_A5 Cap. 8 7 Switch Prob. 8 0.34 Power 8 6.0
_LC3_A10 Cap. 8 17 Switch Prob. 8 0.47 Power 8 19.9

NOT_LC3_A10 Cap. 8 14 Switch Prob. 8 0.47 Power 8 16.4
_LC2-A10 Cap. 8 13 Switch Prob. 8 0.49 Power 8 16.0
NOT_LC2_A10 Cap. 8 9 Switch Prob. 8 0.49 Power 8 11.1

Switch Prob. 8 0.40 Power 8 12.1
2 Switch Prob. 8 0.40 Power 8 2.0
Switch Prob. 8 0.49 Power 8 12.3
6 Switch Prob. 8 0.49 Power 8 7.4
Switch Prob. 8 0.22 Power 8 7.1
2 Switch Prob. 8 0.22 Power 8 1.1
Switch Prob. 8 0.43 Power 8 17.2
0

-L01_A5 Cap. 8 12
NOT-LC1_A5 Cap. 8
-LC1_A8 Cap. 8 10
NOT_LCI_A8 Cap. 8
_LCI_B2 Cap. 8 13
NOT_L01_B2 Cap. 8
_LC4_A1 Cap. 8 16
NOT_LC4_A1 Cap. 8
_LC4_A5 Cap. 8 17
NOT_LC4-A5 Cap. 8
-LC8_A1 Cap. 8 16
NOT_LC8-A1 Cap. 8
_LC3_A5 Cap. 8 17
NOT_L03_A5 Cap. 8

7 Switch Prob. 0.43 Power 8 7.5
Switch Prob. 8 .47 Power 8 19.9
14 Switch Prob. 8 0.47 Power 8 16.4
Switch Prob. 8 0.44 Power 8 17.6
7 Switch Prob. 8 0.44 Power 8 7.7
Switch Prob. 8 0.49 Power 8 20.9
14 Switch Prob. 8 0.49 Power 8 17.2

_LC2_A1 Cap. 8 20 Switch Prob. 8 0.50 Power 8 24.9
NOT_LC2_A1 Cap. 8 18 Switch Prob. 8 0.50 Power 8 22.4
_L06_A1 Cap. 8 27 Switch Prob. 8 0.44 Power 8 29.9

6 Switch Prob. 8 0.44 Power 8 6.7
Switch Prob. 8 0.27 Power 8 13.0

18 Switch Prob. 8 0.27 Power 8 12.3
Switch Prob. 8 0.25 Power 8 8.7

4 Switch Prob. 8 0.25 Power 8 2.5
_LC8_B2 Cap. 8 8 Switch Prob. 8 0.48 Power 8 9.6
NOT_LC8_B2 Cap. 8 11 Switch Prob. 8 0.48 Power 8
_LC2_B2 Cap. 8 8 Switch Prob. 8 0.43 Power 8 8.7
NOT_LC2_B2 Cap. 8 8 Switch Prob. 8 0.43 Power 8 8.7
_LCl_C9 Cap. 8 18 Switch Prob. 8 0.30 Power 8 13.7
NOT_LC1_C9 Cap. 8 16 Switch Prob. 8 0.30 Power 8 12.2
_LCl_A4 Cap. 8 18 Switch Prob. 8 0.09 Power 8 4.0
NOT_LC1_A4 Cap. 8 16 Switch Prob. 8 0.09 Power 8

NOT_LC6_A1 Cap. =
_LC2_A5 Cap. - 19
NOT_LC2_A5 Cap. =
_LC1_A1 Cap. = 14
NOT_L01_A1 Cap. =

13.2

3.6

172

Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node

_LC4_B2 Cap. 8 18 Switch Prob. 8 0.03 Power 8 1.4
NOT_LC4_B2 Cap. 8 16 Switch Prob. 8 0.03 Power
_LC6_B2 Cap. 8 17 Switch Prob. 8 0.01 Power
NOT_LC6_B2 Cap. 8 16 Switch Prob. 8 0.01 Power
_LC2_A4 Cap. 8 12 Switch Prob. 8 0.44 Power
NOT_LC2_A4 Cap. 8 7 Switch Prob. 8 0.44 Power
_LC5_A5 Cap. 8 9 Switch Prob. 8 0.47 Power 8 10.5
NOT_LC5_A5 Cap. 8 4 Switch Prob. 8 0.47 Power

_LC7_A10 Cap. 8 9 Switch Prob. 8

NOT_LC7_A10 Cap. 8 4 Switch Prob.

NOTaO Cap. 8 26 Switch Prob. 8
NOTa1 Cap. 8 21 Switch Prob. 8
NOTa2 Cap. 8 32 Switch Prob. 8
NOTa3 Cap. 8 21 Switch Prob. 8
NOTxO Cap. 8 41 Switch Prob. 8
NOTxl Cap. 8 21 Switch Prob. 8
NOTx2 Cap. 8 48 Switch Prob. 8
NOTx3 Cap. 8 20 Switch Prob. 8
_LC1_Cl4 Cap. 8 1 Switch Prob.

"00000000

0.47 Power 8
8 0.47 Power
.50 Power 8 32.
.50 Power 8 26.
.50 Power 8 40.
.50 Power 8 26.
.50 Power 8 51.
.50 Power 8 26.
.50 Power 8 60.
.50 Power 8 25.
0.38 Power 8

CMUIOOHMNMHHMIOMMN

ONHOOQNNNMMN‘INNWWNthw

0.3

13.

8 7.6

- 4.7

1 .

0
OOONNMONO'IIIO

Enioioio'sz'qén.

16.

1 7
1.2
8 2.0
2 7
2 7

_LC5_06 Cap. 8 3 Switch Prob. 8 0.47 Power 8
_L05_A1 Cap. 8 3 Switch Prob. 8 0.49 Power 8
_L03_A1 Cap. 8 3 Switch Prob. 8 0.50 Power 8
_LC7_A1 Cap. 8 4 Switch Prob. 8 0.49 Power 8
_LC7_B2 Cap. 8 2 Switch Prob. 8 0.46 Power 8
_L05_B2 Cap. 8 3 Switch Prob. 8 0.43 Power 8
_LC3_B2 Cap. 8 5 Switch Prob. 8 0.29 Power
to Cap. 8 5 Switch Prob. 8 0.22 Power

t1 Cap. 8 5 Switch Prob. 8 0.22 Power 8
t2 Cap. 8 5 Switch Prob. 8 0.22 Power

t3 Cap. 8 5 Switch Prob. 8 0.22 Power

t4 Cap. 8 4 Switch Prob. 8 0.12 Power 8
t5 Cap. 8 4 Switch Prob. 8 0.12 Power 8
t6 Cap. 8 5 Switch Prob. 8 0.22 Power 8
t7 Cap. 8 5 Switch Prob. 8 0.22 Power 8
t8 Cap. 8 5 Switch Prob. 8 0.22 Power 8
t9 Cap. 8 6 Switch Prob. 8 0.12 Power 8
t10 Cap. 8 6 Switch Prob. 8 0.06 Power 8
t11 Cap. 8 6 Switch Prob. 8 0.00 Power 8
t12 Cap. 8 5 Switch Prob. 8 0.40 Power 8
t13 Cap. 8 5 Switch Prob. 8 0.22 Power 8
t14 Cap. 8 5 Switch Prob. 8 0.00 Power 8
_LC6_C3 Cap. 8 13 Switch Prob. 8 0.49 Power
t15 Cap. 8 4 Switch Prob. 8 0.17 Power 8
t16 Cap. 8 4 Switch Prob. 8 0.12 Power 8
_LC3_C3 Cap. 8 3 Switch Prob. 8 0.26 Power
t17 Cap. 8 5 Switch Prob. 8 0.22 Power 8
t18 Cap. 8 5 Switch Prob. 8 0.22 Power 8

173

1.

0

0.
1

I50]
‘1

2

3

Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node

t19 Cap. 8
t20 Cap. 8
t21 Cap. 8
_LC2_C9 Cap.
t22 Cap. 8
t23 Cap. 8
t24 Cap. 8
t25 Cap. 8
t26 Cap. 8
t27 Cap. 8
t28 Cap. 8
t29 Cap. 8
t30 Cap. 8
t31 Cap. 8
t32 Cap. 8
t33 Cap. 8
t34 Cap. 8
t35 Cap. 8
t36 Cap. 8
t37 Cap. 8
t38 Cap. 8
t39 Cap. 8
t40 Cap. 8
t41 Cap. 8
t42 Cap. 8
t43 Cap. 8
t44 Cap. 8
t45 Cap. 8
t46 Cap. 8
t47 Cap. 8
t48 Cap. 8
t49 Cap. 8
t50 Cap. 8
t51 Cap. 8
t52 Cap. 8
t53 Cap. 8
t54 Cap. 8
t55 Cap. 8
t56 Cap. 8
t57 Cap. 8
t58 Cap. 8
t59 Cap. 8
t60 Cap. 8
t61 Cap. 8
t62 Cap. 8
t63 Cap. 8
t64 Cap. 8

0101010101010101a1010101010101mmmmmmmmmmmmmmmmammmmmmm37:38-80:IIComa!

Switch
Switch
Switch

Prob.
Prob.
Prob.

5 Switch Prob.

Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch

Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.

Prob. 8

Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.

Prob. 8

Prob.

Prob. 8

Prob.
Prob.
Prob.
Prob.
Prob.

0000000000000000000000000000000000000000000ll000

.22
.12
.12

Power
Power
Power

0.49 Power

.12

.00

.00

.30

.22
.22

.22

.22
.38
.22
.22
.12
.17
.12
.12
.12
.30
.17
.24
.17
.03
.17
.22
.06
.12
.16
.03
.30
.18
.24
.19
.22
.03
.12
.00
.32

.03
.28

.00

.06
.06
.17
.03

174

Power

Power 8

Power

Power 8

Power

Power 8

Power
Power
Power
Power
Power
Power
Power
Power
Power
Power

Power 8

Power

Power 8

Power
Power
Power
Power
Power
Power
Power
Power
Power
Power

Power 8

Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power

Power 8

Power

0M000w00h01-‘0NMCAMW0MH0MMOMODNODHHHMHMMhMMMNw000llCHM

.8 escr 83¢: C>u>co c>cn p-lqih 01¢» a>cn C’Cﬂ a>~418 01:8 05:8 a>cn balm 512» 811434 84-4 44.4 c><3 c>¢o 0530 a>-q

Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node

t65 Cap. 8
t66 Cap. 8
t67 Cap. 8
_L03_A4 Cap.
t68 Cap. 8
t69 Cap. 8
t70 Cap. 8
t71 Cap. 8
t72 Cap. 8
t73 Cap. 8
t74 Cap. 8
t75 Cap. 8
t76 Cap. 8
t77 Cap. 8
t78 Cap. 8
t79 Cap. 8
t80 Cap. 8
t81 Cap. 8
t82 Cap. 8
t83 Cap. 8
t84 Cap. 8
t85 Cap. 8
t86 Cap. 8
t8? Cap. 8
t88 Cap. 8
t89 Cap. 8
t90 Cap. 8
_LC7_A5 Cap.
t91 Cap. 8
t92 Cap. 8
t93 Cap. 8
t94 Cap. 8
t95 Cap. 8
t96 Cap. 8
t97 Cap. 8
t98 Cap. 8
t99 Cap. 8
t100 Cap. 8
t101 Cap. 8
t102 Cap. 8
t103 Cap. 8
t104 Cap. 8
t105 Cap. 8
t106 Cap. 8
t107 Cap. 8
t108 Cap. 8
t109 Cap. 8

mpppppwwpbhmmmmmpppummmmmmmmmmmmmmmmmmmmmphucacao:

Switch
Switch
Switch

11 Switch Prob. -

Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch

10 Switch Prob.

Prob.
Prob.
Prob.

Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.

Switch Prob. 8

Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch

Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.

Prob. 8

Prob.
Prob.
Prob.
Prob.
Prob.

00000000000000000000000

0000000000000000000

“000Mw0Hkubt-80MHMMOJW
munmqopmo¢owmm~zﬂpbp

175

Power 8 3.6
Power 8 0.0
Power 8 3.3

Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power

.47 Power

Power 8

Power
Power
Power
Power

Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power

.40 Power 8
Power 8 3.

0
0
2
2
2
2
2
4
2
1
2.
1
1
1
3
2
3
1
1
1
3
1
1

cocnostisi-sobo'mininmookz'q'q'q'qlz'q'mi»

3
2
1
2
3
0
1
4.
3.
1
1
0
3
2
0
0
0
1

4
4
4
4
1
3
8
5
0
4
0
4
3
9
7
5
2
1
9

Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node

t110
t111
t112
t113
t114
t115
t116
t117
t118
t119
t120
t121
t122
t123
t124
t125
t126
t127
t128
t129
t130
t131
t132
t133
t134
t135
t136
t137
t138
t139
t140
t141
t142
t143
t144
t145
t146
t147
t148
t149
t150
t151
t152
t153
t154
t155
t156

Cap.
Cap.
Cap.
Cap. 8
Cap. -
Cap. 8
Cap. 8
Cap. 8
Cap.
Cap.
Cap.
Cap.
Cap.
Cap. 8
Cap.
Cap. 8
Cap.
Cap. 8
Cap.
Cap. 8
Cap. 8
Cap.
Cap. 8
Cap.
Cap.
Cap.
Cap.
Cap.
Cap.
Cap.
Cap.
Cap.
Cap.
Cap.
Cap.
Cap. 8
Cap. -
Cap. 8
Cap.
Cap. -
Cap. 8
Cap. 8
Cap. 8
Cap. 8
Cap. 8
Cap. -
Cap. 8

mmmwoscncnmbpbmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm

Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch
Switch

Prob. 8
Prob.
Prob. 8
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob.
Prob. -
Prob. 8
Prob.
Prob.
Prob.
Prob. 8
Prob.
Prob. 8
Prob.
Prob. 8
Prob. -
Prob. 8
Prob. 8
Prob. -
Prob. 8
Prob. 8
Prob.
Prob.
Prob.
Prob.
Prob. 8
Prob.
Prob. 8
Prob.
Prob. -
Prob. 8
Prob. 8
Prob.
Prob. 8
Prob.
Prob. -
Prob. 8
Prob. 8
Prob.

00000000000000000000000000000000000000000000000

.22
.22
.22
.22
.22
.22
.22
.22
.30
.19
.17
.14
.12
.17
.26
.06
.22
.34
.24
.16
.15
.10
.19
.10
.24
.19
.07
.36
.10
.27
.02
.12
.05
.09
.02
.12
.43
.29
.03
.27
.37
.25
.01
.01
.00
.00
.00

176

Power

Power 8

Power
Power

Power 8

Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power
Power

Power 8

Power

Power 8

Power

Power 8
Power 8
Power -
Power 8
Power 8
Power 8

Power

Power 8
Power -
Power 8

Power

ooooowpwowhpo~to~owupowwpMupmwpwowwwuwnwwmnnwwwto

c><3 C>h8ld sac» 5.0310 o:cn10 01¢» a>ss.b oocntm ¢-c>cn.h 0530 ¢»c>co«q date.» one» alustn 81-4-8 «I-q-A ~a-q

$337.18:

Node t157 Cap. 8 6 Switch Prob. 8 0.00 Power — 0.0
Node t158 Cap. 8 6 Switch Prob. 8 0.00 Power 8 0.0
Node t159 Cap. 8 6 Switch Prob. 8 0.02 Power 8 0.3
Node t160 Cap. 8 6 Switch Prob. 8 0.03 Power 8 0.5
Node t161 Cap. 8 6 Switch Prob. 8 0.25 Power 8 3.8
Node t162 Cap. 8 3 Switch Prob. 8 0.38 Power 8 2.8
Node t163 Cap. 8 3 Switch Prob. 8 0.38 Power 8 2.8
Node t164 Cap. 8 2 Switch Prob. 8 0.38 Power 8 1.9
Node t165 Cap. 8 3 Switch Prob. 8 0.38 Power -. 2.8
Node t166 Cap. 8 3 Switch Prob. 8 0.38 Power 8 2.8
Node t167 Cap. 8 2 Switch Prob. 8 0.38 Power 8 1.9
Node t168 Cap. 8 3 Switch Prob. 8 0.38 Power - 2.8
Node t169 Cap. 8 3 Switch Prob. 8 0.38 Power 8 2.8

Total Power: 2292 . 467728

B.6 BLAPE Activity and Power Estimation

The switching activity and power estimates provided by BLAPE are given below.

Activity Computation delay 8

0.03 Seconds

Node Activity Capacitance
a0 0.5000 cap859
a1 0.5000 cap875
a2 0 . 5000 cap855
a3 0 . 5000 cap848
x0 0 . 5000 cap875
11 0 . 5000 cap832
x2 0 . 5000 cap852
x3 0 . 5000 cap866
p0 0 . 3750 cap80
pl 0 . 4851 cap80
p2 0 . 4982 cap80
p3 0 . 4992 cap80

p4 0 . 4994 cap80
p5 0 . 4736 cap80
p6 O . 3864 cap80
p7 O . 4304 cap=0
NOT-LC5_C3 0 . 4672 cap88
NOT_LC4_C3 0 . 2129 cap=2
NOT_LC3_C9 O . 1318 cap83
NOT_LCI_A1 2 0 . 4922 cap824
N OT_LCS_A10 0 . 4922 cap89

177

NOT_LC1_A11
NOT_LC6_A10
NOT_LC4_A10
NOT_LCI_C3
NOT_LC1_A10
NOT_LC4_A4
NOT_LC5_A4
NOT_LC2_63
NOT_LC1_A13
NOT_LC1_C7
NOT_LC6_A5
NOT_LC3_A10
NOT_LC2_A10
NOT_LC1_A5
NOT_LC1_A8
NOT_LC1_B2
NOT_LC4_A1
NOT_LC4_A5
NOT_LC8_A1
NOT_LC3-A5
NOT_LC2-A1
NOT_LC6_A1
NOT_LC2_A5
NOT_LC1_A1
NOT_LC8_B2
NOT_LC2_B2
NOT_LC1_C9
NOT_LCl_A4
NOT_LC4_B2
NOT_LC6_B2
NOT_LC2_A4
NOT_LC5_A5

NOT_LC7_A10

NOTaO
NOTa1
NOTa2
NOTa3
NOTxO
NOTx1
NOTx2
NOTxS
t0

t1

t2

t3
-LC4_A10
t4

0.3750
0.3811
0.4851
0.4978
0.4250
0.4493
0.4672
0.1303
0.4922
0.3750
0.3811
0.4901
0.4971
0.4587
.4999
.2831
.4851
.4901
.4736
.4951
.5000
.4687
.2629
.2908
.4960
.4918
.4082
.1377
.0690
.0344
.4975
.4758
0.4758
.5000
.5000
.5000
.5000
.5000
.5000
.5000
.5000
.2188
.2188
.2188
.2188
.4851
.1172

000000000000000000

00000000000000

cap818
cap87
cap814
cap814
cap87
cap=6
cap820
cap820
cap813
cap825
cap87
cap814
cap89
cap=2
cap=6
cap=2
cap87
cap814
cap87
cap814
cap818
cap=6
cap818
cap84
cap=11
cap88
cap816
cap816
cap816
cap816
cap87
cap84
cap84
cap826
cap821
cap832
cap821
cap841
cap821
cap848
cap820
cap=5
cap=5
cap=5
cap=5
cap816
cap83

178

 

t5

_LC4_C3

t6

t7

t8

t9

_LC5_03

t10

t11

t12

t13

t14
-LC1_C3
-LC6_C3
t15

t16
_LC3_03
t17

t18

t19

t20
-LCS_A4
t21
_LC2_C3
_LC2_C9
t22
-LC3_09
t23
t24
t25
_LC1_C9
t26
t27
t28
t29

_LC7_A10

t30
t31
t32
t33

_LC3_A10

‘t34
t35
't36
‘t37

_LCS_A10

1ﬁ38

<3 C><D C><3 C><D C><D C><D C><D C><D c><D C><D c><D c><3 c><3 c><3 c><3 c><3 c><3 c><3 c><> c><3 c><3 c><3 c><3 c><3

.1172

.2129

.2188

.2188

.2188

.1172

.4672

.0373

.0373

.4401

.1687

.1687

.4978
.4927
.1303
.1303
.2339
.2188
.2188
.2188
.1172
.4672
.1303
.1303
.4909
.1318
.1318
.0684
.0684
.3566
.4082
.2158
.1948
.2158
.1948
.4758
.4139
.3139
.3425
.0645
.4901
.1506
.1307
.1307
.1172
.3811
.3572

cap83
cap812
cap85
cap85
cap85
cap85
cap810
cap85
cap85
cap85
cap85
cap85
cap816
cap813
cap83
cap83
cap83
cap85
cap85
cap85
cap85
cap-22
cap-2
cap820
cap85
cap=2
cap84
cap84
cap84
cap84
cap818
cap85
cap85
cap85
cap85
cap88
cap85
cap85
cap85
cap85
cap816
cap85
cap85
cap85
cap85
cap815
cap85

 

 

 

t39
t40
t41
t42

_LC2_A10

t43
t44
t45
t46

_LC1_A10

t47
t48
t49
t50
t51

_LC1_A8

t52
t53
t54
t55

_LC4_A4

t56
t57
t58
t59
t60
t61

_LC2_A4

t62
t63
t64
t65
t66
t67

-LC3_A4

t68
t69

_LCl_A4

t70
t71
t72
t73

_LC5_A5

‘t74
‘t75
‘t76
‘t77

<3 C><D<D C><D C><3 C)<><D C><D C><><D C><>¢D c><>¢3 c><3 c><><D c><3 c:c><3 c><>¢3 c><><o c><3 c><3 c>c><3 c><3

.2604

.1726

.1004

.0635

.4971

.1464

.2622

.0723

.1358

.4250
.1329
.0689
.3874
.2715
.1486
.4999
.1916
.2890
.0783
.1243
.4493
.0563
.3520
.0483
.3134
.0077
.1590
.4975
.0430
.2861
.0430
.2861
.0129
.2494
.4939
.0729
.0729
.1377
.2158
.1948
.2158
.1948
.4758
.3425
.4139
.3139
.0645

cap85
cap85
cap85
cap85

cap812

cap85
cap85
cap85
cap85

cap815

cap85
cap85
cap85
cap85
cap85
cap89
cap85
cap85
cap85
cap85

cap818

cap85
cap85
cap85
cap85
cap85
cap85

cap-=10

cap85
cap85
cap85
cap85
cap85
cap85
cap89
cap83
cap83

cap818

cap85
cap85
cap85
cap85
cap88
cap85
cap85
cap85
cap85

180

 

 

-LC4_A5

ﬂ?

HS

t80

t81
-LC6_A5
t82

t83

t84

t85

t86
-LC3_A5
t87
t88
t89
t90
_LC7_A5
t91
t92

t93

_LC2_A1
t94
_LC2_A5
t95
t96
t97
t98
_LC1_A5
t99
t100
t101
_LC8_B2
t102
t103
_LC1_B2
t104
t105
t106
_LC2-B2
t107
t108
_LC4_B2
t109
_LCI_C14
t110
t111
t112

cap816
cap85
cap85
cap85
cap85
cap815
cap85
cap85
cap85
cap85
cap85
cap816
cap85
cap85
cap85
cap85
cap89
cap84
cap84
cap84
cap820
cap=2
cap818
cap85
cap85
cap85
cap85
cap8 1 1
cap84
cap84
cap84
cap88
cap83
cap83
cap813
cap84
cap84
cap84
cap88
cap83
cap83
cap818
cap=2
cap80
cap85
cap85
cap85

181

 

1:31-08 - 1:". I
J

 

t113 0.2188 cap85
_L05-C6 0.4851 cap=2
t114 0.2188 cap85
t115 0.2188 cap85
t116 0.2188 cap85
t117 0.2188 cap85
-LC4_A1 0.4851 cap815
t118 0.4076 cap85
t119 0.2784 cap85
t120 0.1180 cap85
t121 0.2081 cap85
t122 0.0850 cap85
_LC5_A1 0.4982 cap=2
t123 0.2445 cap85
t124 0.1918 cap85
t125 0.2081 cap85
t126 0.1620 cap85
_LC8_A1 0.4736 cap815
t127 0.3986 cap85
t128 0.2809 cap85
t129 0.1289 cap85
t130 0.1892 cap85
t131 0.0829 cap85
_LC3_A1 0.4992 cap-2
t132 0.2371 cap85
t133 0.1892 cap85
t134 0.1998 cap85
t135 0.1583 cap85
_LC6_A1 0.4687 cap-26
t136 0.0929 cap85
t137 0.3875 cap85
t138 0.0751 cap85
t139 0.3320 cap85
t140 0.0286 cap85
t141 0.1463 cap=5
-LC7_A1 0.4994 cap=2
t142 0.0289 cap85
t143 0.1452 cap85
t144 0.0286 cap85
t145 0.1463 cap85
_LC1_A1 0.2908 cap813
t146 0.4947 cap84
t147 0.3963 cap84
t148 0.0772 cap84
_LC7_B2 0.4736 cap=2
t149 0.4257 cap=5
t150 0.4975 cap85

182

 

88881

 

 

 

t151 0.4050 cap85
t152 0.0344 cap85
-LC5_B2 0.3864 cap=2
t153 0.0344 cap=2
_LC6_B2 0.0344 cap816
t154 0.0001 cap85
t155 0.0003 cap85
t156 0.0007 cap85
t157 0.0221 cap=5
t158 0.0015 cap85
t159 0.0454 cap85
t160 0.0956 cap85
t161 0.3756 cap85
_LC3-B2 0.4304 cap=2
t162 0.3750 cap83
t163 0.3750 cap83
_LC1_A13 0.4922 cap839
t164 0.3750 cap=2
_LC1-C7 0.3750 cap80
t165 0.3750 cap83
t166 0.3750 cap83
_LCI_A12 0.4922 cap846
t167 0.3750 cap=2
_LCI,A11 0.3750 cap80
t168 0.3750 cap83
t169 0.3750 cap83
_LC5_A10 0.4922 cap812

Power 8 2289.34 uW assuming 20 MHz CLK, Vdd 8 5V

The Bar and Level views of the 4-bit Booth multiplier are illustrated in Figures
6.5 and 6.4 The BLAPE implementation, using a depth-accuracy of 1 performs the
activity and power estimation in just 0.03 seconds; SIS runs in 1.0 seconds. The

percent error, when compared to SIS, is just 0.13%.

 

183

 

BIBLIOGRAPHY

 

 

BIBLIOGRAPHY

[1] M. Pedram. Power Minimization in IC Design: Principles and Applications.
ACM Trans. on Design Automation of Electronic Systems, 1(1):3—-56, 1996.

[2] N. Weste and K. Eshraghian. Principles of CM OS VLSI Design. Addison—Wesley
Company, Reading, MA, 1985.

[3] S. Devadas and S. Malik. A Survey of Optimization Techniques Targeting Low
Power VLSI Circuits. In Proceedings of the 32nd Design Automation Conference,

pages 242-247, June 1995.

[4] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. Low-Power CMOS Design.
IEEE Journal Of Solid-State Circuits, 27(4):472—484, April 1992.

[5] A. Shen, A. Ghosh, S. Devadas, and K. Keutzer. On Average Power Dissipation
and Random Pattern Testability of Combinational Logic Circuits. In Proceedings
of the Int ’1 Conference on Computer-Aided Design, pages 402—407, November
1992.

[6] W. Swartz and C. Sechen. New Algorithms for the Placement and Routing of
Macro Cells. IEEE International Conference on Computer-Aided Design, pages
336-339, 1990.

[7] A. P. Chandrakasan, R. Allmon, A. Stratakos, and R. W. Brodersen. Design of
Portable Systems. In Proceedings of CICC, pages 1211—1218, May 1994.

[8] D—S Chen, M. Sarrafzadeh, and G. K. H. Yeap. State Encoding of Finite State
Machines for Low Power Design. In Proceeding of the International Symposium

on Circuits and Systems, volume 3, pages 2309—2312, 1995.

[9] G. E. Sobelman and D. L. Raatz. Low-Power Multiplier Design Using Delayed
Evaluation. In Proceeding of the International Symposium on Circuits and Sys-

tems, pages 1564—1566, 1995.

184

 

 

 

 

[10] V. G. Moshnyaga and K. Tamaru. A Comparative Study of Switching Activity
Reduction Techniques for Design of Low-Power Multipliers. In Proceedings of
the International Symposium on Circuits and Systems, pages 1560—1563, 1995.

[11] C-Y. Tsui, M. Pedram, and A. M. Despain. Technology Decomposition and
Mapping Targeting Low Power Dissipation. In Proceedings of the 30th Design

Automation Conference, pages 68—73, June 1993.

[12] J. Monteiro, S. Devadas, P. Ashar, and A. Mauskar. Scheduling Techniques

to Enable Power Management. In Proceedings of the 33rd Design Automation

Conference, pages 349—352, June 1996.
M. Alidina, J. Monteiro, S. Devadas, A. Ghosh, and M. Papaefthymiou.

[13]

Precomputation-Based Sequential Logic Optimization for Low Power. In Pro-
ceedings of the Int ’1 Conference on Computer-Aided Design, pages 74—81, Novem-
ber 1994.

[14] C. K. Lennard, P. Buch, and A. R. Newton. Logic Synthesis Using Power-
Sensitive Don’t Care Sets. In Proceeding of the ACM Int ’1 Symposium on Low
Power Electronics 63 Design, August 1996.

[15] V. Tiwari, S. Malik, and A. Wolfe. Power Analysis of Embedded Software: A

First Step Towards Software Power Minimization. IEEE Transactions on VLSI
Systems, 2(4):437—445, December 1994.

[16] W. C. Athas, L. J. Svensson, J. G. Koller, N. Tzartzanis, and E. Y—C Chou. Low-
Power Digital Systems Based on Adiabatic-Switching Principles. IEEE Trans-

actions On VLSI Systems, 2(4):398—407, December 1994.

[17] D. Conner. Making the jump to HDL-based programmable-logic design. EDN
Magazine, pages 181—187, September 1997.

[18] S. Katkoori, N. Kumar, and R. Vemuri. High Level Proﬁling Based Low Power
Synthesis Technique. In Proceedings of International Conference on Computer

Design, pages 759—765, October 2—5 1995.

[19] R. Burch, F. Najm, P. Yang, and T. Trick. A Monte Carlo Approach for Power
Estimation. IEEE Transactions on VLSI Systems, 1(1):63—-71, March 1993.

[20] S. M. Ross. Introduction to Probability Models. Academic Press Inc., San Diego,
CA 92101, 5th edition, 1993.

185

 

W

 

[21] K. S. Shanmugan and A. M. Breipohl. Random Signals: Detection, Estimation
and Data Analysis. John Wiley & Sons, Inc., New York, NY, lst edition, 1988.

[22] G. I. Stamoulis. A Monte-Carlo Approach for the Accurate and Efficient Es-
timation of Average Transition Probabilities in Sequential Logic Circuits. In
Proceedings of the IEEE 1996 Custom [C Conference, pages 2218224, May 1996.

[23] M. Xakellis and F. Najm. Statistical Estimation of the Switching Activity in
Digital Circuits. In Proceedings of the 3Ist ACM/IEEE Design Automation Con-
ference, pages 728—733, 1994.

[24] F. N. Najm, S. Goel, and I. N. Hajj. Power Estimation in Sequential Circuits.
In Proceedings of the 32nd ACM/IEEE Design Automation Conference, pages

635-640, June 1995.
[25] F. N. Najm and M. G. Xakellis. Statistical Estimation of the Switching Activity
in VLSI Circuits. VLSI Design, 1996.

[26] A. Ghosh, S. Devadas, K. Keutzer, and J. White. Estimation of Average Switch-
ing Activity in Combinational and Sequential Circuits. In Proceedings of the 29th

Design Automation Conference, pages 253—259, June 1992.

[27] R. E. Bryant. Graph-Based Algorithms for Boolean Function Manipulation.
IEEE Transactions on Computers, c-35:677—691, August 1986.

[28] S. Chakravarty. On the complexity of using BDDS for the synthesis and analysis
of boolean circuits. In Proceedings of the 27th Annual Allerton Conference on

Communication, Control and Computing, pages 730—739, 1989.

F. Najm. Transition Density: A New Measure of Activity in Digital Circuits.

[29]
IEEE Transactions on Computer-Aided Design, 12(2):310—323, February 1993.

K. S. Brace, R. L. Rudell, and R. E. Bryant. Efﬁcient Implementation of a BDD
Package. In Proceedings of the 27th ACM/IEEE Design Automation Conference,

pages 40—45, June 1990.

[301

Bhanu Kapoor. Improving the Accuracy of Circuit Activity Measurement. In
Proceedings of the 31st ACM/IEEE Design Automation Conference, pages 734—

739, 1994.

C—Y. Tsui, M. Pedram, and A. M. Despain. Exact and Approximate Methods
For Calculating Signal and Transition Probabilities in FSMs. In Proceeding of

the 313t Design Automation Conference, pages 19—23, June 1994.

186

[33] J. Monteiro, S. Devadas, and B. Lin. A Methodology for Efﬁcient Estimation
of Switching Activity in Sequential Logic Circuits. In Proceedings of the 31st

Design Automation Conference, pages 12—17, June 1994.

[34] D. I. Cheng, M. Marek-Sadowska, and K-T Cheng. Speeding Up Power Esti-
mation by Topological Analysis. IEEE Custom Integrated Circuits Conference,
pages 623—626, 1995.

[35] S. Katkoori, L. Rader N. Kumar, and R. Vemuri. A Proﬁle Driven Approach for
Low Power Synthesis. In Proceedings of VLSI, pages 759—765, Japan, 1995.

[36] P. E. Landman and J. Rabaey. Power Estimation for High Level Synthesis. In
Proceedings of the European Conference on Design Automation, pages 361-366,
February 1993.

[37] M. Nemani and F. N. Najm. Towards a High-Level Power Estimation Capability.
IEEE Transactions on Computer-Aided Design, 15(6):588-—598, June 1996.

[38] F. N ajm. Towards a high-level power estimation capability. In Proceedings of the
1995 International Symposium on Low Power Design, pages 87—92, April 23—26
1995.

[39] Gary Hachtel and Fabio Somenzi. Logic Synthesis And Veriﬁcation Alogrithms.
Kluwer Academic Publishers, Norwell, MA 02061, 1996.

[40] C. Y. Lee. Representation of Switching Circuits by Binary-Decision Programs.
Bell System Technology J, 382985—999, July 1959.

[41] S. B. Akers. Binary Decision Diagrams. IEEE Transactions on Computers,
c-27(6):509-516, June 1978.

[42] R. E. Bryant. Binary Decision Diagrams and Beyond: Enabling Technologies for
Formal Veriﬁcation. International Conference on Computer-Aided Design, pages

236—243, November 1995.

[43] A. Shen, S. Devadas, and A. Ghosh. Probabilistic Manipulation of Boolean
Einctions Using Free Boolean Diagrams. IEEE Transactions on Computer-Aided

Design, 14(1):87—95, January 1995.

[44] L. F. Saunders R. Camposano and R. M. Tabet. VHDL as Input for High-Level
Synthesis. IEEE Design 85 Test of Computers, pages 43—49, March 1991.

187

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[53]

F. Mailhot and G. D. Michelo. Technology mapping using Boolean matching. In
European Conference on Design Automation, pages 180—185, Mar. 1990.

Ronnie L. Wright and Michael A. Shanblatt. Reducing BDD Size By Exploiting
Structural Connectivity. In Proceeding of the 9th Great Lakes Symposium on
VLSI (to appear), March 1999.

Lilu Pan Sharad C. Seth and Vishwani D. Agrawal. PREDICT - Probabilistic
Estimation of Digital Circuit Testability. IEEE International Symposium on

Fault- Tolerant Computing, pages 220—225, June 1985.

J. E. Hopcroft A. V. Aho and J. D. Ullman. The Design and Analysis of Com-
puter Algorithms. Addison-Wesley Publishing Company, Reading, MA, lst edi-
tion, 1974.

M. Jimenez A. Diaz and M. Shanblatt. Integer Pair Representation of Binary
Terms and Equations. In Proceeding of the 41th Midwest Symposium on Circuits
and Systems, August 1998.

J. Monteiro, S. Devadas, A. Ghosh, K. Keutzer, and J. White. Estimation of
Average Switching Activity in Combinational Logic Circuits Using Symbolic Sim-
ulation. IEEE Transactions on Computer-Aided Design, 16(1):121—127, January
1997.

H. Mehta, M. Borah, R. M. Owens, and M. J. Irwin. Accurate Estimation of
Combinational Circuit Activity. In Proceedings of the 32nd ACM/IEEE Design
Automation Conference, pages 618—622, June 1995.

R. S. Martin and J. Knight. Power-Proﬁler: Optimizing ASIC’s Power Consump-
tion At The Behavioral Level. In Proceedings of the 32nd ACM/IEEE Design
Automation Conference, pages 42—47, June 1995.

F. N. Najm. Power Estimation Techniques for Integrated Circuits. IEEE/A CM
International Conference on Computer-Aided Design, pages 492—499, 1995.

188

nICHIGAN srnrs UNIV. LIBRARIES
II]II[I]III][I[1|][[11]][[111]]!
31293020589879