r.“ l.2 (Inn:

I

UQW‘IW'JI'N‘MUIN 'l'

v Il-"'I'.

un-u

 

 

 

 

 

 

     

   

 

 

 

3'1‘.‘
,_ ’.

   

t,‘

Ann;
rlf;A

..,..

 

5....
,,

    

 

 

 

 

.u.”
, ...,.. .
3: .'..- . nu. ’ ,.-...;..
... ... .,,., --~ - u‘.- ..
43. . 3'!” U ‘5' w}; -~ wan

 

_,
‘..,.irv

Mm

   
      

  
  
  

 

!.

ml

' lllllllllll lllll lllll lllll llll

mass 310777 6720

LIBRARY

Michigan State
University

 

This is to certify that the
thesis entitled
Performance-Design Tradeoff of

Hierarchical VLSI Design Entry Points

presented by
Man-Kuan Vai

has been accepted towards fulfillment
of the requirements for

MQSteY‘S degree in E1§§L, EHQY‘.

47/ é/gét/ﬂmf

Majorroesspf

.. ._ [7‘ A. s latt
Date 5 3? 8‘) Michael hanb

 

0-7639 MS U is an Afﬁrmative Action/Equal Opportunity Institution

 

MSU

LIBRARIES

 

“-

 

 

t!

 

__————
4. 10"
.— -

. "f,“
Wi‘?"¥i

 

i ~.-.__~ ‘( ‘j‘ ‘- ' ‘3

300 AAQj

i l_ .
l 1 19:25

l

 

 

 

 

 

 

 

 

 

BEIURNING MATERIALS:
Place in book drop to
remove this checkout from
your record. FINES will
be charged if book is
returned after the date

stamped below.

##1##.“

l

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PERFORMANCE-DESIGN TRADEOFF
0F

HIERARCHICAL VLSI DESIGN ENTRY POINTS

BY

Han-Kuan Vai

A THESIS
Submitted to
Michigan State University

in partial fulfillment of the requirements
for the degree of

MASTER OF SCIENCE

Department of Electrical Engineering and
Systems Science

1935

ABSTRACT

PERFORHANCE-DESIGN TRADEOFF
0F

HIERARCHICAL VLSI DESIGN ENTRY POINTS

BY

Nan-Kuan Vai

This research relates to VLSI design methodology. and especially to
the performance versus design task tradeoff of specifying functionally
identical circuits at various levels. The layouts of two examples. a
ripple-carry adder and a Braun array multiplier, are designed with the
assistance of a CAB system. The design entry points of transistor and
gate level are considered.

The two case study circuits are designed independently at both
levels and are evaluated with respect to their performance and design
complexity. The comparative results indicate that better performance
can be achieved by starting a design from the transistor level.
However. the design complexity of the circuit is found to be lower in a

gate level design.

To my parents and wife

Mr. and Mrs. Han-Kit Vai and Jin-Yu

ACKNOWLEDGEMENTS

I wish to acknowledge and thank my major advisor, Dr. Michael A.
Shanblatt, who introduced me to this fascinating field and gave me

numerous guidance and encouragement in the course of this research.

 

I also wish to thank the committee members. Dr. D. K. Reinhard and
Dr. E. D. Goodman, for their valuable suggestions and comments in this

work.

Finally, I owe many thanks to Jin-Yu for her emotional support.

TABLE OF CONTENTS

LIST OF TABLES ...............................................
LIST OF FIGURES ..............................................
I. INTRODUCTION ...........................................
l.l Problem Statement .................................
l.2 Approach ..........................................
II. BACKGROUND . ........................................... .
2.l One-Bit Full Adder ..... . ..........................
2.2 Ripple-Carry Adder ........................ . .......
2.3 Array Multiplier . ..... . ...........................
2.h Implementation Technology .........................
2.5 Design Rules ............ . .................... .....
2.6 Delay Time Model ...... .... ........................
2.7 Computer-Aided-Engineering of
VLSI Circuits .....................................
III. 'DEVELOPMENT OF DESIGN EXAMPLES ...................... ...
3.l Selection of Circuits .......................... ...
3.2 Gate Library ......................................
3.3 Full Adder Design .................................
3.3.l Transistor Level Design ................. ...
3.3.2 Gate Level Design ................... . ......
3.h Ripple-Carry Adder ................................
3.5 Braun Array Multiplier ............................
IV. DESIGN EVALUATION ......................................
h.l Criteria of Evaluation ............................
h.2 Device Area Calculation ...........................
b.3 Propagation Delay Calculation .....................
h.3.l Delay Time Model ........ . ..................
h.3.2 Load Capacitance Estimation ................
h.3.3 Results ....................................
h.h Time-Area Complexity ..............................
h.5 Design Complexity .................................
h.6 Comparison ........................................

CONCLUSION .......................................... ...
5.l Summary ...........................................
5.2 Contributions .....................................

5.3 Future Development ................................

BIBLIOGRAPHY

8l
8]

86
88

 

3.2
h.l

h.2

h.3

h.h

h.5
h.6

1+.7
14.8

LIST OF TABLES

Cell list for transistor level

multip‘ier0 0 ....... 00000000000000 000000000000000000000
Cell list for gate level multiplier. .... ..............
The dimensional data of the full adders. ........ ......

The areas of ripple-carry adders. ........... ...... ....

The dimensional data of the cells for

braun array multiplier. ....................... ..... ...
The areas of Braun array multiplier. ..................

The dimensional data of the transistors. ...... . .......

The typical physical parameters of

the designs. . ...... . ........... . ............ . ..... ....

The delay times of the logic gates. . .......... . .......

The effective capacitances

of the gate inputs. ..... ..............................

The propagation delays of

ripple-carry adders. .. ........................... .....

The propagation delays of

Braun array multipliers. .. ...... . .....................

The time-area complexities of

the design examples. ..................................

The design complexities of

the design examples. ..... .............................

Page

5h
55
6l
6l

63
6b
66

67
7O

73

7h

76

76

78

Figure

w w L» w
o o a
\N

LIST OF flGURES

The procedure of integrated

Circuit de5igno 0.0000000000000000 00000000000 0 00000 0000

Logic gate diagram of a one-bit

full adder with hard-wired logic. ... ......... . ........

Logic gate diagram of a one-bit

full adder using only NOR gates. ............ ..........

Block diagram showing the construction

of a ripple-carry adder. ......... .....................

Block diagram of a Braun array

multiplier0 0000000000000000 00000 0 000000000000000000000

Schematic diagrams of basic

NMOS logic gates. ........ . ......................... ...
The layout of an inverter cell. . ......................
The layout of a two-input NAND cell. ..................

The layout of a two-input NOR cell. ...................

The layout of a horizontal

three-input NOR cell. .................................

The layout of a vertical

three-input NOR cell. .... .............................

The layout of a four-input NOR cell. ..................

The layout of a transistor

level full adder cell. ................................

Page

15

16
2t.
21.
75

25

26
26

28

Figure Page

3.8 The layout of a gate level

full adder cell. ...................................... 30
3.9 The layout of a transistor level
A-bit ripple-carry adder. ............... A .............. 32

3.10 The layout of a gate level
A-bit ripple-carry adder. ......................... .... 33

3.11 The rearranged block diagram of
a Braun array multiplier. ........... .......... . ....... 35

3.12 The layout of cell l for multiplier
deSEQne 000 000000000 0 0000000 000000 000000000000000000000 36

3.13 The layout of cell 2 for multiplier
design. . ........... ........ ........................... 37

3.1A The layout of cell 3 for multiplier
design. ............... .......... ..... . ..... ........... 38

3.15 The layout of cell A for multiplier
design. ..... ...... ............... ............... ...... 39

3.16 The layout of cell AA for multiplier
deSign0 000000000000000000000000000 00000000000000 000000 “0

3.l7 The layout of cell A8 for multiplier
deSEQne 000000000000000 00000 0 0000000000000000 000 0000000 A]

3.18 The layout of cell 5 for multiplier
de5i9n0 000000000 00000 000000000 000000000000000000000000 “2

3.19 The layout of cell 5A for multiplier
design. ................ ...... ......................... A3

3.20 The layout of cell 6 for multiplier
design. ... .............. . ..................... . ....... AA

3.2l The layout of cell 7 for multiplier
design. ............... . ........ . ............. . ........ A5

3.22 The layout of cell 8 for multiplier
design. ........ . ...................................... A6

3.23 The layout of cell 9 for multiplier
design. ............................................... A7

viii

Figure

3.2A

3-25

3.26

3-27

3.28

3.29

3.30

3.3l

3.32

The layout of cell 10 for multiplier
design. .. .......... ...... ....... . .....................

The layout of cell 10A for multiplier
design. ................. . .............. . ..... . ........

The layout of cell 108 for multiplier
deSigne 0000000000000000 00000 0 00000000 0 00000000 0 0000000

The layout of cell 11 for multiplier
design. ...................................... . ........

The layout of cell 11A for mUltiplier
design. ............................ ......... ... .......

Tesselation map for transistor level
multiplier. .. ..... ......... ...........................

Tesselation map for gate level
multiplier. .................... ..................... ..

The layout of transistor level .
5-by-5 Braun array multiplier. .......... . .............

The layout of gate level
5-by-5 Braun array multiplier. ..... ...... . ...... ......

Page

A8

A9

50

51

52

53

56

57

58

CHAPTER I

INTRODUCTION

High speed VLSI (Very Large Scale Integration) has created a new
challenge for circuit designers. Numerous algorithms and architectures
have been proposed to take advantage of VLSI capabilities [1.2.3]. New
design concepts. vastly different from those used in conventional
design, have been and continue to be developed in order to facilitate
efficient. manageable design.

Circuits and systems once requiring many individual chips can now
be built on a single chip with VLSI technology. Various problems of
reliability and performance that unavoidably arise from combining many
discrete components have been eliminated or reduced. But the
complicated process of VLSI design has introduced a new set of
reliability and performance problems which are harder to visualize and
more challenging to solve.

Due to the—complexity of a VLSI design. not all algorithms and
architectures are suitable for VLSI implementation. An architecture
eligible for VLSI implementation must foremost possess a certain degree
of design regularity. This is also called device modularity and relates
to the capacity for device tessellation. A complicated VLSI design can
be simplified into the design of several building blocks by taking
advantage of its regularity. This is the "divide-and-conquer"

philosophy of VLSI design.

Gate count was conventionally used in discrete component designs
for the purpose of evaluation of the design complexity. However. the
chip area and delay time represent a better measure of cost efficiency
in VLSI designs. In fact. logic gates are cheap in a VLSI design. It
is the interconnection or communication requirements which mainly
contribute to the performance and cost effectiveness of a design.
lnterconnections not only use chip anea but also play an important role
in propagation delay.

It can be concluded from the above considerations that a good
algorithm or architecture for VLSI implementation must possess the

following properties [1]:

l. The architecture should be implemented by only a few different

types of simple cells.

2. The architecture's data and control flow should be simple and

regular. ideally connecting only nearest neighbors.

3. The architecture should use extensive pipelining and

multiprocessing.

Many circuits meet with these requirements. A few examples include
random access memory (RAM). read only memory (ROM). programmable logic

array (PLA) and, of interest in this work, the systolic array.

‘I

Architectures which meet the above-mentioned properties tend to
have a reduced design cost. Only a few simple cells have to be
designed, and the cells on the chip are merely copies of these few basic
ones. Regular interconnections imply that modularity and extensibility
are achievable. so that a large chip can be formed by a tessellation of
the basic cells. The characteristic of pipelining and multiprocessing
means that a special-purpose chip can be implemented simply by including
many identical cells on the chip in either a vertically parallel
(pipeline) or horizontally parallel (multiprocessor) ,configuration.
Ideally, a combination of both can be used. Finally, the regularity or
modularity of an architecture enables a hierarchical design technique to

be applied.

1.1 Problem Statement

The conglomerate process of integrated circuit design is
traditionally regarded as a difficult field because of the necessity of
expertise relating to solid state physics. But. the recent development
of simplified design rules and computerized support tools provides a
method for designers to experiment with circuit options without concern
for the underlying physical phenomena. Naive VLSI designers can be
successful after a minimum amount of practice with these new design

tools.

Unfortunately, one of the drawbacks in VLSI technology is the high
design cost. A designer must be able to use an efficient approach aimed
at producing valid, working chip layouts at a reasonable cost in both
time and dollars. At present. design costs dominate the whole cost of
VLSI manufacture, and this trend will continue into the foreseeable
future.

Design cost is directly related to the so-called turn-around time
which is the time interval from the receipt of a device specification to
final manufacturing output. Therefore. design time. which is a major
contributing factor to turn-around time, plays an important role in
design cost.

Two examples are provided here in order to demonstrate the
seriousness of the problem of design time [A]. It is reported that the
M68000. which is a 16-bit microprocessor by Motorola. required 52
man-years of design effort. Another popular 16-bit microprocessor. the
Intel 8086. required 13 man-years of effort merely for layout.
Obviously, few custom applications can support time-costs of this
magnitude.

One reasonable way to reduce the design cost is by mass production
so that the initial development cost is shared by many customers. This
strategy is effective for general-purpose chips. especially in the area
of standard SSI or MSI chips. However, other approaches must be used to
keep the design cost low since many VLSI designs are for special-purpose
applications and will not be made in quantities large enough to

significantly reduce the design cost per chip.

Hierarchical design is one of the approaches that can be used to
reduce the design cost. As mentioned above, the design task of a VLSI
chip can be greatly simplified by architectural regularity which allows
the entire chip to be constructed merely by tessellating some basic
building blocks. This concept can be generalized to a lower level. No
matter how complicated a VLSI circuit may be. the elements involved at
the most bottom level of design are primarily only different types of
transistors. Transistors can be connected in a hierarchical design to
generate logic gates. such as NAND's, NOR's. etc.. which constitute the
basic elements at the next higher design level. The logic gates can be
arranged to form processing elements or cells. such as full adders and
shift registers. The processing elements can then be tessellated to
implement the desired algorithm.

Elements or cells for various functions can be stored in a library
from which they are accessed for later designs. For example. the layout
of several general-purpose logic gates can be predesigned, evaluated.
and stored in the library of a CAE system. A number of the stored logic
gates can be recalled, placed. and routed, to form the function and
interconnections required to implement a desired cell. The same idea
also applies to the construction of a larger device by means of cells.
A design can thus be started at the different levels of transistors.
gates, cells. or more complex predefined elements. This design
procedure is illustrated by a flow chart in Figure 1.1.

The designer has to determine the entry point of the design

procedure after the architecture is defined. It is assumed that a

 

 

 

 

DEFINE ARCHITECTURE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

STARll NG
LEVEL?
, ____s4__,
CELL (_ GATE TRANSISTOR
LEVEL LEVEL LEVEL
PLACEMENT, ‘
I ROUTING
TESSELATION CONNECll 0N
. ; TESTI NG,
EVALUATION

 

 

 

 

 

 

F ABRI CAT] ON

Figure 1.1 The procedure of integrated circuit design.

7
variety of gates or cells are predesigned and available for this
procedure. The steps in the design procedure are highly interrelated.
It can also be seen from Figure 1.1 that VLSI design is an iterative
procedure.

The entry point in the design procedure will determine the
performance and design cost of a chip. Generally speaking. the lower
the level of the initial design. the better the performance of the final
product in terms of a time-area parameter. But at the lower levels the
design time increases. This tradeoff is reasonable since at a lower
level the design has more flexibility and the possibility of obtaining a
more optimal design is much higher. This advantage is obtained. at .the
cost of more design work, which in turn makes the design time longer.
In contrast. basic elements can be obtained from the library if the
design is started at a higher level. However. the library elements at
any level may not be best suited for a specific design specification
since they were prepared without the knowledge of future tailored
requirements. The performance of the final design may thus be affected.

A design may be started at the gate level or higher if the chip
must be produced in a relatively short time and the performance is not a
critical requirement. In.contrast. the transistor level may be the
required starting level if the performance is crucial and there is no
rush for completion. Therefore, in addition to the chip performance.
the time allowance for completion of a chip design must be also
considered before the designer can make the decision as to the specific

design level entry point. This decision is not always trivial due to

8
the complicated relationships in the time-area complexity resulting from
designs started at different levels. Therefore, it is desirable to
know, a priori, imformation about the performance-cost tradeoff of

starting a design at different levels.

1.2 Approach

The goal of this research is to investigate the performance-cost
tradeoff of different design starting levels. For this purpose,
functionally identical circuits are designed at different levels to
study their relative performance and design complexity. Two circuits. a
ripple-carry adder and an array multiplier, are selected as working
examples. These circuits are designed from both the transistor and gate
levels.

The detailed layouts of both working examples are obtained with the
aid of a CAE system so that a realistic evaluation can be done on the
designs. Measures of time-area complexity and the design complexity are
formulated for the purpose of comparison. The results of this research
are intended to contribute to the development of a unified design

methodology for VLSI.

CHAPTER II

BACKGROUND

The function and structure of a one-bit full adder, which is the
basic building block for ripple-carry adders and array multipliers, are
provided in this chapter. Then, the principles of ripple-carry adders
and array multipliers are explained. Finally, the rules and

computerized tools for VLSI design are described.

2.l One-Bit Full Adder

The full adder is a basic functional unit which is used in many
arithmetic devices. The operation of a full adder can be described in

Boolean form as

Si - AiOBiOCi. (2-1)

and ci+l - AiBi+Bici+ciAi’ (2-2)

where Ai and Bi are inputs to the current stage and Ci is the carry from
the previous stage.

This set of Boolean equations can be manipulated into many

logically equivalent circuits [5]. The hardwired logic gate

implementation of a full adder is shown in Figure 2.1 [6]. This design,

IO

i+i

 

Figure 2.1 Logic gate diagram of a one-bit full adder
with hard-wired logic [6].

11

with only two levels of gate delay, provides the minimum propagation
time in the sense of gate delay. However, the hardwired logic gates in
this circuit restrict the implemention to certain lC technologies, such
as open collector circuitry and NMOS. Figure ‘2.2 provides another
circuit version of a full adder using only NOR gates [5]. This circuit,
having three levels of gate delay, is the optimal design without using
hardwired logic gates [5].

The one-bit full adder is the basic building block in the examples
of this research and thus its performance has a direct effect on that of

the desired VLSI chips.

2.2 Ripple-Carry Adder

A ripple-carry adder is selected as a one-dimensional tesselation
example in this research. A ripple-carry adder is formed by connecting
one-bit full adders in a linear manner as illustrated in Figure 2.3.

The regularity and localized interconnectivity of this circuit
makes it eligible for VLSI implementation and as a working example in
this work even though it is considered to be a slow adder [6]. While
faster adder circuits are known, such as carry lookahead and conditional
sum adders, they lack the regularity and connectivity requirements
described previously. The propagation time of an n-bit ripple-adder is

ntFA, where t is the delay time of the one-bit full adder and n is the

FA

number of full adders in cascade. Ripple-carry addition is more likely

 

 

 

 

 

‘ ~ . s,
bfb—
C‘ ‘L‘I>°— -;).__

D»—

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2.2 Logic gate diagram of a one-bit full adder
using only NOR gates [5].

l3

 

 

 

 

 

Figure 2.3 Block diagram showing the construction
of a ripple-carry adder.

to be used as a functional block in a more complicated design rather

than as a stand-alone chip.

2.3 Array Multiplier

High-speed multiplication is an essential function in many digital
systems. The speed is important in various applications requiring real
time arithmetic calculations. Conventional add-and-shift multipliers
are less expensive in the sense of chip area requirements, however, they
are too slow to satisfy many performance demands.

Both signed and unsigned array multipliers have been developed [6].

The main difference between these two types of array multipliers is that

1A
the former can handle signed operands directly without the need for
complementing circuitry. Neglecting this complementing circuitry for
signed operands, the overall architectural characteristics of both types
of array multipliers are quite similar. The Braun array, an unsigned
array multiplier, is selected as a two-dimensional tesselation example
in this research [6,7].

The circuit diagram of a 5-by-5 Braun array multiplier is shown in
Figure 2.A. The partial product terms, aibj’ i - O to A, j - O to A,
are called summands. The summands are generated in parallel by an
appropriate number of AND gates. These summands are then fed to the
full adders for operation. In general, an n-by-n multiplier needs
‘n(n-l) full adders and n2 AND gates. The total delay time of an n-by-n
multipler is tAND+2(n-])tFA' This can be verified by tracing the worst
and t

case delay path in Figure 2.A. where t are the propagation

AND FA

delay times of an AND gate and a full adder, respectively.

2.A Implementation Technology

Many IC technologies can be used for integrated circuits. Some
examples of IC technologies include bipolar, NMOS, PMOS, CMOS. Each of
these technologies has its advantages and disadvantages. Some of the
factors that must be considered in choosing a technology include circuit
density, richness of available circuit functions. performance per unit

power, the topological properties of circuit interconnection paths,

IS

“‘0 eh ‘2‘. «A. ‘o‘o

- o o o 0

gb b b
e . *1"
..b‘

56

F

0 0 0 .

 

 

 

 

 

 

9
9
‘9‘
...—4°
L--2_-

Figure 2.A Block diagram of a Braun array multiplier [6].

l6

.suitability for total system implementation, and general availability of
processing facilities [8]. Mead and Conway chose N-channel MOS
technology for the reason that the layout prepared by NMOS technology
can be easily scaled down as the technology' advances [9]. This
technology is also choosen in this work to implement the working
examples.

The basic element in NMOS technology is an inverter, which is shown
in Figure 2.5. The pull-down transistor is an enhancement mode device
with a positive threshold voltage, and the pull-up transistor is a
depletion mode device with a negative threshold voltage. The gate and
source of the depletion mode device are connected together to provide a

zero gate voltage so that it is always active and acts as a load

.I . (it? git.

:1
= 2L -=—

INVERTER

l>l

 

 

 

NAND NOR

Figure 2.5 Schematic diagrams of basic NMOS logic gates [9].

resistor.

An enhancement mode transistor is formed in VLSI by crossing a
polysilicon line over a diffusion line and a depletion mode transistor
is formed by the same procedure plus the application of ion implantation
to achieve a negative threshold voltage. The overlapped region of the
polysilicon and diffusion lines determines the transistor channel. The
channel length-to-width ratio, Z. of a transistor is an important
parameter in NMOS technology. This ratio of the pull-up transistor,
zpu’ must be at least four times of that of the pull-down transistor,
Zpd [9]. Positive logic is used in NMOS circuits with the logic levels

of approximately 0 and 5 volts. NAND and NOR gates can be constructed

by simple modifications of the inverter circuit.

2.5 Design Rules

Design rules for VLSI are a set of rules stating the permissible
geometries, including minimum allowable values for the widths,
separations, extensions, overlaps. etc.. that can be used by a designer
in the integrated layout of a circuit. These rules assure that the
patterns generated are within the resolution of the fabrication process
and that they do not violate the device physics required for the proper
operation of transistors and interconnections formed by the process.

Mead and Conway have developed a structured VLSI design method [9].

A set of design rules in dimensionless form is provided as constraints

18
on the allowable ratios of certain distances to a basic length unit.
This length unit is approximately 1 micron for current research
processes [ID]. The pattern resolution of optical lithography is
predicted to be about 0.5 microns by 1997 .[ll]. Moreover, new
techniques including electron-beam and x-ray lithography have promised
an even lower pattern resolution limitation [12].

The advantage of dimensionless design rules is that designs
implemented accordingly can be easily scaled down as the fabrication
process progresses. As the integrated circuit fabrication technology
advances. the basic length unit decreases and thus the layout element
geometries. which are a function of the basic length unit. also

decrease. Therefore, the design may have a reasonable longevity.

2.6 Delay Time Model

Another contribution of Mead and Conway is a delay-time model which
is used as a basic tool for determining the delay time of a logic gate
[9]. This model recognizes that the delay time of a node depends on the
total capacitance of that node together with the gate capacitance and
transit time of the driving transistor. The transit time is defined as
the average time required for an electron to move from source to drain.
Assume that the inverter ratio, K, is the ratio of zpu to Zpd' Then the

falling and rising time for this inverter driving an identical inverter

are ‘T' and ktu respectively. An inverter with load capacitance

19
CL requires (CL/Cgrt' and (CL/Cg)K17 of falling and rising time,
respectively.

This model is easy to use. However, in a practical circuit the
speed of NMOS device operation must be determined more realistically by
the speed with which capacitors can be charged and discharged [13]. The
model as described tends to give an underestimated delay time [1A].
Therefore. a revised model which estimates the delay time by means of
charging and discharging abilities of a logic gate is used in this
research to evaluate the speed of the designs [1A].

The charging/discharging model estimates the delay time of a logic
gate by considering its ability for charging and discharging the
effective load capacitance. The effective load capacitance comprises
the input capacitances of the driven logic gates and the capacitances of
the signal paths. The rising time of a logic gate is determined by how
fast the load capacitance can be charged to the voltage level
corresponding to logic 1 by the pull-up transistor. 0n the other hand,
the falling time of a logic gate is determined by the speed with which

the pull-down transistor discharges the load capacitance to the voltage

level corresponding to logic 0.

2.7 Computer-Aided-Engineering of VLSI Circuits

The ability to bring new ideas to production faster is the key to

economic success as the integrated circuit becomes increasingly

“

20

sophisticated. Until recently the design processes were relatively
unautomated [A]. Designers manually created the drawings needed to
implement the layout of a chip. Nowadays, VLSI design relies heavily on
the support of special computer. systems called
Computer-Aided-Engineering (CAE) systems. The function of CAE, often
lumped under the more general term CAD. is to provide the capability to
produce better designs faster and with fewer errors. As with any
automation. the primary goal of a CAE system is cost reduction. In the
broadest sense, CAE implies the use of a computer system with
specialized hardware and software to assist in everything from design.
simulation, and testing, through ultimate manufacturing.

The Computervision CAD system in the Case Center of Michigan State
University is used to support this research. The Computervision system
provides a software package, CADDSZ. specially developed for integrated
circuit design. The capabilities of layout generation, design rule
checking, cell tesselation, layout storage and other iindispensible
graphic manipulations are available on this system. The layouts
generated can be converted into prescribed data formats which can be
sent to the silicon foundry for fabrication. A computer language,
Integrated Circuit Programming Language (ICPL), can be used for
automatic tesselation and other design purposes.

All the layouts presented in this work are generated on this
system. Programs written in ICPL are used to build the entire chip by
the tesselation of building blocks. In addition, the system assists in

the calculation of chip area and the estimation of design complexity.

CHAPTER III

DEVELOPMENT OF DESIGN EXAMPLES

This chapter describes the steps in designing the working examples.
The layouts of the circuits, produced according to the Mead and Conway

design rules, are then presented and described.

3.1 Selection of Circuits

The one-bit full adder, which is the basic functional block
involved in this research, can be implemented in many logically
equivalent, yet structurally different, circuits. The first step in the
design process will thus be the selection of full adder circuits most
conducive to the tessellation and VLSI constraints of this project.

The considerations applied in this research for selecting a circuit
to implement a desired function are its gate count and number of delay
levels. The gate count of a circuit gives a rough estimation of the
area consumed on the chip and the delay levels provides information on
its operation speed.

The full adder circuit, shown in Figure 2.1, has a gate count of
ten and two delay levels if the hardwired AND gates are considered to
consume negligible chip area and have sufficiently small delay time.

This circuit is chosen for use in the transistor level design since

21

22
it is the best full adder in the sense of gate count and number of delay
levels [6]. The operation of this circuit can be expressed by the

Boolean equations

si - (FIE—:53) (A=iBiCi) (Afaici) (AiBi-Ci). (3-1)
and cm - (Tim) (Kiri) (Fit—i). (3-2)

A special restriction must be considered in the selection of a full
adder circuit for the gate level design. The ratio. zpu/zpd’ of a
complex gate formed of hardwired individual logic gates will be changed
since all the pull-down transistors are now connected in parallel. The
layout has then to be further modified to maintain the required ratio.
The logic gates stored in the gate library are not supposed to be
modified in this research so that the advantages of gate level design
can be fully utilized. This consideration prohibits the two-level full
adder in Figure 2.1 from being used in the gate level designs.

The circuit, shown in Figure 2.2, having a gate count of twelve and
three delay levels. is selected for the use in the gate level designs.
The reason for this choice is that this circuit is optimal in the sense
of delay time and gate count if hardwired logic can not be used [5].

The operation of this circuit can be described by the Boolean equations

 

 

si - (Ai+ai+ci)+(Ai:li-'FE'E)+(Ti+si+c'i)+(Ti+-B'i+ci). (3-3)
and C. - (Ai+§i)+(Ai+Ci)+(Bi+ci)' (3-A)

I+I

23

3.2 Gate Library

The design layouts of general logic gates are stored in the
Computervision CAD system to form a gate library. The logic gates
needed in this research include inverters, two-input NAND. two-input
NOR. three-input NOR and four-input NOR gates. Two different
configurations are provided for the three-input NOR gates. All three
inputs are oriented horizontally in the first version and vertically in
the second version.

The layouts of these gates are designed to be universal so that
they can be readily pulled from the library and connected to form the
desired functional blocks. The ratio zpu/zpd is four for all the
library gates. This is the minimum requirement. The layouts of these
library gates are shown in Figures 3.1 to 3.6.

These and other layout figures in this thesis are shown in color so
as to represent different layers more clearly. The diffusion layer is
shown in green. the polysilicon layer is shown in red, the metal layer

is shown in blue and both the ion implantation and contact cut layers

are shown in black.

3.3 Full Adder Design

The full adders designed at both transistor and gate levels are

described in this section.

2A

 

Figure 3.1 The layout of an inverter cell.

 

Figure 3.2 The layout of a two-input NAND cell.

25

 

Figure 3.3 The layout of a two-input NOR cell.

 

Figure 3.A The layout of a horizontal three-input NOR cell.

 

26

 

 

 

Figure 3.5 The layout of a vertical three-input NOR cell.

 

 

 

 

 

 

 

 

Figure 3.6 The layout of a four-input NOR cell.

27

3.3.1 Transistor Level Design

The layout of the full adder of Figure 2.1 'when designed at the
transistor level is shown in Figure 3.7. Two metal lines, required for
power supply, run horizontally across the top and bottom of the full
adder cell and all transistors are placed within the space defined
between them. A third metal line for ground connection passes through
the cell and divides it into two regions. The upper region contains the
transistors for generating the input complements. The inputs and their
complements are fed to the lower region by means of polysilicon lines
which form transistors with the diffusion lines. Crossover problems are
incurred whenever it is reqUired to run a polysilicon line across a
diffusion line without forming an undesired transistor. Crossovers of
this type are solved by using metal lines. The transistors contained in
the lower region of the cell generate the sum and carry-out of the full
adder.

The external communication lines of the full adder cells are
located to meet the objective that individual cells can be readily
connected side-by-side and the carry-out of a full adder will be

connected to the carry-in of the full adder at next stage.

28

 

Figure 3.7 The layout of a transistor level full adder cell.

29

3.3.2 Gate Level Design

The layout of the full adder cell corresponding to Figure 2.2 and
designed at gate level is shown in Figure 3.8. Four metal lines run
horizontally across the cell. The top and bottom metal lines are
provided for power supply and the other two are used for ground
connection. In this manner, the cell is divided into three regions.
The top region of the cell contains the inverters for input complements
and three two-input NOR gates cooperating with a three-input NOR gate to
produce the carry-out of the full adder. The bottom region of the cell
accomodates four three-input NOR gates and a four-input NOR. gate to
generate the sum output. The middle region provides space for the
routing between logic gates.

All the logic gates are available from the library. The major work
at this level of design involves the placement and routing of the

appropriate gates.

3.A Ripple-Carry Adder

The full adder cells described in the above sections are ready for
use in the construction of ripple-carry adders since the intercell
connections, considered as they were designed, were configured to

support linear tessellation.

30

 

Figure 3.8 The layout of a gate level full adder cell.

31

The appropriate number of full adder cells are tessellated in a
linear manner by means of a program written in ICPL. Figures 3.9 and
3.10 show the layout of A-bit ripple-carry adders designed at transistor
and gate levels, respectively. In both cases, the operands are fed from
the top of the ripple-carry adder and the result is obtained at the

bottom.

3.5 Braun Array Multiplier

The layout work required to implement the Braun array multiplier is
more complicated. It was shown in Figure 2.A that a Braun array
multiplier is formed by a two-dimensional connection of full adders and
summand-generating AND gates.

The 1/0 connections of a chip are typically located at its
boundaries. This presents a problem relating to how the inputs are
transported to the respective full adders placed in the internal area of
the array. The full adders can be placed on the chip according to the
configuration implied in Figure 2.A. However, almost half of the chip
area will be wasted if a square or rectangular chip, which is typical in
a commercial process, is used to implement the Braun array multiplier as
it appears in Figure 2.A. The full adder array is thus rearranged to a
square or rectangular configuration before it is implemented to solve

this problem.

32

 

Figure 3.9 The layout of a transistor level A-bit ripple-carry
adder.

III-“.... 1|
Ir m'IIIIllllIlllL JI

 

Figure 3.10 The layout of a gate level A-bit ripple-carry adder.

3A

Another problem is the implementation of AND gates for the
generation of summands. The AND function cannot be obtained directly in
NMOS and has to be formed by a NAND gate followed by an inverter. This
implies that there will be a two level delay. This problem is solved by
using NAND gates to replace the AND gates and exchanging the roles of
the related inputs and their complements in the circuit.

The array is rearranged to a rectangular configuration shown in
Figure 3.11 and the NAND gates are incorporated as part of the full
adder cells instead of being treated as separate cells. This also helps
to solve the input problem since, after the rearrangement. the cells in
the same row will require the same bi and those in the same column will
require the same ai. The input operands now go in the vertical and
horizontal directions as opposed to the diagonal and horizontal
directions of the original version.

The required inputs of the various full adder cells in the array
are not the same. The cells at the top row of the array need two
summand inputs and a zero input. The cells at the bottom row perform a
ripple-carry addition. Other full adders have some of their inputs
connected to neighbor cells. Several full adder cells, different in the
sense of input requirements, are thus prepared for tessellation.

The layouts of all the cells prepared for the tessellation to form
Braun array multipliers are shown in Figures 3.12 to 3.28. Most of the
cells are basically a combination of a full adder and one or more NAND
gates for preparing the summands. The tessellation map in Figure 3.29

shows the relationship between different cells and their locations

Figure 3.11

.2 ..i' 0
II-
9 . .9 .9
9 9 6 9

35

® .

9

The rearranged block diagram of a Braun array
multiplier.

 

36

 

Figure 3.12 The layout of cell 1 for multiplier design.

rue-emegj

37

 

Figure 3.13 The.layout of cell 2 for multiplier design.

38

 

Figure 3.1A The layout of cell 3 for multiplier design.

 

39

 

Figure 3.15 The layout of cell A for multiplier design.

 

A0

 

Figure 3.16 The layout of cell AA for multiplier design.

 

Al

 

Figure 3.17 The layout of cell AB for multiplier design.

 

A2

 

Figure 3.18 The layout of cell 5 for multiplier design.

‘03

 

Figure 3.19 The layout of cell 5A for multiplier design.

AA

 

Figure 3.20 The layout of cell 6 for multiplier design.

A5

 

Figure 3.21 The layout of cell 7 for multiplier design.

A6

 

Figure 3.22 The layout of cell 8 for multiplier design.

A7

 

Figure 3.23 The layout of cell 9 for multiplier design.

A8

 

Figure 3.2A The layout of cell 10 for multiplier design.

A9

 

Figure 3.25 The layout of cell 10A for multiplier design.

.50

 

Figure 3.26 The layout of cell 108 for multiplier design.

51

 

Figure 3.27 The layout of cell 11 for multiplier design.

 

52

 

Figure 3.28 The layout of cell 11A for multiplier design.

 

 

 

 

 

 

 

 

 

53

 

 

 

 

 

 

 

 

 

 

 

CELL 3 CELL 2 " - - CELL 2 CELLZ CELLI
. 5

CELL 6 CELL A ‘ "—'ICELL5 CELLS CELLS
CELL6 CELL A "" - . CELLS CELLS CELLS

I I I I I

I I I I |

I l I I I

S 4 ‘

CELL 6 CELL A“ — CELLS CELLS CELLS
. 4 4
CELL B CELL 4 CELL 4 CELL A

Figure 3.29 Tessellation map for transistor level multiplier.

 

 

 

 

 

 

 

 

 

5A

in the transistor level multiplier array design. Figure 3.30 is a
similar tessellation map provided for the gate level multiplier design.

The number of different cells required for the tessellation of a
transistor level n-by-n Braun array multiplier is listed in Table 3.1,
while table 3.2 is a similar list for gate level multiplier design.
Programs written in ICPL are used to produce the layout of n-by-n Braun
array multipliers designed by tessellation of the appropriate building
blocks at both levels. The tessellations are carried out according to

the above maps. The results of designing a 5-by-5 Braun array

 

Table 3.1 Cell list for transistor level multiplier.

Cell Quantity

Cell 1 1

Cell 2 n-2

Cell 3 1

Cell A n-3

Cell AA 1

Cell AB 1

Cell 5 (n-2)2

Cell 5A n-Z

Cell 6 n-2

55

 

Table 3.2 Cell list for gate level multiplier.

Cell Quantity

Cell 7 1

Cell 8 n-3

Cell 9 1

Cell 10 n-3

Cell 10A 1

Cell 108 1

Cell 11 (n-2)2

Cell llA n-2

multiplier at the transistor and gate levels are shown in Figures 3.31

and 3.32 to demonstrate the operations of these tessellation programs.

 

CELL 9

 

u
CELLA

 

 

 

II
CELL A‘

 

 

H
CELLA

 

 

IO
CELLE3

 

 

56

 

CELL8

CELL 8

CELL 7

 

CELL II

CELL I I

CELLII

 

 

CELL II

 

CELL I I

 

CELLII

 

l
I

I

I
I
I

I
' .
l

 

CELL II

CELL II

CELL II

 

 

CELLIO

 

CELL IO

 

IO

CELLA

 

Figure 3.30 Tessellation map for gate level multiplier.

 

 

57

'1’ ,
lllli I IBII I
IIIII II III 1‘
1 IlIIIIIuIv
“I

‘I I‘I'IIII
Hh-‘1.iI-.III-l

III um I ll

r+ |II:III: I y-L ”IE:- 61:“ V

III-u I_I II'II II

T‘ '1‘
Illlll I IIII I

. _lllll IIIII‘I .

. IIIII IIIIIII I

 

 

Figure 3.31 The layout of transistor level
5-by-5 Braun array multiplier.

II’IIIII

‘IL

II:JI

Ian

58

i
I

II

IIIIIITI? 1!

I

IL!"
IiTIIIJ
.IIiIII;'l

-I!
III
I ._
I."
I
(I— _H
ll

I‘.
II:

gin};
III
'III‘ JII

6"! ‘ '
ILIIT-ll

Ill-Id]
.iIllll;lI

'II—
I.

|
I

III‘

II III

I.i_I

IIIII.

II.

 

Figure 3.32 The layout of gate level
S'by-S Braun array multiplier.

 

CHAPTER IV

DESIGN EVALUATION

The main objective of this research is to determine the performance
tradeoffs of designing VLSI chips from different starting levels. The
criteria for evaluating designs are defined and the circuits produced in

this research are then evaluated and compared in this chapter.

h.l Criteria of Evaluation

Chip area and propagation time are conventionally used to evaluate
the performance of an IC chip. In most cases, these two parameters
oppose each other and speed is often achieved at the cost of chip area.
Chip area and propagation time are combined in this research to define a
term called time-area complexity. The time-area complexity is used to
measure the performance of the examples in this work.

The absolute design time, counted in weeks or months spent by the
designer, is a subjective parameter. This design time does not provide
useful information for comparison since it depends on many human factors
including the designer's experience. A measure of design complexity,
indicating the degree of difficulty encounterd in the design, will be
used in this thesis to estimate the relative design time. The design

complexity of a chip will be measured by means of a count of its

59

a ....LA.‘

F7

60

components and the density of its layout.

h.2 Device Area Calculation

The area of a cell is readily calculated from its layout diagram.
The cells produced in this research are substantially rectangular in

shape. The area of a cell is thus the product of its length and width.

 

The total area of a device can then be computed by summing the areas of T ‘
its building blocks. The fact that all the layouts in this thesis are
produced with the basic length unit, 3\, of l micron enables the
following data to be interpreted in either way. The use of this basic
length unit has the advantage that the results are technology
independent. This implies that these data are still valid even when the
layouts are scaled down.

The dimensional data, including the lengths, widths and areas, of
the full adder cells provided for ripple-carry adders are listed in

Table h.l.

A total of n one-bit full adders are required for an n-bit
ripple-carry adder (RCA). The total area of an n-bit ripple carry adder

can thus be calculated by

(h-l)

RCA FA’

where AFA is the area of the full adder used for tessellation in the

 

61

Table h.l The dimensional data of the full adders.

Cell Length -Width Area
(micron) (micron) (micron )

 

Transistor level full adder 80 57 h,560

Gate level full adder I62 l06 l7,l72

ripple-carry adder. The areas of various sizes of ripple-carry adder

are listed in Table h.2.

Table h.2 The areas of ripple-carry adders.

 

Size Transistor lev l Gate level
design (micron ) design (micron )
h-bit 18,2b0 68.688
8-bit 36,h80 137,376
16-bit 72,960 27h.752

The area calculation of the Braun array multiplier (BAH) is
slightly more complicated since several types of cells are involved. In
addition, there are cell area overlaps in the tessellation procedure due

to the intercell communications.

 

 

62

The dimensional data of the cells prepared for the Braun array

multiplier are listed in Table h.3.

The tessellation maps (Figures 3.29 and 3.30) and the cell lists
(Tables 3.1 and 3.2) of the Braun array multiplier are used to develop
equations for calculating the area of an n-by-n Braun array multiplier.
The total area of an n-by-n Braun array multiplier (ABAMT)’ designed at

transistor level, is given in micron2 by

ABAMT - ((n-l)6h+lS)(117+l06(n-2)+83). (h-Z)

The total area of an n-by-n Braun array multiplier (A ), designed at

BANG

gate level, is given in micron2 by

ABAMG - ((n-2)l73+l80)(l30+l23(n-2)+l20). ‘ (h-3)

The areas of various sizes of Braun array multipliers as calculated

by these equations are listed in Table h.h.

h.3 Propagation Delay Calculations

Practical logic gates and circuits unavoidably take time to

generate valid outputs from the applied inputs. This time delay is an

 

63

Table h.3 The dimensional data of the cells for
Braun array multiplier.

 

Cell Length Width Area

(micron) (micron) (micron )
Cell l ll7 6h 7,h88
Cell 2 117 6A 7,h88
Cell 3 117 17 1.989
Cell A 87 6% 5,563
Cell AA 87 6h 5,568
Cell he 86 6h 5,50h
Cell 5 ll0 6h 7,0h0
Cell 5A llO 6h 7,0h0
Cell 6 l06 l7 l,802
Cell 7 l57 l73 27,l6l
Cell 3 l3h l7} 23,l82
Cell 9 l3h l80 2h,l20
Cell l0 l20 l7} 20,760
Cell l0A l20 I73 20,760
Cell lOB l20 l73 20,760
Cell ll l27 l73 2l,97l
Cell llA 127 ISO 22,360

 

6h

Table h.h The areas of Braun array multiplier.

Size Transistor lev l Gate level
design (micron ) design (micron )

 

s-by-s 1ho.378 h32.681
8-by-8 387,068 1,203,38h
l6-by-l6 1.6h1,9oo 5,l3l,lhh

important parameter in IC design since it must be used for determining
the clocking period of a system formed by logic circuits.

The output of a logic gate may, at any time, be switched to either
one (high-level) or zero (low-level). The unequal configurations of
pull-up and pull-down transistors used in NMOS logic gates result in an
inherent feature of asymmetric pull-up (rising) and pull-down (falling)
times [9]. Usually, the longer of these two is the pull-up time and it
is thus considered as the worst case delay time of the logic gate.
Thus, in the following description, the term "delay time of a logic
gate" refers to its pull-up time.

The computation of propagation time involves finding the worst case
signal path, in the sense of delay time, between the inputs and outputs
of the circuit. The propagation time of a circuit is then estimated by
summing up the delay times of the active gates constituting this worst

case path.

65

Due to the circuit complexity, it is not always easy to locate the
worst case signal path. Fortunately, complex hardware usually can be
partitioned into functional blocks. Therefore, the worst case signal
path may be defined in terms of functional blocks instead of logic
gates. Thus, the propagation time of a chip can be calculated by adding
up the delay times of the functional blocks in the worst case signal
path.

The sizes of the pull-up and pull-down transistors used in the
designs are listed in Table h.5 and will be applied in the following
propagation delay calculations.

In addition, since these calculations relate to the physical parameters
of the transistors, such as doping concentration, threshold voltages,
oxide thickness, etc.. reasonable assumptions using typical values are

listed in Table h.6 [9,lh].

The charging/discharging model for the calculation of delay time is
described in the following section. Finally, the results of calculation

using this model are provided.

h.3.l Delay Time Model

One way to compute the speed of a logic circuit is to define a unit
gate delay time corresponding to one level of logic [6]. A good measure

of this unit gate delay time would be the delay time of a NAND or NOR

 

66

Table h.5 The dimensional data of the transistor.

 

Design level Gate L/w (pull-up) L/W (pull-down)
(micron) (micron)
Transistor Inverter h/Z Z/h
Sum-generating
hardwired Zh/Z 2/2 (each input)
3-input NAND
Carry-generating
hardwired 16/2 2/2 (each input)
2-input NAND
2-input NAND 8/2 2/h (each input)
Gate Inverter h/Z Z/h
2-input NOR h/Z Z/h (each input)
3-input NOR h/Z Z/h (each input)
h-input NOR h/Z Z/h (each input)
Z-input NAND 8/2 2/2 (each input)

 

 

 

67

Table h.6 The typical physical parameters of the designs.

Threshold voltage of depletion mode transistor -h V
Threshold voltage of enhancement mode transistor l V
. . . . +l6 -3
N-type impurity doping concentration lO cm
Electron mobility at 300°K lOOO cmz/V-sec
Oxide thickness between transistor gate
and channel 250 A
Voltage supply (vdd) 5 V
High level gate output 5 V
Low level gate output 0 V
Dielectric constant of oxide (cox) 3.1.515Xio"l F/m
gate. The delay time in a multilevel logic circuit can then be

determined by the number of equivalent NAND gate delay levels or the
number of levels.

This is probably the most simplistic way to estimate the
propagation time of a logic circuit, but it does not take into
consideration the loading effect of the logic gates and can only be used
as a very rough estimate. It is useful at the beginning of the design
process in order to decide which algorithm or architecture is to be
chosen for the implementation of a desired function. Actually, the

result from this can only be obtained in an ideal case and is a lower

bound on the propagation time. In order to fully evaluate a design, a

{A . -men"

 

68
more precise model which takes loading effect into consideration must be
used [lb].

The delay time of a logic gate is directly related to the driving
capability of its transistors. The pull-up .time is limited by the
effective load capacitance and the charging current provided by the
pull-up transistor. The pull-down time is determined by the effective
load capacitance and the discharging current drained by the pull-down
transistor. The pull-up time, Tpu’ and pull-down time, Tpd’ can be

estimated by

Tpu- chH/lpu (h-h)
and Tpd - chH/lpd. (h-S)
where lpu and Ipd are the average pull-up and pull-down currents,

respectively, CL is the effective load capacitance and V is the high

H
state output voltage. The average current through the load capacitance
may be calculated by taking the average of the currents supplied or

drained by a transistor over its active (saturation and triode) regions

[lh]. The average pull-up and pull-down currents can be expressed by

lpu - ”nCoxV th 2(vdd thp/3)/2z uvdd, (h-6)
and Ipd - u noC x-Vt(Vdd :)2 (2Vdd th)/6Z pd vdd’ (“'7)
where un is the n-type impurity mobility, Cox is the capacitance

is

produced by the transistor gate oxide, V th

dd is the supply voltage, V

 

69
the threshold voltage. Zpu and Zpd are the length-width ratios of the
pull-up and pull-down transistors, respectively.

into equations h-h

Substituting the average currents, l and l ,
pu pd
and h-5, respectively, will give
T - 22 (v )Zc /u c [(v )2(v +v /3)] (u-8)
pu pu dd L n ox th dd th '
2 2
and Tpd 6Zpd(vdd) CL/unCoxUVdd vth) (zvdd+vth)]' (3 9)

According to this model, the delay times of various logic gates

involved in this research in terms of C the effective load

L!

capacitance, are listed in Table h.7.

h.3.2 Load Capacitance Estimation

It is essential to estimate the effective load capacitance
CL before the charging/discharging model can be used. Consider the case
where the input of a logic gate is connected to the output of another
logic gate. The total capacitance, CL’ appearing at the output of the
driving gate will be calculated as follows.

The major capacitance related to the load capacitance is the
transistor gate capacitance, cox' due to the oxide interposed between
the gate and substrate of the pull‘down transistor in the loading logic

gate.

 

70

Table h.7 The delay times of the logic gates (in terms of CL).

 

Design level Gate Delay time
(nsec/F)

Transistor Inverter 1.2thOhCL
Sum-generating h
hardwired 7.h2XlO CL
3-input NAND
Carry-generating h
hardwired b.9SXl0 CL
2-input NAND
2-input NAND 2.lo7XthCL

Gate Inverter l.2thOhCL
2-input NOR l.2thOhCL
3-input NOR l.2thOhCL
h-input NOR l.2thOhCL 4
Z-input NAND 2.h7X]0hCL

The gate capacitance, cox’ of the pull-down transistor in a loading gate

may be estimated by

cox - eowa/n, (h-IO)
where eox is the dielectric constant of the oxide layer, L and H are the
length and width of the transistor channel, and D is the thickness of

the oxide layer interposed between the gate and the channel [lb]. This

 

 

7l

gate capacitance can be approximately divided equally into
gate-to-source capacitance, C93, and gate-to-drain capacitance.
ng [IS]. The gate-to-source capacitance may be directly accounted for
in the effective loading capacitance. However, the gate-to-drain
capacitance will be charged in one direction for one polarity of input
and in the opposite direction for the opposite polarity input. Thus,
its effect on the system is twice that of an equivalent parasitic
capacitance to ground. The gate-to-drain capacitance should be
approximately doubled and added to the gate-to-source capacitance
[9,l5]. There are other minor parasitic capacitances, lumped as cstray’
associated with the transistor. These are assumed as one-tenth of the
input capacitance in this model [l6].

ln integrated circuits, the capacitances of circuit nodes are due
not only to the gate input capacitances but also to the capacitances to
ground of the signal paths connected to the nodes [9]. This type of
capacitance is not negligible. While gate input capacitances are
typically an order of magnitude greater per unit area than the
capacitances of the signal paths, the signal paths are often much larger
in area than the associated gate regions. Therefore, a substantial
fraction of the delay encountered may be accounted for by the
communication paths.

It is impractical to calculate these capacitances by considering
the signal paths piece by piece. The worse case signal path

capacitance, Cpath’ estimated by considering the communication path

which produces the largest capacitance in the cell, is used for the

72
propagation time calculation in this research. Only the capacitances
between immediately adjacent lines on the same layer are considered.
The other capacitances due to far apart lines or lines on different
layers are neglected because of their relatively smaller values.

The path capacitance is estimated for the transistor and gate level
full adder cells as ux10"5 F and 9x10"5 F , respectively [15]. The
path capacitance for the cells in a Braun array multiplier designed at
transistor level is the same as that in the transistor level full adder
cell. However, this capacitance is estimated as ZXIO-IhF for the cells
in gate level Braun array multiplier, due to the much longer worst case
communication path involved.

The total effective load capacitance due to one loading gate is
defined as the sum of the gate-to-source capacitance. the doubled
gate-to-drain capacitance and the stray capacitance of its input

transistor and the associated path capacitance. This relationship is

expressed mathematically by

C - C +2C

L gs gd+c

C (h-ll)

+ .
stray path

The effective load capacitances associated with the inputs of the

logic gates used in this research are listed in Table h.8.

73

Table h.8 The effective capacitances of the gate inputs.

Design level Gate Effective input
capacitance

 

lh

Transistor Inverter l.77XlO- F
Sum-generating _]5
hardwired 8.83XIO F
3-input NAND
Carry-generating _]5
hardwired 8.83Xl0 F
Z-input NAND
Z-input NAND 8.83x10'15 F

Gate Inverter l.77XlO-]h F
2-input NOR 1.77x10"“ F
3-input NOR 1.77Xl0-1h F
h-input NOR 1.77x10"“ F
Z-input NAND 8.83x10"5 F

h.3.3 Results

The delay time of a full adder cell must be considered in two cases
depending on the nature of any intercell connection. The worst case
signal path of a full adder cell can be considered as the path from
operand inputs to the carry-out bit if only the carry-out bit is used

for intercell connection. The delay times of the full adder cells

in,

 

7h

designed at transistor and gate levels are calculated according to this
signal path as 2.h38 and 2.090 nsec, respectively. The worst case
signal path of the full adder cell must also be considered as the path
from operand inputs to its sum bit output if 'the sum bit is for
communication between cells. The delay times of the cells designed at
transistor and gate levels are then 3.h08 and 2.l8l nsec, respectively.

The worst case signal path of a full adder cell used in a
ripple-carry adder is defined from operand input to carry output. This
is because the intercell connections are achieved by means of connecting
the nearest neighbor carry-out/carry-in lines. The propagation delay of
the ripple-carry adder is thus the sum of the carry-out delays of the

individual cells. Some representative values are listed in Table h.9.

Table h.9 The propagation delays of ripple-carry adders.

 

Size Transistor level Gate level
design design
(nsec) (nsec)
h-bit 9.752 8.36
8-bit l9.50h l6.72
l6-bit 39.008 33.hh

The full adder cells used in a Braun array multiplier are divided

into two groups, those which use preformed summand inputs and those

75

which perform ripple-carry addition at the bottom of the array. The
worst case signal path in the first group of cells must be taken as from
their operand inputs to their sum bit outputs while in the second group
of cells it is considered as from their operand inputs to their
carry-out outputs. The worst case signal path of the entire Braun array
includes the NAND gates for generating the summands. The propagation

delay of an n-by-n Braun array multiplier is given by the equation

TBAM ' TNAN0+(”")TFAI+TRCA (““2)
where TNAND IS the delay time of a two-input NAND gate, TFAl is the
delay time of a one-bit full adder using summand inputs and TRCA is the

delay time of the (n-l)-bit ripple-carry adder at the bottom of the
array. The results of some representative array sizes are listed in

Table h.l0.

h.b Time-Area Complexity

The time-area complexity is defined as the product of propagation
time and chip area. This parameter is used as a measure of the
performance of the designs proposed in this research. This figure
should be as small as possible for performance optimization. The

time-area complexity of the design examples are listed in Table h.ll.

 

 

76

Table h.l0 The propagation delays of Braun array multipliers.

 

Size Transistor level Gate level
design design
(nsec) (nsec)
5-by-5 2h.35& 18.6Ih
8-by-8 hl.892 3l.h27
l6-by-l6 88.660 66.025

h.5 Design Complexity

Table h.ll The time-area complexities of the design examples.

Circuit Transistor level Gate level

 

desi n desi n
(micron -nsec) (micron -nsec)
b-bit ripple-carry adder 177,876 57h,232
8-bit ripple-carry adder 7ll,506 2,296,927
16-bit ripple-carry adder 2,8h6,02h 9,l87,707
5-by-5 Braun array 3,hl8,766 8,053,92h
8-by-8 Braun array 16,215,053 37,818,7h9

l6-by-l6 Braun array lh5.570,85h 333,652,639

 

77

The design time is also one of the important factors in the
determination of the starting level of a particular design. One factor
determining the complexity of a design is the .number of transistors
used. The number of transistors contributes to the difficulty measure
of placement, routing and interconnection problems which will eventually
determine the required design time.

Design compactness, which pertains to the density of elements on a
unit chip area, is achieved by spending more design time. This design
time is spent in either man-hours, for a hand layout design, or in a
combination of man-hours and CPU time, for a computer-assisted layout.
The number of transistors per unit chip area is thus an additional index
of design complexity.

As a comparative figure of merit, the design complexity of a
certain chip is defined in this research as the product of the number of
transistors used and the density of the device.

The design complexity of ripple-carry adders and Braun array
multipliers is related to the building cells used in the tessellation.
The design complexity of a ripple-carry adder will be the same as that
of the full adder cell used. It should be noted that the design
complexity of a Braun array multiplier does not depend on its size since
any size of Braun array multiplier can be readily generated by means of
the same tessellation program with a well-defined set of cells. The
cells used in the formation of Braun array multiplier are based on the
modification of the full adder cells and have a similar degree of design

complexity. Since a comparative result is desired here, it is

78
sufficient to consider the most complicated cell in a Braun array
multiplier to evaluate its design complexity. The design complexity of

the working examples are listed in Table 5.12.

Table h.12 The design complexities of the design examples.

 

Circuit Transistor level Gate level
Design Design
Ripple-carry adder 0.1h8 0.093
Braun array multiplier 0.15h 0.102

h.6 Comparison

The designs produced in this research are compared according to the
above parameters. The results show that the area of a circuit designed
at transistor level is much smaller than that of a functionally
identical circuit designed at the gate level. This is reasonable since
more flexibility is available in a lower design level. With respect to
propagation time, the gate level designs are found to be slightly faster
than their transistor counterparts according to the charging/discharging
delay time model. However, the performance index, calculated by taking

the product of the propagation time and the chip area of a circuit,

 

79
shows that a better performance can be achieved by starting the design
at the transistor level. These results were anticipated intuitively and
come as no surprise.

The result of the delay time comparison, however, was not
anticipated. The selection of circuits was based on the criteria of
minimum gate count and number of delay levels. The transistor level
designs are supposed to be faster than their gate level counterparts
under this consideration. However, the results of this research show
that there is no strict relationship between the delay levels and the
propagation time.

The reason for this result is that, in NMOS technology, the actual
gate delay time heavily depends on the size of its pull-up transistor
and the load capacitance. The standard assumptions, such as the speed
of a NAND gate being the same as that of a NOR gate, are not correct in
this IC technology. This is because the channel length of the pull-up
transistor in a NAND gate has to be increased to maintain the
appropriate pull-up/pull-down ratio while a NOR gate does not have the
same problem. The fan-out of a logic gate can also slow down the speed
of the driving gate since a larger fan-out implies that the load
capacitance of the driving gate is larger. These situations have not
been considered in the delay time estimation using gate level delays.
In conclusion, this supports the assertion that the gate delay model can
only be used as a very rough estimation of the propagation time. The
types of logic gates and the fan-out conditions must also be considered

in the final selection of a circuit.

80

A known source of inaccuracy in the propagation delay calculation
is the estimation of load capacitances. For example, the path
capacitance is estimated roughly by considering .the worst case delay
path. This may result in a longer than necessary delay time.

The comparison of design complexity unsurprisingly verifies the
intuitive notion that higher design complexity is involved in the
circuits designed at the lower level.

In summary, this research demonstrates that, in a VLSI design
environment, better circuit performance can be achieved by starting the
design at a lower level. However,.the design complexity encountered at

this level will be higher.

 

 

CHAPTER V

CONCLUSION

In this work, the tradeoff of designing VLSI circuits at two
different starting levels has been studied. The layouts of two
examples. a ripple-carry adder and a Braun array multiplier, have been

produced and compared for this objective.

5.1 Summary

Advances in VLSI technology have provided new promise for the
custom implementation of dedicated circuits. The number of discrete
components required in a circuit is greatly reduced by the utilization
of VLSI chips. This eliminates the connections between many individual
components and the associated problems.

Even though thorough understanding of the entire VLSI technology is
rather difficult, an emerging set of simplified design rules and
computerized design tools enables an engineer with little knowledge of
solid state physics to successfully experiment with VLSI designs.

However, there are some drawbacks in this newly established field.
The major drawback is the high design cost which dominates the entire
cost of a VLSI project. New design methodologies must continue to be

developed and applied to reduce this cost. Hierarchical design is one

81

 

82
way to meet this end.

The basic elements used in a VLSI circuit are different types of
transistors. The transistors can be connected in a certain manner to
form logic gates. Cells or functional units can then be constructed
with the logic gates. The cells are connected to form the desired
circuits.

The logic gates or functional blocks can be predesigned and stored
to form a library. A design can then be started at different levels.
The lowest design level is the transistor level, which implies that the
designer has the highest flexibility in the design. This normally
results in a better design which is achieved at the cost of a longer
design time. The predesigned library gates or cells can be used in a
higher level design, and the work is thus simplified into the placement
and routing of logic gates or cells. The performance of the resultant
system may be affected due to the fact that the library gates or cells
are designed to be universal, usually without knowledge of the exact
requirement of end designs in which they will be used.

The objective of this research is to investigate the performance
tradeoffs resulting from starting representative designs at two distinct
design levels, the transistor and gate levels. The layout and analysis
of two working examples, a ripple-carry adder and a Braun array
multiplier, are independently designed with the assistance of a CAE
system. The circuits produced are then evaluated and compared.

The criteria used in this research for evaluation are time-area

complexity and design complexity. Time-area complexity, defined as the

 

83
product of propagation time and chip area, provides a comparative
measure of the performance of a chip. Design complexity is defined as
the product of the transistor quantity used in the design and the
density of elements on a unit chip area. This provides a comparative
measure of the design complexity which is related to the design time.

The various candidate circuits for the implementation of a full
adder, which is the major building block in both working examples, are
chosen according to their gate counts and gate delay levels. The full
adder circuit selected for the transistor level design is a hard-wired
circuit with two delay levels. Another circuit having three delay
levels is chosen for the gate level design. The selection for gate
level design is based on the assumption that the library gates are fixed
and cannot be user modified so as to take full advantage of design at
this level. A complex gate generated by hard-wiring several individual
NMOS gates will have an inadequate pull-up/pull-down ratio.

The circuit layouts, which are based on the Head and Conway design
rules for NMOS IC technology, are presented in Chapter III. This
includes the library logic gates, the full adders, the ripple carry
adders, the cells for Braun array multipliers and the multipliers
themselves. All of these were uniquely designed and parameterized for
the purpose of this study.

The ripple-carry adders are formed using an ICPL program
tessellating its full adder cells in a linear manner. The array of a
Braun array multiplier is first rearranged to solve the problems of

signal communication and efficient use of chip area. A program written

 

 

8b

in ICPL is used to tessellate the basic cells according to predefined
tessellation maps to form the Braun array multipliers. In both the
ripple-carry adder and Braun array multiplier cases the ICPL
tessellation program enables a designer to produce the layouts, and thus
the major portion of the design specifications for ultimate fabrication,
by merely specifying the desired array size in bits.

The chip areas and propagation times of the working examples are
calculated and presented in Chapter IV. The time-area complexity is

calculated and the parameter design complexity for both examples is also

obtained.
The charging/discharging model is used in the calculation of
propagation delay. This model is based on the charging/discharging

ability of a logic gate and its effective loading capacitance. The
delay time of a logic gate is defined as the time needed by the driving
gate to charge-up the load capacitance to the voltage level
corresponding to logic 1. The load capacitance includes the input
capacitance of the logic gates at next stage and the communication path
capacitance. -

The design examples are then compared in terms of the chip
performance and design complexity parameters. The results demonstrate
that, for functionally identical circuits, the design started at lower
level will have better circuit performance which is obtained at the cost

of higher design complexity.

 

85

5.2 Contributions

This research can be viewed as one of the many steps in
establishing a unified VLSI design methodology of which the final goal
is total design automation.

Hierarchical design is an efficient approach to reduce the high
design cost of a VLSI circuit. In a hierarchical design, the design
procedure can be entered at different starting points. In other words,
the designer has to decide the entry point of the design procedure,
which will eventually determine the performance and design cost of the
desired chip. Even though it is commonly anticipated that better
performance can be obtained at the cost of longer design time if the
design is started at a lower level, the decision of which starting level
should be chosen is not trivial.

This research studied the performance and design complexity
tradeoff of two different entry points in the hierarchical design
procedure. The results of designing two circuits, a ripple-carry adder
and a Braun array multiplier, showed that the lower level design had a
better performance and encounters higher design complexity.

The complete design procedure, from the selection of circuit for a
desired function to the generation of layout diagrams, has been
performed in this research. As mentioned above, the comparison of
propagation delay disclosed a result which was not anticipated at the
beginning of the design procedure. The gate level design has a delay

time which is less than that of the transistor level design. This

86
demonstrates that the commonly used consideration of choosing a circuit
according to its gate count and delay levels is not always appropriate.
This is because the basic assumption in this consideration, that each
gate level has the same contribution to the propagation delay of the

entire circuit, is incorrect in NMOS technology.

5.3 Future Development

One of the important performance factors in this project, the
propagation time, is calculated without the assistance of the CAE
system. The fact that the delay time model used for this purpose can
only provide a rough estimation is regrettable but is a function of the
available tools.

The normal practice of VLSI design requires that the chip be
fabricated before precise measurement of the propagation time is
possible. The future tasks for this research would be to do a
simulation on the circuits designed and then send the mask information
to a silicon foundry for prototype fabrication. More meaningful results
can then be obtained.

The gate library can be refined by including more gates with
different configurations and pull-up/pull-down ratios to meet the
purpose of various applications. A more informative method would be to
define a level somewhere between the transistor and gate levels by

storing different sizes of depletion and enhancement mode transistors in

87
the library. The hybrid advantages of both transistor and gate level
designs can then be incorporated into the circuits.

Computer programs can be written to generate transistors or gates
automatically on a CAE system according to the design rules. Many human
errors can thus be eliminated.

The necessity of defining a more complete set of criteria for
selecting a circuit to implement a desired function is demonstrated in
the above description. One of the future development tasks must be in

this direction.

10.

11.

13.

BIBLIOGRAPHY

Foster, M. J. and Kung, H. T., ”The Design of Special-Purpose
VLSI Chips," omputer, Vol. 13 (January 1980), pp. 26-80.

Kung, H. T., "Let's Design Algorithms for VLSI Systems," Proc.
m 921.13. lent Lets: 52212 W (January 1979).
PP- 75-90-

Fairbairn, D. G., ”VLSI: A New Frontier for Systems Designers,”
Compute , Vol. 15 (January 1982). pp. 87-96.

Losleben, P., "Computer Aided Design for VLSI.” Very Large
Scale Integration VLSI: Fundamentals and Applications, edited
by Barbe. D. F., Springer-Verlag Berlin Heidelberg, New York
(1982). pp. 89-125.

Liu, T. K.. Hohulin, K. R., Shiau. L. 6.. and Huroga. 5..
"Optimal One-Bit Full Adders with Different Types of Gates."
IEEE Trans. pp Computer , Vol. C-23, No. 1 (January 197k),
pp- 63-70.

Hwang, K.. Computer Arithmetic. John Wiley and Sons, New York
(1979)-

8raun, E. L.. Digital Computer Design, Academic Press, New York
(1963).

Tobias, J. R.. ”LSI/VLSI Building Blocks," omputer. Vol. 1h
(August 1981). pp. 83-101.

Head, C. and Conway, L.. Introduction to VLSI Systems.

Addison-Wesley Pub. Co.. Reading, Massachusetts (1980).

Beyers, J. u. et al.. "A 32-31: VLSI‘EPU Chip," IEEE Journal
21 Solid-State Circuits. Vol SC-16, No. 5 (October 1981),
pp- 537-5h7-

Keyes, R. w., ”Physical Limits in Semiconductor Electronics,"
Science, Vol. 195 (March 1977), pp. 1230-1235.

Edison. J. C., "Fast Electron-Beam Lithography,“ IEEE Spectrum
(July 1981), pp. 2h-28.

Taub, H. and Schilling, 0., Digital Integrated Electronics,
HcGraw-Hill Inc. (1977).

88

1h.

)5.

16.

Reinhard. D. K.. Integrated Circuit Engineering Notes, Dept.
of Electrical Engineering and Systems Science, Michigan State

University (1985).

Barna. A., VHSIC Technologies and Tradeoffs, John Wiley and
Sons, New York (1981).

Personal communication with Reinhard. D. K.. Dept. of

Electrical Engineering and System Science, Michigan State
University (December, l98h).

39

MICHIGAN STRTE UNIV. LIBRARIES
WWIWIIIWIWW)ll)(I)WHIWIINIWI
31293107776720