lllllllllllllm
LIBRARY H l
Michigan State

University

 

 

 

 

 

This is to certify that the

thesis entitled

A VLSI Chip Architecture For The B-spline
Surface Evaluation

presented by

Chia-Yiu Maa

has been accepted towards fulﬁllment
of the requirements for

Master . Science
degree in

 

 

agar/gm Mar

Major professor

Date 8-28-87

0-7 639 MS U i: an Afﬁrmative Action/Equal Opportunity Institution

 

MSU
LIBRARIES
.___.

 

RETURNING MATERIALS:
Place in book drop to
remove this checkout from
your record. FINES will
be charged if book is
returned after the date
stamped below.

 

 

 

 

 

A VLSI CHIP ARCHITECTURE FOR THE

B-SPLINB SURFACE EVALUATION

By

Chia-Yiu Maa

A THESIS

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

MASTER OF SCIENCE

Department of Electrical Engineering

1987

ABSTRACT

A VLSI CHIP ARCHITECTURE FOR THE
B-SPLINE SURFACE EVALUATION

By

Chia-Yiu Maa

This thesis describes the architecture design of the SE (Surface Evaluation) chip,
a VLSI chip dedicated to the B-spline surface evaluation, which can be used to ease
the Computer-Aided Geometric Design (CAGD) of B-spline surfaces. The
reconﬁgurable structure of the SE chip can perform on-chip blending function formula-
tion and high-speed vector-matrix multiplication at less hardware cost. Pipelining and
parallel processing techniques are applied to the inter- and intra-functional unit designs
to speed up the calculation of the surface evaluation algorithm. The symbolic example
and the description of the SE chip written in RTL (Register Transfer Language) verify
the correctness of the design. Area estimation conﬁrms the feasibility of a semi-
custom implementation approach with current technology and speed estimation shows
at least an order of magnitude improvement over existing graphics hardware in per-
forming surface evaluation. The SE chip architecture can also be used to execute the

coordinate transformation, tangent and normal vector derivation, and shading.

ACKNOWLEDGMENTS

I wish to express my greatest appreciation to my parents for getting me started
right and continued support and encouragement throughout my studies, and my wife
Mei-Ling, without her patience, understanding and love, nothing would have become
reality. I express my deep gratitude to my advisors, Dr. M. Shanblatt and Dr. E.
Goodman, for their guidance and encouragement throughout this research. Special
thank is also due to Jane Hawkins for providing the mathematical preliminary of B-

spline and helpful discussions.

Table of Contents

 

LIST OF TABLES ............. - ............... - -_ v
LIST OF FIGURES ................................................................................................... vi
1. INTRODUCTION ................................................................................................. 1
1.1 Problem Statement .................................................................................. 3

1.2 Approach ................................................................................................. 4

1.3 Overview of the Thesis .......................................................................... 6

II. VLSI System Design ........................................................................................... 7
2.1 Characteristics and Trends in VLSI Design .......................................... 7

2.2 Design Methodology .............................................................................. 9

2.3 New Design Technology ........................................................................ 14

III. MATHEMATICAL PRELIMINARIES ............................................................ 17
3.1 B-spline Curve ........................................................................................ 17

3.1.1 Knot Vector ............................................................................. 20

3.1.2 Blending Functions .................................................................. 23

3.2 Rational B-spline .................................................................................... 30

3.3 B-spline Surface ..................................................................................... 34

IV. LITERATURE REVIEW ................................................................................... 39
4.1 Uniform Subdivision .............................................................................. 39

4.2 Adaptive Subdivision” ............................................................................. 40

4.3 Cox-deBoor Algorithm ........................................................................... 41

4.4 Pickelmann’s Algorithm ......................................................................... 43

V. SURFACE EVALUATION AND SHADING ................................................... 48
5.1 Evaluation Algorithm ............................................................................. 48

5.2 The Derivative and Normal Vectors ...................................................... 51

iii

5.3 Shading ................................................................................................... 53

VI. ARCHITECTURE DESIGN .............................................................................. 57
6.1 System Speciﬁcation ............................................................................... 58

6.1.1 Design Objective ..................................................................... 58

6.1.2 I/O Speciﬁcation ...................................................................... 59

6.2 Architecture Development ...................................................................... 60

6.2.1 Functional Units ..................... 60

6.2.2 Timing and Pipelining ............................................................. 70

6.2.3 Storage Scheme ....................................................................... 73

6.3 Architecture of SE chip .......................................................................... 78

6.3.1 Architecture Synthesis ............................................................. 78

6.3.2 Hardware Description in RTL ................................................ 82

VII. ARCHITECTURE EVALUATION ................................................................. 84
7.1 Architecture Veriﬁcation ........................................................................ 85

7.2 Area Estimation ...................................................................................... 88

7.3 Speed Estimation .................................................................................... 92

VIII. CONCLUSION ................................................................................................ 95
APPENDIX 1 ............................................................................................................. 98
APPENDIX 2 ............................................................................................................. 110
APPENDIX 3 ............................................................................................................. 1 11
APPENDIX 4 ............................................................................................................. 113
APPENDIX 5 ............................................................................................................. 116
APPENDIX 6 ............................................................................................................. 118
BIBLIOGRAPHY ...................................................................................................... 120

iv

3.1
4.1
6.1
7.1
7.2

LIST OF TABLES

Blending function table .of the example. ....................................................... 28
Subpatch and corresponding control points. .................................................. 46
Cycle count for different evaluation routines. ............................................... 82
Transistor count of the SE chip. .................................................................... 89
Evaluation time for different evaluation routines. ........................................ 94

1.1
1.2
2.1
2.2
2.3
2.4
3.1

3.2 '

3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
4.1
4.2
5.1
5.2
6.1

6.2

6.3

6.4

LIST OF FIGURES

Viewing of 3D objects. . ..................................................................................
An enhanced graphics pipeline [2]. ...............................................................
Speed versus integration scale [3]. ......................................... ‘ .......................
Hierarchical layout. ........................................................................................
The "Y-chart" display of design representations. ..........................................
Cooperating expert agents for VLSI design [8]. ...........................................
B-spline curve. ................................................................................................
Example of locality. .......................................................................................
Blending function cascade. ............................................................................
Blending function tree. ...................................................................................
Blending function tree - segment 1. ..............................................................
Blending function tree - segment 2. .................................................... _ ..........
Plot of blending function. ..............................................................................
Third order B-spline. ......................................................................................
Strictly rational B-spline. ...............................................................................
B-spline control point net. .............................................................................
B-spline surface. ............................................................................................
B-spline control point net [11]. ......................................................................
B-spline surface [11]. .....................................................................................
Normal vector and tangent vector of a point on the surface [18]. ...............
The geometry of shading [l2]. ......................................................................

The block diagram of a 5-by-5 Baugh-Wooley two’s complement
array multiplier [24]. ......................................................................................

(a) Basic MAC block. (b) MAC array structure to compute a
polynomial of order 3. (c) MAC array structure to ﬁnd the
ﬁrst derivative of a polynomial with order 3. ...............................................

The block diagram for calculating the blending function and
the ﬁrst derivative. .........................................................................................

Block diagram for calculating of the inner product of X and Y. ................

vi

13
15
18
22
24
25
26
27
29
30
33
37
38
47
47
52
55

6O

62

64

6.5
6.6
6.7

6.8
6.9

6.10

6.11
6.12
6.13
6.14
6.15
6.16
6.17
6.18
7.1

A four-operand adder formed by CSA tree. .................................................. 65

 

The functional block diagram of a full carry lookahead adder. ................... 65
Logic circuits for realizing the carry-generate, carry-propagate,

and sum functions. ......................................................................................... 66
The schematic logic circuit of a 4-bit carry lookahead (CLA) unit. ............ 67
A carry lookahead array divider with 8-bit divisor in two’s

complement representation [25]. ................................................................... 68
Three basic types of cells to be used in constructing the division

array of Figure 6.9 [25]. ................................................................................ 69
Thegeneral form ofadatapath [23]. .......................................................... 71
The two-phase clocking scheme [26]. .......................................................... 71
Pipelined four-segment multiplier and its clock scheme. ............................ 72
Dataﬁow of vector-matrix multiplication. ..................................................... 74
Skewed storage pattern with (a) 8:0, and (b) Sal. ..................................... 76
Skewed storage with output multiplexing. ........................ 77
Simpliﬁed diagram of the SE chip. .................... . - -- - ................. 79
Multiplying-and-adding (M&A) unit. ........................................................... 81
Flowchart of the evaluation algorithm in the SE chip. ................................. 86

vii

CHAPTER I

INTRODUCTION

Geometric modelling is a broad field in computer graphics which
uses computational geometry to develop representation techniques for
three-dimensional objects. It can be divided into two sub-fields: 1)
surface modelling, which represents three-dimensional objects using
various types of closed surfaces (quadratics, polygonal meshes,
parametric bicubic patches, etc.), and 2) solid modelling, which aims at
developing object representations which are complete, unambiguous,
realizable, accurate, efficient, productive, flexible and minimal.
Surface modelling is especially useful in the aerospace and automotive
industries, while solid modelling is gradually becoming useful for a
variety of mechanical CAD/CAM applications. In this thesis, we shall
deal only with the topic of surface modelling, but insofar as most solid
modelling systems utilize a curve -- and surface -- based boundary
representation (B-rep) for at least graphics calculations, the work
described here is quite directly.applicable to solid modelling system
development.

The process used to generate the realistic computer images from a
geometric data base is called image synthesis. It generally involves
several steps:

1) transformation of the data base from world coordinates to the

2)

3)

4)

appropriate viewing coordinates (see Figure 1.1);

elimination of hidden surfaces by way of depth sorting (sort the
polygons by their distance from the viewpoint and to place them
into the refresh buffer in order of decreasing distance), 2.
buffering (a buffer is used to record the smallest 2 value
encountered for each (x,y) as the search porgresses), scan-line
processing (by using scan-line coherence and edge coherence to
determine the projecting result of all polygons onto the xy-
plane), or area subdivision (recursively subdivide the area of
the projection plane image until it is easy to decide which
polygon or polygons are visible in the area);

shading the visible surfaces by taking the light sources, the
surface characteristics, and the relative positions and
orientations of the surfaces and sources into consideration;
taking account of shadows, transparent and translucent

materials, surface texture, surface detail, and aliasing (some

 

 

 

 

 

31) world “PM '0'“ Normalised device
coonduune . coondhudee coondhuﬂes
deecrl u l f |
p (“1 cup ﬁlﬂhuﬂ» E Fﬁoﬁun.onhb uJEP%%E"‘%¢ Trenenn1n
*dew'vchune ‘ﬂp u“, in 23:55; °d '—€>lhﬂﬁdggﬂﬂeelbe>
pnubc n P '3 .eoo ll tee e co

 

 

 

 

 

 

 

 

 

 

 

Figure 1.1. Viewing of 30 objects.

of these can be handled by ray tracing).
The formulation of a realistic three-dimensional image does not
necessarily follow the order of steps stated above. For example, a

chip set designed by Hewlett-Packard Co. has a graphics pipeline as

shown in Figure 1.2 [2].

 

 

 

 

 

 

 

 

 

 

 

 

531$:th :33. runs “WC“ scan me n
cigar F4, .32," same “I 9522* u'vecnotusn "I “5": "WA?" —.
usr sums
.—__. ' .___. Inn—m l—J, .—. lL—J. _.
onuavxut
.uuiiiin‘e'l‘ﬁiom rul'éi‘l'l'fma. scan couvsnsnon IMAGE nuwv

 

 

 

 

 

Figure 1.2. An enhanced graphics pipeline [2].

1.1 Problem Statement

Researchers at the A. H. Case Center for Computer-Aided Design,
Michigan State University, have developed a surface assessment package
called MSU COLORSCOPE to produce accurate shaded images by calculating
and shading surfaces evaluated at the pixel level. Recently, several
algorithms developed by Pickelmann [I] allowed for more efficient
evaluation of the entire range of rational/non-rational curves and
surfaces. Though these algorithms speed up the response time of the

interactive CAD/CAM system which uses them, the time for generating a

shaded image is still too long (typically more than few minutes) and
thus far away from the goal of high quality "real-time" realistic image
generating for the interactive CAD graphics system.

1 It has been shown that the time required for three-dimensional
surface evaluation and transformation demands a significant percentage
of the total time needed to realize a new shaded image. If we can
download the surface evaluation and transformation to high speed
hardware, the overall computation time will be reduced dramatically.

I The surface evaluation routines shown in [l] involve numerous
matrix multiplications and thus are candidates good for implementing on
regular structured hardware. The very best candidate for the hardware
implementation is VLSI (Very Large Scale Integration), which has shown
great potential for improving system performance by implementing
concurrent numerical algorithms directly in hardware. Furthermore,
advances in CAD (Computer-Aided Design) and CAB (ComputereAided
Engineering) systems for VLSI have enabled such implementation to be carried

out in a custom design with relatively low costs in both time and money.
1.2 Approach

The purpose of this research is to investigate and specify the
major hardware components of a custom-designed VLSI chip architecture
for the surface evaluation algorithm given in [l].

The surface evaluation chip is to work as a coprocessor to the

host, a graphics processor taking care of all the graphics processing

except for surface evaluation. It downloads necessary information from
the host and evaluates either a point or a patch (by recusively
evaluating the point on the patch). Fixed-point calculations are
assumed throughout the design. This implies that any necessary scaling
is performed in the host system.

The overall design follows the top-down approach. The system
criteria will be stated first, followed by the definition of each
building block. When specifying the system criteria, the I/O problem,
one of the main considerations in dealing with matrix operations, must
be solved in order to guarantee better performance. A proper VLSI array
structure suited for matrix multiplication is also to be employed in the
design.

In order to achieve the highest system throughout, the pipeline
concept is extensively applied to the inter- and intra-functional units.
The timing scheme for supporting pipelining will also be introduced.

The storage scheme of the surface evaluation chip will be addressed
particularly because of the special pattern of memory access used. A
reconfigurable structure with higher resource utilization and less
number of functional components will then be given.

In order to simulate as well as verify the design, the chip
architecture will be expressed in the form of a program written in RTL
(Register Transfer Language). By comparing the simulation results with
the software surface evaluation algorithm, mistakes will be corrected

and modifications will be made to improve the system performance.

1.3 A Overview of the Thesis

Chapter Two introduces basic VLSI technology and design
methodologies. Chapter Three presents the mathematical preliminaries
required for the surface evaluation algorithm. A literature review is
given in Chapter Four. Chapter Five gives the surface evaluation
algorithm and the idea of shading. The development of the chip
architecture is covered in Chapter Six. The simulation results and

conclusions are presented in Chapter Seven.

CHAPTER II

VLSI SYSTEM DESIGN

The goal of this thesis is to develop a dedicated VLSI chip for
surface evaluation. This chapter introduces the characteristics as well
as trends of VLSI technology. The discussion will emphasize semicustom
and custom VLSI, also called application-specific integrated circuit
(ASIC) design. VLSI design methodologies such the silicon compiler and

AI approaches will also be covered.
2.1 Characteristics and Trends in VLSI Design

Recent advances in VLSI technology -- in system and circuit design
environments, process sophistication, manufacturing cost effectiveness,
reliability, and packaging -- have spurred the identification of new
system applications which are now feasible with more advanced
technology. Offering excellent noise immunity, wide operating margins
with respect to power supply voltage and temperature, lower drive
currents, decreased power consumption, ease of design for VLSI circuits,
and scalability to submicrometer dimensions with improved performance
and reliability relative to alternative technologies, CMOS is the most
suitable technology for the majority of these emerging applications.

CMOS designs present a disadvantage in that the die size required is

larger than that required by NMOS for the same circuit. New
technologies, however, such as domino circuits and clocked circuits, are
reducing the die size difference between CHDS and NMOS.

In the ultra-high-speed region, ECL and CML technologies are used.
Their performance superiority is irresistible, but other costs are
higher. In a much higher speed region, new semiconductor devices are
being explored. Gallium arsenide devices are expected to be applied to
real-world systems in the near future and are one of the most promising
technologies. Figure 2.1 shows the distribution of operation speed and
scale of integration for gate arrays achieved by various technologies
[3].

Two CHOS technologies have been prominently used: the silicon-gate

Se tt Mum;

 

 

 

 

speed (net
! si-qete Obs (emote metal!
. l
I
5 t St-oete 008 ldoeble metal)
-------- -- -J----
l r ﬂ
5 . m :
l *o_e—c-oqpoj '
2 i i' . |
r4 I '
Ll - I
it ,tzuztztztgja--__-J
F _-— - _. '
' a
' o
' I
0.5? I o
o r
I r
L ................. ‘J
CCL
on}
Scale of

 

 

 

1 .‘c—
0.1 Inteeretton

10° 1 S 1000 I 5 10000 2 ‘°.‘.’

Figure 2.1. Speed versus integration scale [3].

CHOS and metal-gate CMOS. The former has become dominant because it
allows for higher operation speed and higher degrees of integration by
using finer pattern geometries. For example, 1.2 to 1.5 pm design rule
devices exhibit 1.0 to 1.2 ns delays and subnanosecond delay is expected
by using submicrometer rules [4];

The types of integrated circuits that are used for semicustom and

custom designs include:

1) Gate Arrays (GAs). These are prefabricated unwired standard
transistor arrays. The function of the gate array is defined
by properly interconnecting transistor patterns on the chip.

2) Standard Cells (SCs). These allow for the design of
application-specific integration circuits by composing circuits
from elements stored in cell library.

3) Full Custom integration circuits (FCs). These are hand
customized by system designers. Full custom designs are the
most time-consuming to produce, but are capable of the highest

density.
2.2 Design Methodology

..Custom chips developed for particular applications tend to be more
dedicated to the target system and less general purpose. The term
"application-specific 16' clearly describes the present situation where
many different types of circuits are being used in small quantities for

specific applications. Smaller size, lower cost, higher speed and

10

higher reliability have been goals in the development of ASICs. Higher
integration of devices, however, has led to longer design times, and
made it difficult to maintain design reliability. In order to solve
this dilemma, research and development work on various customization
technologies and automated design technologies is being accelerated,
with emphasis on design methodologies and automated tools.

Design methodology is defined as a set of codified techniques that
are applicable to the VLSI design process. The objective of studying
design methodology according to Baller [5] is to facilitate the
creation of better designs. A final design must be functionally and
physically correct, qualitatively acceptable, testable, and easily
modified.

The prevalent design methodology today is hierarchical in nature.

A hierarchical layout using the standard cell approach is given as an
example in Figure 2.2 [6]. The VLSI techniques are categorized by
those used for design, or synthesis, of a VLSI circuit and those used
for its verification. In both categories, a further distinction is made
between techniques relating to physical and functional views of the
design. Physical design supports partitioning, layout, and topological
analysis at all design levels. Functionality, testability, and physical
design must be considered in parallel throughout the design process.

A design model is first defined according to some initial
objectives and requirements. A top-down design process is normally used
to decompose the desired operations of a circuit into a network of

orchestrated smaller and simpler functional modules. Once a functional

11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

F—
—1[ 11"08UFF18' ]
_t ‘
l I
z’ 5 R. ./
C) (D
A. L IJ I! E
B P C3 3
u , U
P 1' P
F , F F
f“ A
A. I C
V V
C:th'R.L L
R C>rw ,e ........ 3 E
“ ': —‘—
rat-55:...” .04

 

LAYOUT OP BLOCKS

 

 

 

 

 

STANDARD CELL LIBRARY

Figure 2.2. Hierarchical layout.

implementation strategy has been determined, a bottom-up process is
commonly used to complete the physical design. Design results are
simulated at different levels throughout the design process to evaluate
correctness and performance [7].

The ”Y" chart shown in Figure 2.3 has three axes that depict
functional (behavioral), structural (architecture), and geometric

(physical) representations of VLSI design. The functional axis

12

describes the functionality of the design, the structural axis
represents the structural implementation of the design, and the
geometric dimension represents the hardware realization of the design.
Higher level of abstractions are encountered moving out radially from
the origin. Commonly used levels of structural representation are the
processor/memory/switch (PMS), the register-transfer, and the circuit
level. Geometric representations include layout planning, cells, and
physical mask geometries. VLSI design methodology, from this
perspective, is a sequence of one-to-many mappings from higher to lower
levels of abstraction and between different design representations in
the design space. A particular mapping sequence taken by a certain
school of thought, not surprisingly, more or less reflects a particular
environment (usually expressed in needs and constraints of human,
technological and financial resources) under which that particular
methodology has evolved.

The design tools available to IC designers today usually encompass
several of the lower level representations shown on the Y chart. For
example, the designer can expect to enter at the logic level or perhaps
the circuit level and find support for layout, logic and circuit
simulation, design rule checking, and timing and critical path analysis.
Many of these systems support designer interaction at any level of the
design process and provide consistent database services across the

entire range of design activities [4].

13

mucrurm.

summon mm”
Mmﬂomry/Swttoh Subm-
Wt ‘l‘l'enster Mm
Circulte 3.01..“ We

._ leak Geometriee'

-Celis

_ layout Planning

 

cmmmc
MUTATION

Figure 2.3. The "Y-chart" display of design representations.

14
2.3 New Design Technology

In software design, for the ease of programming, algorithms are
written in high level languages. The same principle is applied to
hardware design for which high level hardware description languages
(HDL) are necessary. More than clarifying module specification
definitions as in the early age, a ideal HDL should provide both
functional and structural descriptions at all levels of abstraction.
Many HDL's exist, but no one is widely used.

Besides HDL, silicon compilers have drawn much attention from CAE'
system developers. Silicon compilers have the goal of generating mask
patterns directly from an HDL. In this case, the entire physical design
process of a VLSI system can be reduced into a compilation process such
that most of the characteristics of the chip under design can be
accurately estimated. This allows system designers to focus on higher
level design and to experiment with various algorithm/architecture
alternatives with much less effort and cost. An elementary version of a
silicon compiler is the module generator used to generate the mask
layouts of some regular modules, such as basic cells in gate arrays and
standard cells. "Concorde" is an example of one released by Seattle
Silicon Technology and is more precisely classified as a macro-compiler.
But the chip geometry generated by a macro-compiler is usually not
optimal. How to automatically generate close-to-optimal geometry and
achieve full compilation rather than macro-compilation are the two

issues remaining to be solved for the silicon compiler in the near

15

future.

There has recently been much interest in using expert systems
modeled on human expert knowledge. Such systems aim to capture the
style and expertise of the human expert and provide flexibility to deal with
a wide range of applications. Designs in VLSI usually involve
certain forms of heuristics, such as those used for placement and
routing, and thus it is appropriate to use expert system techniques to convert
a hierarchical circuit description into a full-custom VLSI
layout. The expert system for VLSI design may consist of a number of
cooperating expert agents connected through a central manager. An
example is given in Figure 2.4 [8]. Much effort has gone into the
development of an expert system for VLSI design. Several issues,

however, still remain open.

 

Structural moot

 

Physical output

 

 

 

Figure 2.4. Cooperating expert agents for VLSI design [8].

16

l) The expert system should contain as much knowledge of VLSI
design as possible.
2) The machine's learning capabilities should be augmented.
3) Special artificial intelligence machines msut be deveIOped
for providing high throughput expert systems.
This thesis focuses mainly on the design of a surface evaluation
chip at the system, algorithmic,-and register transfer levels. The

final design will be expressed by a program written in RTL (Resigter

Transfer Language).

CHAPTER III

MATHEMATICAL PRELIMINARIES

This chapter covers the mathematical preliminaries on which the
surface evaluation algorithm is based. The B-spline space curve and
rational B-spline space curve will be introduced first. Next the B-
spline surface patch is defined and extended to the rational B-spline
patch. In order to give the reader a clear picture of B-splines, simple
examples are given. Finally, various classifications of B-spline are

discussed.
3.1 B-spline Curve

A curve can be represented by a piecewise linear approximation in
which each segment of the approximation is a straight line. Likewise, a
curve can be represented as a piecewise polynomial approximation in
which each segment of the composite figure is defined by a polynomial.

A piecewise polynomial curve is called a spline.[10]

A B-spline is a parametric spline which consists of one or more
segments. Each segment represents the curve over a range of the
parameter t, for example 1 s t < 2. The curve is partially defined by a
set of control points { P0 P1 P2 ..... Pn I (coordinate values) which

produce an open-sided polygon when connected sequentially by straight

i7

18

lines. Each segment of the curve is defined by a subset of the control
points. A three-segment cubic B-spline curve with its control point

polygon is shown in Figure 2.1. Segment ends are denoted by an X. The
control points that are used to compute a particular segment are listed

beside that segment [11].

 

 

 

 

 

Y

4+ P:

3 a

a. ‘ Pa

Fed-'29:

1.. t3 p

P 39‘? 3
I I P
F% I 1 3 ‘14 HI x

Figure 3.1. B-spline curve.

Let P(t) be the Cartesian position along a curve as a function of
the parametric variable t. A segment of the spline is defined by the

following equation:

19

n
P(t) - 2 P N (c) (2.1)
1-01 1.1;

where

P1 is the control point,

n+1 is the number of control points,

Ni,k is the blending function associated with P1, and
k is the order of the blending functions.

The order k of the B-spline curve is the degree + 1, i.e., a quadratic
is an order 3 curve and a cubic is an order 4 curve. P1 represents the
vector of the control point coordinates such as [xi,y1] for a 2-space
curve, and likewise P(t) represents the vector of solution equations for
the coordinates of points along the curve, such as [x(t),y(t)]. If the
number of control points exceeds the order of the curve, the B-spline
will have more than one segment. In this case, only a subset of the of
control points is used for each segment, because the result of
formulation of the blending functions associated with these unused
control points for the segment are zero. Examples will be given in the
following sections.

The number of segments the curve will have can then be represented

as a function of the number of control points, n41 and the order, k, as
number of segments - n - k + 2. (3.2)

The parameter range for each segment can be determined by simple

observation of the knot vector which will be introduced next.

20

3.1.1 Knot Vector

The blending functions are determined by a set of values called
knots. They are displayed as [ x0 x1 x2 x3 ...... x1 ] and are called
the knot vector. In order to make the curve pass through the end
points, the first and last entries of the knot are repeated k times
where k is the order of the curve. Thus xO-x1-...-xk_1 and x£_(k-1)-
...-xl-1-xl. The length of the knot vector refers to the number of
elements (knots) in the vector. The above vector has length - 1+1. The
length is a function of the control points (n+1) and the order (k) of
the curve, given by n+k+l. The order of the curve must be less than or
equal to the number of control points.[11]

For example, if n+1 - 4 and k - 4 (cubic), the knot vector is [ 0 0
0 0 1 1 1 l ]. This curve is a special case of the B-spline, called the
Bezier spline, and has only one segment. If n+l-4 and k~3 (quadratic),
one possible knot vector is [ 0 0 O 1 2 2 2 ]. There are n - k + 2 - 2
segments. The first segment of the curve, produced by the knot vector
shown, has the parameter range t-O to t-l and a zero blending function
with the fourth control point. The second segment has a zero function
associated with the first control point and covers the parameter range
t-l to 12-2.

Since each segment of a spline depends on a subset of control
points, each control point affects only a local area of the B-spline
curve. This property is called locality and is one of the key features

of the B-spline. The B-spline curve can be modified by simply moving

21

the locations of some control points. An example of locality is given
in Figure 3.2 with n94, kp3 (quadratic), and knot vector [ O O O l 2 3 3
3] which results in a three segment curve. The first segment depends
only on the first three control points and has the parameter range of
t-O to t-l. The second segment depends on the middle three control
points and has the parameter range of t-l to t-2. The third and last
segment depends on the last three control points and has the parameter
range of t-2 to t-3. In Figure 3.2(a), the location of the first
control point is moved, and in Figure 3.2(b) the second control point is
moved.

A knot vector with a repeated interior knot, such as [ 0 0 0 l l 2
2 2 ] with n94 and k-3, causes the length of the second segment of the
curve to be zero. Usually, the order of the continuous derivative of a
B-spline curve at its segment joints is up to k-2. For example, the
order of a B-spline curve with k-4 should have continuous zero, first,
and second derivatives. Repeating a knot once (multiplicity of knot -
2) causes the curve to lose the highest order derivative continuity at
the segment joint corresponding to that knot. The relationship between
multiplicity m, the order of the curve k, and the order of the
derivatives that are continuous across a segment joint is (0, 1, ..., k-
m-l). Thus, if the B-spline curve is a cubic (k-4) and an interior knot
has a multiplicity of 4, then k-m—l--1 and the curve will be in general

discontinuous pointwise and in all derivatives.

22

 

 

 

 

 

(b)

Figure 3.2. Example of locality (a) effect of moving the first
control point, (b) effect of moving the second control point.

23

It is not necessary that the knot values be integer. If all
interior knots are consecutive integeters, this type of knot vector is
called a uniform knot vector. Relaxing a uniform knot vector to allow
repeated interior knots an produces enhanced uniform knot vector [1].
The general knot vector with monotonically non-decreasing real knot
values produces a nonuniform knot vector, for example [ O O 0 .3 .7 1 1

l ].
3.1.2 Blending Functions

As illustrated in the previous examples, the relationships between
the knot vector, the control points, and the curve are that the curve is
defined by the control points and the blending functions which are

functions of the knots. The formal definition of blending functions are

1 if xi 5 t < x1+1
N1 1“) "

0 otherwise

(t - x ) N (t) (k - t) N (t)
”1 km _ 1 i,k-1 + 1+1: i+1,k-1 . (3.3)

xi+k-l ’ xi xi+k ' x1+1

 

 

The x1 is the value of the ith position of the knot vector, for example:

x-[oooorzaass 7 s 9 9 9 9].

i - 0 1 2 3 4 5 6 7 8 9 10 ll 12 l3 14 15

From equation (2.3),if xi - x1+1 , then Ni,l(t) - 0, which causes the

24

curve to lose the highest-order derivative continuity as mentioned in
the last section. For each segment there is one Ni,1(t) that is
nonzero. This nonzero function cascades down to the final blending
functions associated with that segment as shown in the Figure 3.3. Note
that the number of nonzero blending functions per segment is equal to

the order of the curve.[ll]

"1,1.
.Ni-1,2 .Ni.2.
”W ’ “is? “ii-t
Ni'ir‘ Ni'2:; Ni'1:; Ii'?

Figure 3.3. Blending function cascade.

A clear example of blending functions quoted from [2] is:

Control points : (0,0) , (2,2) , (3,1) , (3,0)
Order : k - 3

Knot vector : [ 0 0 0 1 2 2 2 ]

l - 0 l 2 3 4 5 6.

3

From equation (2.1),the spline equation is P(t) - Z

i 0

Pi Ni,3° There

25

are four blending functions NO,3’ N1’3, N2’3, and N3,3 that need to be
formulated. Tracing from the bottom of the tree shown in Figure 3.4, we

know which blending functions need to be computed.

o o 1 it one I it l(t<2 o ‘ o
s”... NI. N... AC. ,N..

N 3',
\\ / \ / \ / \\ / \ 0/

edidz Iqtz I‘ll ”‘11, xbrzl
\ / \ / \ / \ /
N N N

OJ 13 21 ”'31

Figure 3.4. Blending function tree.

In this example, only two of N are not zero, which agrees with

i,1
the fact that the curve will have two segments. The function tree used

for determining the first segment is given in Figure 3.5 Therefore, the

blending functions N N 3 , and N2 3 are nonzero and will be a

0,3 ’ 1

function of N1,2 and N2.2

into equation (3.2). Then one can get the following blending functions:

found by substituting the corresponding values

(x - t) N
N1,2 - - 1 - t (3.4)
x - x

3 2

26

 

 

Figure 3.5. Blending function tree - segment 1.

 

 

 

 

 

 

 

(t - x ) N
2 2,1
N2,2 ' x _ x ' t (3.5)
3 2
(t - X ) N (x ; t) N
No 3 - 1 012 + 3 1'2 - c2 - 2c + 1 (3.6)
x2 ' x0 x3 ' x1
(t - x ) N (x - t) N
N1 3 - 1 1'2 + a 2'2 - -1.5c2 + 2: (3.7)
x3 ‘ x1 x4 ‘ x2
(t - x ) N (x - c) N
N2 3 - 2 2'2 + 5 3'2 - .5c2 (3.8)
xé'xz XS'XB

Note that the convention of 0/0 - 0 is used when computing the blending
functions.
For the second segment the blending functions N1 3, N2 3, and N3 3

are nonzero and can be formulated by N2 2 and N2 3 as shown in Figure

3.6. These

are computed in

(x4 - t) N

 

 

 

 

 

27

a manner similar to the first segment as:

 

 

 

3,1 _ 2 _ t
"a'x3
(t-x)N
3 3,1 _ t _ 1
“4"‘3
(t - x ) N (X ' t) N
1 1.2 + 4 2'2 - ,5c2 - 2t + 2
X3-X1 xa'xz
(c - x ) N (x - t) N
2 2.2 + 5 3.2 - -1,5t2 + 4c - 2
xa'xz x5.x3
(II-X)" (X‘t)N
3 3.2 + 5 4'2 - t2 - 2t + 1.
xS-x3 x6°xh

(3.9)

(3.10)

(3.11)

(3.12)

(3.13)

 

Figure 3.6.

 

Blending function tree - segment 2.

28

Table 3.1 summarizes the blending functions computed above and
Figure 3.7 gives a plot of the blending functions.
After the blending functions have been determined, the spline

equation of the first segment, 0 s t < 1, can be expressed as

F t2 - 2t + 1 ‘
x(t) 0 2 3 3 2
- -1.St + 2: (3.14)
y(t) 0 2 1 0 2
0.5t
b O A

 

 

and for the second segment, where 1 s t < 2,

. o .
x(t) 0 2 3 3 2
- 0.5: - 2t + 2 (3.15)
y(t) 0 2 1 0 2
-1.St + 2t - 2
. t2 - 2: + 1

 

 

Table 3.1. Blending function table of the example.

 

1 Ni 3(1:), 0 s t < 1 N1,3(t)’ 1 s t < 2
o c2 - 2: + 1 o

1 -1.5c2 + 2t . 0.5:2 - 2: + 2

2 0.5:2 -1.5c2 + 4c - 2

3 0 t2 - 2t + 1

 

 

29

 

 

 

 

 

 

 

 

 

 

 

 

l . 0
d ...e I
8 t /
I L
0 0 . I a, /
Q‘ N
3 ‘ I/ 0.3 N 3.3 .7
e s ,p“ . m‘
A 9 O 3 1" ‘.§a. .z”' ‘Na, /
e r. O ‘. ’
g e. I N g \. / h N \\
d .0. ./ u \ ’ u \
F f ‘. ” .\ \(\
u 0 ° “V I. . e I ‘X ‘ ‘1
n f a git 3% ’7 \
c o e i \ \
t d 'l x s\. I \\
i e, 4 I ‘. j r \\
o o e 2 "I ‘v. ’7 ’ ¥‘ I \‘
ﬂ / ’ h. ‘\ \
3 it I a .e. ‘ \ \
f x” / ‘s ‘\
' 0 ° e...“ I .‘.
. ' o.- I T I I. W I f I r I I I F“? ‘l
0.0 0.5 1.0 1.5 2.0
Pam-eta: - t
Figure 3.7. Plot of blending functions.

The plot of this two segment curve is shown in Figure 3.8 along

with the polygon of control points; the segment joint is marked with an

X.

As demonstrated in the above example, when the blending functions

for a segment of a B-spline are cmputed, only a subset of the knot

vector is used.

knot vector.

This subset of the knot vector is called the effective

The length of the effective knot vector is equal to 2(k-

l), where k is the order. In above example, the length of the effective

30

 

 

  

 

4:? ate 1:!

Figure 3.8. Third order B-spline.

knot vector is 2(3-l)-4, which agrees with the computation of blending
functions. The only knot values used in the computation of the blending

functions of the first segment are x and x4, or [ O 0 1 2 ],

19 x2! x3!
while the blending functions for the second segment use only the knot

values x2, x3, x4, and x or [ 0 1 2 2 ].

5’

3.2 Rational B-splines

A rational B-spline is one in which the control points are
expressed in homogeneous coordinates. A point is given an additional
coordinate which can be thought of as a scale factor. Instead of P(x,y)

a point is represented as P'(wx, wy, w). The last coordinate is the

31

homogeneous variable. There are an infinite number of homogeneous
representations of a single coordinate point, for example the coordinate
x-l, y-2, or (1,2), can be represented by (1,2,1), (2,4,2), or (5,10,5).
If the homogeneous variable is equal to 1, then the other variables
represent the actual Cartesian coordinates. If the homogeneous
variable, also called the weight in rational B-splines, is not equal to
1, then the transformation back to actual coordinates, P(x,y,l), is
calculated by dividing each coordinate of P' by the calculated value of
w.

There is no effect on the blending functions when the control
points are expressed in homogeneous coordinates, because the blending
functions depend only on the knot vector. A point WP(t) along the
spline in homogeneous coordinates can then be expressed as

n

time) - X NC?1 * N1 k(c), (3.16)

i-O ’
where the WCP1 are the homogeneous coordinate values of the weighted
control points. WP(t) represents wx(t), wy(t), and w(t), which is the
vector of homogeneous coordinate values. The Cartesian coordinate
values of points P(t) along the spline are then calculated as follows.

wx(t)
x(t) - -—————— (3.17)
w(t)
wy(t)

y(t) ' ---- (3.18)
w(t)

Extending the earlier example,let the weights of control points

32

(0,0), (2,2), (3,1), and (3,0) be 1, 2, l, and 1, respectively, for

example. The control points in weighted or homogeneous form are:

'0
I

(0.0.1):‘

'11
I

(4.4.2);

'11
I

(3.1.1);

P - (3,0,1).

The computation of coordinates WP(t) for the first segment is then

 

 

 

 

 

 

straightforward.
wx(t) ' o a 3 c2 - 2 c + 1
wy(t) - o a 1 -1.5 c2 + 2 t
w(t) . 1 2 1 L .s c2 .
. 2 1
-4.5 t + 8 t -
- 2 (3.19)
-5.5 t + 8 t
h -1.5 t2 + 2 t + 1 .
-a.5 :2 + 8 c -s.5 :2 + 8 c
Thus x(t) - 2 83d Y(t) ' 2
-1.5 t + 2 t + 1 -l.5 t + 2 t + 1

for 0 s t < 1. Notice that x(t) and y(t) are now ratios of polynomials

in t. Similarly, we can calculate the WP(t) for the second segment.

F .

 

 

wx(t) a 3 3 .5 c2 - 2 c + 2
wy(t) - a 1 o -1.5 c2 + a t - 2
w(t) 2 1 1 _ c2 - 2 c + 1 .

33

 

 

 

 

.S t2 - 2 t + 5
_ 2 (3.20)
S t - 4 t + 6
b5t2-2t+3‘
.5c2-2c+5 .5t2-4t+6
Thus x(t) - 2 and y(t) - 2
.St-2c+3 .5t-2t+3

for l s t < 2.

The solid line curve in Figure 3.9 is the rational B-spline derived
above and the dashed line is the nonrational B-spline from the previous
example. Note how the double weight on the (2,2) coordinate pulled the

curve toward it.[ll]

 

 

 

Figure 3.9. Strictly rational B-spline.

34

Weighting each of the control points gives another degree of
freedom to the curve per control point and allows the representation of
curves that can not be fitted with nonrational B-splines, such as
circular arcs derived from conic sections.

The example shown above is restricted to 2-dimensional curves. The
representation of curves in 3-dimension space (x(t), y(t), and z(t))
with B-splines can be achieved by simply adding another row to the

control point matrix for the calculation of zw(t).
3.3 B-spline Surface

The extension from the spline curve to the surface patch is done by
adding a second parametric coordinate and an associated knot vector. The
most commonly used surface representation is the four sided patch. Each
point on the patch is a function of two parameters, s and t. One edge
or side of the patch and its opposite side are chosen as curves that
depend only on s and the other two edges are functions only of t. Both
s-varying sides share the same knot vector, as do both t-varying sides.
The s knot vector does not have to equal the t knot vector. They can be
of different orders and have a different number of control points. If
the s edges have m+1 control points and the t edges have n+1, then there
will be a total of (m+1)o(n+l) control points. The control points
joined together by straight lines form what is called a control point
net. The general shape of the patch will mimic the net (more precisely,

the surface is contained within the convex hull formed by the control

points).

35

The general formula for determining the coordinates of a point on

the B-spline surface is

P(s,t) -130 1:0 M],h(s) - P1,] - Ni,k(t) (3.21)
or in matrix form
P(s,t)- M0,h(s) Ml,h(s) .... Mm,h(s) PPOO P01 .... Pogq. N0,k(t) I (3.22)
P10 P11 .... Pln Nl,k(t)
.ng _.njk.., J

where

1.1

n+1
m+l

N11

M11

is

is

is

is

is

is

is

one

the

the

the

the

one

0116

 

 

 

 

of the (n+1)-(m+l) control points,

order of the curve in the t direction,

order of the curve in the s direction,
number of control points in the t direction,
number of control points in the 3 direction,
of the n+1 blending functions of order k,

of the m+l blending functions of order h.

Just as the B-spline curve consisted of segments, the Bospline

surface consists of subpatches. The s edge B-splines have (m-h+2)

segments and the t edge B-splines have (n-k+2) segments. Each subpatch

is formed by a segment of the s edge paired with a segment of the t

edge. Therefore, the number of subpatches is (m-h+2)-(n-k+2). Each

36

subpatch covers a range of s and a range of t.

Example [2]:
g_gdgg§ m+l-4, h-3 (quadratic), knot vector [ 0 0 O l 2 2 2 ]

;_§ggg§ n+l-2, k-2 (linear), knot vector [ 0 0 1 l ]

There are two subpatches in this case, since there are two segments
in the 3 direction and one in the t direction. One subpatch covers the
range of s-O»1 and t-Oel, and the other covers the range of s-1~2 and
t-Oel. There are altogether (n+1)-(m+l) - 8 control points, but each
subpatch use only a subset of the control points. The number of control
points that contribute to each subpatch is h-k, which is 6 in this
example. The subpatch with the range of s-Oel and t-0~1 has the

following blending functions.

2

”0,3(8) - s - 23 + 1 N0,2(t) - 1 - t
M1’3(s) — -1.5s2 + 23 N1,2(t) - c
M2'3(s) - 0.532

M3,3(s) - 0

The coordinate of a surface point on this subpatch can then be

calculated as

’ P P l
P(s,t) - 32-25+l -1.532+25 0.532 00 01 l-t
P10 P11 t (3.23)
P20 P21

 

 

37

The first column elements of the control point matrix are the first
three control points of one of the s edges. The first row elements of
the control point matrix are the two control points on one t edge. The
surface points for the other subpatch with the range of s-leZ and t-O~1

are computed by

p a

P P

P(s,t)- 0.532-25+2 -1.532+as-4 32-23+l 0.5e2 1° 11 l-t
P20 P21 t (3.24)
lP3° P31

 

 

If we choose the control points given in Figure 3.10 for this

example, we can generate the B-spline surface shown in Figure 3.11.[1l]

More complex surfaces may be constructed if a weight is also

assigned to each control point of the B—spline surface. To determine

 

 

 

Figure 3.10. B-spline control point net.

38

We!”
5p, SUXKNCH 2

y ,. ﬂ.

 

 

 

 

Figure 3.11. B-spline surface.

points on the surface, equation (3.16) is used in the homogeneous
coordinate system as described above._

More examples of B-spline curves and surfaces can be found in [11]
and in [1]. The next chapter will give a literature review of the work

being done by others related to surface representation and evaluation.

CHAPTER IV

LITERATURE REVIEW

A fundamental area of interest in computer graphics and computer-
aided geometric design is the representation of a 3-space object whose
shape is delineated by free-form surfaces. Various mathematical
models as well as special purpose hardware have been developed to meet
the requirement for implementation in an interactive computer-aided
geometric design system. This chapter presents a review of techniques

in use for the display of free-form surfaces.
4.1 Uniform Subdivision

There are two stages involved in displaying parametrically defined
surfaces in most existing system -- creating a polygonal approximation
of the surface and displaying the approximation by use of the standard
polygon display techniques [10]. Uniform subdivision is the simplest
technique to produce the polygonal database for displaying the surface.
In uniform subdivision, a single patch is divided uniformly in parameter
space and the resulting mesh of evaluated points are treated as vertices
of the approximating triangles, for example. The more subdivisions, the

smoother the approximating surface.

39

40

Note that the vertices of the approximating triangle are used for
interpolating the points on the patch. Yet the control points used in
B-splines define the patch rather than interpolate the patch. This is
the key difference between B-spline and piecewise linear approximation.

The first attempt at displaying a bicubic surface, a surface with
order equal to 4 in both the s and t directions, directly from the
parametric mathematical definition of the surface (i.e., without
polygonalizing) was done by Catmull [13]. Catmull subdivided the patch
in parameter space down to the pixel basis. Then the subpatch was

passed to displaying and shading routines.
4.2 Adaptive Subdivision

Adaptive subdivision, proposed by Lane and Carpenter [14],
adaptively divides a patch into polygons according to the flatness of
the patch. A "flat" patch can be displayed as a single polygon while a
curved patch is subdivided into smaller subpatches until each of the
subpatches meets the criteria of the flatness test. Compared to uniform
subdivision, adaptive subdivision generates ”enough" polygons to
adequately represent the surface whereas the uniform subdivision spends
a longer time to generate ”more than enough“ polygons [10].

All surface patches are represented in terms of an appropriate
tensor product of Bernstein basis functions. Using Bernstein basis
functions, curves are generated by interpolating their first and last

control points. Then the points are assembled into quadrilaterals. The

41

flatness test is carried out on the quadrilaterals, which are further
subdivided if necessary. The final subdivision produces polygons for
display by a conventional routine [10].

In order to decide which portion is the most curved one of the
segment or patch, the calculation of curvature is usually involved. Yet
the evaluation of curvature is difficult and time-consuming. Van Book
has since developed a method similar to the Lane-Carpenter method, but
in which no calculation of curvature is necessary. Van Hook evaluated
more than enough points on the surface to achieve the flatness test.
This method was classified by Whitton [10] as a semi-adaptive

subdivision.

4.3 Cox-deBoor Algorithm

The most popular B-spline algorithm in many CAD/CAM systems is the
Cox-deBoor algorithm [16,17]. It works well for point evaluation as
well as the derivative evaluation of the point. The algorithm for
formulating the blending functions presented by deBoor [17] is

reproduced below:

N(1,l) - l;
for s - 1 to k-l
begin

DP(s) - t -t;

1+8

DM(s) - t-ti+l-s;

42

N(l,s+l) - O;

for r - 1 to s

begin
M - N(r,s) / ( DP(r) + DM(s+l-r) );
N(r,s+l) - N(r,s+1) + DP(r) * M;
N(r+1,s+1) - DM(s+l-r) * M;

endfor,

endfor.

Where
k is the order of the blending functions,
t1 are the knots of the knot vector,

and t1 5 t < t1+1.

The B-spline is then evaluated by

k-l

P(t) - 2 CP . N (c), (4.1)
i-O i i,k

where Ni k is the kth column of N in the above algorithm.

Note that N is a triangular matrix. To evaluate derivatives, the
appropriate column of N would be used. The general routine given by.I
deBoor for derivative evaluation is

k- -1

me - <k-1> - - - (k-j) Aim - Ni 1w(t). (4.2)
1-0 '

43

where
j is the order of derivative and 0 s j < k,

N

th
i,k-j is the (k-j) column of N,

and

(0) _
A1 CPi’

(.1) _ (J-l) _ (J-l)
A (A1 A 1

1 1- t

(4.3)

) / (ti+k-j ' 1"

4.4 Pickelmann's Algorithm

One problem embedded in the Cox-deBoor algorithm is that when the
knot values of the knot vector become very large, the error will also
increase. Knot vectors are, in Pickelmann's work [1], translated (or
shifted) so that the parameter range being evaluated (the middle of the
effective knot vector) is always [0,1). This translation both limits
the number of possible effective knot vectors and has a beneficial
effect on numerical accuracy of the blending function calculations. The
CURBS algorithm proposed by Pickelmann [1] thus produces a three-to-one
reduction in machine operations over the Cox-deBoor algorithm, for cubic
splines. For higher order splines, the reduction ratio is even larger.

From the last chapter, only the effective knot vector (a subset of
the knot vector is used to compute any given blending function. Also,
the length of the effective knot vector is 2(k-1). Pickelmann showed
that a B-spline with an infinite number of possible knot vectors can be
processed as several individual subsegments, each with one effective

knot vector. In what follows, only enhanced uniform knot vectors will

44

be considered (note that Pickelmann showed that this is not actually a
restriction on the class of all nonuniform rational B-splines; he
derived an algorithm for converting an arbitrary NURB surface into one
with an enhanced uniform knot vector). And for EURB knot vectors, using
the knot vector translation just mentioned, the number of possible

effective knot vectors is finite and equal to 2(2k-4).

For a cubic,
k-4, there are 2(2o4-4) or 16 possible effective knot vectors for a
subsegment. For k-S, there are 64 possible knot vectors.

Each of the 2(2k'4) knot vectors yields a unique set of blending
functions. In order to easily determine which set of blending functions
should be used for a given subsegment, each subsegment is given a label
or pointer to the correct set of preformulated blending functions. The
label is determined by a special subroutine in which the effective knot
vector is compressed into a binary-encoded label. Then the shape of a
subpatch is formed by the labels in both the s and t directions and the

corresponding control points. To get a clear picture of the algorithm,

an example is given as follows.

W
§_§ggg§ - number of control points : m+l - 4
order : h - 3 knot vector : [ O O 0 l 2 2 2 ]
§_§gggg - number of control points : n+1 - 4

order : k - 3 knot vector : [ O O O 1 2 2 2 ]

45

Control point matrix : P00 P01 P02 P03
P10 P11 P12 P13
P20 P21 P22 P23

P30 P31 P32 P33

 

 

For k-3, there are 2(2.3-h)

- 4 possible knot vectors. The above
knot vector uses only two of them, [ 0 0 l 2 ] and [ 0 l 2 2 ], and the
second effective knot vector can be translated to [ -l 0 l 1 ] which
will have the same label as [ o 1 2 2 1. The label of [ 0 0‘1 2 1 is
LABl - 2, and the label of [ -l 0 l 1 ] is LABZ - l. The way to label
the effective knot vector was given in [1].

Since (n-k+2) - 2, we know there are two segments in both the s and
t directions, which results altogether in four subpatches. The blending
functions of one of the four subpatches can then be retrieved from the
preformulated blending function basis according to LABl and LAB2. The
labels and the corresponding control points used to form the subpatch
are given in Table 4.1. Figure 4.1 shows the B-spline control net and
Figure 4.2 gives the B-spline surface.

Conceptually, the four subpatches have different parameter ranges
.as shown in the above table. But for purposes of computation,
subpatches can be treated individually as if each of them had the

parameter range of s-Oel and t-Oel, so long as the "bookkepping" of the

proper control point indices is maintained.

46

TABLE 4.1. Subpatch and corresponding control points

 

 

 

 

subpatch label of s label of t control ponints
P P P

s _ 0‘1 00 01 02
LABl LABl P P P

t _ 0+1 10 11 12

P20 P21 P22
P P P

s _ 1‘2 10 11 12
LABZ LABl P P P

t _ 041 20 21 22

P30 P31 P32
P P P

s _ 0‘1 01 02 03
LABl LABZ P P P

t _ 1+2 11 12 13

P21 P22 P23
P P P

s _ 1‘2 11 12 13
LAB2 LABZ P P P

t _ 1‘2 21 22 23

P31 P32 P33

 

 

 

 

47

 

 

Figure 4.1. B—spline control point net [11].

 

Figure 4.2. B-spline surface [11].

CHAPTER V

SURFACE EVALUATION AND SHADING

This chapter describes the surface evaluation algorithm for both
nonrational B-splines and rational B-splines. The computation of the
derivative and the normal vector of the points on the B-spline surface
are given. Finally, based on the normal of the surface, the concept of
shading is introduced and the shaded image of the B-spline surface is

visualized.
5.1 Evaluation Algorithm

From equation (3.20) we know a point on the B-spline patch can be

represented in matrix form as
P(s,t) - [BF(8)] ' [CP] ' [BF(t)] (5 1)

where CP is the control point matrix with size (m+l)x(n+l), and BF(s)

and BF(t) are the blending function vectors of s and t, respectively.
In the last chapter, we showed that the computation of the points on a
subpatch can be treated individually such that the point on a subpatch

can now be defined as

P'(S.t) - [BF(S)] ' [09'] ° [BF(t)] (5 2)

48

49

where CP' is the corresponding control point matrix with size hxk, and
each element of BF(s) and of BF(t) is a polynomial of s and of t,

respectively. For a bicubic surface, h-kn4, we may rewrite equation

(5.2) as
23 ’1‘
P'(s,t) - [ 1 s s s ] M4x4 CP'4x4 N4x4 . (5.3)
t
t2
btad

 

 

u4x4 and N4x4 are the blending coefficient matrices by which the
blending function is formulated. This assumes that all the blending
coefficient matrices have been preformulated according to the labels
determined from the effective knot vector. Note that the formulation of
the blending functions is done only once and the results are stored for
later use. The algorithm, as stated below, also assumes that the
subsegment labels are calculated or read in with the control points that
define the surface.

The algorithm requires the following information for evaluating a
subpatch: .

1) corresponding control point matrix for the subpatch (CP');

2) subpatch labels in s and in t direction (LABS and LABT);

3) orders of the subpatch in s and in t direction (h and k);

4) parametric value (between 0.0 and 1.0) (s and t);

5) the value of rational indicator (RATNOT).

50

The major steps of the algorithm for evaluating a subpatch are:

Step 1:

Step 2:

Step 3:

Step 4:

Step 5:

Use LABS and h to retrieve the proper set of blending
coefficient matrix, uhxh' Similarly use LABT and k to

retrieve kak.

Construct the a vector [ l s ... sh.1 ];
construct the t vector [ 1 t ... tk"1 ].
Evaluate blending functions
h-
[BF(s)l-[ls..-s ll-Iﬂhxh];
k-l T
[BF(t)] - [ kak ] s [ 1 t ... t ]

Calculate homogeneous points

WX - [BF(S)] - [CP’(X)l - [BF(t)];

WY - [BF(S)] ° [CP'(y)l ° [BF(t)];

WZ - [BF(S)l ° [CP'(2)] - [BF(t)];

and if it is a strictly rational B-spline, then

W ' [BF(S)] ° [CP'(V)] ' [BF(t)]-

Calculate the three-dimensional point if strictly rational
X - wX/W:
Y - wY/W;

z - w2/w.

51
5.2 The Derivative and Normal Vectors

If the derivatives of the point with respect to the parametric
variable are required, the algorithm must be modified. The appropriate
derivative of the a vector and t vector are taken and used in step 3 to
evaluate an alternative set of blending functions. This set of blending
function derivatives is then used in step 4. An example to get the

derivative of bicubic patch in both the s and t directions is given

 

 

below.
2 P 1 1
3.113.? ' l 0 1 23 39 ] M4x4 CP 4x4 N4x4 t (5.4)
t2
b t3 d
- 1 ‘2 3 M CP' N ' o - S 5
§£1§251 [ 3 3 3 1 4x4 4x4 4x4 1 ( ° )
2::
D 3:2 d

 

 

If the surface is nonrational, then aX/as, aY/as, and 62/83 can be
obtained by substituting CP'(x), CP'(y), and CP'(z), respectively. The
derivatives in the t direction can be derived similarly by applying
equation (5.5). If the surface is strictly rational, then the chain
rule must be employed to replace step 5 to calculate the derivative.

For example, to calculate aX/as it would be necessary to use

52

dX__§.LliXAil_.L-_M__ -.&H.- (5.6)

as as W as as

The tangent vector to parametric curve P - P(s,te), where to is a
constant, is a multiple of the vector aP/as - ( aX/as aY/as aZ/as ).
Similarly, the tangent vector to the curve P - P(se, t) is a multiple of
aP/at - ( aX/at aY/at 62/8t ). The tangent plane at the intersection of
these curves at P(se, to) contains these two tangent vectors, so that
the normal to the surface is a multiple of their vector product (see

Figure 5.1). The unit normal vector n is then given by

 

Figure 5.1. Normal vector and tangent vectors of a
point on the surface [18].

53

X

n ' | aP/ae x aP/ac |

i j k
where ( aP/as x aP/at ) - 8X/as BY/as 32/33 - ai + bj + ck

BX/at av/at 82/8t

and | aP/as x aP/ac | - (a2 + b2 + c2)“.

5:3 Shading

The process of taking a three-dimensional description of a scene
and generating a two-dimensional array (typically 1024x1024) of
intensities (pixels) that will be displayed on a CRT is called
rendering. The description of a scene may be modeled by polygons,
parametric patches, or splines. Generally, rendering of a scene also
needs information about the (one or more) light sources, such as the
intensity, direction, and model of the light sources. Many alternative
models for light sources and their interactions with surfaces are in
common usage.

Rendering begins by transforming the objects into the eye-
coordinate system, or what is called viewing transformation (clipping,
perspective transformation also included) in which objects outside the
field of view are clipped away (see Figure 1.1). The remaining objects
must be compared to decide which objects, and/or portions of objects,

are visible from the view point. Then the hidden surface removal

54

problem is taken into account. Three popular hidden surface removing
algorithms are the z-buffer, priority, and scan-line algorithms [20].

The next step in visualizing an image, after hidden surfaces have
been removed, is to shade the visible surfaces taking into account the
light sources, surface characteristics, and the positions and
orientations of the surfaces and sources. Next, we examine the
interaction of different components of point light sources shining on
the surfaces of objects.

The light reflected from a surface can be modeled in a common
simplification with two components, diffuse and specular. For diffuse
reflection, scattering of light is assumed to be equally in all
directions, so that the surfaces appear to have the same brightness from
all viewing angles. Ideal specular surface, however, reradiates light
in only one direction, the reflected light direction. For real objects,
the reflected light contains both diffuse and specular components. For
example, illuminate an apple with a bright light; the highlight is
caused by specular reflection, while the light reflected from the rest
of the apple is caused by diffuse reflection. In order to generate
realistic images, both components need to be taken into consideration.
In Figure 5.2, E is unit vector that points to the eye while L is unit
vector to the light source. E' indicates the reflected eye direction and
N is the unit normal vector at the point P. It is very simple to
compute the diffuse component. According to Lambert's Law, the diffuse
component can be calculated by N-L, the inner product of N and L. The

computation of the specular component, however, is much more

55

 

 

 

 

 

 

Figure 5.2 The geometry of shading [12]

complicated.
An approximation of the specular component proposed by Phong [21]

took the form
"(L)'Cosn(a) (5.7)

where t is the incident angle and a is the angle between E' and L. W(t)
is used to model the change of the ratio of incident light to reflected
light as the incident angle changes. In practice, according to [20],
W(L) has been ignored by most implementers. The order of cos(a), n, is
the shininess factor. When L and E' in the same direction, i.e. a-O,
cosn(a) reaches maximum, and it will decrease very fast in the off-axis

direction as n increase.

56

More complex methods to calculate the specular component can be
found in [19]. The modeling of a third component, termed ambient light,
is given also in [19].

Once we know how to shade a point, we can shade a surface according
to the same principle. Rather than going through the extra shading
computation for every point to be displayed (i.e., pixel), several
approximation methods have been proposed [21,22]. The drawback of these
approximation methods is that they do not generally represent the original
surface precisely. If high-speed hardware was available to reduce the
computation cost of shading for each point, more precise visualizing of
a surface would then be possible. Based on the B-splines and the
algorithm introduced in previous chapters, a VLSI chip architecture is
proposed in the next chapter which performs a crucial part of the work
to achieve real-time shading, now a bottleneck of interactive computer

graphics.

CHAPTER VI

ARCHITECTURE DESIGN

The architecture of a computational system can be characterized in
many ways depending on the designer's view as well as the design level.
The architecture here is collectively defined by a specification of the
functional capabilities of its physical components, the logical
structure of their interconnections, the nature of the information flow,

and the control mechanism.

In the section on system specification, the design objectives and
criteria as well as special functional configuration requirements will
be presented. The functional units and the clocking scheme used in the
design are discussed in detail. The data flow is then manipulated in
such a way that maximum system throughput is achieved. The architecture
which meets the system specification is then presented in block diagram
form. The algorithm of surface evaluation is expressed in the form of a

program written in RTL (Register Transfer Language).

57

58

6.1 System Specification

6.1.1 Design Objective

The objective is to design a basic VLSI chip architecture dedicated
to the fixed-point computation of the surface evaluation for a host
graphics processor. This chip will work as a coprocessor to the host.
It will be activated whenever a surface evaluation request is invoked;
otherwise, it remains in a quiescent mode.

The following criteria have been established, based on the
application requirement analysis, to guide in making various design

decisions.

CMOS technology with a feature size of 1.5pm or below and a
clock of up to ZOMHz are to be used.

0 Blending coefficient matrices will be stored in ROM and the
blending functions will be generated on chip in order to avoid
possible I/O bottlenecks.

o In order to simplify the overall design, the surface to be
evaluated is assumed to be defined by a bicubic patch.

0 Intra-functional 'unit pipelining is to be used in order to
achieve higher system throughput.

0 Minimum cycle time and minimum interconnections are sought, with
the former being a major concern.
In the design of the SE (Surface Evaluator) chip, the three
following rules from Mead and Conway [23], which are adequate to

guarantee the correct functioning of a bus-oriented chip design, are

59

rigidly followed:

Rule 1: The system is driven by a two-phase, nonoverlapped clock;
input of phase one is from phase two output and vice-
versa.

Rule 2: Functional units are timeless C/L (Combinational Logic);
operations on data are separated by the two-phase clock.

Rule 3: A standard bus scheme is used; data transfer through the

bus occurs during phase one, and the bus is precharged
during phase two.

6.1.2 I/O Specification

The input to the proposed SE chip consists of LABS and LABT, values
of both a and t, an information bit for indicating rational or
nonrational, and the three (four if rational) 4-by-4 control point
matrices. The output of the SE chip is the three-dimensional point (x,
y, 2). If the normal vector at this particular point is desired,
further calculations are needed. All of the information, except for the
rational indication bit, are 32-bit, two's complement, fixed-point
representations. Necessary pre-scaling and post-scaling of the data are
assumed to be done by the host processor.

Communication of the control point matrices is the bottleneck of
the SE chip. Efficient pipelining of the input stage and the functional
units of the SE chip in order to obtain the highest throughput, is,

therefore, the major design endeavor.

60

6.2 Architecture Development
6.2.1 Functional Units

The elementary arithmetic operations needed for the surface
evaluation calculation are multiplication, addition, and division. The
schematic block diagram of a Baugh-Wboley two's complement array
multiplier [24] is shown in Figure 6.1. It is the basic structure of a

multiply-and-add cell (MAC).

J. be ‘4 b:

a For i.) ' 0. 1. 2. 3 and (i,1) - (4, 4)

I

05.33 osj$3
“ego “350 “2‘0 ‘1 1’0 ‘050

,b
zb‘ - .M o o o o

1.0, ‘

 

 

a. b. 1 z “:1 Q
, o mimic
We 0 f

l

 

 

 

 

Figure 6.1. The block diagram of a S-by-S Baugh-Wooley two's
complement array multiplier [24].

61

The first step of the surface evaluation calculation is to form the
blending functions of s and of t. If we compute the blending function
of s, for example, by the multiplication of vector [1 s 32 s3] and the
corresponding blending coefficient matrix chosen according to LABS, we
need to calculate the square and the cube of s beforehand. This
requires at least two more cycles to perform. However, there is an
alternative way of forming the blending functions, which does not
require computing the square and the cube in advance.

If the MAC, represented in block form shown in Figure 6.2(a), is
used, then the polynomial of a variable x with order equal to 3 can be
obtained by using three MAC's as shown in Figure 6.2(b). Calculation of
the first derivative of a polynomial of x with order - 3 can then be
performed by the block diagram shown in Figure 6.2(c). Based on this
approach, Figure 6.3 gives the data flow and the structure to calculate
the blending function of s as well as its first derivative. The
blending function of t and its first derivative can be obtained
similarly. Note that in Figure 6.3 there is a need to multiply by 2 and
3 (*2 and *3) in the forming of the first derivative. Actually, no
multiplication is necessary here. Multiplication by 2 can be achieved
by just shifting the input data one bit left and multiplication by 3 can
be obtained by adding three duplicates of the operand. The reason for
adding the 4-stage delay is that piplined MAC will has 4 segments as
will be covered shortly.

In the evaluation process, many matrix-vector multiplications are

involved. After multiplying elements of the vector [1 s 52 $3] with

62

(a)

P(x)
=((<A1l X)+A2)!X+A3)IX+A4

 

 

(b)
are 3
3““
MAC P’(x)
. n x
( C)

Figure 6.2. (a) Basic MAC block. (b) MAC array structure to compute
a polynomial of order 3. (c) MAC array structure to find the first
derivative of a polynomial with order - 3.

 

63

A41 A42 A43

A31 A38 A33
A21 A82 A23
A11 A12 A13

 

 

 

 

 

A44

A34

A24

A14
4-stage

4nu delay
unit

, 4DU

 

* T ll MAC IT IMAC IT l] MAC }——BFT(I)
,. BFT(2)

BF T(3)

 

 

BFT(4)

 

 

 

 

ilg IT If; eTl ll MAC l—-—DBFT(1)
DBFT<2>

 

DBFT(3)
DBFT(4)

Figure 6.3. The block diagram for calculating the blending
functions for evaluating the point and the first

derivative.

64

the corresponding elements of a row (or a column) of the blending
coefficient matrix, we need to sum up the products to get the final
result. The MAC can be used to achieve matrix-vector multiplication as
given in Figure 6.4. This requires four MAC cycles to get the final
result; a time cost that is too high. However, we may perform the
multiplications concurrently and then add the products in one step. In
so doing, we need a special multi-operand adder. A four-operand adder
formed by carry-save adder (CSA) trees is given in Figure 6.5. Note
that the '~' on the carry output lines indicates that the carries are
shifted left one bit with a zero entering from the right end before

they are fed into the CSA inputs at the next level. The partial carry
and partial sum from the last stage of the CSA is then feed into a carry
lookahead adder. The functional block diagram of a full carry lookahead
adder is shown in Fig. 6.6 and the details of each unit are given in

Figures 6.7 and 6.8.

 

 

 

X1—‘ — - .
~~= ts—iI—‘m gnaw ......
‘ of X and Y

Figure 6.4. Block diagram for calculation of the inner product of
X and Y.

 

 

65

 

 

Figure 6.5. A four-operand adder formed by CSA tree.

  

LkhchvI-I. III

  
   
      
    
 

Carry Generate/Propagate Unit.

8-bit CLA Unit c4: C-

Summe tion Unit

C-$(——-(A+I)+C.

Figure 6.6. The functional block diagram of a full carry
lookahead adder.

 

 

 

l
AHA—"+1

Figure 6.7. Logic circuits for realizing the carry-generate,
carry-propagate, and sum functions.

 

 

 

 

 

67

P6, P6,, PG,PG,

       

'?.f:bi£_CL-_A_U2£

C1

Ca C2 C1 Co

Figure 6.8. The schematic logic circuit of a h-bit carry
lookahead (CLA) unit.

The last requried functional unit is a divider. There are three
types of cellular array dividers: restoring division, nonrestoring
division, and nonrestoring with carry lookahead array dividers. The
area cost of the CLA dividers is higher than the other two array
dividers, but on the average it is 5 times faster [25]. Since in the SE

chip speed is the major concern, the nonrestoring CLA array divider is

68

chosen for the calculation of division in the evaluation of a rational
surface. Figure 6.9 shows the diagram of a carry lookahead array
divider and Figure 6.10 gives the three basic types of logic cells used

in Figure 6.9 [25].

N0 5‘ ”I " ”I ‘ N1 - ”a 0mm N - No. ~,~,~,~,~,~.A,~.
o 0 I o o l Dmsov u - Do. 0,0,1“).
Ououenl o - on. 0‘ 919,0.

 

 

 

 

 

 

 

 

 

aIEJIIEIINII ”'
: ‘I a: ﬂ WNW;
:‘TCV—EE .ﬂﬂﬂk

Figure 6.9. A carry lookahead array divider with 8-bit divisor
in two's complement representation [25].

69

S1,” Cf. O‘a‘s

\’ II OSIsn

m

 

 

 

 

 

 

 

 

 

 

 

“'1
i tK'
I
I
I
I
l
I
1 1c!
l "
|
I
.._1
q
4:K‘

 

 

 

 

 

 

 

..--q -_——J

----J

 

   

C,‘
(Ea. 8.55!

(C)

Figure 6.10. Three basic types of cells to be used in constructing
the division array of Figure 6.9. (a) The A cell located at the jth
digit position of the ith row; (b) the 8 cell located at sign

position of the ith row; and (c) the CLA cell for the ith row [25].

70

6.2.2 Timing and Pipelining

In the synchronous discipline of design, tasks are accomplished
through a logical sequence of events with each event realized under the
constraint of the physical timing of transporting information
(electrical signals) from one point to another. This dual meaning of
timing is bound by a system-wide clock which serves both as the sequence
reference and as the time reference. Hence the design of the clocking
scheme has paramount important to the system's correct functioning and
performance [26].

The general form of a data path and the corresponding two-phase
clock scheme are illustrated in Figure 6.11 and 6.12.

Since the evaluation process involves repeated vector-matrix
multiplications, it is of great advantage to have a multiplier and an
adder form a computational pipeline. This is because the piped
operation will save bus bandwidth and increase the overall throughput as
well as eliminate the need to store temporary results.

From the timing diagram of the two-phase clocking scheme, it is
clear that the minimum clock period is the sum of the maximum C/L delays
of phase-one and phase-two plus some small propagation overhead and
preset time. Intuitively, the C/L delay due to the multiplier may
constitute the largest delay component in the minimum clock period
calculation and thus one may further be motivated to introduce
additional "subpipelining" into the multiplier to shorten the clock

period.

71

Phasel 0P1 Phase2 0P2 Phasel

i I i I I

a I
I I I I I I
l

——a

 

 

 

 

 

R C/L R C/L R

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.11. The general form of a data path [23].

 

 

r————————-(ﬂock perkxl-————————1
Phase] s/f 1\_.__.__

 

Phase2 1 \

 

 

 

 

 

 

lax delay lax delay”)l
of C/L of c L
a .— odd / ...,,‘I .-
Deuq'ux . lﬁ1metthmm
Deuw'un
Phasel Phase2 for Phasez
Pnuun.thne
for Phasel

Figure 6.12. The two-phase clocking scheme [26].

A pipelined four-segemnt multiplier is given in Figure 6.13 with
its clocking scheme [26]. Note that the clocking scheme used here is

slightly different from the general single phase latches as shown in

 

 

 

 

 

 

 

 

R -—->
Dynamic
Raghner
l-stage
dehy
nun; ...... l_____
——a mu I——-a mu ———1 mu
Q a; _, 9.19.: ___I¢: Q.
R BIN”: in -3InPYa-9I R -aIMPY3
...... ...J L........J..____J
(n)
25; Cycle Time
._/ \__.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1DU I

as.

R I«)IMI’Y4

 

 

 

4

ﬂ:

 

 

Is—e.

Worst case path delay

Figure 5.13.

(b)

clock scheme.

Ihuty'ICyniha 131nm:

(a) Pipelined four-segment multiplier and (b) its

 

 

 

R

9c

73

Figure 6.12. When using the clocking scheme in Figure 6.12, each C/L
section is actually activated during only half of a clock cycle.
However, the clocking scheme in Figure 6.13 allows the C/L sections to
operate in a single phase and thus allows a shorter clock period; the
price is a slight increase in chip area because of the use of dynamic
registers.

The advantage of pipelined multiplication of the blending functions
and control point matrix is in more efficient computation. The data
flow and the interconnection between adder and multiplier are
illustrated in Figure 6.14. The adder and four four-segment multipliers
form a whole pipeline for calculation of the inner product.

Note that the above discussion of pipelining doesn't include the
divider. It should be segmented into 10 segments for the sake of

synchronizing with the system clock.
6.2.3 Storage Scheme

The 3E chip architecture takes advantage of parallelism and
pipelining, but two key problems have not yet been discussed: I/O and
the data storage scheme. In the case of evaluation of bicubic rational
surfaces, four 4-by-4 control point matrices need to be fed into the SE
chip. Because of the constraint on the number of I/O pins, it is
impossible to input the data in a few cycles. But one thing should be

noted: as long as the values of s and t and the labels are given, the

          
   
 

 

1:5 PX(21)
t4 PX(11)
t3 PV<41>
t2 Hun/(31)
t1 PV(21)
1:0 PV(11)
Figure 6.14.

BFTQ)

  

74

PX(22)
PX<12>

PV(42)
PV(32)
PV<22>
PV<12)

  

BFT(2)

4-0PERAND

PX<23)
PX<13)

was)
PV<33>
was)
was)

BFT(3)

PX(24)
PX(14)

PV<44)

PIA/(34)
PV<24>
PV<14>

EFT“)

t5 PVl
t6 PVZ
t7 PV3
t8 PV4

t9 PXl
1:10 PX2

Dataflow of vector-matrix multiplication.

7S

formation of blending functions can be overlapped with the input of
control point matrices. By so doing, the I/O bottleneck can be reduced
if not eliminated.

The question remaining is how to store the data in order to support
the pipelined process. One straightforward way is to store the control
point matrices in registers. The area cost of using registers is quite
high and the control of that many registers is complicated. An
alternative is to store the data in RAM (Random Access Memory), which is
easier to control and occupies much less chip area. In the SE design,
the latter is used to store the control point matrices.

As stated in the last section, the inner product multiplications
are executed concurrently. In order to provide the necessary data for
parallel multiplications, the RAM is divided into four modules and each
module stores one fourth of the three (four, if rational) control point
matrices. But row accessing as well as column accessing are necessary
for the matrix-vector multiplication. In order to access either a row
or a column of the control point matrices in one cycle, a special skewed
storage scheme is used.

Let S (stride or skewing distance) be the distance measured in
columns that each array row has been shifted relative to the row above

it. Let H be the total number of memory modules, m(a ) be the memory

1.1

module number of element a and q(a ) be the local address within

13 ' 13

that module. Then the storage scheme can be represented by the

following equations:

76

m(aij) - (i-S + j) mod N

4(611) - i

(6.1)

(6.2)

The storage pattern used in the SE chip is a skewed storage scheme

with 8-1 and M—4; an example is given in Figure 6.15. A multiplexing

 

 

 

 

 

 

 

 

 

 

(a)

 

 

 

 

 

 

 

 

 

 

(b)

Figure 6.15. Skewed storage pattern with (a) S-0, and (b) S-l.

77

 

INPUT UUTPUT

 

01 00 n1 D2 03 n4
0 0 A11 A12 A13 A14
0 1 A21 A22 A23 .A24 _

'A31 A32 A33 A34
A41 A42 A43 A44

HH
...0

 

Figure 6.16. Skewed storage with output multiplexing.

78

output circuit of the skewed storage module is shown in Figure 6.16.
The table in the figure gives an example of row accessing. If column
accessing is desired, proper addressing must be applied to each module

in order to get the appropriate elements.
6.3 Architecture of SE Chip
6.3.1 Architecture Synthesis

After the presentation of the details of the functional units,
clock scheme, and storage scheme, a simplified block diagram of the SE
chip can now be drawn as shown in Figure 6.17. When the SB chip is
activated by the host processor, one of the two main routines in the
control memory will become active according to the rational/nonrational
indication bit received from the system control input. One routine is
in charge of rational surface evaluation and the other takes care of the
nonrational case.

The first four input items received by the I/O circuit are the
values of s, t, LABS, and LABT, which will be stored in the register
file. As long as these four items are known, the formation of blending
functions of s and t can be performed in the arithmetic unit by
retrieving the proper blending coefficient matrices according to LABS
and LABT. At the same time, the I/O circuit will store the incoming
control point matrices in RAM based on the skewed storage scheme. The

register keeps the blending function vectors, the intermediate output of

79

System ‘ System Data

Control
input Input/ Output

Q TEX m/Wiicmcm 7f

'_ L \U/
_- m (com-nor.

77’ point matrices)

 

 

 

 

 

 

 

 

 

 

CONTROL MEMORY

 

 

 

 

 

 

 

 

 

 

 

 

 

 

II? ii

A
son I Am 1001c mm
(mm ——\,
.______7/

com .
rumors)

 

:I mm“ IJ/KJ—

x(‘I.

 

 

 

 

 

Fig. 6.17. Simplified diagram of SE chip.

80

the matrix-vector multiplication, and the final result. All
calculations, including formation of blending function, matrix-vector
multiplication, and division, are performed in the arithmetic unit.
According to the functional dependency, the arithmetic unit can be
subdivided into two parts: division unit and multiplying-and-adding
(M&A) unit. The data flow of the division unit is trivial. It receives
data WX, WY, W2, and W (if rational) from the register file and then
sends the division results X, Y, and 2 back to the register file. Data
flow and interconnection of the M&A unit, however, is much more
complicated.

The design of the MﬁA unit must enable calculation of both blending
functions and matrix-vector multiplication with a minimum number of
functional units. To meet this requirement, the circuit diagram shown
in Figure 6.18 is developed. The interconnection of the M&A unit is
determined by only one control signal, MODE. When MODE is low, the M&A
unit acts as the circuit showed in Figure 6.3. All the 0th input bits
of the multiplexers will be selected, thus the blending functions as
well as the first derivatives of the blending functions, can be
computed. If MODE becomes high, the M&A unit acts as four parallel
multipliers with a four-operand adder to sum up the inner-product, as
showed in Figure 6.14.

For simplicity, Figure 6.18 shows only the interconnection of the
M&A unit with MODE-O. When MODE-l, one of the inputs of the four active
MAC's is from RAM and the others are from the register file. Note also

that the multiplexing is now implicitly embedded in the four-operand

81

.us22 A<azc mcseea-ecu-m=ssHaAussz .ms.e «Aswan

Ag<

 

aazaauzesu
3:33.. co

asauspaeev
ulna“

secons..
axauoaac“
3:38..

 

5.32m 330536 3233 955233.30

 

 

m=uaaa

82

adder. As MODE-O, the addition is performed on the first four inputs
(pins 0, l, 2, and 3) of the adder, and while MODE-l, the addition is

performed on the other four inputs (pins 4, 5, 6, and 7).

6.3.2 Hardware Description in RTL

The last section covered only the structural specification of the
SE chip. The functional description is yet to be given. A program
written in RTL (Register Transfer Language) and given in the Appendix 1
is used to functionaly describe and verify the SE chip. The RTL program
specifies the data flow among the functional units and storage units.
Table 6.1 gives the cycle count for different evaluation routines based
on the RTL program. The patch evaluation in the table means the
evaluation of multiple points on the same patch. The startup time is

the time mainly to input the control point matrices whereas the average

Table 6.1. Cycle count for diﬂerent evaluation routines

 

   

Routine

nonrational point evaluation 52 79

nonrational patch evaluation 52 49
rational point evaluation 68 110

 

 

rational patch evaluation 68 64

 

83

cycle time is the average time to evaluate multiple points on the patch
except for the first evaluated point. The average cycle time for single
point evaluation is longer than the average time for patch evaluation
(or multiple point evaluation), because for single point evaluation, it
includes startup delay for each evaluation routine, but for the patch
evaluation, only the evaluation of the first point involves the startup
delay. But one should note that if the multiple points to be evaluated
were defined by two separate effective knot vectors, then you have to pay
the startup delay twice. In other words, as long as the point to be
evaluated across the patch boundary defined by the effective knot
vector, you have to repay the price for startup time delay. The
drawback of the RTL program is that, because of its simplicity, some
details of the architecture can not be described thoroughly, such as the

skewed storage scheme.

CHAPTER VII

ARCHITECTURE EVALUATION

Speed and area costs are the two most prevailing parameters for
evaluating the ”goodness" of a hardware design. Without physical design
and simulation, however, it is impossible to get the precise values of
these parameters. Even worse, different technologies, such as NMOS or
CMOS, and different physical design approaches, like gate array,
macrocell, or full-custom design, result in different time/area
evaluations. Without at least a qualitative estimation of the time/area
complexity of the chip, however, a decision of whether or not to carry

on the physical design work is difficult to make.

In this research, no attempt is made to acquire accurate speed and
area costs of the SE chip. Yet, some basic statistics of the chip are
provided in order to show the practicality of implementation of the chip
under current technology. This chapter starts with the architecture
verification of the SE chip. The area estimation is then given in detail

followed by the speed estimation.

84

85
7.1 Architecture Verification

To better understand the evaluation algorithm in the SE chip, a
flowchart is given in Figure 7.1. There are two main routines, one for
the evaluation of nonrational spline points and the other for the
evaluation of rational spline points. When the SE chip is activated
according to the value of RAT, it takes four cycles to input the
parameters, s and t, and their labels, LABS and LABT. As long as the
parameters and their labels are known, the formulation of the blending
functions and their first derivatives can be performed in parallel with
the input of the control point matrices. Then the Cartesian coordinates
of the point are evaluated and sent back to the host. If a rational
spline point is to be evaluated, three more divisions are necessary to
obtain the desired coordinates. If a patch is to be evaluated, then the
parameter values for the evaluation of next patch point will be given
followed by the formulation of blending functions and their first
derivatives, otherwise stop. Note that when evaluating points of a
patch, the control point matrices need to be brought in only once.

A symbolic example is given in the Appendices to illustrate the
dataflow of the evaluation of a nonrational point. Starting from top of
the flowchart in Figure 7.1, the input of s, t, LABS, and LABT are
trivial and thus not shown in the example. Appendix 2 gives the input
sequences of control point matrices and Xij, Yij, and Zij (the Cartesian

coordinates of a control point). Appendix 3 shows the content of each

86

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

    
  

hunte.t
labs. and labt
I.
formulating input input (stimulating
a! blending control eeatnl e! blending
functions at point point function e!
s and at t natrbes matrices s and o! t
and their and their
ﬁrst delive- that derive-
thu uses
! - [EH-)l‘iCle‘thn 1" - E:.q:{[a']]:[[:‘))ll
Y - [BK-)l'ichl‘ilmll 1’1 - s) CPI '-
. I"! - [BK-)l‘icPrl‘ian
2 [BK-)l‘ichl‘lllim n _ [BK-)1 {cal-[sun]
l l
output a Y. and Z X - PW
’ Y - 97/"
z - 92/"

 

 

 

l

I output 2.1. and z'

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 7.1. Flowchart of the evaluation algorithm in the SE chip.

87

latch element in every cycle for the formulation of blending functions
and their first derivatives. Appendix 4 gives the result in each
functional stage. Sij and T1] are the blending coefficient matrices
retrieved from ROM according to IABS and LABT, respectively. BFS(i) and
BFT(i) are the blending functions and DBFS(i) and DBFT(i) are their
first derivatives. All the intermediate output are shown in
parenthesises. For example, (M243) is the intermediate output in the
third stage of the second MAC unit.

' Similarly, Appendix 5 shows the contents of each element for the
evaluation of the Cartesian coordinates and Appendix 6 gives the result
in each functional stage. The IT(i,j) in the Appendices are used to
store the result of the inner product of BFS and the control point
matrices.

It is noted that in Appendices 3 and 5, the contents of the latches
are strobed in at the beginning of each cycle and held throughout the
cycle whereas in Appendices 4 and 6, the result of each functional stage
is valid at the end of the cycle. If a patch evaluation is desired, the
same time table in Appendices 3 and 4 can be used to formulate the
blending functions and their first derivatives of next point to be
evaluated. The principle of this example can be easily applied to the
evaluation of rational spline points by considering one more control

point matrix and three more divisions.

88
7.2 Area Estimation

The real size of a VLSI chip depends on many physical design
factors such as transistor layout, placement of building blocks, and
routing (wiring). Optimal design is very hard to obtain in any aspect
and even a close-to-optimal solutions cost the designer much effort to
achieve.

Without actual layout, placement, and routing of the chip, we are
short of knowledge of all physical design factors about the chip. To
overcome this dilemma, this research used the physical device count, or
transistor count more precisely, as the major parameter in order to get
a realistic picture of the area required for the SE chip.

The transistor counts for various function units and components in
the SE chip are listed in Table 7.1. All transistor counts are based
on CMOS technology.

The central part of the MAC is a Baugh-Wooley array multiplier
which requires n(n-l)+3 full adders, n2 2-input AND gates, and Zn
inverters for the multiplication of two n-bit inputs, where n is the
bandwidth of the data [24]t In terms of physical layout, the full
adder, 2-input AND gate, and inverter take 28, 6, and 2 transistors,
respectively [29]. Thus the total transistor count for a n-by-n Baugh-

Wooley multiplier is

NT(BW) - [n-(n-l) + 31-23 + n2-6 + 2n-2

- 34n2 - 24n + 84 (7.1)

89

Table 7.1. Transistor count of the SE chip.

 

 

 

Function cell type No. of Transistors No. of Total No. of
. per units Units Transistors
MAC Dynamic Reg 10 32x40 12,800
Delay stage 8 32x17 4,352
Array Multiplier 34,132 5 170,660
2—1 MU X 4 32x10 1,280
Ripple-carry Add 28 51:48 6,720
subtotal 195,812
Divider Dynamic Reg 12 32x30 11,520
Array Divider 72,708 1 72,708
subtotal 84,228
Adder Dynamic Reg 12 64x4 3,072
2—1 MUX 4 64x4 1,024
CSA Tree 24 64x2 3,072
CLA(16x4) 4,100 1 4,100
subtotal 1 1,268
RAM RAM cell 6 32x64+64x32 24,576
Decoder 4 20 80
Driving Buf 4 l6x42+4x42 4,032
+2x42+2x42
4-1 MUX 12 32x4 1,536
subtotal 30,224
ROM ROM cell 1 32x16x16 8,192
Decoder 4 63 252
Driving Buf 4 64x(2+8+32) 10,752
subtotal 19,196
Driver 110 Buf 12 32x2 768
Delay unit 2-phase latch 8 32x15 3,840
Register 2-phase latch 14 32x2+64x4 3,300
+4x2+l+l
Grand Total 348,636

 

 

90

The transistor count of an array divider is not available, but the
gate count of a nonrestoring with l-level CLA and carry-save divider is
shown in [30]. A unit gate assumed in [24] is a 2-input NAND gate which
requires 4 transistors when implemented in CMOS technology. 80 the
transistor count of a nonrestoring array divider (AD) can be roughly
estimated by

NT(AD) - (18n2 + 28n + 11).4

- 72n2 + 112n + 44, (7.2)

where n is 15 in this case instead of 16, because the sign bit does not
contribute.

One might dispute the accuracy of this estimation. It is true that
the number of transistors will increase dramatically if there are many
gates with more than two input pins. On the other hand, hardware
implementation at the transistor level usually takes only 1/3 or 1/4 the
area when compared to implementation at the gate level [31]. These two
factors may balance each other out somewhat and thus make the above
estimation still reasonable.

The rest of the work on counting the transistors is based on CMOS
circuits taken from [29]. .In counting the transistors of the CLA, a
two-level CLA(8x4) is assumed, which takes one 8-bit CLA and eight 4-bit
BCLA (Block Carry Lookahead Adder) units.

Another issue to note is the signal driving capability. No signal
can drive an infinite number of loads, and thus certain fan-out has to

be defined in order to guarantee the validity of the driving signal. In

91

the SE chip, parallel retrieval of data from RAM or ROM requires one
decoded signal to drive 64 storage cells simultaneously, but this is
impossible under current technology. A reasonable solution is to use a
cascaded driving buffers. If we assume conservatively that one driving
buffer can drive four successive buffers, then three buffer stages are
good enough to drive the 64 cells. This is used to calculate the number
of transistors in the driving buffers for the RAM and ROM. Also note
that the I/O buffer requires larger area in order to drive a large
external load, so the I/O driving buffer is usually counted separately.

The transistor count of the control logic remains unknown until the
actual logic design is carried out. One may implement the control
circuit by ROM or PLA (Programmable Logic Array), whichever is more
convenient. With the help of the program written in RTL, this can be
easily achieved.

Under current VLSI technology, more than several million devices
have been successively fabricated in a single chip [33]. For a gate array,
it is quite normal to have more than 100,000 gates per chip and each
gate with 4 transistors [34]. According to these criteria and the grand
transistor count, it is possible that the SE architecture could be
fabricated on a single chip.

In this research, I counted the transistors of the SE chip first
based on 16-bit data bandwidth and obtained nearly 120,000 transistors
for the grand total count. When we increased the data bandwidth up to
32-bit, the overall increase is less than 3 times, even though the

increases of the array multipliers and array divider are 4 times. The

92

same principle would apply to a 64-bit data bandwidth; it is likely to
get a grand total count of nearly 1,000,000 transistors. In this case,
an SE chip set, instead of a single SE chip, will be the choice to
fulfill the proposed architecture. Also noted that replacing the array
divider with a reciprocal generating unit for the generation of the
reciprocal of a divisor may somewhat reduce the space cost for

fulfilling the division operations.
7.3 Speed Estimation

The SE architecure uses a two-phase nonoverlapped clock scheme as
mentioned previously. Thus, the minimum clock period is the sum of the
duty clock widths of phase one and phase two. The phase one duty clock
width, according to the specified clocking scheme, is the worst case
path propagation delay. Consider the transfer of data from the register
to the input latch of the functional units in Figure 6.18 as the worst
case path. The delay components include the register response time, bus
propagation delay, and the input latch response time. Following the
estimation process in [26], the phase one delay would be 12ns if a 0.6ns
gate delay is assumed [34].

The phase two clock width is the maximum combinational logic delay.
In the SE chip, one stage of the multiplier pipe or divider pipe can be
considered as the maximum combinational logic delay. Kawahito et al.
reported on the design and fabrication of a high-speed 32x32-bit binary

multiplier using 2-pm CMOS technology with a multiply time of S9ns [38].

93

A CMOS 32x32-bit Wallace tree multiplier with a multiply time of S6ns
has been also reported [39]. Since in the SE chip the multiplier has
been segmented into four stages and the delay time for the four stages
are about the same, the phase two clock is set to lSns, which is nearly
one fourth of the multiply time.

Summing the delay times of phase one and phase two, the clock
period of the SE chip is 27ns. Table 7.2 gives the computation time for
different evaluation routines. Compare with the transform engine of the
HP 32OSRX graphics workstation, which adopts 32-bit IEEE floating point
data format and transforms 180,000 coordinates per second, the SE chip
is 12.6 times faster for nonrational patch evaluation and 9.6 times
faster for the rational patch evaluation (noted that three coordinates
per evaluation in the SE architecture). But reader should note that the
SE chip adopted 32-bit fix-point operation whereas the HP 3208RX used
32-bit floating point operation. Also it is not clear whether the
performance of the HP 3208RX to transform 180,000 coordinates per second
is recorded for coordinate transformation (such as rotation or shifting)
or B-spline point evaluation or both. If it is for coordinate
transformation, the only operations are matrix-vector multiplications.
When the matrix-vector multiplication implemented in the SE chip, based
on the above criteria, 216ns is enough to calculate the multiplication
of 4-by-4 matrix and 4-by-l vector. In other words, 4.63 millions of
such matrix-vector multiplications per second is achieveable by using

the SE architecture.

94

Table 7.2. Evaluation time for different evaluation routines

 

 

 

Routine ‘ Startup Average Evalu- Number of Evalu-

Delay (us) ation Time Q13) ations per second
nonrational point evaluation 1,404 2,133 468,823
nonrational patch evaluation 1,404 1,323 755,857
rational point evaluation 1,836 2,970 336,700

rational patch evaluation 1,836 1,728 578,703

 

 

CHAPTER VIII

CONCLUSION

The goal of this research is to develop a dedicated VLSI chip for
surface evaluation, and the proposed SE chip architecture satisfies the
major design objectives. The architecture of the SE chip may not be
optimal, which is not our goal, however much effort has been devoted in
order to achieve higher system performance.

Compared to the hardware given in [10], the main advantage of the
SE chip is speed. This is mainly due to the introduction of pipelining
of functional units. Other features, such as parallel data retrieving,
and the overlapping of the input of control point matrices and the
formulating of the blending functions, also guarantee the high
throughput for the system. One feature not found in any other known
hardware design is the on-chip formulation of the blending function. It
not only saves time in calculating the blending functions, but also
prevents unnecessary I/O of the blending functions. In order to achieve
higher utilization of the functional units as well as to save hardware
components, a reconfigurable structure has been proposed for the M&A
unit.

In Chapter 6, the cycle counts for various evaluation routines are
calculated under the assumption that the chip receives one control point

coordinate at a time from the host processor. If we increase the I/O

95

96

bandwidth to send or receive two or even four control point coordinates
at a time by increasing the number of I/O pads and of I/O buffers, we
can reduce the startup time for patch evaluation and the overall
evaluation time for point evaluation.

Besides surface evaluation, different point transformations, such
as position translation, rotation, scaling and reflection, can easily be
done by the proposed architecture. Parallel projection and central
projection can also be achieved by the SE chip with a little more
complicated control routine.

Recall that in the formulation of the blending functions, their
first derivatives are being calculated at the same time, but they remain
unused for the following computation. The derivatives of the blending
functions are mainly for the calculating of the tangent and normal
vectors of the evaluated point, and the vectors can then be used to
shade the patch.

According to Pickelmann [l], a complete algorithm for interactive
design of B-spline surface should include the conversion of non-uniform
knot vectors to enhanced uniform knot vectors (defined in Section
3.1.1), knot vector translation, enhanced uniform knot vector
compression (or labelling), blending function formulation, point
evaluation, tangent vector and normal vector calculation, and shading.
The SE chip can now perform blending function formulation and point
evaluation. The routines for computing the tangent and normal vector
and shading are still to be done. The rest of the algorithm is assumed

to be accomplished by the host processor. The rest of the algorithm,

97

however, can be fully implemented into dedicated hardware. If this
implementation can be achieved at reasonable cost, it would be possible
to design a chip, or a chip set, to handle the preprocessing for the
interactive design of B2spline surfaces. But the data transmission
between the preprocessing chip or chip set and the SE chip is still a
bottleneck. A possible solution to this is to design a specific
processor to accomplish the whole algorithm. Future research should
emphasize developing a dedicated VLSI processor for the Computer-Aided

Geometric Design (CAGD) of B-spline surfaces.

APPENDICES

APPENDIX 1

DESCRIPTION OF THE SE CHIP IN RTL

98

/* THE RTL DESCRIPTION OF SURFACE EVALUATION CHIP */
type REGISTERI : Bit<31:o>:

REGISTERZ : Bit<63z0>:

DYNREG : Bit<63=0>:

LATCHl : Bit<31:O>:

LATCHZ : Bit<63:O>:

RAM : Bit<31=0>:

ROM : Bit<31:0>:

var /* basic functional elements 1"/
of MACl

upr1_1,
MPYZ_1.
MPY3_1,
MPY4_1,
MPYS_1,

ADDER or 311:263 : o>:
Dl,D2,D3,D4,D5,D6,D7,D8,09,DlO or

112213,
MPY2_2,
MPY3_2,
MPY4_2,
MPYS 2 '

MPY1_3.
MPY2_3,
MPY3_3,
MPY4_3,
MPYS_3,
/*

/* storage elements */

BLD(16,4,4) OF ROM:
CPM(4,4,4) OF RAM:
IT(4,4) OF RAM:

RBLDlZp4) OF RAM:

MPY1_4 or
MPY2_4 or

MPY3_4
MPY4_4
MPY5_4
4-operand Adder */

OF
OF
OF

Bit<63:O>:
Bit<63=0>;
Bit<63:0>:
Bit<63=0>:
Bit<63=0>:

Bit<63:0>:

/*
/*
/*
/*
/*

/* intermediate product term */

segments
segments
segments
segments
segments

/* blending coefficient matrices */
/* control point matrices */

/* for storing blending functions */

of MACZ
of MAC3
of MAC4
of MACS

RBLDD(2,4) OF RAM: /* first derivatives of blending functions */

/* dynamic registers and latches */

M1D3, M203, M3D3, M4D3, M503 OF LATCHZ: /* 3-stage delay within MAC's */
/* l-stage delay within MACl and MAC2 */

MlDlo

DLY8 OF LATCHI:
LAD3 OF LATCHl:

M2Dl OF LATCHZ:
DLY4 OF LATCHl:

/* 4-stage delay unit */
/* 8-stage delay unit */
/* 3-stage delay */

*/
*/
*/
*/
*/

/* segments of Dividor */

Ll_1, L142 OF DYNREG: /* input latches of MACl */
L2_l, L2_2 OF DYNREG: /* input latches of MACZ */
L3_l, L3_2 OF DYNREG; /* input latches of MAC3 */
L4_1, L4_2 OF DYNREG; /* input latches of MAC4 */
LS_l, L5_2 OF DYNREG; /* input latches of MACS */
LAl, LAZ, LA3, LA4 OF DYNREG: /* input latches of Adder */
LDl, LDZ OF DYNREG; /* input latches of Dividor */

/* registers and control signal */

INPUT, OUTPUT OF REGISTERl: /* input and output buffer */

RS, RT OF REGISTERI: /* registers for parameter s and t */

R(4) OF REGISTERZ; /* registers for WX, WY,WZ, and W */

LABS, LABT OF Bit<3:0>; /* labels of s and of t */

RAT OF Bit<0:o>: /* l for rational, 0 for nonrational */

PATCH OF Bit<0:0>; /* l for patch evaluation, 0 for point
evaluation */ '

Begin /* the routine for nonrational spline point evaluation */
Cycle (0): if RAT-O then
begin
(1): RS<-INPUT:
(2): RT<-INPUT:

(3): LABS<-INPUT:

(4):
(5):

(6):

(7):

(8):

(9):

(10):

(ll):

(12):
(13):
(14):
(15):
(16):

/* blending functions
(17):

(18):
(19):
(20):

99

LABT<-INPUT;

CPM(l,l,l)<-INPUT, Ll_l<-BLD(LABS,1,1),
MlD3<-BLD(LABS:1,2), DL4<-BLD(LABS,1,3),
DL8<-BLD(LABS,1,4), Ll_2<-RS, LAD3<-BLD(LABS,1,1):
CPM(2.1.1)<-INPUT. L1_l<-BLD(LABS,2,1),
M1D3<-BLD(LABS,2,2), DL4<-BLD(LABS,2,3),
DL8<-BLD(LABS,2,4), Ll_2<-RS, LAD3<~BLD(LABS,2,1):
CPM(3' 1’ 1)<-INPUT' L1_l<-BLD (LABS: 3' 1) '

M1D3<-BLD (LABS, 3, 2), DL4<-BLD (LABS, 3, 3) r
DL8<-BLD(LABS,3,4), Ll _2<-RS, LAD3<-BLD(LABS,3,1):
CPM(1,1,2)<-INPUT, Ll_l<-BLD(LABS,4, 1),
M1D3<-BLD(LABS,4,2), DL4<-BLD(LABS, 4, 3),
DL8<-BLD(LABS,4,4), L1_2<-RS, LAl<-O, LA2<-LAD3,
LA3<-LAD3, LA4<-LAD3, LAD3<-BLD (LABS, 4, 1) o'
CPM(2,1,2)<-INPUT, L4 l<-ADDER, L4_2<-RS,

M4D3<-M1Dl, L2_1<-MPYl_4, L2_2<-RS LA1<-O, LA2<-LAD3,
LA3<-LAD3, LA4<-LAD3;
CPM(3,l,2)<-INPUT, L4 _l<-ADDER, L4 _2<-RS,
M4DB<-M1Dl, L2_ l<-MPY1_4, L2 _2<-RS- LAl<-0, LA2<-LAD3,
LA3<-LAD3, LA4<-LAD3:
CPM(l,l,3)<-INPUT, L4 _l<-ADDER, L4 _2<-RS,
M4D3<-M1Dl, L2_1<-MPY1_4, L2 _2<-RS- LA1<-O, LA2<-LAD3,
LA3<-LAD3, LA4<-LAD3:
CPM(2,1,3)<- INPUT, L4 _l<-ADDER, L4_2<-RS,
M4D3<-M1Dl, L2_ l<-MPY1_4, L2_2<-RS:
CPM(3,1,3)<-INPUT, L3_l<-MPY2_4, L3_2<-RS,
LS_l<-MPY4_4, L5_2<-RS, MSD3<~M2D1:
CPM(l,l,4)<-INPUT, L3 _1<-MPY2 _4, L3 _2<-RS,
L$_l<-MPY4_4, LS_2<-RS, M5D3<-M2Dl:
CPM(2,1,4)<-INPUT, L3 _l<-MPY2 _4, L3 _2<-RS,
L5_1<-MPY4_4, LS _2<-RS, MSD3<~M2D1:
CPM(3, l, 4)<- INPUT, L3 _l<-MPY2_4, L3_2<-RS,
L$_l<-MPY4_4, LS_2<-RS, MSDB<-M2Dl;
of s ------------------------------------ */
CPM(l,2,2)<-INPUT, RBLD(1,1)<-MPY3_4,
RBLDD(1,1)<-MPY5_4:
CPM(2,2,2)<-INPUT. RBLD(1,2)<—MPY3_4,
RBLDD(l,2)<-MPYS_4:
CPM(3,2,2)<-INPUT, RBLD(1,3)<-MPY3_4,
RBLDD(1,3)<-MPY5_4:
CPM(l,2,3)<-INPUT, RBLD(1,4)<-MPY3_4,
RBLDD(l,4)<-MPY5_4:

/* ............................................................ */

(23):

(24):

: CPM(2,2,3)<-INPUT,

: CPM(33233)<-INPUT'

Ll_l<-BLD(LABT,1,1),
MlD3<-BLD(LABT,1,2), DL4<-BLD(LABT, 1, 3),
DL8<-BLD(LABT,1,4), L1 _2<-RT, LAD3<~BLD(LABT,1,1);
L1 _l<-BLD(LABT, 2, 1),
MlD3<-BLD(LABT,2,2), DL4<-BLD(LABT, 2, 3).
DL8<-BLD(LABT,2,4), L1 _2<-RT, LAD3<-BLD(LABT,2,1);
CPM(l, 2, 4)<- INPUT, L1 _l<-BLD(LABT, 3, 1),
M1D3<-BLD(LABT, 3, 2), BL4<-BLD(LABT, 3 3),
DL8<-BLD(LABT, 3, 4), Ll_2<-RT, LAD3<-BLD(LABT,3,1);
CPM(2,2,4)<-INPUT, Ll_l<-BLD(LABT,4,1),
M1D3<-BLD(LABT,4,2). DL4<-3LD(LABT,4.3).
DL8<-BLD(LABT,4,4), Ll_2<-RT, LAl<-O, LA2<-LAD3,
LA3<-LADS, LA4<-LAD3, LAD3<~BLD(LABT,4,1);

(25): CPM(3,2,4)<-INPUT, L4_l<-ADDER, L4_2<-RT,
M4D3<-MlDl, L2_1<-MPY1_4,L2_2<—RTLAl<-O, LA2<-LAD3,
LA3<-LAD3, LA4<-LAD3:

(26): CPM(1,2,1)<-INPUT, L4 _1<-ADDER, L4_2<-RT,
M4D3<-M1Dl, L2_ l<-MPY1_4, L2_2<-RTLA1<-O, LA2<-LAD3,
LA3<—LAD3, LA4<-LAD3:

(27): CPM(2,2,1)<-INPUT, L4_l<-ADDER, L4_2<-RT,
M4DS<-M1Dl, L2_l<-MPYl_4, L2_2<-RTLAl<-O, LA2<-LAD3,
LA3<-LAD3, LA4<-LAD3:

(28): CPM(3,2,1)<- INPUT, L4 _l<—ADEER, L4 _2<—RT,
M4D3<-M1Dl, L2_1<-MPY1_4, L2 _2<-RT:

(29): CPM(l,3,3)<-INPUT, L3 _1<-MPY2 _4, L3 _2<-RT,
L5_1<-MPY4_4, L5 _2<-RT, MSD3<-M2D1:

(30): CPM(Z, 3, 3)<-INPUT, L3 _l<-MPY2 _4, L3 _2<-RT,
L5_ l<-MPY4 _4, L5 _2<-RT, M5D3<-M2Dl;

(31): CPM(3, 3, 3)<-INPUT, L3_ l<-MPY2 J, L3 _2<-RT,
L5_l<-MPY4_4, L5 _2<-RT, M5D3<-M2Dl;

(32): CPM(l,3,4)<-INPUT, L3_l<-MPY2_4, L3_2<-RT,
L5_l<-MPY4_4, L5_2<-RT, MSDB<-M2D1:

/* blending functions of t --------------------------------------- */

(33): CPM(2,3,4)<-INPUT, RBLD(2,1)<-MPY3_4,
RBLDD(2,1)<-MPY5_4:

(34): CPM(3,3,4)<-INPUT, RBLD(2,2)<-MPY3_4,
RBLDD(2,2)<-MPY5_4:

(35): CPM(l,3,l)<-INPUT, RBLD(2,3)<-MPY3_4,
RBLDD(2,3)<-MPY5_4:

(36): CPM(2,3,1)<-INPUT, RBLD(2,4)<-MP23_4,
RBLDD(2,4)<-MPY5_4;

/* ............................................................... */

(37): CPM(3,3,1)<-INPUT

(38): CPM(l,3,2)<-INPUT

(39): CPM(2,3,2)<-INPUT

(40): CPM(3,3,2)<-INPUT

(41): CPM(l,4,4)<-INPUT

(42): CPM(2,4,4)<-INPUT

(43): CPM(3,4,4)<-INPUT

(44): CPM(l,4,1)<-INPUT

(45): CPM(2,4,1)<-INPUT

(46): CPM(3,4,1)<-INPUT

(47): CPM(1,4,2)<-INPUT

(48): CPM(2,4,2)<-INPUT

(49): CPM(3,4,2)<-INPUT

(50): CPM(1,4,3)<-INPUT

(51): CPM(2,4,3)<-INPUT

(52): CPM(3,4,3)<-INPUT

(53): Ll_l<-CPM(l,1,l), L1_2<-RBLD(1,1), M103<-O,
L2_l<-CPM(l,l,2), L2_2<-RBLD(1,2), M2D3<-O,
L4_1<-CPM(1,1,3), L4_2<-RBLD(1,3), M4D3<-O,
L5_1<-CPM(1,1,4), L5_2<-RBLD(1,4), MSD3<-O:

(54): Ll_1<-CPM(1,2,2), L1_2<-RBLD(1,1), MlD3<-O,
L2_1<-CPM(1,2,3), L2_2<-RBLD(1,2), MZD3<-O,
L4_l<-CPM(1,2,4), L4_2<-RBLD(1,3), M4D3<-0,
L5_1<-CPM(1,2,1), L5_2<-RBLD(l,4), M5D3<-0:

(55): Ll_l<-CPM(1,3,3), Ll_2<-RBLD(1,1), M103<-0,
L2_1<-CPM(1,3,4), L2_2<-RBLD(1,2), M2D3<-O,
L4_1<-CPM(1,3,1), L4_2<-RBLD(1,3), M4D3<-O,

100

(56):

(S7):

(58):

(59):

(60):

(61):

(62):

(63):

(64):

(65):

(66):

101

L1_1<-CPM(1,4,4), L1_2<-RBLD(1,1), MlD3<-0,
L4_1<-CPM(1,4,2), L4_2<-RBLD(1,3), M4D3<-0,
L5 1<-CPM(1,4,3), L5 _2<-RBLD(1, 4), M5D3<-0:
L1 1<-CPM(2,1,1), L1:2<-RBLD(1,1), MlD3<-Or
L2 1<-CPM(2,1,2), L2- _2<-RBLD(1, 2), M203<-0,
L4 1<-CPM(2,1,3), L4:2<-RBLD(1, 3), M4D3<-0,
L5 1<-CPM(2,1,4), L3_2<-RBLD(1,4), M5D3<-0,
LA1<~MPY1_4, LA2<-MPY2_4, m3<~m>x4_4,
LA4<-MPY5_4;

L1_1<-CPM(2,2,2), L1 _2<-RBLD(1, 1), M1D3<-0,
L2_1<-CPM(2,2,3), L2 _2<-RBLD(1, 2), M203<-0,
L4_1<-CPM(2,2,4), L4 :,2<-RBLD(1 3), M4D3<-0,
L5_1<-CPM(2,2,1), L5:2<-RBLD(1, 4), M5D3<-0,
LA1<-MPY1_4, LA2<-Mp¥2_4, LA3<-MPY4_4,
LA4<-MPY5_4 , IT (1, 1) <-ADDER;

L1 _1<-CPM(2, 3, 3), L1 _2<-RBLD(1, 1), MlD3<-0,
L2- _1<-CPM(2, 3,4), L2- _2<-RBLD(1, 2), M2D3<- -0,
L4-_1<-CPM(2,3,1), L4:2<-RBLD(1, 3), M4D3<-0:
LS- _1<-CPM(2, 3, 2), LS_2<-RBLD(1,4), M5D3<-0,
LA1<-MPY1_4, LAZ<~MPY2 _4, LA3<-MPY4 _4,
LA4<-MPY5_4, 12(1, 2)<-ADDER:
L1_1<~CPM(2,4,4), L1_2<-RBLD(1,1), MlD3<-0,
L2_1<-CPM(2,4,1), L2 _2<-RBLD(1, 2), M2D3<-0,
L4_1<-CPM(2,4,2), L4 :2<-RBLD(1, 3), M4D3<-0,
L5_1<-CPM(2,4,3), L5- _2<-RBLD(1,4), M5D3<-0,
m1<~upr1_4, LA2<-MPY2 _4, mourn _4,
LA4<-MPY5_4, 11(1, 3)<-ADDER:
L1__1<-c1>t4(3.1,1), L1_2<-RBLD(1,1), MlD3<-O,
L2_1<-CPM(3, 1,2), L2 _2<-RBLD(1,2), MZD3<-0,
L4:1<-CPM(3, 1,3), L4 :2<-RBLD(1, 3), M4D3<- -0,
L5- _1<-CPM(3, 1, 4), L5- _2<-RBLD(1, 4), M5D3<- -0,
LA1<-MPY1_4, LA2<‘MPY2 _4, LA3<-MPY4 _4,
LA4<-MPY5 :4, IT(1, 4)<-ADDER:

L1_1<-CPM(3, 2,2), L1_2<-RBLD(1,1), M1D3<-0,
L2- _1<-CPM(3, 2, 3), L2 :2<-RBLD(1, 2), M2D3<-0,
L4- _1<-CPM(3, 2, 4), L4 :2<-RBLD(1, 3), M4D3<-0,
L5__1<-CPM(3,2,1), LS- _2<-RBLD(1, 4), M5D3<-O,
LA1<-MPY1 _4, LA2<-MPY2 _4, LA3<-MPY4_4,
LA4<-MPY5- _4, IT(2,1)<-ADDER:

L1_1<-CPM(3, 3, 3), L1_2<-RBLD(1,1), M1D3<-0,
L2- _1<-CPM(3, 3,4), LZ— _2<-RBLD(1, 2), MZD3<-0,
L4 :1<-CPM(3, 3, 1), L4 :2<-RBLD(1, 3), M4D3<-O,
L5- _1<-CPM(3, 3, 2), LS- _2<-RBLD(1, 4), M5D3<- -O,
LA1<-MPY1_4, LA2<-MPY2 _4, LA3<-MPY4 _4,
LA4<-MPY5- _4, IT(2, 2)<-ADDER;

L1_1<-CPM(3, 4,4), L1_2<-RBLD(1,1), MlD3<-Oo
L2-_1<-CPM(3, 4,1), L2- _2<-RBLD(1, 2), MZD3<-0,
L4 :1<-CPM(3,4, 2), L4 :2<-RBLD(1, 3), M4D3<-0,
L5 :1<-CPM(3, 4,3), L5- _2<-RBLD(1,4), MSD3<- -0,
LAl<-MPY1_4, LA2<-MPY2 _4, LA3<-MPY4 _4,
LA4<-MPY5- _4, IT(2, 3)<-ADDBR:

LA1<—MPY1_4, LA2<-MPY2_4, LA3<-MPY4_4,
LA4<-MP‘1'5_4 , IT (2 , 4) <-ADDER:

LA1<-MPY1_4, LA2<-MPY2_4, LA3<-MPY4_4,

(67):

LA1<-MPY1_4 I
LA4<-MPYS_4,

102

IT(3,1)<-ADDER:
LA2<-MPYZ_4, LA3<-MPY4_4,
IT(3,2)<-ADDER:

(68): LA1<-MPYL_4, LA2<~MPY2_4, LA3<-MPY4_4,
LA4<~MPY5_4, IT(3,3)<-ADDER:

(69): IT(3,4)<—ADDER:

(70): L1_1<-IT(1,1), L1_2<-RBLD(2,1), MID3<-O,
L2_1<-IT(1,2), L2 _2<-RBLD(2, 2), M203<-0,
L4_1<-IT(1,3), L4 _2<-RBLD(2, 3), M4D3<-0,
L5_1<-IT(1,4), L5 _2<-RBLD(2, 4), MSDB<-0:

(71): L1_1<-IT(2,1), L1:2<-RBLD(2,1), M1D3<-0,
L2_1<-IT(2,2), L2 :2<-RBLD(2, 2), M2D3<-O,
L4_1<-IT(2,3), L4 :2<-RBLD(2, 3), M4D3<-0,
L5_1<—IT(2,4), LS :2<~RBLD(2, 4), MSD3<-O;

(72): LL_1<-IT(3,1), L1_2<-RBLD(2, 1), M1D3<-0,
L2_1<-IT(3,2), L2 _2<-RBLD(2, 2), M2D3<-0,
L4_1<-IT(3,3), L4 :2<-RBLD(2, 3), M403<-0,
L$_1<-IT(3,4), L5 _2<-RBLD(2, 4), M503<-0;

(73):

(74): LA1<-MPY1_4, LA2<-MPY2_4, LA3<—MPY4_4,
LA4<-MPY5_4:

(75): LA1<-MPY1_4, LAZ<~MPY2_4, LA3<—MPY4_4,
LA4<-MPY5_4, R(2)<-ADDER:

(76): LA1<-MPY1_4, LA2<-MPY2_4, LA3<-MPY4 _4,
LA4<~MPY$_4, R(3)<-ADDER, OUTPUT<-R(2);

(77): R(4)<-ADDER, OUTPUT<-R(3):

(78): OUTPUT<-R(4):

(79): it (PATCH-O or (3-1 and t-1)) then stop; /* stop here*/

else begin
/* input new parameter values */

(80): RS<-INPUT;

(81): RT<-INPUT:

(82): L1_1<-BLD(LABS,1,1), M103<-BLD(LABS,1,2),
DL4<-BLD(LABS, 1,3), DL8<-BLD(LABS,1,4),
L1_2<-RS, LAD3<-BLD(LABS,1,1):

(83): L1_1<-BLD(LABS,2,1), M1D3<-BLD(LABS,2,2),
DL4<-BLD(LABS, 2, 3), DL8<—BLD(LABS,2,4),
L1_2<—RS, LAD3<-BLD(LABS,2,1):

(84): L1_1<-BLD(LABS,3,1), MlDB<-BLD(LABS,3,2),
DL4<-BLD(LABS,3,3), DL8<-BLD(LABS,3,4),
L1_2<-RS, LAD3<~BLD(LABS,3,1):

(8S): L1_1<-BLD(LABS,4,1), MlD3<-BLD(LABS,4,2),
DL4<-BLD(LABS,4,3), DL8<-BLD(LABS,4,4),
L1_2<-RS, LAD3<~BLD(LABS,4,1), LA1<-o, LA2<-LAD3,
LA3<-LADB, LA4<-LAD3:

(86): L1_1<-BLD(LABT,1,1), M103<-BLD(LABT,1,2),
DL4<-BLD(LABT,1,3), DL8<-BLD(LABT,1,4),
L1_2<-RT, LAD3<-BLD(LABT,1,1), LA1<-O, LA2<-LAD3,
LA3<-LAD3, LA4<-LAD3, L4_ 1<-ADDER, L4 _2<-RS,
M4D3<-M1D1, L2_ 1<-MPY1_ 4, L2 _2<-RS,:

(87): L1_1<-BLD(LABT,-2,1), M1D3<-BLD(LABT, 2, 2),

(88):

DL4<-BLD(LABT, 2, 3):

DL8<-BLD(LABT, 2, 4):

L1_2<-RT, LAD3<-BLD(LABT,2,1), LA1<-O, LA2<-LAD3,
LA3<-LAD3, LA4<-LAD3, L4 _1<-ADDER, L4 _2<-RS,
M4D3<-M1Dl, L2_ l<-MPY1_4, L2 _2<-RS:

L1_1<-BLD(LABTT3,1), MlD3<-BLD(LABT, 3,2),

(89):

(90):

(91):

(92):

(93):

(94):
'(95):
(96):
<97):
(9a):
(99):

(100):
(101):

103

DL4<-BLD(LABT,3,3), DL8<-BLD(LABT,3,4),

L1_2<-RT, LAD3<-BLD(LABT,3,1), LA1<-o, LA2<—LAD3,
LA3<-LAD3, LA4<-LAD3, L4_l<-ADDER, L4_2<-RS,
M4D3<-MlDl, L2 _1<-u921_4, L2_2<-as:

L1_ l<-BLD(LABT,4,1), M1D3<-BLD(LABT,4, 2),
DL4<-3LD(LABT, 4, 3), DL8<-BLD(LABT, 4, 4),

Ll_2<-RT, LAD3<-BLD(LABT,4,1), LA1<-o, LA2<-LAD3,
LA3<-LA03, LA4<-LAD3, L4 _1<-ADDER, L4_2<-Rs,
M4D3<~M1D1, L2_ 1<-MPY1_4, L2 _2<-RS:

LA1<-o, LA2<-LAD3, LA3<-LA93, LA4<-LAD3,
L3_1<-MPY2_4, L3_2<-RS, L$_l<-MPY4_4, Ls_2<-Rs,
M503<~uzn1, LA1<-o, LA2<-LAD3,LA3<-LAD3, LA4<-LAD3,
L4_1<-ADDER, L4_2<-RT, M403<-M1D1, L2_1<-MPY1_4,

L2_2<-RT:
LA1<-O, LA2<-LAD3,
L3_1<-MPY2_4,
L4_1<-ADDER,
L2_2<-RT:
LAI<-0, LA2<‘LAD3'
L3_1<-MPY2_4,
MSD3<-M2D1,
L4_1<-ADDER'
L2 -2<-RT;
L3:1<-MPY2_4,
MSD3<~M2D1,
L4_1<-ADDER,
L2 -2<-RT:
L3:1<-MPY2 J,
M5D3<-M201,
L3_1<-MPY2 J,
M5D3<~M2D1, RBLD(1,
L3_1<-MPY2_4,
M5D3<-M2D1,
L3_1<-MPY2 J,
MSD3<-M2D1, RBLD(1,
RBLD(2, 2)<-MPY3- J,
RBLD(2, 3)<-MPY3- _4,
RBLD(2,4)<-MPY3_4,

end else:

end:

/* the routine for rational spline point

else begin

(1):
(2):
(3):
(4):
(5):

(6):

(7):

RS<-INPUT:
RT<-INPUT:
LABS<-INPUT:
LABT<-INPUT:
CPM(1,1,1)<-INPUT,
M1D3<-BLD(LABS,1,2)
DL8<-BLD(LABS,1,4),
CPM(1,1,2)<-INPUT,
MlDB<-BLD(LABS,2,2)
DL8<—BLD(LABS,2,4),
CPM(l,l,3)<-INPUT,

L3 _2<-RS '
LAl<:o '
L4_2<-AT, M4D3<-M1D1,

m1<:o '

LAI<:O '
L4_2<-RT, M4D3<-MlDl,

RBLD(1,

RBLD(1,

LA3<-LAD3, LA4<-LAD3,

L5_1<-MPY4_4, L5_2<—RS,
LA2<-LAD3,LA3<-LAD3, LA4<-LAD3,
L2_l<-MPY1_4,
m3<-LAD3 ' m4<-LAD3 '

L5_1<-MPY4_4, L5 _2<—RS,
LA2<-LAD3, LA3<-LAD3, LA4<-LADB,
L2_l<-MPY1_4,

L5_ l<-MPY4 _4, LS _2<-RS,
LA2<-LADB, LA3<-LAD3, LA4<-LAD3,
L2_l<-MPYl_4,

1) <‘MPY3_ 4 '

LS_2<-RT,
mLDD(1,1)<-Mpys_4:
L5_1<-MPY4 _4, L5 _2<-RT,
2)<-MPY3 J, ABLBDu, 2)<-MPYS_ 4,-
LS_ l<-MPY4 _4, LS _2<-RT,
3)<-MPY3 _4, ABLED(1,3‘)’<-MPYL4;
L5_ l<-MPY4 _4, LS _2<-RT,
4)<-MPY3 J, RBLDD(1, 4)<-MPYL 4:
RBLDD(2,1)<-MPY5_4:
ABLDD(2,2)<-Mpys_4:
RBLDD(2,3)<-MPYS_4:

RBLDD(2,4)<-MPY5_4, go to (53):

evaluation */

L1_l<-BLD(LABS,1,1),

, DL4<-BLD(LABs,1,3),

Ll_2<-RS, LAD3<-BLD(LABS,l,l):
L1_1<-BLD(LABs,2,1),

, DL4<-BLD(LABS,2,3),

L1_2<-RS, LAD3<-BLD(LABS,2,1):
L1_l<-BLD(LABS,3,1),

/* blending functions

(8):

(9):

(10):

(11):

(12):
(13):
(14):
(15):
(16):

(17):
(18):
(19):
(20):

(23):

(24):

(25):

(26):

(27):

104

MID3<-BLD (LABS, 3, 2) , DL4<-BLD (LABS, 3, 3) ,
DLB<¢BLD(LABS,3,4), Ll_2<-RS, LAD3<¢BLD(LABS,3,1):
CPM(1,1'4)<-INPUT, L1_1<-BLD(LABStqll-)I

M1D3<-BLD (LABS, 4, 2) , DL4<-BLD (LABS: 4, 3) ,
DL8<-BLD(LABS,4,4), Ll_2<-RS, LAl<-0, LA2<-LAD3,
LA3<-LAD3, LA4<-LAD3, LAD3<-BLD(LABS,4,1);
CPM(1,2,2)<'INPUT, L4_1<-ADDER, L4_2<-R3,
M4D3<-MlDl, L2_l<-MPY1_4, L2_2<-RS LA1<-0, LA2<-LAD3,
LA3<-LAD3, LA4<-LAD3:

CPM(1,2,3)<-INPUT, L4_1<-ADDER, L4_2<-RS,
M4D3<-M101, L2_1<-MPY1_4, L2_2<-RS LA1<-o, LA2<-LAD3,
LA3<-LAD3, LA4<-LAD3;

CPM(1,2,4)<-INPUT, L4_1<-ADDER, L4_2<-RS,
M4D3<-MlDl, L2_l<-MPY1_4, L2_2<-As LA1<-o, LA2<-LAD3,
LA3<-LAD3, LA4<'LAD33

CPM(1,2,1)<‘INPUT, L4_1<-ADDER, L4_2<-RS,
M4DB<-M1Dl, L2_1<-MPYl_4, L2_2<—RS:
CPM(1,3,3)<-INPUT, L3_1<-Mpyz_4, L3_2<-RS,
Ls_1<-MPY4_4, L5_2<-RS, MSD3<-M2D1:
CPM(l,3,4)<-INPUT, L3_1<-MPY2_4, L3_2<-RS,
Ls_1<4upy4_4, Ls_2<-As, M5D3<-M2D1;
CPM(1,3,1)<-INPUT, L3_l<-MPY2_4, L3_2<-RS,
L5_1<-MPY4_4, L5_2<-RS, M5D3<-M2D1:
CPM(1,3,2)<-INPUT, L3_1<-MPY2_4, L3_;<-RS,
L5_l<-MPY4_4, L5_2<-Rs, MSD3<-M2D1:

of s ------------------------------------ */
CPM(1,4,4)<-INPUT, RBLD(l,1)<-MPY3_4,
RBLDD(1,1)<-MPY5_4:

CPM(1,4,1)<-INPUT. RBLD(1,2)<-MPY3_4,
RBLDD(1,2)<-MPY5_4:

CPM(1,4,2)<-INPUT, RBLD(1,3)<-MPY3_4,
RBLDD(1,3)<-MPYS_4:

CPM(1,4,3)<-INPUT, RBLD(1,4)<-MPY3_4,
RBLDD(1,4)<-MPYS_4:

MlDB<-BLD(LABT,1,2), BL4<~BLD<LABT,1,3),
DL8<-BLD(LABT,1,4), L1_2<—RT, LAD3<-BLD(LABT,1,1):
CPM(2,1,2)<-INPUT, Ll_1<-BLD(LABT,2,1),
M1D3<-BLD(LABT,2,2), DL4<-BLD(LABT,2,3),
DL8<~BLD(LABT,2,4), Ll_2<-RT, LAD3<-BLD(LABT,2,1):
CPM(2,1,3)<-INPUT, Ll_1<-BLD(LABT,3,1),
MlD3<-BLD(LABT,3,2), DL4<-BLD(LABT,3,3),
DL8<-BLD(LABT,3,4), Ll_2<-RT, LAD3<-BLD(LABT,3,1);
CPM(2,1,4)<-INPUT, L1_1<-BLD(LABT,4,1),
M103<~BLD(LABT,4,2), DL4<-BLD(LABT,4,3),
DL8<-BLD(LABT,4,4), L1_2<—RT, LA1<-o, LA2<-LAD3,
LA3<-LADB, LA4<-LAD3, LAD3<-BLD(LABT,4,1):
CPM(2,2,2)<-INPUT, L4_1<-ADDER, L4_2<-RT,
M4D3<-M1Dl, L2_1<-MPY1_4,L2_2<-RTLA1<-O, LA2<-LA93,
LA3<-LAD3, LA4<-LAD3:

CPM(2,2,3)<-INPUT, L4_1<-ADDER, L4_2<-RT,
M4D3<-M101, L2_1<-MPY1_4, L2_2<-RTLAl<-O, LA2<-LA03,
LA3<-LAD3, LA4<-LAD3:

CPM(2,2,4)<-INPUT, L4_l<-ADDER, L4_2<-RT,
M4D3<-M1D1, L2_l<-MPY1_4, L2_2<-RTLAl<-O, LA2<-LAD3,

(28):
(29):
(30):
(31):
(32):

/* blending functions
(33):

(34):
(35):
(36):

/* ---------------------------------------------------------------
. CPM(3,1,1)<-INPUT
: CPM(3,1,2)<-INPUT
: CPM(3,1,3)<-INPUT
: CPM(3,1,4)<-INPUT
' CPM(3'2'2)<-INPUT
: CPM(3,2,3)<-INPUT
: CPM(3,2,4)<-INPUT
: CPM(3,2,1)<-INPUT
: CPM(3,3,3)<-INPUT
: CPM(3,3,4)<-INPUT
: CPM(3,3,1)<-INPUT
: CPM(3,3,2)<-INPUT
. CPM(3,4,4)<-INPUT
: CPM(3,4,1)<-INPUT
: CPM(3,4,2)<-INPUT
: CPM(3,4,3)<-INPUT
: CPM(4,1,1)<-INPUT
: CPM(4,1,2)<-INPUT
: CPM(4,1,3)<-INPUT
: CPM(4,1,4)<-INPUT
: CPM(4,2,2)<-INPUT
: CPM(4,2,3)<-INPUT
: CPM(4,2,4)<-INPUT
: CPM(4,2,1)<-INPUT
: CPM(4,3,3)<-INPUT
: CPM(4,3,4)<-INPUT
: CPM(4,3,1)<-INPUT
: CPM(4,3,2)<-INPUT
: CPM(4,4,4)<-INPUT
: CPM(4,4,1)<-INPUT
: CPM(4,4,2)<-INPUT
: CPM(4, 4,3)<- -INPUT
: L1_1<-CPM(1, 1,1), L1_2<-RBLD(1,1), MlD3<-0,
L2-_1<-CPM(1, 1,2), L2 :2<-RBLD(1, 2), M2D3<-0,
L4-_1<-CPM(1, 1,3), L4 :2<-RBLD(1, 3), M4D3<-0,

105

LA3<-LAD3, LA4<~LAD3:

CPM(2,2,1)<-INPUT, L4 _1<-ADEER, L4_2<-RT,

M4D3<-M1D1, L2_1<-MPY1_4, L2 _2<-RT:

CPM(Z, 3, 3)<- INPUT, L3 _1<-MPY2 _4, L3 _2<-RT,

L5_1<-MPY4_4, L5 _2<-RT, M5D3<-M2D1;

CPM(Z, 3,4)<-INPUT, L3_l<-MPY2_4, L3_2<-RT,

L5 l<-MPY4_4, L5_2<-RT, M5D3<-M2D1:

cpﬁ(2,3,1)<-INPUT, L3_ l<-MPY2 _4, L3 _2<-RT,

L5_1<-MPY4 _4, L5 _2<-RT, M5D3<-M2D1:

LS_1<-MPY4_4, Ls_2<-RE, MSD3<-M2D1:

of t ' -------------------------------------

CPM(2,4,4)<-INPUT, RBLD(2,1)<-MPY3_4,
RBLDD(2,1)<-MPY5_4:
CPM(2,4,1)<-INPUT, RBLD(2,2)<-MPY3_4,
RBLDD(2,2)<-Mpys_4;
CPM(2,4,2)<-INPUT, RBLD(2,3)<-MPY3_4,
RBLDD(2,3)<-MPYS_4:
CPM(2,4,3)<-INPUT, RBLD(2,4)<-MPY3_4,
RBLDD(2,4)<-MPYS_4:

*/

(70):

(71):

(72):

(73):

(74):

(75):

(76):

(77):

(78):

(79):

(80):

106

L3_1<-CPM(1,1,4),
L1_1<-CPM(1,2,2),
L2_1<-CPM(1,2,3),
L4_1<-CPM(1,2,4),
L5_1<-CPM(1,2,1),
11_1<-CPM(1,3,3),
12_1<-cm4(1, 3, 4) .
14_1<-CPM(1,3,1),
15_1<-c2u(1,3,2),
11_1<-c2u(1,4,4),
L2_1<-CPM(1,4,1),
14_1<-CPM(1,4,2),
L5_1<-CPM(1'4'3)'
11_1<-CPM(2,1,1),
L2_1<-CPM(2,1,2),
L4_1<-CPM(2,1,3), L4 :2<-RBLD(1, 3),
L5_1<-CPM(2,1,4), L5- _2<-RBLD(1,4),
LAl<-MPYI_4 , 1A2<-MPY2J,
LA4<-MPY5_ 4,

LS_2<-RBLD(1,4),
L1_2<-RBLD(1,1),
L2_2<-RBLD(1,2),
L4_2<-RBLD(1,3),
L5_2<-RBLD(1'4)'
L1_2<-RBLD(1,1),
L2 _2<-RBLD(1, 2),
L4 :2<-RBLD(1, 3),
LS- _2<-RBLD(1, 4),
L1:2<-RBLD(1,1),
L2 :2<"RBLD(1 2):
L4 :2<-RBLD(1, 3),
LS- _2<-RBLD(1, 4),
L1:2<-RBLD(1,1),
L2 _2<-RBLD(1, 2),

MSD3<-0;
M1D3<-0,
M203<-0'
M4D3<-0,
M5D3<-0:
MID3<-0,
M203<-O'
M4D3<-0,
MSD3<-0:
MlD3<-0,
M203<-0,
M4DB<-O,
MSDB<-0:
M1D3<-0,
M2D3<-0'
M403<-0,
M5D3<-0,

L1_1<-CPM(2, 2, 2):
L2:1<-CPM (2, 2, 3) I
L4 :1<-CPM(2, 2, 4):

L1_2<-RBLD(1,1), M103<-0,

15Y_1<-cpM(2,2,1),
LAI<~MPY1_4, 1A2<-MPY2 _4,
LNK-MPYS J, 11(1, l)<-ADDER:

11 _1<-c2u(2, 3, 3), 11_2<-RBLD(1,1),
12Y _1<-CPM(2, 3,4), 12 _2<-RBLD(1, 2),
14Y_1<-c2u(2,3,1), 14 _2<-RBLD(1, 3),
15Y _1<-c2u(2, 3,2), 15 :2<-RBLD(1, 4),
LAl<-MPY1_4, LA2<-MPYZ J,
LA4<-MprsY J, 11(1, 2)<-ADDER:

11 _1<-CPM(2, 4 4), L1_2<-RBLD(1,1),
L2:1<-CPM(2,4,1), 12 _2<-RBLD(1, 2),
14_1<-CPM(2,4,2), 14 _2<-RBLD(1, 3),
L3_1<-CPM(2,4,3), 15 _2<-RBLD(1, 4),
LA1<-MPY1_4, 1A2<-MPY2_4,
LA4<-MPYS J, 12(1, 3)<-ADDER;

L1_1<-CPM(3,1,1), L1_2<-RBLD(1,1),
12Y _1<-CPM(3, 1, 2), 12 _2<-RBLD(1, 2),
14Y _1<-c2u(3, 1, 3), 14 Y2<-RBLD<1, 3),
15Y _1<-CPM(3, 1,4), 15Y _2<-RBLD(1, 4),
LA1<-MPY1 J, 1A2<-MpY2_4,
1A4<-MPYSY _4, 11(1, 4)<-ADDER:

L1_1<-CPM(3, 2, 2), 11 _2<-RBLD(1, 1),
12Y _1<-CPM(3, 2, 3), 12Y _2<-RBLD(1, 2),
14 Y1<-CPM(3, 2, 4), 14 3<-RBLD(1, 3),
L5_1<-CPM(3,2,1), L5_2<-RBLD(1, 4),
LA1<-MPYI_4, LA2<~MPY2_4,
LA4<~MPYS J, 11(2, l)<-ADDER;

11 _1<-CPM(3, 3, 3), L1_2<-RBLD(1,1),
12Y _1<-CPM(3, 3, 4), 12Y _2<-RBLD(1, 2),
L4_1<-CPM(3,3,1), 14 3<-RBLD(1, 3),
15_1<-CPM(3,3,2), 15 Y2<-RBLD<1, 4),
1A1<-MPY1_4, 1A2<-MPY2_4,
LA4<-MPY5_4, IT(2,2)<-ADDER:

L1_1<-CPM(3,4,4), L1_2<-RBLD(1,1),

LS U2<-RBLD(1 4):

LA3<-MPY4 _4,

M103<-O'
M2D3<-O,
M4D3<-0,
M5D3(-O(

LA3<-MPY4 _4,

M103<-O,
M203<-0,
M4D3<-O(
M5D3<-0(

LA3<-MPY4_4,

MID3<-0,
M2D3<-0,
M4D3<-0,
M5D3<-0,

LA3<-MPY4_4,

M103<-0,
M203<-0,
M403<-O,
MSD3<-O,

LA3<-MPY4_4,

MID3<-0,
M2D3<- -0,
M403<- -0,
M5D3<- -0,

(81):

(82):

(83):

(84):

(85):
(86):
(87):

(88):

(89):
(90):

(91L

(92)“

(93):

(94):

107

L2_1<-CPM(3,4,1), L2 _2<-RBLD(1, 2), M203<-0,
L5_1<-CPM(3,4,3), LS:2<-RBLD(1: 4), M5D3<-0,
LA1<-Mpr1_4, LA2<-MPY2_4, LA3<-MPY4_4,
LA4<-MPY5_4, 12(2, 3)<-ADDER:
L1_1<-CPM(4,1,1), L1_2<-RBLD(1,1), MlDB<-0,
L2_1<-CPM(4,1,2), L2 :,2<-RBLD(1 2), M203<-o,
L4_1<-CPM(4,1,3), L4 :2<-RBLD(1, 3), M4D3<-0,
L5_1<-CPM(4,1,4), L5- _2<-RBLD(1,4), MSD3<-0,
LAl<-MPYL_4, LA2<-MPY2 _4, LA3<-MPY4 _4,
LA4<-MPY5_4, 11(2.4)<-Xnoea; '
L1_1<-CPM(4,2, 2), L1_2<-RBLD(1I1)I M1D3<-0:
L2_1<-CPM(4,2,3), L2_2<-RBLD(1,2), M2D3<-0,
L4 _1<-CPM(4, 2 4), L4 _2<-RBLD(1, 3), M4D3<-0,
L5-_1<-CPM(4,2,1), L5:2<-RBLD(1p4)p M503<-0,
LA1<-MPY1_4, LA2<-MPY2 _4, LA3<-MPY4_4,
LA4<-MPY5- _4, IT(3,1)<-ADDER;

L1_1<-CPM(4, 3, 3), L1_2<-RBLD(1,1), MlD3<-0,
L2 _1<-CPM(4, 3 4), L2_2<-RBLD(1,2), M2D3<-0,
L4_ _1<-CPM(4 3 1), L4_2<-RBLD(1,3), M403<-0,
L5- _1<-CPM(4, 3 2), L5_2<-RBLD(1,4), MSD3<-Or
LA1<-MPY1_4, LA2<-MPY2_4, LA3<-MPY4_4,
LA4<-MPY5_4 I IT (3! 2 ) ('ADDER;

L1_1<-CPM(4, 4, 4), L1_2<-RBLD(1,1), M1D3<-0,
L2- _1<-CPH(4, 4 1), L2 :2<-RBLD(1, 2), M2D3<-0,
L4- _1<-CPM(4, 4 2), L4 :2<-RBLD(1, 3), M4D3<-O,
L5. _1<-CPM(4, 4 3), L5. _2<-RBLD(1,4), M503<-0r
LA1<-MPYL_4, LA2<-MPY2 _4, LA3<-Mpy4_4,
LA4<-MPY$_4, 11(3 3)<-XDDER:

LA1<-MPY1_4, LA2<-MPY2_4, LA3<-MPY4_4,
LA4<~MPY5- _4, IT(3,4)<-ADDER;

LA1<-MPY1 J, LA2<-MPYZ_4, LA3<-MPY4_4,
LA4<-MPY5- _4, IT(4,1)<-ADDER:

LA1<-MPY1 J, LA2<-MPYZ_4, LA3<-MPY4_4,
LA4<-MPYS:4, IT(4,2)<-ADDER:

LA1<-MPY1_4, LA2<-MPY2_4, LA3<-MPY4_4,
LA4<-MPYS_4, IT(4,3)<-ADDER;
IT(4,4)<-ADDER:

LL_1<-IT(1,1), Ll_2<-RBLD(2,1), M1D3<-0,
L2_1<-IT(1,2), L2_2<-RBLD(2,2), M203<-O,

L4 _1<-IT(1, 3), L4_2<-RBLD(2,3), M4D3<-0,
L5:1<-IT(1 4), L5_2<-RBLD(2,4), MSD3<-0:
L1:1<-IT(2,1), LL_2<-RBLD(2,1), M1D3<-O,
L2_1<-IT(2 2), L2_2<-RBLD(2,2), M203<-0,
104:1 <‘IT‘2' 3), L4_2<-RBLD(213)I M4D3<"0,

L5 _1<-IT(2, 4), L5_2<-RBLD(2,4), M5D3<-O:
L1_1<-—IT(3 1), L1_2<-RBLD(2.1), MlD3<-0,
L2_ _1<-IT(3, 2), L2_2<-RBLD(2,2), M2D3<-0,
L4_1<- IT(3, 3), L4_2<-RBLD(2,3), M403<-0,
L5_1<-IT(3,4), L5_2<-RBLD(2,4), M5D3<-O:
L1_1<-IT(4,1), Ll_2<-RBLD(2,1), M1D3<-O,
L2_1<-IT(4,2), L2_2<-RBLD(2,2), MZD3<-0,
L4_1<-IT(4,3), L4_2<-RBLD(2,3), M4DB<-0,
L5_1<-IT(4,4), L5_2<-RBLD(2,4), M503<-0;
LA1<-Mpy1_4, LA2<-MPY2_4, LA3<-Mpy4_4,
LA4<-MPY$_4:

108

(95): LA1<-MPY1_4, LA2<-MPY2_4, LA3<-MPY4_4,
LA4<-MPYS_4, R(1) <-ADDER;

(96): LAl<-MPY1_4, LA2<-MPY2_4, LA3<-MPY4_4,
LA4<-MPYS_4, LD1<-ADDER, LD2<-R(1);

(97): LA1<-MPY1_4, LA2<-MPY2_4, LA3<-MPY4_4,
LA4<-MPY5_4, LDl<-ADDER, LDZ<-R(1):

(98): R(4)<-ADDER, LDl<-ADDER, LD2<-R(1):

(99):

(100):

(101):

(102):

(103):

(104):

(105):

(106): R(2)<-D10:

(107): R(3)<-DIO, OUTPUT<-R(2);

(108): R(4)<-D10, OUTPUT<-R(3):

(109): OUTPUT<-R(4):

(110): it (PATCH-0 or (s-l and t-1)) then stop;

else begin
/* input new parameter values */

(111): RS<- -INPUT:

(112): RT<-INPUT:

(113): L1_1<-BLD(LABS,1,1), M1D3<-BLD(LABS,1,2),
DL4<~BLD(LABS,1,3), DL8<-BLD(LABS,1,4),
L1_2<-RS, LAD3<-BLD(LABS,1,1);

(114): L1_1<-BLD(LABS,2,1), MlD3<-BLD(LABS,2,2),
DL4<-BLD(LABS,2,3), DL8<-BLD(LABS,2,4),
L1_2<-RS, LAD3<-BLD(LABS,2,1);
DL4<-BLD(LABS,3,3), DL8<-BLD(LABS,3,4),
L1_2<-RS, LAD3<-BLD(LABS,3,1);

(116): L1_1<-BLD(LABS,4,1), M103<-BLD(LABS,4,2),
DL4<-BLD(LABS, 4, 3), DL8<-BLD(LABS,4,4),
L1_2<-RS, LAD3<-BLD(LABS, 4,1), LA1<-0, LA2<~LAD3,
LA3<-LAD3, LA4<-LAD3:

(117): L1_1<-BLD(LABT, 1,1), MlD3<-BLD(LABT,1,2),
DL4<-BLD(LABT,1,3), DL8<-BLD(LABT,1,4),
L1_2<-RT, LAD3<-BLD(LABT,1,1), LA1<-0, LA2<-LAD3,
LA3<-LAD3, LA4<-LAD3, L4_1<-ADDER, L4_2<-RS,
M4D3<-M101, L2_1<-MPY1_4, L2_2<-Rs,;

(118): L1_1<-BLD(LABT,2,1), M103<-BLD(LABT,2,2),
DL4<-BLD(LABT,2,3), DL8<-BLD(LABT,2,4),
L1_2<-RT, LAD3<-BLD(LABT,2,1), LA1<-0, LAZ<-LAD3,
LA3<-LAD3, LA4<-LAD3, L4 _1<-ADDER, L4 _2<-RS,
M4D3<-M1D1, L2_1<-MPY1_4, L2 _2<-RS:

(119): L1 _1<-BLD(LABT, 3, 1), M103<-BLD(LABT, 3, 2),
DL4<-BLD(LABT, 3,3), DL8<-BLD(LABT, 3, 4),
L1_2<-RT, LAD3<-BLD(LABT,3,1), LA1<-0, LA2<-LAD3,
LA3<-LADB, LA4<-LAD3, L4 _1<-ADDER, L4_2<-RS,
M4D3<~MlDl, L2_1<-MPY1_4, L2 _2<-RS:

(120): L1_1<-BLD(LABT,4,1), M1D3<~BLD(LABT, 4,2),

DL4<-BLD(LABT, 4, 3), DL8<-BLD(LABT,4,4),

L1_2<-RT, LADB<-BLD(LABT,4,1), LA1<-0, LA2<-LAD3,
LA3<-LAD3, LA4<-LAD3, L4_1<-ADDER, L4_2<-RS,
M4D3<-M101, L2_1<-MPY1_4, L2_2<-RS:

/* stop here *

End.

(121):

(122):

(123):

(124):

(125):

(126):
(127):

(128):

(129):
(130):
(131)“

(132):

109

LA1<-O, LA2<-LAD3,

L3_1<-MPY2_4, L3 _2<-RS, LS_1<-MPY4_4,
LAI<:0,

m3<-LAD3, m4<- m3,
L5_ 2<-RS p

LA2<~LAD3, LA3<- LAD3, -LA4<-LAD3,

L4_1<-ADDER, L4_2<-RT, M4D3<-M1D1, L2_l<-MPY1_4,

L2_2<-RT:
L3_1<-MPY2_4,

L3_ 2<-RS I
MSD3<-M201, LAl<-0,

LA3<-LAD3, LA4<-LAD3,
L5_1<-MPY4_4, L5 _2<-RS,
LA2<-LAD3, LA3<-LAD3, -LA4<-LAD3,

L2_2<-RT:
LA1<-O, LA2<-LAD3,
L3_1<—MPY2_4,

LA3<-LAD3, LA4<-LAD3:
L5_1<-MPY4_4, LS_2<-RS,

L4_1<-ADDER, L4_2<-RT, M4D3<-MlDl, L2_1<-MPY1_4,

L2_2<-RT:

L3 l<-MPY2_4, L3_2<-RS,
LAI<"O'

max-mm,

L5_1<-MPY4_4 , L5_2<-RS,
LA2<-LAD3, LA3<-LAD3, LA4<-LA03,

L4__l<-ADDER, L4_2<-RT, M403<-M101, L2_1<-MPY1_4,

L2_2<-RT:

L5 1<-MPY4 _4, L5 _2<-RT,

MSDB<4M2017 RBLDTl,1)<-MPY3 _4, RBLBD(1,1')'<-mys_4;

L5_1<-MPY4 4, LS_2<-RT,

M5D3<-M201, RBLD(1,2)<-MPY3_ 4, Rsnﬁb(1, 2)<-MPY5_ 4;

L3_2<-RT,

MSD3<-M201, RBLD(1,3)<-MPY3_4, RBLDD(1, 3)<-MPY5_ 4:

L3_2<-RT,

LS_l<-MPY4_4 , LS_2<-RT,

MSD3<-M2D1, RBLD(1, 4)<-MPY3 4, RBLDD(1,4)<-MPYS_4:

RBLD(2,1)<-MPY3_4,
RBLD(2, 2)<-MPY3 _4,
331.0(2, 3)<-MPY3:4,
RBLD (2, 4) <-MPY3_4 ,

end else:

end else;

11st (2,1) <-MPYS_4 .-
RBLDD (2, 2) <-MPYS_4 ,-
RBLDD (2, 3) <-m¥5_4 ,-
RBLDD(2,4)<-MPYS_4, go to (69);

APPENDIX 2

INPUT SEQUENCE OF CONTROL POINT MATRICES

Cycle

(5):

(6):

(7):

(8):

(9):
(10):
(11):
(12):
(13):
(14):
(15):
(16):
(17):
(18):
(19):
(20):
(21):
(22):
(23):
(24):
(25):
(26):
(27):
(28):
(29):
(30):
(31):
(32):
(33):
(34):
(35):
(36):
(37):
(38):
(39):
(40):
(41):
(42):
(43):
(44):
(45):
(46):
(47):
(48):
(49):
(50):
(51):
(52):

CPM(1,1,1)
CPM(2,1,1)
CPM(3,1,1)
CPM(1,1,2)
CPM(2,1,2)
CPM(3,1,2)
CPM(1,1,3)
CPM(2,1,3)
CPM(3,1,3)
CPM(1,1,4)
CPM(2,1,4)
CPM(3,1,4)
CPM(1,2,2)
CPM(2,2,2)
CPM(3,2,2)
CPM(1,2,3)
CPM(2,2,3)
CPM(3,2,3)
CPM(1,2,4)
CPM(2,2,4)
CPM(3,2,4)
CPM(1,2,1)
CPM(2,2,1)
CPM(3,2,1)
CPM(1,3,3)
CPM(2,3,3)
CPM(3,3,3)
CPM(1,3,4)
CPM(2,3,4)
CPM(3,3,4)
CPM(1,3,1)
CPM(2,3,1)
CPM(3,3,1)
CPM(1,3,2)
CPM(2,3,2)
CPM(3,3,2)
CPM(1,4,4)
CPM(2,4,4)
CPM(3,4,4)
CPM(1,4,1)
CPM(2,4,1)
CPM(3,4,1)
CPM(1,4,2)
CPM(2,4,2)
CPM(3,4,2)
CPM(1,4,3)
CPM(2I413)
CPM(3,4,3)

(-
<-
<-
<-
<-
<—
<-
<-
(-
<-
<-
<—
<-
(-
<4—
(-
<-
<-
<-
<-
<-
<-
<-
<-
<—
<-
<-
<-
<-
<-
<-
<—
<-
<-
<-
<-
<-
<-
(—
(.-
<-
(-
<—
<—
<-
<-
<-
<—

110

X11
Y11
211
X12
Y12
212
X13
Y13
213
X14
Y14
214
X21
Y21
221
X22
Y22
222
X23
Y23
223
X24
Y24
224
X31
Y31
231
X32
Y32
232
X33
Y33
233
X34
Y34
234
X41
Y41
Z41
X42
Y42
242
X43
Y43
243
X44
Y44
244

APPENDIX 3

TIME TABLE OF LATCH ELEMENTS (I)

lll

Tmubleofhchelmtsfathefamdzﬁmof
Malling fractions and their ﬁrst deﬁvatives.

 

Cycle 1.4133 LA! LA2 LAB 1.44 L1_1 L1_2 M1D3 L2_1 L2_2 M2D3
5 $11 811 s 812

6 521 521 s 522

7 $31 831 5 S32

8 341 o 511 $11 S11 841 5 S42

9 o 521 $21 $21 (M1_4) s $13
10 0 $31 $31 $31 (M1_4) s 323
11 o 541 S41 S41 (M1_4) s 533
12 (Ml_4) s 543
13

14

15

16

17

18

19

2o

21 111 111 1 112

22 1'21 121 1 122

23 131 131 1 132

24 141 o 111 111 111 141 1 142

25 o 121 121 121 (Ml_4) 1 123
26 o 131 131 131 (Ml__4) 1 133
27 o 141 141 141 (Ml_4) 1 143
23

29

3o

31

32

112

Tune table of lamb elements for the formulation of blending
functions and their ﬁrst deﬁvatives (cont’d).

 

L3_1 L3_2 M303 L4_1 L4_2 M4D3 LS_1 L5_2 M5D3 D4 D8
5 $13 814
6 823 824
7 $33 $34
8 S43 S44
9 3x511 S 211812
10 311521 8 211822
11 311531 S 211832
12 311541 S 211842
13 (MZ_4) S 814 (M4_4) S $13
14 (MZ_4) S 824 (M4_4) S 823
15 (MZ_4) S 834 (M4_4) S 833
16 (M2_4) S S44 (M4_4) S S43
17
18
19
20
21 T13 T14
22 T23 1‘24
23 T33 T34
24 T43 T44
25 3lel '1' 2xT12
26 311121 ‘1‘ 211122
27 311131 '1‘ 2x132
28 3xT4l ‘1' 211142
29 (M2_4) 1 114 (M4_4) 1 . 113
.30 (mg) 1 124 (M4_4) 1 123
31 (1123) 1 134 (M4_4) 1 133
32 (M2_4) 1 144 (M4_4) 1 143

APPENDIX 4

TIME TABLE OF FUNCTIONAL STAGES (I)

Tune table of ﬁnncdonal stage: fat the fmnulation of

113

blending functions and their ﬁrst derivatives.

 

Cycle ADDER MPY1_1 MPY1_2 MPY1_3 MPY1_4 MPY2_1 MPY2_2 MPY2_3 MPY2_4
5 (M1_4)

'6 (M1_4) (M1_2)

7 (M1_4) (M1_2) (M1_3)

8 3x811 (M1_4) (M1_2) (M1_3) (M1_4)

9 3x821 (M1_2) (M1_3) (M1_4) (M2_1)

10 311831 (M1_3) (M1_4) (1123) (M2_2)

11 311841 (M1_4) (M2_1) (M2_2) (M2_3).

12 (M2_1) (M23) (M23) (M2_4)
13 (M23) (M2_3) (M2_4)
14 012.3) (M23)
15 (M2_4)
16

17

18

19

20

21 (M1_4)

22 (M1_4) (M1_2)

23 (M1_4) (M1_2) (Ml-3)

2: 3lel (M1_4) (M1_2) (M1_3) (M1__4)

25 311121 (M1_2) (M1_3) (M1_4) (M2_1)

26 311131 (M1_3) (M1_4) (M2_1) (M2_2)

27 311141 (M1_4) (M2_1) (M2_2) (M2_3)

28 (M2_l) (M2_2) (M23) (M2_4)
29 (M2_2) (M2_3) (M2_4)
30 (M2_3) (M2_4)
31 (M2_4)
32

33

34

35

114

Tim table of functional stages for the {trunnion of blending
functions and their ﬁrst derivative: (cont’d).

0N0

MPY‘3_1

MPY3_2 MPY3__3

MPY3_4

MPY4_1

MPY4_2 MPY4_3

MPY4_4

 

ONQGM

11
12
13
14
15
16
17
18
19
20

13.528

26
27
28
29
30
3 1
32
33

35

(M33)
(M33)
(M33)
(M33)

(M33)
(M33)
(M33)
(M33)

(M33)
(M33)
(M33)
(3133)

(M33)
(M33)
(M33)
(M33)

M(3_3)
M(3_3)
M(3_3)
M(3_3)

M(3_3)
M(3_3)
M(3_3)
M(3_3)

31-30)
31:50)
BFSQ)
393(4)

BFI'( l)
BFI' (2)
BFI‘(3)
BFI'(4)

(M43)
(M43)
(M43)
(M43)

(M43)
(M43)
(M43)
(M43)

(M43)
(M43)
(M43)
(M43)

(M43)
(M43)
(M43)
(M43)

(M43)
(M43)
(M43)
(M43)

(M4_3)
(M4_3)
(M4_3)
(M4_3)

(M4_4)
(M4_4)
(M4_4)
(M4_4)

(M4_4)
(M4_4)
(M4_4)
(M4_4)

115

Time table of functional stage. for the formulation of blending
functions and their ﬁn: derivatives (oont'd).

 

Cycle MPYS_1 MPY5_2 MPY5_3 MPY5_4
5

6

7

8

9

1o

11

12

13 (M5_1)

14 (M53) (M53)

15 (MS3) (MS3) (M53)-

16 (M5_1) (M5_2) (M5_3) napsu)
17 (M33) (M53) bursa)
18 (1453) DBFS(3)
l9 DBFS(4)
20

21

22

23

24

25

26

27

28

29 (M5_1)

30 (M5_1) (M5_2)

31 (1453) (M53) (M53)

32 (M5_1) (M5_2) (M5_3) DBFl‘(l)
33 (M5_2) (M5_3) Dar-1(2)
34 (M53) Dal-1(3)
35 oat-1(4)

APPENDIX 5

TIME TABLE OF LATCH ELEMENTS (II)

TimetableoflacchelememsfortheevaluationofX,Y,de.

116

 

Cycle 1.4133 1.41 1.112 1.43 1.14.4 L1_1 1.1.2 M1133
53 X11 BFS(1) o
54 1121 BFS(1) o
55 x31 81-30) 0
56 X41 BFS(1) o
57 (M1_4) (M2_4) (M4_4) (M5_4) Yll BFSO) o
58 (M1_4) (M2_4) (M4_4) (MS_4) 1(21 BFS(1) o
59 (M1_4) (M2_4) (M4_4) (M5_4) 1131 BFS(1) o
60 (M1_4) (M2_4) (M4_4) (M5_4) 1141 BFS(1) o
61 (M1_4) (M2_4) (M4_4) (M5_4) 211 BFS(1) o
62 (M1_4) (M2_4) (M4_4) (M5_4) 221 BFS(1) o
63 (M1_4) (M2_4) (M4_4) (M5_4) 231 BFS(1) o
64 (M1_4) (M2_4) (M4_4) (M5_4) 241 BFS(1) o
65 (M1_4) (M2_4) (M4_4) (M5_4)

66 (M1_4) (M2_4) (M4_4) (M5_4)

67 (M13) (Mz__4) (M4_4) (1453)

68 (M1_4) (MZ_4) (M4_4) (M5_4)

69

70 110.1) 331(1) 0
71 11(2,1) 331(1) 0
72 1113.1) 311(1) 0
73

74 (M1_4) (M23) (M4_4) (1)453)

75 (M1_4) (M2_4) (M4_4) (M5_4)

76 (M1_4) (112_4) (M4_4) (M5_4)

117

Time table of latch elements for the evaluation of X. Y, and Z (cont'd).

 

Cycle L2_1 1.2_2 112133 L4_1 L4_2 M4133 1.53 L5__2 M5D3
53 X12 358(2) 0 x13 358(3) 0 x14 358(4) 0
54 x22 358(2) 0 1123 358(3) 0 x24 358(4) 0
55 1132 358(2) 0 x33 358(3) 0 x34 358(4) 0
56 X42 358(2) 0 X43 358(3) 0 X44 358(4) 0
57 1112 358(2) 0 1113 358(3) 0 1114 358(4) 0
58 1122 358(2) 0 1123 358(3) 0 124 358(4) 0
59 Y32 358(2) 0 1133 358(3) . o 1134 358(4) 0
60 1142 358(2) 0 1(43 358(3) 0 1144 358(4) 0
61 212 358(2) 0 213 358(3) 0 214 358(4) 0
62 222 358(2) 0 223 358(3) 0 224 358(4) 0
63 232 358(2) 0 233 358(3) 0 234 358(4) 0
64 242 358(2) 0 243 358(3) 0 244 358(4) 0
65
66
67
68
69
7o 11(1,2) 351(2) 0 11(1,3) 351(3) 0 11(1,4) 351(4) 0
71 1112.2) 351(2) 0 « 11(23) 351(3) 0 11(24) 351(4) 0
72 11(32) 351(2) 0 1103) 351(3) 0 11(3,4) 351(4) 0
73
74

‘3“
GM

APPENDIX 6

TIME TABLE OF FUNCTIONAL STAGES (II)

118

TinntableofMACmgafortheevaludmofX.Y,andZ.

 

Cycle ADDER MPY1_1 31mg MPY1_3 my m_1 MPY2_2 MPY2_3 MPY2_4
53 (M1_l) (3123)
54 (M13) (M13) (1123) (M23)
55 (M13) (M1_2) (M13) (1123) (M23) (Mz3)
56 (M13) (M13) (M13) (M1_4) (112.1) (M23) (1123) (1123)
57 110.1) (M13) (M1_2) (M13) (M13) (112.1) (M23) (M23) (M2_4)
58 1102) (M13) (M13) (M13) (M1_4) (1123) (M23) (1123) (M23)
59 1103) (M13) (M13) (M13) (M13) (1123) (1123) (1123) (M2_4)
60 1111.4) (M13) (M13) (M13) (M13) (M23) (M23) (1123) (312.4)
61 110.1) (M13) (M13) (M13) (M13) (M23) (M2_2) (1123) (142.4)
62 1112.2) (M13) (M1_2) (M13) (M13) (112.1) (M2_2) (1123) (M23)
65 1105) (M13) (M13) (M13) (M13) (1412.1) (M23) (1123) (M23)
64 132.4) (M13) (M13) (M13) (M1_4) (M23) (M23) (1123) (11123)
65 110.1) (M13) (M13) (M13) (M23) M3) (1123)
66 110.2) (M13) (M13) (1123) (1123)
67 113.5) (M13) (1123)
61 110.4)
69
7o (M1_l) (M23)
71 (M1_l) (M1_2) (M2_l) (M2_2)
72 (M13) (M1_2) (M13) (112-1) (M2_2) (1)123)
73 (M13) (M13) (M1_4) (1123) (1123) (112.4)
74 x (M1_3) (M1_4) (M23) (M2_4)
75 Y (Ml_4) (M2_4)
76 Z

119

Time table of functional me: for 111: «111mm ofX, Y, and Z (cont'd).

 

Cycle MPY4_1 MPY4__2 11m; MPY4_4 MPY5_1 1151153 MPY5_3 1151153
53 (M4_l) (M5_1)
54 (M43) (M43) (M5_1) (1153)
55 (M43) (M43) (M43) (M53) (1153) (1153)
56 (M43) (1143) (M43) (M4_4) (M53) (M53) (M53) (M53)
57 (1143) (M43) (1113) (M43) (M53) (1153) (1153) (M5_1)
51 (M43) (M13) (M43) (M4_4) (1153) (1153) (1153) (115.4)
59 (1143) (M43) (M43) (M4_4) (M53) (1153) (M53) (M5_4)
60 (M43) (M43) (M43) (M4_4) (M5_1) (1153) (M53) (115.4)
61 (M43) (M43) (M43) (M43) (M5_1) (1153) (1153) (M53)
62 (M43) (M43) (M43) (M43) (M5_1) (M5_2) (1153) (115.4)
63 (M43) (M43) (M43) (1143) (M5_1) (1153) (1153) (115.4)
64 (M43) (1143) (M43) (M4_4) (M5_1) (1153) (M53) (1153)
65 (M43) (M43) (M4_4) (1153) (M53) . (M53)
66 (M43) (M4_4) (1153) (M5_4)
67 (M4_4) (M5_4)
68
69
70 (M43) (M53)
71 (M43) (M43) (M53) (1153)
72 (M43) (M43) (1143) (M5_1) (M53) (M53)
73 (1143) (M43) (M4_4) (1153) (M53) (115.4)
74 (M4_3) (M4_4) (M5_3) (M5_4)
75 (M4_4) (M5_4)

76

BIBLIOGRAPHY

10.

11.

12.

13.

120

BIBLIOGRAPHY

Pickelmann, M.N., “The Design of Rational B-spline Algorithms for
Interactive Color Shading of Surfaces,“ Ph.D. dissertation, Michigan
State University, 1985.

'Nov 3-D Image Can Be Moved in Real Time,“ Electronics, August 7,
1985. pp. 97-100.

”Survey of custom and semicustom ICs,' VLSI Des., Oct. 1984, pp. 30-
43.

Bolton, U.C. and Gavin, R.K., "A Perspective on CMOS Technology
Trends," Proceedings of the 1523, Vol. 7h, No. 12, Dec 1986, pp.
1646-1668.-

Baller, H.H., 'VLSI/Software Engineering Design Methodology," IEEE
1984 workshop on the Engineering of VLSI and Software, IEEE Computer
Society Press, 1984, pp. 11-14.

Okuda, 8., Sugai, M., and Goto, 8.. "Semicustom and Custom LSI
Technology,“ Proceedings of the 1535, Vol. 74, No. 12, Dec 1986, pp.
1636-1645.

Vai, M.K~ and Shanblatt, M.A,, 'Some Issues in the—Design.Automation
of VLSI,“ Technical Report, Department of Electrical Engineering,
Michigan State Unversity, 1986.

Watanabe, W, and Ackland, 8., "FLUTE: An Expert Floor Planner for
Full-custom VLSI Design,“ IEEE Design & Test, Feb. 1987.

Subrahmanyam, P.A., “Synapse: An Expert System for VLSI Design,"
computer, July 1986, pp. 78-89.

Uhitton, M.C., I'Special Purpose Hardware For the Display of Free
Form Surfaces,“ Master thesis, North Carolina State University,
1984.

Hawkins, J.L., I"I'he Wonder of B-splines,‘ Case Center for Computer-
Aided Engineering and Manufacturing, Michigan State University,
1986.

Vanderploeg, M.J., "Surface Assessment Using Color Graphics," Ph.D.
dissertation, Michigan State University, 1982.

Catmull, E., ”A Subdivision Algorithm for Computer Display of Curved
Surfaces," Dept. of Computer Science, University of Utah, Dec. 1974.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

121

Lane. J.M., Carpenter, L.C.. Whitted, T., and Blinn, J.F., "Scan
Line Methods for Displaying Parametrically Defined Surfaces,"
Communication of the ACH, Vol. 23, No. 1, Jan. 1980, pp. 23-34.

Lane, J.M. and Riesenfeld, R.F., ”A Theoretical Development for the
Computer Generation and Display of Piecewise Polynomial Surfaces,"

IEEE Trans. on Pattern Recognition and Machine Intelligence, Jan.
1980. pp. 35-46.

Cox, M.G., ”The Numerical Evaluation of B-splines," J. Inst. Maths.
Applics, Vol. 10, 1972, pp. 134-147.

DeBoor, C., 'On Calculation with B-splines,' J. Approx. Theory, Vol.
6, 1972, pp. 50-62.

W. I.D. Faux- and
M.J. Pratt, Ellis Horwood Limited, 1983, England.

MW. J.D. Foley and A.
Van Dam, AddisoneWesley Publishing Company, 1982.

Amanatides, J., "Realism in Computer Graphics: A Survey," Computer
Graphics and Applications, Vol. 7, No.1, Jan. 1987, pp. 44-56.

Phong, B.T.'111umination for Computer General Pictures,“ comm.
ACM, V61. 18, No: 6, June 1975, pp. 311- 317.

Gouraud, M., “Continuous Shading of Curved Surfaces,” IEEE Trans.
Computers, Vol. C-20, No. 6, June 1971, pp. 623-629.

, C. Mead and L. Conway, Readings, MA:
Addison-Wesley, 1978.

ggnnuggx_5;1;hng;19, K. Mwang, John Wiley and Son, 1979, p. 184.

K. Hwang, op. cit., Chapter 8.

Leung, 8.8. and Shanblatt, M.A., “A VLSI Chip Architecture for the
Computation of the Kinematic Solution of a Robotic Manipulator,"

Dept. of EESS, Michigan State University, April 1985.

Gajski, D.D. and Kuhn, R.H., "New VLSI Tools," IEEE Computer, Vol.
16, No. 11, Dec. 1983, pp. 11-14.

Hartmann, A.C., "Software or Silicon? The Designer's Option,"
Proceedings of the 1552, Vol. 74, No. 6, June 1986, pp. 861-874.

211ng1nlgg_g£_§ﬂ9§_!L§1_Qg§1gn, N. Waste and K. Eshraghian, Addison-

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

122

Wesley, 1985.
K. Mwang, op. cit., p. 279.

Vai, M.K., "Performance-Design Tradeoff of Hierarchical VLSI Design
Entry Points," M.S. Thesis, Michigan State University, 1985.

Ramachandran, V., "On Driving Many Long Wires in a VLSI Layout," J.
of the ACM, Vol. 33, No. 4, Oct. 1986, pp.687-701.

'TRN's Superchip Passes First Milestone," Electronics, July 10,
1986. PP. 49-54.

Panasuck, C., “High-density Gate Arrays," Electronic Design, Oct.
16, 1986. Pp. 94-100.

Uinard, M., ”Standard-cell Libraries Expand," Electronic Design,
Sept. 11, 1986, pp. 179-184.

Guttag, M., Aken, J.V., and Asal, M., "Requirements for a VLSI
Graphics Processor,“ Computer Graphics and Application, Vol. 6, No.
1, Jan. 1986, pp.32-47. “

Ikedo, T., "High-Speed Techniques for a 3-D Color Graphics Terminal,"
computer Graphics and Application, Vol. 4, No. 3, May
1984, pp. 46-58.

Kawahito, 8., Kameyama, M., Miguchi, T., and Yamada, M., “A. High-
Speed Compact Multiplier Based on Multiple-Valued Bi-directional
Current-Mode Circuit,“ Proc. 17th Int’l Symposium on Hu1tiple-Valued
Logic, Computer Society Press, Los Angles, Calif., May 1987, pp.
172-180.

Canal, A.E., et a1., ”A CMOS 32b Wallace Tree Multiplier-
Accumulator,” 1355 Digest Int’l Solid-State Circuits Cbnf., Feb.
1986, pp. 194-195.

HICHIGRN STRT

I 1111

E UNIV. LIBRRRIES
IIIH 1111111le1111111
9 91 37 1 2

312 3015