lllllllllllllm LIBRARY H l Michigan State University This is to certify that the thesis entitled A VLSI Chip Architecture For The B-spline Surface Evaluation presented by Chia-Yiu Maa has been accepted towards fulfillment of the requirements for Master . Science degree in agar/gm Mar Major professor Date 8-28-87 0-7 639 MS U i: an Affirmative Action/Equal Opportunity Institution MSU LIBRARIES .___. RETURNING MATERIALS: Place in book drop to remove this checkout from your record. FINES will be charged if book is returned after the date stamped below. A VLSI CHIP ARCHITECTURE FOR THE B-SPLINB SURFACE EVALUATION By Chia-Yiu Maa A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Electrical Engineering 1987 ABSTRACT A VLSI CHIP ARCHITECTURE FOR THE B-SPLINE SURFACE EVALUATION By Chia-Yiu Maa This thesis describes the architecture design of the SE (Surface Evaluation) chip, a VLSI chip dedicated to the B-spline surface evaluation, which can be used to ease the Computer-Aided Geometric Design (CAGD) of B-spline surfaces. The reconfigurable structure of the SE chip can perform on-chip blending function formula- tion and high-speed vector-matrix multiplication at less hardware cost. Pipelining and parallel processing techniques are applied to the inter- and intra-functional unit designs to speed up the calculation of the surface evaluation algorithm. The symbolic example and the description of the SE chip written in RTL (Register Transfer Language) verify the correctness of the design. Area estimation confirms the feasibility of a semi- custom implementation approach with current technology and speed estimation shows at least an order of magnitude improvement over existing graphics hardware in per- forming surface evaluation. The SE chip architecture can also be used to execute the coordinate transformation, tangent and normal vector derivation, and shading. ACKNOWLEDGMENTS I wish to express my greatest appreciation to my parents for getting me started right and continued support and encouragement throughout my studies, and my wife Mei-Ling, without her patience, understanding and love, nothing would have become reality. I express my deep gratitude to my advisors, Dr. M. Shanblatt and Dr. E. Goodman, for their guidance and encouragement throughout this research. Special thank is also due to Jane Hawkins for providing the mathematical preliminary of B- spline and helpful discussions. Table of Contents LIST OF TABLES ............. - ............... - -_ v LIST OF FIGURES ................................................................................................... vi 1. INTRODUCTION ................................................................................................. 1 1.1 Problem Statement .................................................................................. 3 1.2 Approach ................................................................................................. 4 1.3 Overview of the Thesis .......................................................................... 6 II. VLSI System Design ........................................................................................... 7 2.1 Characteristics and Trends in VLSI Design .......................................... 7 2.2 Design Methodology .............................................................................. 9 2.3 New Design Technology ........................................................................ 14 III. MATHEMATICAL PRELIMINARIES ............................................................ 17 3.1 B-spline Curve ........................................................................................ 17 3.1.1 Knot Vector ............................................................................. 20 3.1.2 Blending Functions .................................................................. 23 3.2 Rational B-spline .................................................................................... 30 3.3 B-spline Surface ..................................................................................... 34 IV. LITERATURE REVIEW ................................................................................... 39 4.1 Uniform Subdivision .............................................................................. 39 4.2 Adaptive Subdivision” ............................................................................. 40 4.3 Cox-deBoor Algorithm ........................................................................... 41 4.4 Pickelmann’s Algorithm ......................................................................... 43 V. SURFACE EVALUATION AND SHADING ................................................... 48 5.1 Evaluation Algorithm ............................................................................. 48 5.2 The Derivative and Normal Vectors ...................................................... 51 iii 5.3 Shading ................................................................................................... 53 VI. ARCHITECTURE DESIGN .............................................................................. 57 6.1 System Specification ............................................................................... 58 6.1.1 Design Objective ..................................................................... 58 6.1.2 I/O Specification ...................................................................... 59 6.2 Architecture Development ...................................................................... 60 6.2.1 Functional Units ..................... 60 6.2.2 Timing and Pipelining ............................................................. 70 6.2.3 Storage Scheme ....................................................................... 73 6.3 Architecture of SE chip .......................................................................... 78 6.3.1 Architecture Synthesis ............................................................. 78 6.3.2 Hardware Description in RTL ................................................ 82 VII. ARCHITECTURE EVALUATION ................................................................. 84 7.1 Architecture Verification ........................................................................ 85 7.2 Area Estimation ...................................................................................... 88 7.3 Speed Estimation .................................................................................... 92 VIII. CONCLUSION ................................................................................................ 95 APPENDIX 1 ............................................................................................................. 98 APPENDIX 2 ............................................................................................................. 110 APPENDIX 3 ............................................................................................................. 1 11 APPENDIX 4 ............................................................................................................. 113 APPENDIX 5 ............................................................................................................. 116 APPENDIX 6 ............................................................................................................. 118 BIBLIOGRAPHY ...................................................................................................... 120 iv 3.1 4.1 6.1 7.1 7.2 LIST OF TABLES Blending function table .of the example. ....................................................... 28 Subpatch and corresponding control points. .................................................. 46 Cycle count for different evaluation routines. ............................................... 82 Transistor count of the SE chip. .................................................................... 89 Evaluation time for different evaluation routines. ........................................ 94 1.1 1.2 2.1 2.2 2.3 2.4 3.1 3.2 ' 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 4.1 4.2 5.1 5.2 6.1 6.2 6.3 6.4 LIST OF FIGURES Viewing of 3D objects. . .................................................................................. An enhanced graphics pipeline [2]. ............................................................... Speed versus integration scale [3]. ......................................... ‘ ....................... Hierarchical layout. ........................................................................................ The "Y-chart" display of design representations. .......................................... Cooperating expert agents for VLSI design [8]. ........................................... B-spline curve. ................................................................................................ Example of locality. ....................................................................................... Blending function cascade. ............................................................................ Blending function tree. ................................................................................... Blending function tree - segment 1. .............................................................. Blending function tree - segment 2. .................................................... _ .......... Plot of blending function. .............................................................................. Third order B-spline. ...................................................................................... Strictly rational B-spline. ............................................................................... B-spline control point net. ............................................................................. B-spline surface. ............................................................................................ B-spline control point net [11]. ...................................................................... B-spline surface [11]. ..................................................................................... Normal vector and tangent vector of a point on the surface [18]. ............... The geometry of shading [l2]. ...................................................................... The block diagram of a 5-by-5 Baugh-Wooley two’s complement array multiplier [24]. ...................................................................................... (a) Basic MAC block. (b) MAC array structure to compute a polynomial of order 3. (c) MAC array structure to find the first derivative of a polynomial with order 3. ............................................... The block diagram for calculating the blending function and the first derivative. ......................................................................................... Block diagram for calculating of the inner product of X and Y. ................ vi 13 15 18 22 24 25 26 27 29 30 33 37 38 47 47 52 55 6O 62 64 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 6.18 7.1 A four-operand adder formed by CSA tree. .................................................. 65 The functional block diagram of a full carry lookahead adder. ................... 65 Logic circuits for realizing the carry-generate, carry-propagate, and sum functions. ......................................................................................... 66 The schematic logic circuit of a 4-bit carry lookahead (CLA) unit. ............ 67 A carry lookahead array divider with 8-bit divisor in two’s complement representation [25]. ................................................................... 68 Three basic types of cells to be used in constructing the division array of Figure 6.9 [25]. ................................................................................ 69 Thegeneral form ofadatapath [23]. .......................................................... 71 The two-phase clocking scheme [26]. .......................................................... 71 Pipelined four-segment multiplier and its clock scheme. ............................ 72 Datafiow of vector-matrix multiplication. ..................................................... 74 Skewed storage pattern with (a) 8:0, and (b) Sal. ..................................... 76 Skewed storage with output multiplexing. ........................ 77 Simplified diagram of the SE chip. .................... . - -- - ................. 79 Multiplying-and-adding (M&A) unit. ........................................................... 81 Flowchart of the evaluation algorithm in the SE chip. ................................. 86 vii CHAPTER I INTRODUCTION Geometric modelling is a broad field in computer graphics which uses computational geometry to develop representation techniques for three-dimensional objects. It can be divided into two sub-fields: 1) surface modelling, which represents three-dimensional objects using various types of closed surfaces (quadratics, polygonal meshes, parametric bicubic patches, etc.), and 2) solid modelling, which aims at developing object representations which are complete, unambiguous, realizable, accurate, efficient, productive, flexible and minimal. Surface modelling is especially useful in the aerospace and automotive industries, while solid modelling is gradually becoming useful for a variety of mechanical CAD/CAM applications. In this thesis, we shall deal only with the topic of surface modelling, but insofar as most solid modelling systems utilize a curve -- and surface -- based boundary representation (B-rep) for at least graphics calculations, the work described here is quite directly.applicable to solid modelling system development. The process used to generate the realistic computer images from a geometric data base is called image synthesis. It generally involves several steps: 1) transformation of the data base from world coordinates to the 2) 3) 4) appropriate viewing coordinates (see Figure 1.1); elimination of hidden surfaces by way of depth sorting (sort the polygons by their distance from the viewpoint and to place them into the refresh buffer in order of decreasing distance), 2. buffering (a buffer is used to record the smallest 2 value encountered for each (x,y) as the search porgresses), scan-line processing (by using scan-line coherence and edge coherence to determine the projecting result of all polygons onto the xy- plane), or area subdivision (recursively subdivide the area of the projection plane image until it is easy to decide which polygon or polygons are visible in the area); shading the visible surfaces by taking the light sources, the surface characteristics, and the relative positions and orientations of the surfaces and sources into consideration; taking account of shadows, transparent and translucent materials, surface texture, surface detail, and aliasing (some 31) world “PM '0'“ Normalised device coonduune . coondhudee coondhufles deecrl u l f | p (“1 cup filflhufl» E Ffiofiun.onhb uJEP%%E"‘%¢ Trenenn1n *dew'vchune ‘flp u“, in 23:55; °d '—€>lhflfidggflfleelbe> pnubc n P '3 .eoo ll tee e co Figure 1.1. Viewing of 30 objects. of these can be handled by ray tracing). The formulation of a realistic three-dimensional image does not necessarily follow the order of steps stated above. For example, a chip set designed by Hewlett-Packard Co. has a graphics pipeline as shown in Figure 1.2 [2]. 531$:th :33. runs “WC“ scan me n cigar F4, .32," same “I 9522* u'vecnotusn "I “5": "WA?" —. usr sums .—__. ' .___. Inn—m l—J, .—. lL—J. _. onuavxut .uuiiiin‘e'l‘fiiom rul'éi‘l'l'fma. scan couvsnsnon IMAGE nuwv Figure 1.2. An enhanced graphics pipeline [2]. 1.1 Problem Statement Researchers at the A. H. Case Center for Computer-Aided Design, Michigan State University, have developed a surface assessment package called MSU COLORSCOPE to produce accurate shaded images by calculating and shading surfaces evaluated at the pixel level. Recently, several algorithms developed by Pickelmann [I] allowed for more efficient evaluation of the entire range of rational/non-rational curves and surfaces. Though these algorithms speed up the response time of the interactive CAD/CAM system which uses them, the time for generating a shaded image is still too long (typically more than few minutes) and thus far away from the goal of high quality "real-time" realistic image generating for the interactive CAD graphics system. 1 It has been shown that the time required for three-dimensional surface evaluation and transformation demands a significant percentage of the total time needed to realize a new shaded image. If we can download the surface evaluation and transformation to high speed hardware, the overall computation time will be reduced dramatically. I The surface evaluation routines shown in [l] involve numerous matrix multiplications and thus are candidates good for implementing on regular structured hardware. The very best candidate for the hardware implementation is VLSI (Very Large Scale Integration), which has shown great potential for improving system performance by implementing concurrent numerical algorithms directly in hardware. Furthermore, advances in CAD (Computer-Aided Design) and CAB (ComputereAided Engineering) systems for VLSI have enabled such implementation to be carried out in a custom design with relatively low costs in both time and money. 1.2 Approach The purpose of this research is to investigate and specify the major hardware components of a custom-designed VLSI chip architecture for the surface evaluation algorithm given in [l]. The surface evaluation chip is to work as a coprocessor to the host, a graphics processor taking care of all the graphics processing except for surface evaluation. It downloads necessary information from the host and evaluates either a point or a patch (by recusively evaluating the point on the patch). Fixed-point calculations are assumed throughout the design. This implies that any necessary scaling is performed in the host system. The overall design follows the top-down approach. The system criteria will be stated first, followed by the definition of each building block. When specifying the system criteria, the I/O problem, one of the main considerations in dealing with matrix operations, must be solved in order to guarantee better performance. A proper VLSI array structure suited for matrix multiplication is also to be employed in the design. In order to achieve the highest system throughout, the pipeline concept is extensively applied to the inter- and intra-functional units. The timing scheme for supporting pipelining will also be introduced. The storage scheme of the surface evaluation chip will be addressed particularly because of the special pattern of memory access used. A reconfigurable structure with higher resource utilization and less number of functional components will then be given. In order to simulate as well as verify the design, the chip architecture will be expressed in the form of a program written in RTL (Register Transfer Language). By comparing the simulation results with the software surface evaluation algorithm, mistakes will be corrected and modifications will be made to improve the system performance. 1.3 A Overview of the Thesis Chapter Two introduces basic VLSI technology and design methodologies. Chapter Three presents the mathematical preliminaries required for the surface evaluation algorithm. A literature review is given in Chapter Four. Chapter Five gives the surface evaluation algorithm and the idea of shading. The development of the chip architecture is covered in Chapter Six. The simulation results and conclusions are presented in Chapter Seven. CHAPTER II VLSI SYSTEM DESIGN The goal of this thesis is to develop a dedicated VLSI chip for surface evaluation. This chapter introduces the characteristics as well as trends of VLSI technology. The discussion will emphasize semicustom and custom VLSI, also called application-specific integrated circuit (ASIC) design. VLSI design methodologies such the silicon compiler and AI approaches will also be covered. 2.1 Characteristics and Trends in VLSI Design Recent advances in VLSI technology -- in system and circuit design environments, process sophistication, manufacturing cost effectiveness, reliability, and packaging -- have spurred the identification of new system applications which are now feasible with more advanced technology. Offering excellent noise immunity, wide operating margins with respect to power supply voltage and temperature, lower drive currents, decreased power consumption, ease of design for VLSI circuits, and scalability to submicrometer dimensions with improved performance and reliability relative to alternative technologies, CMOS is the most suitable technology for the majority of these emerging applications. CMOS designs present a disadvantage in that the die size required is larger than that required by NMOS for the same circuit. New technologies, however, such as domino circuits and clocked circuits, are reducing the die size difference between CHDS and NMOS. In the ultra-high-speed region, ECL and CML technologies are used. Their performance superiority is irresistible, but other costs are higher. In a much higher speed region, new semiconductor devices are being explored. Gallium arsenide devices are expected to be applied to real-world systems in the near future and are one of the most promising technologies. Figure 2.1 shows the distribution of operation speed and scale of integration for gate arrays achieved by various technologies [3]. Two CHOS technologies have been prominently used: the silicon-gate Se tt Mum; speed (net ! si-qete Obs (emote metal! . l I 5 t St-oete 008 ldoeble metal) -------- -- -J---- l r fl 5 . m : l *o_e—c-oqpoj ' 2 i i' . | r4 I ' Ll - I it ,tzuztztztgja--__-J F _-— - _. ' ' a ' o ' I 0.5? I o o r I r L ................. ‘J CCL on} Scale of 1 .‘c— 0.1 Inteeretton 10° 1 S 1000 I 5 10000 2 ‘°.‘.’ Figure 2.1. Speed versus integration scale [3]. CHOS and metal-gate CMOS. The former has become dominant because it allows for higher operation speed and higher degrees of integration by using finer pattern geometries. For example, 1.2 to 1.5 pm design rule devices exhibit 1.0 to 1.2 ns delays and subnanosecond delay is expected by using submicrometer rules [4]; The types of integrated circuits that are used for semicustom and custom designs include: 1) Gate Arrays (GAs). These are prefabricated unwired standard transistor arrays. The function of the gate array is defined by properly interconnecting transistor patterns on the chip. 2) Standard Cells (SCs). These allow for the design of application-specific integration circuits by composing circuits from elements stored in cell library. 3) Full Custom integration circuits (FCs). These are hand customized by system designers. Full custom designs are the most time-consuming to produce, but are capable of the highest density. 2.2 Design Methodology ..Custom chips developed for particular applications tend to be more dedicated to the target system and less general purpose. The term "application-specific 16' clearly describes the present situation where many different types of circuits are being used in small quantities for specific applications. Smaller size, lower cost, higher speed and 10 higher reliability have been goals in the development of ASICs. Higher integration of devices, however, has led to longer design times, and made it difficult to maintain design reliability. In order to solve this dilemma, research and development work on various customization technologies and automated design technologies is being accelerated, with emphasis on design methodologies and automated tools. Design methodology is defined as a set of codified techniques that are applicable to the VLSI design process. The objective of studying design methodology according to Baller [5] is to facilitate the creation of better designs. A final design must be functionally and physically correct, qualitatively acceptable, testable, and easily modified. The prevalent design methodology today is hierarchical in nature. A hierarchical layout using the standard cell approach is given as an example in Figure 2.2 [6]. The VLSI techniques are categorized by those used for design, or synthesis, of a VLSI circuit and those used for its verification. In both categories, a further distinction is made between techniques relating to physical and functional views of the design. Physical design supports partitioning, layout, and topological analysis at all design levels. Functionality, testability, and physical design must be considered in parallel throughout the design process. A design model is first defined according to some initial objectives and requirements. A top-down design process is normally used to decompose the desired operations of a circuit into a network of orchestrated smaller and simpler functional modules. Once a functional 11 F— —1[ 11"08UFF18' ] _t ‘ l I z’ 5 R. ./ C) (D A. L IJ I! E B P C3 3 u , U P 1' P F , F F f“ A A. I C V V C:th'R.L L R C>rw ,e ........ 3 E “ ': —‘— rat-55:...” .04 LAYOUT OP BLOCKS STANDARD CELL LIBRARY Figure 2.2. Hierarchical layout. implementation strategy has been determined, a bottom-up process is commonly used to complete the physical design. Design results are simulated at different levels throughout the design process to evaluate correctness and performance [7]. The ”Y" chart shown in Figure 2.3 has three axes that depict functional (behavioral), structural (architecture), and geometric (physical) representations of VLSI design. The functional axis 12 describes the functionality of the design, the structural axis represents the structural implementation of the design, and the geometric dimension represents the hardware realization of the design. Higher level of abstractions are encountered moving out radially from the origin. Commonly used levels of structural representation are the processor/memory/switch (PMS), the register-transfer, and the circuit level. Geometric representations include layout planning, cells, and physical mask geometries. VLSI design methodology, from this perspective, is a sequence of one-to-many mappings from higher to lower levels of abstraction and between different design representations in the design space. A particular mapping sequence taken by a certain school of thought, not surprisingly, more or less reflects a particular environment (usually expressed in needs and constraints of human, technological and financial resources) under which that particular methodology has evolved. The design tools available to IC designers today usually encompass several of the lower level representations shown on the Y chart. For example, the designer can expect to enter at the logic level or perhaps the circuit level and find support for layout, logic and circuit simulation, design rule checking, and timing and critical path analysis. Many of these systems support designer interaction at any level of the design process and provide consistent database services across the entire range of design activities [4]. 13 mucrurm. summon mm” Mmflomry/Swttoh Subm- Wt ‘l‘l'enster Mm Circulte 3.01..“ We ._ leak Geometriee' -Celis _ layout Planning cmmmc MUTATION Figure 2.3. The "Y-chart" display of design representations. 14 2.3 New Design Technology In software design, for the ease of programming, algorithms are written in high level languages. The same principle is applied to hardware design for which high level hardware description languages (HDL) are necessary. More than clarifying module specification definitions as in the early age, a ideal HDL should provide both functional and structural descriptions at all levels of abstraction. Many HDL's exist, but no one is widely used. Besides HDL, silicon compilers have drawn much attention from CAE' system developers. Silicon compilers have the goal of generating mask patterns directly from an HDL. In this case, the entire physical design process of a VLSI system can be reduced into a compilation process such that most of the characteristics of the chip under design can be accurately estimated. This allows system designers to focus on higher level design and to experiment with various algorithm/architecture alternatives with much less effort and cost. An elementary version of a silicon compiler is the module generator used to generate the mask layouts of some regular modules, such as basic cells in gate arrays and standard cells. "Concorde" is an example of one released by Seattle Silicon Technology and is more precisely classified as a macro-compiler. But the chip geometry generated by a macro-compiler is usually not optimal. How to automatically generate close-to-optimal geometry and achieve full compilation rather than macro-compilation are the two issues remaining to be solved for the silicon compiler in the near 15 future. There has recently been much interest in using expert systems modeled on human expert knowledge. Such systems aim to capture the style and expertise of the human expert and provide flexibility to deal with a wide range of applications. Designs in VLSI usually involve certain forms of heuristics, such as those used for placement and routing, and thus it is appropriate to use expert system techniques to convert a hierarchical circuit description into a full-custom VLSI layout. The expert system for VLSI design may consist of a number of cooperating expert agents connected through a central manager. An example is given in Figure 2.4 [8]. Much effort has gone into the development of an expert system for VLSI design. Several issues, however, still remain open. Structural moot Physical output Figure 2.4. Cooperating expert agents for VLSI design [8]. 16 l) The expert system should contain as much knowledge of VLSI design as possible. 2) The machine's learning capabilities should be augmented. 3) Special artificial intelligence machines msut be deveIOped for providing high throughput expert systems. This thesis focuses mainly on the design of a surface evaluation chip at the system, algorithmic,-and register transfer levels. The final design will be expressed by a program written in RTL (Resigter Transfer Language). CHAPTER III MATHEMATICAL PRELIMINARIES This chapter covers the mathematical preliminaries on which the surface evaluation algorithm is based. The B-spline space curve and rational B-spline space curve will be introduced first. Next the B- spline surface patch is defined and extended to the rational B-spline patch. In order to give the reader a clear picture of B-splines, simple examples are given. Finally, various classifications of B-spline are discussed. 3.1 B-spline Curve A curve can be represented by a piecewise linear approximation in which each segment of the approximation is a straight line. Likewise, a curve can be represented as a piecewise polynomial approximation in which each segment of the composite figure is defined by a polynomial. A piecewise polynomial curve is called a spline.[10] A B-spline is a parametric spline which consists of one or more segments. Each segment represents the curve over a range of the parameter t, for example 1 s t < 2. The curve is partially defined by a set of control points { P0 P1 P2 ..... Pn I (coordinate values) which produce an open-sided polygon when connected sequentially by straight i7 18 lines. Each segment of the curve is defined by a subset of the control points. A three-segment cubic B-spline curve with its control point polygon is shown in Figure 2.1. Segment ends are denoted by an X. The control points that are used to compute a particular segment are listed beside that segment [11]. Y 4+ P: 3 a a. ‘ Pa Fed-'29: 1.. t3 p P 39‘? 3 I I P F% I 1 3 ‘14 HI x Figure 3.1. B-spline curve. Let P(t) be the Cartesian position along a curve as a function of the parametric variable t. A segment of the spline is defined by the following equation: 19 n P(t) - 2 P N (c) (2.1) 1-01 1.1; where P1 is the control point, n+1 is the number of control points, Ni,k is the blending function associated with P1, and k is the order of the blending functions. The order k of the B-spline curve is the degree + 1, i.e., a quadratic is an order 3 curve and a cubic is an order 4 curve. P1 represents the vector of the control point coordinates such as [xi,y1] for a 2-space curve, and likewise P(t) represents the vector of solution equations for the coordinates of points along the curve, such as [x(t),y(t)]. If the number of control points exceeds the order of the curve, the B-spline will have more than one segment. In this case, only a subset of the of control points is used for each segment, because the result of formulation of the blending functions associated with these unused control points for the segment are zero. Examples will be given in the following sections. The number of segments the curve will have can then be represented as a function of the number of control points, n41 and the order, k, as number of segments - n - k + 2. (3.2) The parameter range for each segment can be determined by simple observation of the knot vector which will be introduced next. 20 3.1.1 Knot Vector The blending functions are determined by a set of values called knots. They are displayed as [ x0 x1 x2 x3 ...... x1 ] and are called the knot vector. In order to make the curve pass through the end points, the first and last entries of the knot are repeated k times where k is the order of the curve. Thus xO-x1-...-xk_1 and x£_(k-1)- ...-xl-1-xl. The length of the knot vector refers to the number of elements (knots) in the vector. The above vector has length - 1+1. The length is a function of the control points (n+1) and the order (k) of the curve, given by n+k+l. The order of the curve must be less than or equal to the number of control points.[11] For example, if n+1 - 4 and k - 4 (cubic), the knot vector is [ 0 0 0 0 1 1 1 l ]. This curve is a special case of the B-spline, called the Bezier spline, and has only one segment. If n+l-4 and k~3 (quadratic), one possible knot vector is [ 0 0 O 1 2 2 2 ]. There are n - k + 2 - 2 segments. The first segment of the curve, produced by the knot vector shown, has the parameter range t-O to t-l and a zero blending function with the fourth control point. The second segment has a zero function associated with the first control point and covers the parameter range t-l to 12-2. Since each segment of a spline depends on a subset of control points, each control point affects only a local area of the B-spline curve. This property is called locality and is one of the key features of the B-spline. The B-spline curve can be modified by simply moving 21 the locations of some control points. An example of locality is given in Figure 3.2 with n94, kp3 (quadratic), and knot vector [ O O O l 2 3 3 3] which results in a three segment curve. The first segment depends only on the first three control points and has the parameter range of t-O to t-l. The second segment depends on the middle three control points and has the parameter range of t-l to t-2. The third and last segment depends on the last three control points and has the parameter range of t-2 to t-3. In Figure 3.2(a), the location of the first control point is moved, and in Figure 3.2(b) the second control point is moved. A knot vector with a repeated interior knot, such as [ 0 0 0 l l 2 2 2 ] with n94 and k-3, causes the length of the second segment of the curve to be zero. Usually, the order of the continuous derivative of a B-spline curve at its segment joints is up to k-2. For example, the order of a B-spline curve with k-4 should have continuous zero, first, and second derivatives. Repeating a knot once (multiplicity of knot - 2) causes the curve to lose the highest order derivative continuity at the segment joint corresponding to that knot. The relationship between multiplicity m, the order of the curve k, and the order of the derivatives that are continuous across a segment joint is (0, 1, ..., k- m-l). Thus, if the B-spline curve is a cubic (k-4) and an interior knot has a multiplicity of 4, then k-m—l--1 and the curve will be in general discontinuous pointwise and in all derivatives. 22 (b) Figure 3.2. Example of locality (a) effect of moving the first control point, (b) effect of moving the second control point. 23 It is not necessary that the knot values be integer. If all interior knots are consecutive integeters, this type of knot vector is called a uniform knot vector. Relaxing a uniform knot vector to allow repeated interior knots an produces enhanced uniform knot vector [1]. The general knot vector with monotonically non-decreasing real knot values produces a nonuniform knot vector, for example [ O O 0 .3 .7 1 1 l ]. 3.1.2 Blending Functions As illustrated in the previous examples, the relationships between the knot vector, the control points, and the curve are that the curve is defined by the control points and the blending functions which are functions of the knots. The formal definition of blending functions are 1 if xi 5 t < x1+1 N1 1“) " 0 otherwise (t - x ) N (t) (k - t) N (t) ”1 km _ 1 i,k-1 + 1+1: i+1,k-1 . (3.3) xi+k-l ’ xi xi+k ' x1+1 The x1 is the value of the ith position of the knot vector, for example: x-[oooorzaass 7 s 9 9 9 9]. i - 0 1 2 3 4 5 6 7 8 9 10 ll 12 l3 14 15 From equation (2.3),if xi - x1+1 , then Ni,l(t) - 0, which causes the 24 curve to lose the highest-order derivative continuity as mentioned in the last section. For each segment there is one Ni,1(t) that is nonzero. This nonzero function cascades down to the final blending functions associated with that segment as shown in the Figure 3.3. Note that the number of nonzero blending functions per segment is equal to the order of the curve.[ll] "1,1. .Ni-1,2 .Ni.2. ”W ’ “is? “ii-t Ni'ir‘ Ni'2:; Ni'1:; Ii'? Figure 3.3. Blending function cascade. A clear example of blending functions quoted from [2] is: Control points : (0,0) , (2,2) , (3,1) , (3,0) Order : k - 3 Knot vector : [ 0 0 0 1 2 2 2 ] l - 0 l 2 3 4 5 6. 3 From equation (2.1),the spline equation is P(t) - Z i 0 Pi Ni,3° There 25 are four blending functions NO,3’ N1’3, N2’3, and N3,3 that need to be formulated. Tracing from the bottom of the tree shown in Figure 3.4, we know which blending functions need to be computed. o o 1 it one I it l(t<2 o ‘ o s”... NI. N... AC. ,N.. N 3', \\ / \ / \ / \\ / \ 0/ edidz Iqtz I‘ll ”‘11, xbrzl \ / \ / \ / \ / N N N OJ 13 21 ”'31 Figure 3.4. Blending function tree. In this example, only two of N are not zero, which agrees with i,1 the fact that the curve will have two segments. The function tree used for determining the first segment is given in Figure 3.5 Therefore, the blending functions N N 3 , and N2 3 are nonzero and will be a 0,3 ’ 1 function of N1,2 and N2.2 into equation (3.2). Then one can get the following blending functions: found by substituting the corresponding values (x - t) N N1,2 - - 1 - t (3.4) x - x 3 2 26 Figure 3.5. Blending function tree - segment 1. (t - x ) N 2 2,1 N2,2 ' x _ x ' t (3.5) 3 2 (t - X ) N (x ; t) N No 3 - 1 012 + 3 1'2 - c2 - 2c + 1 (3.6) x2 ' x0 x3 ' x1 (t - x ) N (x - t) N N1 3 - 1 1'2 + a 2'2 - -1.5c2 + 2: (3.7) x3 ‘ x1 x4 ‘ x2 (t - x ) N (x - c) N N2 3 - 2 2'2 + 5 3'2 - .5c2 (3.8) xé'xz XS'XB Note that the convention of 0/0 - 0 is used when computing the blending functions. For the second segment the blending functions N1 3, N2 3, and N3 3 are nonzero and can be formulated by N2 2 and N2 3 as shown in Figure 3.6. These are computed in (x4 - t) N 27 a manner similar to the first segment as: 3,1 _ 2 _ t "a'x3 (t-x)N 3 3,1 _ t _ 1 “4"‘3 (t - x ) N (X ' t) N 1 1.2 + 4 2'2 - ,5c2 - 2t + 2 X3-X1 xa'xz (c - x ) N (x - t) N 2 2.2 + 5 3.2 - -1,5t2 + 4c - 2 xa'xz x5.x3 (II-X)" (X‘t)N 3 3.2 + 5 4'2 - t2 - 2t + 1. xS-x3 x6°xh (3.9) (3.10) (3.11) (3.12) (3.13) Figure 3.6. Blending function tree - segment 2. 28 Table 3.1 summarizes the blending functions computed above and Figure 3.7 gives a plot of the blending functions. After the blending functions have been determined, the spline equation of the first segment, 0 s t < 1, can be expressed as F t2 - 2t + 1 ‘ x(t) 0 2 3 3 2 - -1.St + 2: (3.14) y(t) 0 2 1 0 2 0.5t b O A and for the second segment, where 1 s t < 2, . o . x(t) 0 2 3 3 2 - 0.5: - 2t + 2 (3.15) y(t) 0 2 1 0 2 -1.St + 2t - 2 . t2 - 2: + 1 Table 3.1. Blending function table of the example. 1 Ni 3(1:), 0 s t < 1 N1,3(t)’ 1 s t < 2 o c2 - 2: + 1 o 1 -1.5c2 + 2t . 0.5:2 - 2: + 2 2 0.5:2 -1.5c2 + 4c - 2 3 0 t2 - 2t + 1 29 l . 0 d ...e I 8 t / I L 0 0 . I a, / Q‘ N 3 ‘ I/ 0.3 N 3.3 .7 e s ,p“ . m‘ A 9 O 3 1" ‘.§a. .z”' ‘Na, / e r. O ‘. ’ g e. I N g \. / h N \\ d .0. ./ u \ ’ u \ F f ‘. ” .\ \(\ u 0 ° “V I. . e I ‘X ‘ ‘1 n f a git 3% ’7 \ c o e i \ \ t d 'l x s\. I \\ i e, 4 I ‘. j r \\ o o e 2 "I ‘v. ’7 ’ ¥‘ I \‘ fl / ’ h. ‘\ \ 3 it I a .e. ‘ \ \ f x” / ‘s ‘\ ' 0 ° e...“ I .‘. . ' o.- I T I I. W I f I r I I I F“? ‘l 0.0 0.5 1.0 1.5 2.0 Pam-eta: - t Figure 3.7. Plot of blending functions. The plot of this two segment curve is shown in Figure 3.8 along with the polygon of control points; the segment joint is marked with an X. As demonstrated in the above example, when the blending functions for a segment of a B-spline are cmputed, only a subset of the knot vector is used. knot vector. This subset of the knot vector is called the effective The length of the effective knot vector is equal to 2(k- l), where k is the order. In above example, the length of the effective 30 4:? ate 1:! Figure 3.8. Third order B-spline. knot vector is 2(3-l)-4, which agrees with the computation of blending functions. The only knot values used in the computation of the blending functions of the first segment are x and x4, or [ O 0 1 2 ], 19 x2! x3! while the blending functions for the second segment use only the knot values x2, x3, x4, and x or [ 0 1 2 2 ]. 5’ 3.2 Rational B-splines A rational B-spline is one in which the control points are expressed in homogeneous coordinates. A point is given an additional coordinate which can be thought of as a scale factor. Instead of P(x,y) a point is represented as P'(wx, wy, w). The last coordinate is the 31 homogeneous variable. There are an infinite number of homogeneous representations of a single coordinate point, for example the coordinate x-l, y-2, or (1,2), can be represented by (1,2,1), (2,4,2), or (5,10,5). If the homogeneous variable is equal to 1, then the other variables represent the actual Cartesian coordinates. If the homogeneous variable, also called the weight in rational B-splines, is not equal to 1, then the transformation back to actual coordinates, P(x,y,l), is calculated by dividing each coordinate of P' by the calculated value of w. There is no effect on the blending functions when the control points are expressed in homogeneous coordinates, because the blending functions depend only on the knot vector. A point WP(t) along the spline in homogeneous coordinates can then be expressed as n time) - X NC?1 * N1 k(c), (3.16) i-O ’ where the WCP1 are the homogeneous coordinate values of the weighted control points. WP(t) represents wx(t), wy(t), and w(t), which is the vector of homogeneous coordinate values. The Cartesian coordinate values of points P(t) along the spline are then calculated as follows. wx(t) x(t) - -—————— (3.17) w(t) wy(t) y(t) ' ---- (3.18) w(t) Extending the earlier example,let the weights of control points 32 (0,0), (2,2), (3,1), and (3,0) be 1, 2, l, and 1, respectively, for example. The control points in weighted or homogeneous form are: '0 I (0.0.1):‘ '11 I (4.4.2); '11 I (3.1.1); P - (3,0,1). The computation of coordinates WP(t) for the first segment is then straightforward. wx(t) ' o a 3 c2 - 2 c + 1 wy(t) - o a 1 -1.5 c2 + 2 t w(t) . 1 2 1 L .s c2 . . 2 1 -4.5 t + 8 t - - 2 (3.19) -5.5 t + 8 t h -1.5 t2 + 2 t + 1 . -a.5 :2 + 8 c -s.5 :2 + 8 c Thus x(t) - 2 83d Y(t) ' 2 -1.5 t + 2 t + 1 -l.5 t + 2 t + 1 for 0 s t < 1. Notice that x(t) and y(t) are now ratios of polynomials in t. Similarly, we can calculate the WP(t) for the second segment. F . wx(t) a 3 3 .5 c2 - 2 c + 2 wy(t) - a 1 o -1.5 c2 + a t - 2 w(t) 2 1 1 _ c2 - 2 c + 1 . 33 .S t2 - 2 t + 5 _ 2 (3.20) S t - 4 t + 6 b5t2-2t+3‘ .5c2-2c+5 .5t2-4t+6 Thus x(t) - 2 and y(t) - 2 .St-2c+3 .5t-2t+3 for l s t < 2. The solid line curve in Figure 3.9 is the rational B-spline derived above and the dashed line is the nonrational B-spline from the previous example. Note how the double weight on the (2,2) coordinate pulled the curve toward it.[ll] Figure 3.9. Strictly rational B-spline. 34 Weighting each of the control points gives another degree of freedom to the curve per control point and allows the representation of curves that can not be fitted with nonrational B-splines, such as circular arcs derived from conic sections. The example shown above is restricted to 2-dimensional curves. The representation of curves in 3-dimension space (x(t), y(t), and z(t)) with B-splines can be achieved by simply adding another row to the control point matrix for the calculation of zw(t). 3.3 B-spline Surface The extension from the spline curve to the surface patch is done by adding a second parametric coordinate and an associated knot vector. The most commonly used surface representation is the four sided patch. Each point on the patch is a function of two parameters, s and t. One edge or side of the patch and its opposite side are chosen as curves that depend only on s and the other two edges are functions only of t. Both s-varying sides share the same knot vector, as do both t-varying sides. The s knot vector does not have to equal the t knot vector. They can be of different orders and have a different number of control points. If the s edges have m+1 control points and the t edges have n+1, then there will be a total of (m+1)o(n+l) control points. The control points joined together by straight lines form what is called a control point net. The general shape of the patch will mimic the net (more precisely, the surface is contained within the convex hull formed by the control points). 35 The general formula for determining the coordinates of a point on the B-spline surface is P(s,t) -130 1:0 M],h(s) - P1,] - Ni,k(t) (3.21) or in matrix form P(s,t)- M0,h(s) Ml,h(s) .... Mm,h(s) PPOO P01 .... Pogq. N0,k(t) I (3.22) P10 P11 .... Pln Nl,k(t) .ng _.njk.., J where 1.1 n+1 m+l N11 M11 is is is is is is is one the the the the one 0116 of the (n+1)-(m+l) control points, order of the curve in the t direction, order of the curve in the s direction, number of control points in the t direction, number of control points in the 3 direction, of the n+1 blending functions of order k, of the m+l blending functions of order h. Just as the B-spline curve consisted of segments, the Bospline surface consists of subpatches. The s edge B-splines have (m-h+2) segments and the t edge B-splines have (n-k+2) segments. Each subpatch is formed by a segment of the s edge paired with a segment of the t edge. Therefore, the number of subpatches is (m-h+2)-(n-k+2). Each 36 subpatch covers a range of s and a range of t. Example [2]: g_gdgg§ m+l-4, h-3 (quadratic), knot vector [ 0 0 O l 2 2 2 ] ;_§ggg§ n+l-2, k-2 (linear), knot vector [ 0 0 1 l ] There are two subpatches in this case, since there are two segments in the 3 direction and one in the t direction. One subpatch covers the range of s-O»1 and t-Oel, and the other covers the range of s-1~2 and t-Oel. There are altogether (n+1)-(m+l) - 8 control points, but each subpatch use only a subset of the control points. The number of control points that contribute to each subpatch is h-k, which is 6 in this example. The subpatch with the range of s-Oel and t-0~1 has the following blending functions. 2 ”0,3(8) - s - 23 + 1 N0,2(t) - 1 - t M1’3(s) — -1.5s2 + 23 N1,2(t) - c M2'3(s) - 0.532 M3,3(s) - 0 The coordinate of a surface point on this subpatch can then be calculated as ’ P P l P(s,t) - 32-25+l -1.532+25 0.532 00 01 l-t P10 P11 t (3.23) P20 P21 37 The first column elements of the control point matrix are the first three control points of one of the s edges. The first row elements of the control point matrix are the two control points on one t edge. The surface points for the other subpatch with the range of s-leZ and t-O~1 are computed by p a P P P(s,t)- 0.532-25+2 -1.532+as-4 32-23+l 0.5e2 1° 11 l-t P20 P21 t (3.24) lP3° P31 If we choose the control points given in Figure 3.10 for this example, we can generate the B-spline surface shown in Figure 3.11.[1l] More complex surfaces may be constructed if a weight is also assigned to each control point of the B—spline surface. To determine Figure 3.10. B-spline control point net. 38 We!” 5p, SUXKNCH 2 y ,. fl. Figure 3.11. B-spline surface. points on the surface, equation (3.16) is used in the homogeneous coordinate system as described above._ More examples of B-spline curves and surfaces can be found in [11] and in [1]. The next chapter will give a literature review of the work being done by others related to surface representation and evaluation. CHAPTER IV LITERATURE REVIEW A fundamental area of interest in computer graphics and computer- aided geometric design is the representation of a 3-space object whose shape is delineated by free-form surfaces. Various mathematical models as well as special purpose hardware have been developed to meet the requirement for implementation in an interactive computer-aided geometric design system. This chapter presents a review of techniques in use for the display of free-form surfaces. 4.1 Uniform Subdivision There are two stages involved in displaying parametrically defined surfaces in most existing system -- creating a polygonal approximation of the surface and displaying the approximation by use of the standard polygon display techniques [10]. Uniform subdivision is the simplest technique to produce the polygonal database for displaying the surface. In uniform subdivision, a single patch is divided uniformly in parameter space and the resulting mesh of evaluated points are treated as vertices of the approximating triangles, for example. The more subdivisions, the smoother the approximating surface. 39 40 Note that the vertices of the approximating triangle are used for interpolating the points on the patch. Yet the control points used in B-splines define the patch rather than interpolate the patch. This is the key difference between B-spline and piecewise linear approximation. The first attempt at displaying a bicubic surface, a surface with order equal to 4 in both the s and t directions, directly from the parametric mathematical definition of the surface (i.e., without polygonalizing) was done by Catmull [13]. Catmull subdivided the patch in parameter space down to the pixel basis. Then the subpatch was passed to displaying and shading routines. 4.2 Adaptive Subdivision Adaptive subdivision, proposed by Lane and Carpenter [14], adaptively divides a patch into polygons according to the flatness of the patch. A "flat" patch can be displayed as a single polygon while a curved patch is subdivided into smaller subpatches until each of the subpatches meets the criteria of the flatness test. Compared to uniform subdivision, adaptive subdivision generates ”enough" polygons to adequately represent the surface whereas the uniform subdivision spends a longer time to generate ”more than enough“ polygons [10]. All surface patches are represented in terms of an appropriate tensor product of Bernstein basis functions. Using Bernstein basis functions, curves are generated by interpolating their first and last control points. Then the points are assembled into quadrilaterals. The 41 flatness test is carried out on the quadrilaterals, which are further subdivided if necessary. The final subdivision produces polygons for display by a conventional routine [10]. In order to decide which portion is the most curved one of the segment or patch, the calculation of curvature is usually involved. Yet the evaluation of curvature is difficult and time-consuming. Van Book has since developed a method similar to the Lane-Carpenter method, but in which no calculation of curvature is necessary. Van Hook evaluated more than enough points on the surface to achieve the flatness test. This method was classified by Whitton [10] as a semi-adaptive subdivision. 4.3 Cox-deBoor Algorithm The most popular B-spline algorithm in many CAD/CAM systems is the Cox-deBoor algorithm [16,17]. It works well for point evaluation as well as the derivative evaluation of the point. The algorithm for formulating the blending functions presented by deBoor [17] is reproduced below: N(1,l) - l; for s - 1 to k-l begin DP(s) - t -t; 1+8 DM(s) - t-ti+l-s; 42 N(l,s+l) - O; for r - 1 to s begin M - N(r,s) / ( DP(r) + DM(s+l-r) ); N(r,s+l) - N(r,s+1) + DP(r) * M; N(r+1,s+1) - DM(s+l-r) * M; endfor, endfor. Where k is the order of the blending functions, t1 are the knots of the knot vector, and t1 5 t < t1+1. The B-spline is then evaluated by k-l P(t) - 2 CP . N (c), (4.1) i-O i i,k where Ni k is the kth column of N in the above algorithm. Note that N is a triangular matrix. To evaluate derivatives, the appropriate column of N would be used. The general routine given by.I deBoor for derivative evaluation is k- -1 me - - - - (k-j) Aim - Ni 1w(t). (4.2) 1-0 ' 43 where j is the order of derivative and 0 s j < k, N th i,k-j is the (k-j) column of N, and (0) _ A1 CPi’ (.1) _ (J-l) _ (J-l) A (A1 A 1 1 1- t (4.3) ) / (ti+k-j ' 1" 4.4 Pickelmann's Algorithm One problem embedded in the Cox-deBoor algorithm is that when the knot values of the knot vector become very large, the error will also increase. Knot vectors are, in Pickelmann's work [1], translated (or shifted) so that the parameter range being evaluated (the middle of the effective knot vector) is always [0,1). This translation both limits the number of possible effective knot vectors and has a beneficial effect on numerical accuracy of the blending function calculations. The CURBS algorithm proposed by Pickelmann [1] thus produces a three-to-one reduction in machine operations over the Cox-deBoor algorithm, for cubic splines. For higher order splines, the reduction ratio is even larger. From the last chapter, only the effective knot vector (a subset of the knot vector is used to compute any given blending function. Also, the length of the effective knot vector is 2(k-1). Pickelmann showed that a B-spline with an infinite number of possible knot vectors can be processed as several individual subsegments, each with one effective knot vector. In what follows, only enhanced uniform knot vectors will 44 be considered (note that Pickelmann showed that this is not actually a restriction on the class of all nonuniform rational B-splines; he derived an algorithm for converting an arbitrary NURB surface into one with an enhanced uniform knot vector). And for EURB knot vectors, using the knot vector translation just mentioned, the number of possible effective knot vectors is finite and equal to 2(2k-4). For a cubic, k-4, there are 2(2o4-4) or 16 possible effective knot vectors for a subsegment. For k-S, there are 64 possible knot vectors. Each of the 2(2k'4) knot vectors yields a unique set of blending functions. In order to easily determine which set of blending functions should be used for a given subsegment, each subsegment is given a label or pointer to the correct set of preformulated blending functions. The label is determined by a special subroutine in which the effective knot vector is compressed into a binary-encoded label. Then the shape of a subpatch is formed by the labels in both the s and t directions and the corresponding control points. To get a clear picture of the algorithm, an example is given as follows. W §_§ggg§ - number of control points : m+l - 4 order : h - 3 knot vector : [ O O 0 l 2 2 2 ] §_§gggg - number of control points : n+1 - 4 order : k - 3 knot vector : [ O O O 1 2 2 2 ] 45 Control point matrix : P00 P01 P02 P03 P10 P11 P12 P13 P20 P21 P22 P23 P30 P31 P32 P33 For k-3, there are 2(2.3-h) - 4 possible knot vectors. The above knot vector uses only two of them, [ 0 0 l 2 ] and [ 0 l 2 2 ], and the second effective knot vector can be translated to [ -l 0 l 1 ] which will have the same label as [ o 1 2 2 1. The label of [ 0 0‘1 2 1 is LABl - 2, and the label of [ -l 0 l 1 ] is LABZ - l. The way to label the effective knot vector was given in [1]. Since (n-k+2) - 2, we know there are two segments in both the s and t directions, which results altogether in four subpatches. The blending functions of one of the four subpatches can then be retrieved from the preformulated blending function basis according to LABl and LAB2. The labels and the corresponding control points used to form the subpatch are given in Table 4.1. Figure 4.1 shows the B-spline control net and Figure 4.2 gives the B-spline surface. Conceptually, the four subpatches have different parameter ranges .as shown in the above table. But for purposes of computation, subpatches can be treated individually as if each of them had the parameter range of s-Oel and t-Oel, so long as the "bookkepping" of the proper control point indices is maintained. 46 TABLE 4.1. Subpatch and corresponding control points subpatch label of s label of t control ponints P P P s _ 0‘1 00 01 02 LABl LABl P P P t _ 0+1 10 11 12 P20 P21 P22 P P P s _ 1‘2 10 11 12 LABZ LABl P P P t _ 041 20 21 22 P30 P31 P32 P P P s _ 0‘1 01 02 03 LABl LABZ P P P t _ 1+2 11 12 13 P21 P22 P23 P P P s _ 1‘2 11 12 13 LAB2 LABZ P P P t _ 1‘2 21 22 23 P31 P32 P33 47 Figure 4.1. B—spline control point net [11]. Figure 4.2. B-spline surface [11]. CHAPTER V SURFACE EVALUATION AND SHADING This chapter describes the surface evaluation algorithm for both nonrational B-splines and rational B-splines. The computation of the derivative and the normal vector of the points on the B-spline surface are given. Finally, based on the normal of the surface, the concept of shading is introduced and the shaded image of the B-spline surface is visualized. 5.1 Evaluation Algorithm From equation (3.20) we know a point on the B-spline patch can be represented in matrix form as P(s,t) - [BF(8)] ' [CP] ' [BF(t)] (5 1) where CP is the control point matrix with size (m+l)x(n+l), and BF(s) and BF(t) are the blending function vectors of s and t, respectively. In the last chapter, we showed that the computation of the points on a subpatch can be treated individually such that the point on a subpatch can now be defined as P'(S.t) - [BF(S)] ' [09'] ° [BF(t)] (5 2) 48 49 where CP' is the corresponding control point matrix with size hxk, and each element of BF(s) and of BF(t) is a polynomial of s and of t, respectively. For a bicubic surface, h-kn4, we may rewrite equation (5.2) as 23 ’1‘ P'(s,t) - [ 1 s s s ] M4x4 CP'4x4 N4x4 . (5.3) t t2 btad u4x4 and N4x4 are the blending coefficient matrices by which the blending function is formulated. This assumes that all the blending coefficient matrices have been preformulated according to the labels determined from the effective knot vector. Note that the formulation of the blending functions is done only once and the results are stored for later use. The algorithm, as stated below, also assumes that the subsegment labels are calculated or read in with the control points that define the surface. The algorithm requires the following information for evaluating a subpatch: . 1) corresponding control point matrix for the subpatch (CP'); 2) subpatch labels in s and in t direction (LABS and LABT); 3) orders of the subpatch in s and in t direction (h and k); 4) parametric value (between 0.0 and 1.0) (s and t); 5) the value of rational indicator (RATNOT). 50 The major steps of the algorithm for evaluating a subpatch are: Step 1: Step 2: Step 3: Step 4: Step 5: Use LABS and h to retrieve the proper set of blending coefficient matrix, uhxh' Similarly use LABT and k to retrieve kak. Construct the a vector [ l s ... sh.1 ]; construct the t vector [ 1 t ... tk"1 ]. Evaluate blending functions h- [BF(s)l-[ls..-s ll-Iflhxh]; k-l T [BF(t)] - [ kak ] s [ 1 t ... t ] Calculate homogeneous points WX - [BF(S)] - [CP’(X)l - [BF(t)]; WY - [BF(S)] ° [CP'(y)l ° [BF(t)]; WZ - [BF(S)l ° [CP'(2)] - [BF(t)]; and if it is a strictly rational B-spline, then W ' [BF(S)] ° [CP'(V)] ' [BF(t)]- Calculate the three-dimensional point if strictly rational X - wX/W: Y - wY/W; z - w2/w. 51 5.2 The Derivative and Normal Vectors If the derivatives of the point with respect to the parametric variable are required, the algorithm must be modified. The appropriate derivative of the a vector and t vector are taken and used in step 3 to evaluate an alternative set of blending functions. This set of blending function derivatives is then used in step 4. An example to get the derivative of bicubic patch in both the s and t directions is given below. 2 P 1 1 3.113.? ' l 0 1 23 39 ] M4x4 CP 4x4 N4x4 t (5.4) t2 b t3 d - 1 ‘2 3 M CP' N ' o - S 5 §£1§251 [ 3 3 3 1 4x4 4x4 4x4 1 ( ° ) 2:: D 3:2 d If the surface is nonrational, then aX/as, aY/as, and 62/83 can be obtained by substituting CP'(x), CP'(y), and CP'(z), respectively. The derivatives in the t direction can be derived similarly by applying equation (5.5). If the surface is strictly rational, then the chain rule must be employed to replace step 5 to calculate the derivative. For example, to calculate aX/as it would be necessary to use 52 dX__§.LliXAil_.L-_M__ -.&H.- (5.6) as as W as as The tangent vector to parametric curve P - P(s,te), where to is a constant, is a multiple of the vector aP/as - ( aX/as aY/as aZ/as ). Similarly, the tangent vector to the curve P - P(se, t) is a multiple of aP/at - ( aX/at aY/at 62/8t ). The tangent plane at the intersection of these curves at P(se, to) contains these two tangent vectors, so that the normal to the surface is a multiple of their vector product (see Figure 5.1). The unit normal vector n is then given by Figure 5.1. Normal vector and tangent vectors of a point on the surface [18]. 53 X n ' | aP/ae x aP/ac | i j k where ( aP/as x aP/at ) - 8X/as BY/as 32/33 - ai + bj + ck BX/at av/at 82/8t and | aP/as x aP/ac | - (a2 + b2 + c2)“. 5:3 Shading The process of taking a three-dimensional description of a scene and generating a two-dimensional array (typically 1024x1024) of intensities (pixels) that will be displayed on a CRT is called rendering. The description of a scene may be modeled by polygons, parametric patches, or splines. Generally, rendering of a scene also needs information about the (one or more) light sources, such as the intensity, direction, and model of the light sources. Many alternative models for light sources and their interactions with surfaces are in common usage. Rendering begins by transforming the objects into the eye- coordinate system, or what is called viewing transformation (clipping, perspective transformation also included) in which objects outside the field of view are clipped away (see Figure 1.1). The remaining objects must be compared to decide which objects, and/or portions of objects, are visible from the view point. Then the hidden surface removal 54 problem is taken into account. Three popular hidden surface removing algorithms are the z-buffer, priority, and scan-line algorithms [20]. The next step in visualizing an image, after hidden surfaces have been removed, is to shade the visible surfaces taking into account the light sources, surface characteristics, and the positions and orientations of the surfaces and sources. Next, we examine the interaction of different components of point light sources shining on the surfaces of objects. The light reflected from a surface can be modeled in a common simplification with two components, diffuse and specular. For diffuse reflection, scattering of light is assumed to be equally in all directions, so that the surfaces appear to have the same brightness from all viewing angles. Ideal specular surface, however, reradiates light in only one direction, the reflected light direction. For real objects, the reflected light contains both diffuse and specular components. For example, illuminate an apple with a bright light; the highlight is caused by specular reflection, while the light reflected from the rest of the apple is caused by diffuse reflection. In order to generate realistic images, both components need to be taken into consideration. In Figure 5.2, E is unit vector that points to the eye while L is unit vector to the light source. E' indicates the reflected eye direction and N is the unit normal vector at the point P. It is very simple to compute the diffuse component. According to Lambert's Law, the diffuse component can be calculated by N-L, the inner product of N and L. The computation of the specular component, however, is much more 55 Figure 5.2 The geometry of shading [12] complicated. An approximation of the specular component proposed by Phong [21] took the form "(L)'Cosn(a) (5.7) where t is the incident angle and a is the angle between E' and L. W(t) is used to model the change of the ratio of incident light to reflected light as the incident angle changes. In practice, according to [20], W(L) has been ignored by most implementers. The order of cos(a), n, is the shininess factor. When L and E' in the same direction, i.e. a-O, cosn(a) reaches maximum, and it will decrease very fast in the off-axis direction as n increase. 56 More complex methods to calculate the specular component can be found in [19]. The modeling of a third component, termed ambient light, is given also in [19]. Once we know how to shade a point, we can shade a surface according to the same principle. Rather than going through the extra shading computation for every point to be displayed (i.e., pixel), several approximation methods have been proposed [21,22]. The drawback of these approximation methods is that they do not generally represent the original surface precisely. If high-speed hardware was available to reduce the computation cost of shading for each point, more precise visualizing of a surface would then be possible. Based on the B-splines and the algorithm introduced in previous chapters, a VLSI chip architecture is proposed in the next chapter which performs a crucial part of the work to achieve real-time shading, now a bottleneck of interactive computer graphics. CHAPTER VI ARCHITECTURE DESIGN The architecture of a computational system can be characterized in many ways depending on the designer's view as well as the design level. The architecture here is collectively defined by a specification of the functional capabilities of its physical components, the logical structure of their interconnections, the nature of the information flow, and the control mechanism. In the section on system specification, the design objectives and criteria as well as special functional configuration requirements will be presented. The functional units and the clocking scheme used in the design are discussed in detail. The data flow is then manipulated in such a way that maximum system throughput is achieved. The architecture which meets the system specification is then presented in block diagram form. The algorithm of surface evaluation is expressed in the form of a program written in RTL (Register Transfer Language). 57 58 6.1 System Specification 6.1.1 Design Objective The objective is to design a basic VLSI chip architecture dedicated to the fixed-point computation of the surface evaluation for a host graphics processor. This chip will work as a coprocessor to the host. It will be activated whenever a surface evaluation request is invoked; otherwise, it remains in a quiescent mode. The following criteria have been established, based on the application requirement analysis, to guide in making various design decisions. CMOS technology with a feature size of 1.5pm or below and a clock of up to ZOMHz are to be used. 0 Blending coefficient matrices will be stored in ROM and the blending functions will be generated on chip in order to avoid possible I/O bottlenecks. o In order to simplify the overall design, the surface to be evaluated is assumed to be defined by a bicubic patch. 0 Intra-functional 'unit pipelining is to be used in order to achieve higher system throughput. 0 Minimum cycle time and minimum interconnections are sought, with the former being a major concern. In the design of the SE (Surface Evaluator) chip, the three following rules from Mead and Conway [23], which are adequate to guarantee the correct functioning of a bus-oriented chip design, are 59 rigidly followed: Rule 1: The system is driven by a two-phase, nonoverlapped clock; input of phase one is from phase two output and vice- versa. Rule 2: Functional units are timeless C/L (Combinational Logic); operations on data are separated by the two-phase clock. Rule 3: A standard bus scheme is used; data transfer through the bus occurs during phase one, and the bus is precharged during phase two. 6.1.2 I/O Specification The input to the proposed SE chip consists of LABS and LABT, values of both a and t, an information bit for indicating rational or nonrational, and the three (four if rational) 4-by-4 control point matrices. The output of the SE chip is the three-dimensional point (x, y, 2). If the normal vector at this particular point is desired, further calculations are needed. All of the information, except for the rational indication bit, are 32-bit, two's complement, fixed-point representations. Necessary pre-scaling and post-scaling of the data are assumed to be done by the host processor. Communication of the control point matrices is the bottleneck of the SE chip. Efficient pipelining of the input stage and the functional units of the SE chip in order to obtain the highest throughput, is, therefore, the major design endeavor. 60 6.2 Architecture Development 6.2.1 Functional Units The elementary arithmetic operations needed for the surface evaluation calculation are multiplication, addition, and division. The schematic block diagram of a Baugh-Wboley two's complement array multiplier [24] is shown in Figure 6.1. It is the basic structure of a multiply-and-add cell (MAC). J. be ‘4 b: a For i.) ' 0. 1. 2. 3 and (i,1) - (4, 4) I 05.33 osj$3 “ego “350 “2‘0 ‘1 1’0 ‘050 ,b zb‘ - .M o o o o 1.0, ‘ a. b. 1 z “:1 Q , o mimic We 0 f l Figure 6.1. The block diagram of a S-by-S Baugh-Wooley two's complement array multiplier [24]. 61 The first step of the surface evaluation calculation is to form the blending functions of s and of t. If we compute the blending function of s, for example, by the multiplication of vector [1 s 32 s3] and the corresponding blending coefficient matrix chosen according to LABS, we need to calculate the square and the cube of s beforehand. This requires at least two more cycles to perform. However, there is an alternative way of forming the blending functions, which does not require computing the square and the cube in advance. If the MAC, represented in block form shown in Figure 6.2(a), is used, then the polynomial of a variable x with order equal to 3 can be obtained by using three MAC's as shown in Figure 6.2(b). Calculation of the first derivative of a polynomial of x with order - 3 can then be performed by the block diagram shown in Figure 6.2(c). Based on this approach, Figure 6.3 gives the data flow and the structure to calculate the blending function of s as well as its first derivative. The blending function of t and its first derivative can be obtained similarly. Note that in Figure 6.3 there is a need to multiply by 2 and 3 (*2 and *3) in the forming of the first derivative. Actually, no multiplication is necessary here. Multiplication by 2 can be achieved by just shifting the input data one bit left and multiplication by 3 can be obtained by adding three duplicates of the operand. The reason for adding the 4-stage delay is that piplined MAC will has 4 segments as will be covered shortly. In the evaluation process, many matrix-vector multiplications are involved. After multiplying elements of the vector [1 s 52 $3] with 62 (a) P(x) =(( DBFT(3) DBFT(4) Figure 6.3. The block diagram for calculating the blending functions for evaluating the point and the first derivative. 64 the corresponding elements of a row (or a column) of the blending coefficient matrix, we need to sum up the products to get the final result. The MAC can be used to achieve matrix-vector multiplication as given in Figure 6.4. This requires four MAC cycles to get the final result; a time cost that is too high. However, we may perform the multiplications concurrently and then add the products in one step. In so doing, we need a special multi-operand adder. A four-operand adder formed by carry-save adder (CSA) trees is given in Figure 6.5. Note that the '~' on the carry output lines indicates that the carries are shifted left one bit with a zero entering from the right end before they are fed into the CSA inputs at the next level. The partial carry and partial sum from the last stage of the CSA is then feed into a carry lookahead adder. The functional block diagram of a full carry lookahead adder is shown in Fig. 6.6 and the details of each unit are given in Figures 6.7 and 6.8. X1—‘ — - . ~~= ts—iI—‘m gnaw ...... ‘ of X and Y Figure 6.4. Block diagram for calculation of the inner product of X and Y. 65 Figure 6.5. A four-operand adder formed by CSA tree. LkhchvI-I. III Carry Generate/Propagate Unit. 8-bit CLA Unit c4: C- Summe tion Unit C-$(——-(A+I)+C. Figure 6.6. The functional block diagram of a full carry lookahead adder. l AHA—"+1 Figure 6.7. Logic circuits for realizing the carry-generate, carry-propagate, and sum functions. 67 P6, P6,, PG,PG, '?.f:bi£_CL-_A_U2£ C1 Ca C2 C1 Co Figure 6.8. The schematic logic circuit of a h-bit carry lookahead (CLA) unit. The last requried functional unit is a divider. There are three types of cellular array dividers: restoring division, nonrestoring division, and nonrestoring with carry lookahead array dividers. The area cost of the CLA dividers is higher than the other two array dividers, but on the average it is 5 times faster [25]. Since in the SE chip speed is the major concern, the nonrestoring CLA array divider is 68 chosen for the calculation of division in the evaluation of a rational surface. Figure 6.9 shows the diagram of a carry lookahead array divider and Figure 6.10 gives the three basic types of logic cells used in Figure 6.9 [25]. N0 5‘ ”I " ”I ‘ N1 - ”a 0mm N - No. ~,~,~,~,~,~.A,~. o 0 I o o l Dmsov u - Do. 0,0,1“). Ououenl o - on. 0‘ 919,0. aIEJIIEIINII ”' : ‘I a: fl WNW; :‘TCV—EE .flflflk Figure 6.9. A carry lookahead array divider with 8-bit divisor in two's complement representation [25]. 69 S1,” Cf. O‘a‘s \’ II OSIsn m “'1 i tK' I I I I l I 1 1c! l " | I .._1 q 4:K‘ ..--q -_——J ----J C,‘ (Ea. 8.55! (C) Figure 6.10. Three basic types of cells to be used in constructing the division array of Figure 6.9. (a) The A cell located at the jth digit position of the ith row; (b) the 8 cell located at sign position of the ith row; and (c) the CLA cell for the ith row [25]. 70 6.2.2 Timing and Pipelining In the synchronous discipline of design, tasks are accomplished through a logical sequence of events with each event realized under the constraint of the physical timing of transporting information (electrical signals) from one point to another. This dual meaning of timing is bound by a system-wide clock which serves both as the sequence reference and as the time reference. Hence the design of the clocking scheme has paramount important to the system's correct functioning and performance [26]. The general form of a data path and the corresponding two-phase clock scheme are illustrated in Figure 6.11 and 6.12. Since the evaluation process involves repeated vector-matrix multiplications, it is of great advantage to have a multiplier and an adder form a computational pipeline. This is because the piped operation will save bus bandwidth and increase the overall throughput as well as eliminate the need to store temporary results. From the timing diagram of the two-phase clocking scheme, it is clear that the minimum clock period is the sum of the maximum C/L delays of phase-one and phase-two plus some small propagation overhead and preset time. Intuitively, the C/L delay due to the multiplier may constitute the largest delay component in the minimum clock period calculation and thus one may further be motivated to introduce additional "subpipelining" into the multiplier to shorten the clock period. 71 Phasel 0P1 Phase2 0P2 Phasel i I i I I a I I I I I I I l ——a R C/L R C/L R Figure 6.11. The general form of a data path [23]. r————————-(flock perkxl-————————1 Phase] s/f 1\_.__.__ Phase2 1 \ lax delay lax delay”)l of C/L of c L a .— odd / ...,,‘I .- Deuq'ux . lfi1metthmm Deuw'un Phasel Phase2 for Phasez Pnuun.thne for Phasel Figure 6.12. The two-phase clocking scheme [26]. A pipelined four-segemnt multiplier is given in Figure 6.13 with its clocking scheme [26]. Note that the clocking scheme used here is slightly different from the general single phase latches as shown in R -—-> Dynamic Raghner l-stage dehy nun; ...... l_____ ——a mu I——-a mu ———1 mu Q a; _, 9.19.: ___I¢: Q. R BIN”: in -3InPYa-9I R -aIMPY3 ...... ...J L........J..____J (n) 25; Cycle Time ._/ \__. 1DU I as. R I«)IMI’Y4 4 fl: Is—e. Worst case path delay Figure 5.13. (b) clock scheme. Ihuty'ICyniha 131nm: (a) Pipelined four-segment multiplier and (b) its R 9c 73 Figure 6.12. When using the clocking scheme in Figure 6.12, each C/L section is actually activated during only half of a clock cycle. However, the clocking scheme in Figure 6.13 allows the C/L sections to operate in a single phase and thus allows a shorter clock period; the price is a slight increase in chip area because of the use of dynamic registers. The advantage of pipelined multiplication of the blending functions and control point matrix is in more efficient computation. The data flow and the interconnection between adder and multiplier are illustrated in Figure 6.14. The adder and four four-segment multipliers form a whole pipeline for calculation of the inner product. Note that the above discussion of pipelining doesn't include the divider. It should be segmented into 10 segments for the sake of synchronizing with the system clock. 6.2.3 Storage Scheme The 3E chip architecture takes advantage of parallelism and pipelining, but two key problems have not yet been discussed: I/O and the data storage scheme. In the case of evaluation of bicubic rational surfaces, four 4-by-4 control point matrices need to be fed into the SE chip. Because of the constraint on the number of I/O pins, it is impossible to input the data in a few cycles. But one thing should be noted: as long as the values of s and t and the labels are given, the 1:5 PX(21) t4 PX(11) t3 PV<41> t2 Hun/(31) t1 PV(21) 1:0 PV(11) Figure 6.14. BFTQ) 74 PX(22) PX<12> PV(42) PV(32) PV<22> PV<12) BFT(2) 4-0PERAND PX<23) PX<13) was) PV<33> was) was) BFT(3) PX(24) PX(14) PV<44) PIA/(34) PV<24> PV<14> EFT“) t5 PVl t6 PVZ t7 PV3 t8 PV4 t9 PXl 1:10 PX2 Dataflow of vector-matrix multiplication. 7S formation of blending functions can be overlapped with the input of control point matrices. By so doing, the I/O bottleneck can be reduced if not eliminated. The question remaining is how to store the data in order to support the pipelined process. One straightforward way is to store the control point matrices in registers. The area cost of using registers is quite high and the control of that many registers is complicated. An alternative is to store the data in RAM (Random Access Memory), which is easier to control and occupies much less chip area. In the SE design, the latter is used to store the control point matrices. As stated in the last section, the inner product multiplications are executed concurrently. In order to provide the necessary data for parallel multiplications, the RAM is divided into four modules and each module stores one fourth of the three (four, if rational) control point matrices. But row accessing as well as column accessing are necessary for the matrix-vector multiplication. In order to access either a row or a column of the control point matrices in one cycle, a special skewed storage scheme is used. Let S (stride or skewing distance) be the distance measured in columns that each array row has been shifted relative to the row above it. Let H be the total number of memory modules, m(a ) be the memory 1.1 module number of element a and q(a ) be the local address within 13 ' 13 that module. Then the storage scheme can be represented by the following equations: 76 m(aij) - (i-S + j) mod N 4(611) - i (6.1) (6.2) The storage pattern used in the SE chip is a skewed storage scheme with 8-1 and M—4; an example is given in Figure 6.15. A multiplexing (a) (b) Figure 6.15. Skewed storage pattern with (a) S-0, and (b) S-l. 77 INPUT UUTPUT 01 00 n1 D2 03 n4 0 0 A11 A12 A13 A14 0 1 A21 A22 A23 .A24 _ 'A31 A32 A33 A34 A41 A42 A43 A44 HH ...0 Figure 6.16. Skewed storage with output multiplexing. 78 output circuit of the skewed storage module is shown in Figure 6.16. The table in the figure gives an example of row accessing. If column accessing is desired, proper addressing must be applied to each module in order to get the appropriate elements. 6.3 Architecture of SE Chip 6.3.1 Architecture Synthesis After the presentation of the details of the functional units, clock scheme, and storage scheme, a simplified block diagram of the SE chip can now be drawn as shown in Figure 6.17. When the SB chip is activated by the host processor, one of the two main routines in the control memory will become active according to the rational/nonrational indication bit received from the system control input. One routine is in charge of rational surface evaluation and the other takes care of the nonrational case. The first four input items received by the I/O circuit are the values of s, t, LABS, and LABT, which will be stored in the register file. As long as these four items are known, the formation of blending functions of s and t can be performed in the arithmetic unit by retrieving the proper blending coefficient matrices according to LABS and LABT. At the same time, the I/O circuit will store the incoming control point matrices in RAM based on the skewed storage scheme. The register keeps the blending function vectors, the intermediate output of 79 System ‘ System Data Control input Input/ Output Q TEX m/Wiicmcm 7f '_ L \U/ _- m (com-nor. 77’ point matrices) CONTROL MEMORY II? ii A son I Am 1001c mm (mm ——\, .______7/ com . rumors) :I mm“ IJ/KJ— x(‘I. Fig. 6.17. Simplified diagram of SE chip. 80 the matrix-vector multiplication, and the final result. All calculations, including formation of blending function, matrix-vector multiplication, and division, are performed in the arithmetic unit. According to the functional dependency, the arithmetic unit can be subdivided into two parts: division unit and multiplying-and-adding (M&A) unit. The data flow of the division unit is trivial. It receives data WX, WY, W2, and W (if rational) from the register file and then sends the division results X, Y, and 2 back to the register file. Data flow and interconnection of the M&A unit, however, is much more complicated. The design of the MfiA unit must enable calculation of both blending functions and matrix-vector multiplication with a minimum number of functional units. To meet this requirement, the circuit diagram shown in Figure 6.18 is developed. The interconnection of the M&A unit is determined by only one control signal, MODE. When MODE is low, the M&A unit acts as the circuit showed in Figure 6.3. All the 0th input bits of the multiplexers will be selected, thus the blending functions as well as the first derivatives of the blending functions, can be computed. If MODE becomes high, the M&A unit acts as four parallel multipliers with a four-operand adder to sum up the inner-product, as showed in Figure 6.14. For simplicity, Figure 6.18 shows only the interconnection of the M&A unit with MODE-O. When MODE-l, one of the inputs of the four active MAC's is from RAM and the others are from the register file. Note also that the multiplexing is now implicitly embedded in the four-operand 81 .us22 A: REGISTERZ : Bit<63z0>: DYNREG : Bit<63=0>: LATCHl : Bit<31:O>: LATCHZ : Bit<63:O>: RAM : Bit<31=0>: ROM : Bit<31:0>: var /* basic functional elements 1"/ of MACl upr1_1, MPYZ_1. MPY3_1, MPY4_1, MPYS_1, ADDER or 311:263 : o>: Dl,D2,D3,D4,D5,D6,D7,D8,09,DlO or 112213, MPY2_2, MPY3_2, MPY4_2, MPYS 2 ' MPY1_3. MPY2_3, MPY3_3, MPY4_3, MPYS_3, /* /* storage elements */ BLD(16,4,4) OF ROM: CPM(4,4,4) OF RAM: IT(4,4) OF RAM: RBLDlZp4) OF RAM: MPY1_4 or MPY2_4 or MPY3_4 MPY4_4 MPY5_4 4-operand Adder */ OF OF OF Bit<63:O>: Bit<63=0>; Bit<63:0>: Bit<63=0>: Bit<63=0>: Bit<63:0>: /* /* /* /* /* /* intermediate product term */ segments segments segments segments segments /* blending coefficient matrices */ /* control point matrices */ /* for storing blending functions */ of MACZ of MAC3 of MAC4 of MACS RBLDD(2,4) OF RAM: /* first derivatives of blending functions */ /* dynamic registers and latches */ M1D3, M203, M3D3, M4D3, M503 OF LATCHZ: /* 3-stage delay within MAC's */ /* l-stage delay within MACl and MAC2 */ MlDlo DLY8 OF LATCHI: LAD3 OF LATCHl: M2Dl OF LATCHZ: DLY4 OF LATCHl: /* 4-stage delay unit */ /* 8-stage delay unit */ /* 3-stage delay */ */ */ */ */ */ /* segments of Dividor */ Ll_1, L142 OF DYNREG: /* input latches of MACl */ L2_l, L2_2 OF DYNREG: /* input latches of MACZ */ L3_l, L3_2 OF DYNREG; /* input latches of MAC3 */ L4_1, L4_2 OF DYNREG; /* input latches of MAC4 */ LS_l, L5_2 OF DYNREG; /* input latches of MACS */ LAl, LAZ, LA3, LA4 OF DYNREG: /* input latches of Adder */ LDl, LDZ OF DYNREG; /* input latches of Dividor */ /* registers and control signal */ INPUT, OUTPUT OF REGISTERl: /* input and output buffer */ RS, RT OF REGISTERI: /* registers for parameter s and t */ R(4) OF REGISTERZ; /* registers for WX, WY,WZ, and W */ LABS, LABT OF Bit<3:0>; /* labels of s and of t */ RAT OF Bit<0:o>: /* l for rational, 0 for nonrational */ PATCH OF Bit<0:0>; /* l for patch evaluation, 0 for point evaluation */ ' Begin /* the routine for nonrational spline point evaluation */ Cycle (0): if RAT-O then begin (1): RS<-INPUT: (2): RT<-INPUT: (3): LABS<-INPUT: (4): (5): (6): (7): (8): (9): (10): (ll): (12): (13): (14): (15): (16): /* blending functions (17): (18): (19): (20): 99 LABT<-INPUT; CPM(l,l,l)<-INPUT, Ll_l<-BLD(LABS,1,1), MlD3<-BLD(LABS:1,2), DL4<-BLD(LABS,1,3), DL8<-BLD(LABS,1,4), Ll_2<-RS, LAD3<-BLD(LABS,1,1): CPM(2.1.1)<-INPUT. L1_l<-BLD(LABS,2,1), M1D3<-BLD(LABS,2,2), DL4<-BLD(LABS,2,3), DL8<-BLD(LABS,2,4), Ll_2<-RS, LAD3<~BLD(LABS,2,1): CPM(3' 1’ 1)<-INPUT' L1_l<-BLD (LABS: 3' 1) ' M1D3<-BLD (LABS, 3, 2), DL4<-BLD (LABS, 3, 3) r DL8<-BLD(LABS,3,4), Ll _2<-RS, LAD3<-BLD(LABS,3,1): CPM(1,1,2)<-INPUT, Ll_l<-BLD(LABS,4, 1), M1D3<-BLD(LABS,4,2), DL4<-BLD(LABS, 4, 3), DL8<-BLD(LABS,4,4), L1_2<-RS, LAl<-O, LA2<-LAD3, LA3<-LAD3, LA4<-LAD3, LAD3<-BLD (LABS, 4, 1) o' CPM(2,1,2)<-INPUT, L4 l<-ADDER, L4_2<-RS, M4D3<-M1Dl, L2_1<-MPYl_4, L2_2<-RS LA1<-O, LA2<-LAD3, LA3<-LAD3, LA4<-LAD3; CPM(3,l,2)<-INPUT, L4 _l<-ADDER, L4 _2<-RS, M4DB<-M1Dl, L2_ l<-MPY1_4, L2 _2<-RS- LAl<-0, LA2<-LAD3, LA3<-LAD3, LA4<-LAD3: CPM(l,l,3)<-INPUT, L4 _l<-ADDER, L4 _2<-RS, M4D3<-M1Dl, L2_1<-MPY1_4, L2 _2<-RS- LA1<-O, LA2<-LAD3, LA3<-LAD3, LA4<-LAD3: CPM(2,1,3)<- INPUT, L4 _l<-ADDER, L4_2<-RS, M4D3<-M1Dl, L2_ l<-MPY1_4, L2_2<-RS: CPM(3,1,3)<-INPUT, L3_l<-MPY2_4, L3_2<-RS, LS_l<-MPY4_4, L5_2<-RS, MSD3<~M2D1: CPM(l,l,4)<-INPUT, L3 _1<-MPY2 _4, L3 _2<-RS, L$_l<-MPY4_4, LS_2<-RS, M5D3<-M2Dl: CPM(2,1,4)<-INPUT, L3 _l<-MPY2 _4, L3 _2<-RS, L5_1<-MPY4_4, LS _2<-RS, MSD3<~M2D1: CPM(3, l, 4)<- INPUT, L3 _l<-MPY2_4, L3_2<-RS, L$_l<-MPY4_4, LS_2<-RS, MSDB<-M2Dl; of s ------------------------------------ */ CPM(l,2,2)<-INPUT, RBLD(1,1)<-MPY3_4, RBLDD(1,1)<-MPY5_4: CPM(2,2,2)<-INPUT. RBLD(1,2)<—MPY3_4, RBLDD(l,2)<-MPYS_4: CPM(3,2,2)<-INPUT, RBLD(1,3)<-MPY3_4, RBLDD(1,3)<-MPY5_4: CPM(l,2,3)<-INPUT, RBLD(1,4)<-MPY3_4, RBLDD(l,4)<-MPY5_4: /* ............................................................ */ (23): (24): : CPM(2,2,3)<-INPUT, : CPM(33233)<-INPUT' Ll_l<-BLD(LABT,1,1), MlD3<-BLD(LABT,1,2), DL4<-BLD(LABT, 1, 3), DL8<-BLD(LABT,1,4), L1 _2<-RT, LAD3<~BLD(LABT,1,1); L1 _l<-BLD(LABT, 2, 1), MlD3<-BLD(LABT,2,2), DL4<-BLD(LABT, 2, 3). DL8<-BLD(LABT,2,4), L1 _2<-RT, LAD3<-BLD(LABT,2,1); CPM(l, 2, 4)<- INPUT, L1 _l<-BLD(LABT, 3, 1), M1D3<-BLD(LABT, 3, 2), BL4<-BLD(LABT, 3 3), DL8<-BLD(LABT, 3, 4), Ll_2<-RT, LAD3<-BLD(LABT,3,1); CPM(2,2,4)<-INPUT, Ll_l<-BLD(LABT,4,1), M1D3<-BLD(LABT,4,2). DL4<-3LD(LABT,4.3). DL8<-BLD(LABT,4,4), Ll_2<-RT, LAl<-O, LA2<-LAD3, LA3<-LADS, LA4<-LAD3, LAD3<~BLD(LABT,4,1); (25): CPM(3,2,4)<-INPUT, L4_l<-ADDER, L4_2<-RT, M4D3<-MlDl, L2_1<-MPY1_4,L2_2<—RTLAl<-O, LA2<-LAD3, LA3<-LAD3, LA4<-LAD3: (26): CPM(1,2,1)<-INPUT, L4 _1<-ADDER, L4_2<-RT, M4D3<-M1Dl, L2_ l<-MPY1_4, L2_2<-RTLA1<-O, LA2<-LAD3, LA3<—LAD3, LA4<-LAD3: (27): CPM(2,2,1)<-INPUT, L4_l<-ADDER, L4_2<-RT, M4DS<-M1Dl, L2_l<-MPYl_4, L2_2<-RTLAl<-O, LA2<-LAD3, LA3<-LAD3, LA4<-LAD3: (28): CPM(3,2,1)<- INPUT, L4 _l<—ADEER, L4 _2<—RT, M4D3<-M1Dl, L2_1<-MPY1_4, L2 _2<-RT: (29): CPM(l,3,3)<-INPUT, L3 _1<-MPY2 _4, L3 _2<-RT, L5_1<-MPY4_4, L5 _2<-RT, MSD3<-M2D1: (30): CPM(Z, 3, 3)<-INPUT, L3 _l<-MPY2 _4, L3 _2<-RT, L5_ l<-MPY4 _4, L5 _2<-RT, M5D3<-M2Dl; (31): CPM(3, 3, 3)<-INPUT, L3_ l<-MPY2 J, L3 _2<-RT, L5_l<-MPY4_4, L5 _2<-RT, M5D3<-M2Dl; (32): CPM(l,3,4)<-INPUT, L3_l<-MPY2_4, L3_2<-RT, L5_l<-MPY4_4, L5_2<-RT, MSDB<-M2D1: /* blending functions of t --------------------------------------- */ (33): CPM(2,3,4)<-INPUT, RBLD(2,1)<-MPY3_4, RBLDD(2,1)<-MPY5_4: (34): CPM(3,3,4)<-INPUT, RBLD(2,2)<-MPY3_4, RBLDD(2,2)<-MPY5_4: (35): CPM(l,3,l)<-INPUT, RBLD(2,3)<-MPY3_4, RBLDD(2,3)<-MPY5_4: (36): CPM(2,3,1)<-INPUT, RBLD(2,4)<-MP23_4, RBLDD(2,4)<-MPY5_4; /* ............................................................... */ (37): CPM(3,3,1)<-INPUT (38): CPM(l,3,2)<-INPUT (39): CPM(2,3,2)<-INPUT (40): CPM(3,3,2)<-INPUT (41): CPM(l,4,4)<-INPUT (42): CPM(2,4,4)<-INPUT (43): CPM(3,4,4)<-INPUT (44): CPM(l,4,1)<-INPUT (45): CPM(2,4,1)<-INPUT (46): CPM(3,4,1)<-INPUT (47): CPM(1,4,2)<-INPUT (48): CPM(2,4,2)<-INPUT (49): CPM(3,4,2)<-INPUT (50): CPM(1,4,3)<-INPUT (51): CPM(2,4,3)<-INPUT (52): CPM(3,4,3)<-INPUT (53): Ll_l<-CPM(l,1,l), L1_2<-RBLD(1,1), M103<-O, L2_l<-CPM(l,l,2), L2_2<-RBLD(1,2), M2D3<-O, L4_1<-CPM(1,1,3), L4_2<-RBLD(1,3), M4D3<-O, L5_1<-CPM(1,1,4), L5_2<-RBLD(1,4), MSD3<-O: (54): Ll_1<-CPM(1,2,2), L1_2<-RBLD(1,1), MlD3<-O, L2_1<-CPM(1,2,3), L2_2<-RBLD(1,2), MZD3<-O, L4_l<-CPM(1,2,4), L4_2<-RBLD(1,3), M4D3<-0, L5_1<-CPM(1,2,1), L5_2<-RBLD(l,4), M5D3<-0: (55): Ll_l<-CPM(1,3,3), Ll_2<-RBLD(1,1), M103<-0, L2_1<-CPM(1,3,4), L2_2<-RBLD(1,2), M2D3<-O, L4_1<-CPM(1,3,1), L4_2<-RBLD(1,3), M4D3<-O, 100 (56): (S7): (58): (59): (60): (61): (62): (63): (64): (65): (66): 101 L1_1<-CPM(1,4,4), L1_2<-RBLD(1,1), MlD3<-0, L4_1<-CPM(1,4,2), L4_2<-RBLD(1,3), M4D3<-0, L5 1<-CPM(1,4,3), L5 _2<-RBLD(1, 4), M5D3<-0: L1 1<-CPM(2,1,1), L1:2<-RBLD(1,1), MlD3<-Or L2 1<-CPM(2,1,2), L2- _2<-RBLD(1, 2), M203<-0, L4 1<-CPM(2,1,3), L4:2<-RBLD(1, 3), M4D3<-0, L5 1<-CPM(2,1,4), L3_2<-RBLD(1,4), M5D3<-0, LA1<~MPY1_4, LA2<-MPY2_4, m3<~m>x4_4, LA4<-MPY5_4; L1_1<-CPM(2,2,2), L1 _2<-RBLD(1, 1), M1D3<-0, L2_1<-CPM(2,2,3), L2 _2<-RBLD(1, 2), M203<-0, L4_1<-CPM(2,2,4), L4 :,2<-RBLD(1 3), M4D3<-0, L5_1<-CPM(2,2,1), L5:2<-RBLD(1, 4), M5D3<-0, LA1<-MPY1_4, LA2<-Mp¥2_4, LA3<-MPY4_4, LA4<-MPY5_4 , IT (1, 1) <-ADDER; L1 _1<-CPM(2, 3, 3), L1 _2<-RBLD(1, 1), MlD3<-0, L2- _1<-CPM(2, 3,4), L2- _2<-RBLD(1, 2), M2D3<- -0, L4-_1<-CPM(2,3,1), L4:2<-RBLD(1, 3), M4D3<-0: LS- _1<-CPM(2, 3, 2), LS_2<-RBLD(1,4), M5D3<-0, LA1<-MPY1_4, LAZ<~MPY2 _4, LA3<-MPY4 _4, LA4<-MPY5_4, 12(1, 2)<-ADDER: L1_1<~CPM(2,4,4), L1_2<-RBLD(1,1), MlD3<-0, L2_1<-CPM(2,4,1), L2 _2<-RBLD(1, 2), M2D3<-0, L4_1<-CPM(2,4,2), L4 :2<-RBLD(1, 3), M4D3<-0, L5_1<-CPM(2,4,3), L5- _2<-RBLD(1,4), M5D3<-0, m1<~upr1_4, LA2<-MPY2 _4, mourn _4, LA4<-MPY5_4, 11(1, 3)<-ADDER: L1__1<-c1>t4(3.1,1), L1_2<-RBLD(1,1), MlD3<-O, L2_1<-CPM(3, 1,2), L2 _2<-RBLD(1,2), MZD3<-0, L4:1<-CPM(3, 1,3), L4 :2<-RBLD(1, 3), M4D3<- -0, L5- _1<-CPM(3, 1, 4), L5- _2<-RBLD(1, 4), M5D3<- -0, LA1<-MPY1_4, LA2<‘MPY2 _4, LA3<-MPY4 _4, LA4<-MPY5 :4, IT(1, 4)<-ADDER: L1_1<-CPM(3, 2,2), L1_2<-RBLD(1,1), M1D3<-0, L2- _1<-CPM(3, 2, 3), L2 :2<-RBLD(1, 2), M2D3<-0, L4- _1<-CPM(3, 2, 4), L4 :2<-RBLD(1, 3), M4D3<-0, L5__1<-CPM(3,2,1), LS- _2<-RBLD(1, 4), M5D3<-O, LA1<-MPY1 _4, LA2<-MPY2 _4, LA3<-MPY4_4, LA4<-MPY5- _4, IT(2,1)<-ADDER: L1_1<-CPM(3, 3, 3), L1_2<-RBLD(1,1), M1D3<-0, L2- _1<-CPM(3, 3,4), LZ— _2<-RBLD(1, 2), MZD3<-0, L4 :1<-CPM(3, 3, 1), L4 :2<-RBLD(1, 3), M4D3<-O, L5- _1<-CPM(3, 3, 2), LS- _2<-RBLD(1, 4), M5D3<- -O, LA1<-MPY1_4, LA2<-MPY2 _4, LA3<-MPY4 _4, LA4<-MPY5- _4, IT(2, 2)<-ADDER; L1_1<-CPM(3, 4,4), L1_2<-RBLD(1,1), MlD3<-Oo L2-_1<-CPM(3, 4,1), L2- _2<-RBLD(1, 2), MZD3<-0, L4 :1<-CPM(3,4, 2), L4 :2<-RBLD(1, 3), M4D3<-0, L5 :1<-CPM(3, 4,3), L5- _2<-RBLD(1,4), MSD3<- -0, LAl<-MPY1_4, LA2<-MPY2 _4, LA3<-MPY4 _4, LA4<-MPY5- _4, IT(2, 3)<-ADDBR: LA1<—MPY1_4, LA2<-MPY2_4, LA3<-MPY4_4, LA4<-MP‘1'5_4 , IT (2 , 4) <-ADDER: LA1<-MPY1_4, LA2<-MPY2_4, LA3<-MPY4_4, (67): LA1<-MPY1_4 I LA4<-MPYS_4, 102 IT(3,1)<-ADDER: LA2<-MPYZ_4, LA3<-MPY4_4, IT(3,2)<-ADDER: (68): LA1<-MPYL_4, LA2<~MPY2_4, LA3<-MPY4_4, LA4<~MPY5_4, IT(3,3)<-ADDER: (69): IT(3,4)<—ADDER: (70): L1_1<-IT(1,1), L1_2<-RBLD(2,1), MID3<-O, L2_1<-IT(1,2), L2 _2<-RBLD(2, 2), M203<-0, L4_1<-IT(1,3), L4 _2<-RBLD(2, 3), M4D3<-0, L5_1<-IT(1,4), L5 _2<-RBLD(2, 4), MSDB<-0: (71): L1_1<-IT(2,1), L1:2<-RBLD(2,1), M1D3<-0, L2_1<-IT(2,2), L2 :2<-RBLD(2, 2), M2D3<-O, L4_1<-IT(2,3), L4 :2<-RBLD(2, 3), M4D3<-0, L5_1<—IT(2,4), LS :2<~RBLD(2, 4), MSD3<-O; (72): LL_1<-IT(3,1), L1_2<-RBLD(2, 1), M1D3<-0, L2_1<-IT(3,2), L2 _2<-RBLD(2, 2), M2D3<-0, L4_1<-IT(3,3), L4 :2<-RBLD(2, 3), M403<-0, L$_1<-IT(3,4), L5 _2<-RBLD(2, 4), M503<-0; (73): (74): LA1<-MPY1_4, LA2<-MPY2_4, LA3<—MPY4_4, LA4<-MPY5_4: (75): LA1<-MPY1_4, LAZ<~MPY2_4, LA3<—MPY4_4, LA4<-MPY5_4, R(2)<-ADDER: (76): LA1<-MPY1_4, LA2<-MPY2_4, LA3<-MPY4 _4, LA4<~MPY$_4, R(3)<-ADDER, OUTPUT<-R(2); (77): R(4)<-ADDER, OUTPUT<-R(3): (78): OUTPUT<-R(4): (79): it (PATCH-O or (3-1 and t-1)) then stop; /* stop here*/ else begin /* input new parameter values */ (80): RS<-INPUT; (81): RT<-INPUT: (82): L1_1<-BLD(LABS,1,1), M103<-BLD(LABS,1,2), DL4<-BLD(LABS, 1,3), DL8<-BLD(LABS,1,4), L1_2<-RS, LAD3<-BLD(LABS,1,1): (83): L1_1<-BLD(LABS,2,1), M1D3<-BLD(LABS,2,2), DL4<-BLD(LABS, 2, 3), DL8<—BLD(LABS,2,4), L1_2<—RS, LAD3<-BLD(LABS,2,1): (84): L1_1<-BLD(LABS,3,1), MlDB<-BLD(LABS,3,2), DL4<-BLD(LABS,3,3), DL8<-BLD(LABS,3,4), L1_2<-RS, LAD3<~BLD(LABS,3,1): (8S): L1_1<-BLD(LABS,4,1), MlD3<-BLD(LABS,4,2), DL4<-BLD(LABS,4,3), DL8<-BLD(LABS,4,4), L1_2<-RS, LAD3<~BLD(LABS,4,1), LA1<-o, LA2<-LAD3, LA3<-LADB, LA4<-LAD3: (86): L1_1<-BLD(LABT,1,1), M103<-BLD(LABT,1,2), DL4<-BLD(LABT,1,3), DL8<-BLD(LABT,1,4), L1_2<-RT, LAD3<-BLD(LABT,1,1), LA1<-O, LA2<-LAD3, LA3<-LAD3, LA4<-LAD3, L4_ 1<-ADDER, L4 _2<-RS, M4D3<-M1D1, L2_ 1<-MPY1_ 4, L2 _2<-RS,: (87): L1_1<-BLD(LABT,-2,1), M1D3<-BLD(LABT, 2, 2), (88): DL4<-BLD(LABT, 2, 3): DL8<-BLD(LABT, 2, 4): L1_2<-RT, LAD3<-BLD(LABT,2,1), LA1<-O, LA2<-LAD3, LA3<-LAD3, LA4<-LAD3, L4 _1<-ADDER, L4 _2<-RS, M4D3<-M1Dl, L2_ l<-MPY1_4, L2 _2<-RS: L1_1<-BLD(LABTT3,1), MlD3<-BLD(LABT, 3,2), (89): (90): (91): (92): (93): (94): '(95): (96): <97): (9a): (99): (100): (101): 103 DL4<-BLD(LABT,3,3), DL8<-BLD(LABT,3,4), L1_2<-RT, LAD3<-BLD(LABT,3,1), LA1<-o, LA2<—LAD3, LA3<-LAD3, LA4<-LAD3, L4_l<-ADDER, L4_2<-RS, M4D3<-MlDl, L2 _1<-u921_4, L2_2<-as: L1_ l<-BLD(LABT,4,1), M1D3<-BLD(LABT,4, 2), DL4<-3LD(LABT, 4, 3), DL8<-BLD(LABT, 4, 4), Ll_2<-RT, LAD3<-BLD(LABT,4,1), LA1<-o, LA2<-LAD3, LA3<-LA03, LA4<-LAD3, L4 _1<-ADDER, L4_2<-Rs, M4D3<~M1D1, L2_ 1<-MPY1_4, L2 _2<-RS: LA1<-o, LA2<-LAD3, LA3<-LA93, LA4<-LAD3, L3_1<-MPY2_4, L3_2<-RS, L$_l<-MPY4_4, Ls_2<-Rs, M503<~uzn1, LA1<-o, LA2<-LAD3,LA3<-LAD3, LA4<-LAD3, L4_1<-ADDER, L4_2<-RT, M403<-M1D1, L2_1<-MPY1_4, L2_2<-RT: LA1<-O, LA2<-LAD3, L3_1<-MPY2_4, L4_1<-ADDER, L2_2<-RT: LAI<-0, LA2<‘LAD3' L3_1<-MPY2_4, MSD3<-M2D1, L4_1<-ADDER' L2 -2<-RT; L3:1<-MPY2_4, MSD3<~M2D1, L4_1<-ADDER, L2 -2<-RT: L3:1<-MPY2 J, M5D3<-M201, L3_1<-MPY2 J, M5D3<~M2D1, RBLD(1, L3_1<-MPY2_4, M5D3<-M2D1, L3_1<-MPY2 J, MSD3<-M2D1, RBLD(1, RBLD(2, 2)<-MPY3- J, RBLD(2, 3)<-MPY3- _4, RBLD(2,4)<-MPY3_4, end else: end: /* the routine for rational spline point else begin (1): (2): (3): (4): (5): (6): (7): RS<-INPUT: RT<-INPUT: LABS<-INPUT: LABT<-INPUT: CPM(1,1,1)<-INPUT, M1D3<-BLD(LABS,1,2) DL8<-BLD(LABS,1,4), CPM(1,1,2)<-INPUT, MlDB<-BLD(LABS,2,2) DL8<—BLD(LABS,2,4), CPM(l,l,3)<-INPUT, L3 _2<-RS ' LAl<:o ' L4_2<-AT, M4D3<-M1D1, m1<:o ' LAI<:O ' L4_2<-RT, M4D3<-MlDl, RBLD(1, RBLD(1, LA3<-LAD3, LA4<-LAD3, L5_1<-MPY4_4, L5_2<—RS, LA2<-LAD3,LA3<-LAD3, LA4<-LAD3, L2_l<-MPY1_4, m3<-LAD3 ' m4<-LAD3 ' L5_1<-MPY4_4, L5 _2<—RS, LA2<-LAD3, LA3<-LAD3, LA4<-LADB, L2_l<-MPY1_4, L5_ l<-MPY4 _4, LS _2<-RS, LA2<-LADB, LA3<-LAD3, LA4<-LAD3, L2_l<-MPYl_4, 1) <‘MPY3_ 4 ' LS_2<-RT, mLDD(1,1)<-Mpys_4: L5_1<-MPY4 _4, L5 _2<-RT, 2)<-MPY3 J, ABLBDu, 2)<-MPYS_ 4,- LS_ l<-MPY4 _4, LS _2<-RT, 3)<-MPY3 _4, ABLED(1,3‘)’<-MPYL4; L5_ l<-MPY4 _4, LS _2<-RT, 4)<-MPY3 J, RBLDD(1, 4)<-MPYL 4: RBLDD(2,1)<-MPY5_4: ABLDD(2,2)<-Mpys_4: RBLDD(2,3)<-MPYS_4: RBLDD(2,4)<-MPY5_4, go to (53): evaluation */ L1_l<-BLD(LABS,1,1), , DL4<-BLD(LABs,1,3), Ll_2<-RS, LAD3<-BLD(LABS,l,l): L1_1<-BLD(LABs,2,1), , DL4<-BLD(LABS,2,3), L1_2<-RS, LAD3<-BLD(LABS,2,1): L1_l<-BLD(LABS,3,1), /* blending functions (8): (9): (10): (11): (12): (13): (14): (15): (16): (17): (18): (19): (20): (23): (24): (25): (26): (27): 104 MID3<-BLD (LABS, 3, 2) , DL4<-BLD (LABS, 3, 3) , DLB<¢BLD(LABS,3,4), Ll_2<-RS, LAD3<¢BLD(LABS,3,1): CPM(1,1'4)<-INPUT, L1_1<-BLD(LABStqll-)I M1D3<-BLD (LABS, 4, 2) , DL4<-BLD (LABS: 4, 3) , DL8<-BLD(LABS,4,4), Ll_2<-RS, LAl<-0, LA2<-LAD3, LA3<-LAD3, LA4<-LAD3, LAD3<-BLD(LABS,4,1); CPM(1,2,2)<'INPUT, L4_1<-ADDER, L4_2<-R3, M4D3<-MlDl, L2_l<-MPY1_4, L2_2<-RS LA1<-0, LA2<-LAD3, LA3<-LAD3, LA4<-LAD3: CPM(1,2,3)<-INPUT, L4_1<-ADDER, L4_2<-RS, M4D3<-M101, L2_1<-MPY1_4, L2_2<-RS LA1<-o, LA2<-LAD3, LA3<-LAD3, LA4<-LAD3; CPM(1,2,4)<-INPUT, L4_1<-ADDER, L4_2<-RS, M4D3<-MlDl, L2_l<-MPY1_4, L2_2<-As LA1<-o, LA2<-LAD3, LA3<-LAD3, LA4<'LAD33 CPM(1,2,1)<‘INPUT, L4_1<-ADDER, L4_2<-RS, M4DB<-M1Dl, L2_1<-MPYl_4, L2_2<—RS: CPM(1,3,3)<-INPUT, L3_1<-Mpyz_4, L3_2<-RS, Ls_1<-MPY4_4, L5_2<-RS, MSD3<-M2D1: CPM(l,3,4)<-INPUT, L3_1<-MPY2_4, L3_2<-RS, Ls_1<4upy4_4, Ls_2<-As, M5D3<-M2D1; CPM(1,3,1)<-INPUT, L3_l<-MPY2_4, L3_2<-RS, L5_1<-MPY4_4, L5_2<-RS, M5D3<-M2D1: CPM(1,3,2)<-INPUT, L3_1<-MPY2_4, L3_;<-RS, L5_l<-MPY4_4, L5_2<-Rs, MSD3<-M2D1: of s ------------------------------------ */ CPM(1,4,4)<-INPUT, RBLD(l,1)<-MPY3_4, RBLDD(1,1)<-MPY5_4: CPM(1,4,1)<-INPUT. RBLD(1,2)<-MPY3_4, RBLDD(1,2)<-MPY5_4: CPM(1,4,2)<-INPUT, RBLD(1,3)<-MPY3_4, RBLDD(1,3)<-MPYS_4: CPM(1,4,3)<-INPUT, RBLD(1,4)<-MPY3_4, RBLDD(1,4)<-MPYS_4: MlDB<-BLD(LABT,1,2), BL4<~BLD