N '1 .n hJ .5- “31b . Nu .tvl A1. t .I.&III).,\;I , . 4 , Ivvll.4Y. nil ”fr .7 , fig a?“ 7 A ; EfiExw .. ¥ .. A. .. 25$;wg. .‘\1.l.v 1 IllllllllllllI|||lllll|||lllllll|||||llilllllilllllllllllllll 3 1293 02058 987 This is to certify that the dissertation entitled A METHODOLOGY FOR BEHAVIORAL-LEVEL SWITCHING ACTIVITY ESTIMATION IN CMOS CIRCUITS presented by Ronnie Lee Wright has been accepted towards fulfillment of the requirements for Ph. D degree in Electrical Eng WW Major professor Date ,4,» 020" I747 MS U is an Affirmative Action/Equal Opportunity Institution 0-12771 LIBRARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINE return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE moo chIMpSS—p.“ A Methodology For Behavioral-Level Switching Activity Estimation in CMOS Circuits By Ronnie Lee Wright A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Electrical and Computer Engineering 1999 ABSTRACT A Methodology For Behavioral-Level Switching Activity Estimation in CMOS Circuits By Ronnie Lee Wright The demand for computer-aided design (CAD) tools to accurately perform power analysis on high-level design specifications has become increasingly important. Two major factors are responsible for promoting this research and development: the emer- gence of low power as a key VLSI design parameter and the integrated circuit de- sign industry’s increased migration from hardware-design methodologies based on, schematics or simple programmable logic device (PLD) languages to methodologies based on high-level design specifications described by hardware description languages. Such CAD tools will enable the development of power efficient digital circuits within a process that offers increased design flexibility, design reuse, and lower cost. A CAD tool’s ability to accurately compute the internal power dissipated by a circuit heavily depends on how well it estimates switching activity and network capacitance. The switching activity is a key factor used in the calculation of dynamic power dissipation for CMOS circuits; it represents the probability or frequency at which power-consuming gate output transitions take place. The objective of this research is to develop a more accurate and cost-effective technique for computing the switching activity and dynamic power dissipation of behavioral-level design specifications described in VHDL (VHSIC Hardware Descrip- tion Language). Unlike some gate- or circuit- level statistical or probabilistic esti- mation schemes, this new technique operates at the behavioral level of design ab- straction without considering technology or statistical circuit characterizations. This new approach accepts a user-specified depth-accuracy parameter that is responsible for controlling the accuracy of the switching activity estimate at the expense of time and memory. The developed techniques and algorithms have been implemented in a program called the Behavioral-Level Activity and Power Estimator (BLAPE). Re- sults and benchmark comparisons with other power analysis CAD tools are given to validate the application and effectiveness of the new approach. Copyright © by Ronnie Lee Wright 1999 To my grandmother, Mrs. Lillie Mae Heller in], _ III; . 35 an “’7’“ a: "LIT 1P Milly; ACKNOWLEDGMENTS First, giving honor to God, who is the head of my life. If it were not for God and my savior, Jesus Christ, the completion of this research effort would not have been possible. I thank God for giving me the ability and strength to persist and weather the storm. I would like to thank my hero and grandmother, Mrs. Lillie Mae Heller for inspir- ing me to become an engineer. She was brave and fought in the struggles that made this and many other Opportunities possible. I would like to thank my grandfather, Mr. Willard Ervin Heller (Expired: May 18, 1995) for being a good provider, a great father figure and a very positive influence in my life. As a young man my grandfather pushed for excellence in all of my schoolwork; his insistence initiated the possibility of undergraduate studies and ensured the completion of advanced graduate studies. I would like to thank my uncle and father figure, Dr. Oscar Lee Wright, for serving as an excellent male role model and demonstrating that hard work and determina- tion are the key elements to success. I would like to thank my mother, Ms. Burnice Marie Wright for passing on the ability to be academically successful and constantly reminding me that no problem was too difficult for me to solve. I would like to thank my youngest aunt, Mrs. Marneta Lynn Griffin, for her support in xeroxing vi n—n-p .4.» w.— ’U) I I v.3 f‘wL f X W 1 Mg w; (1.? documents/papers, as well as occasional proof-reading of my writings. I would like to thank all my friends and relatives that placed me in their prayers and offered their support. I would even like to thank those who encouraged me to quit when times were difficult. I would like to thank Dr. ChristOphe Fiorio of the Technische Universitat in Berlin, Germany for his development and support of the algorithm2e.sty [MEX style file. I would like to give a special thanks to my major dissertation advisor, Dr. Michael A. Shanblatt. Dr. Shanblatt demonstated excellence in advising and assisted in helping me improve my research skills. He believed in me and the research I was conducting. Also, Dr. Shanblatt was always very professional, timely, and an expert at obtaining funding for my graduate studies and research. Additionally, I would like to thank Dr. Shanblatt’s family for being understanding; there were many weekends, evenings, and nights in which I telephoned Dr. Shanblatt at home to discuss research. I would like to thank my disseration committee: Dr. Michael A. Shanblatt, Dr. Bruce Kim, Dr. Diane T. Rover, Dr. Anthony S. Wojcik, Dr. Richard J. Enbody, and Dr. Byron C. Drachman for serving on the committee and providing constructive criticisms to improve the quality of my research. I would also like to give a special thanks to Dr. Wojcik for providing papers and other references to assist in the research. vii TABLE OF CONTENTS LIST OF TABLES xii LIST OF FIGURES xiii 1 Introduction 1 1.1 Background ................................ 2 1.1.1 Electronic Design Automation Process ............. 2 1.1.2 Design Representations ...................... 2 1.1.3 Design Optimization ....................... 3 1.1.4 Design Phases ........................... 3 1.1.5 Steps Within the Design Phase ................. 5 1.2 Motivation ................................. 6 1.3 Low Power Design ............................ 7 1.3.1 Sources of Power Dissipation ................... 7 1.3.2 Low Power Optimization Techniques .............. 11 1.3.2.1 Supply Voltage Reduction ............... 11 1.3.2.2 Physical Capacitance Reduction ............ 12 1.3.2.3 Switching Activity Reduction ............. 13 1.3.2.4 Structural Optimization ................ 14 1.3.2.5 System and Architectural Level Optimizations . . . . 16 1.4 HDL—Based Design ............................ 17 1.5 Problem Statement ............................ 18 1.6 Dissertation Overview .......................... 19 2 Switching Activity Estimation 21 2.1 Calculation of Switching Activity .................... 21 2.1.1 Input Pattern Dependence .................... 22 2.1.2 Glitch Activity .......................... 24 2.1.3 Delay Model ............................ 25 2.1.4 Logic Function .......................... 26 5 I 2.1.5 Circuit Structure ......................... 27 2.2 Previous Work .............................. 29 2.2.1 Statistical Methods ........................ 30 2.2.2 Probabilistic Methods ...................... 34 2.2.3 High-Level Methods ....................... 42 3 Behavioral Representations of Switching Functions 46 3.1 Switching Algebra Background ...................... 46 3.2 Truth Tables ............................... 47 3.2.1 Logic Operations ......................... 47 3.2.2 Output Enumeration ....................... 49 3.3 Boolean Expressions ........................... 49 3.4 Binary Decision Diagrams ........................ 51 3.5 Behavioral VHDL Specifications ..................... 54 4 Structural Representations of Switching Functions 56 4.1 Schematics ................................. 57 4.2 Netlists .................................. 58 4.3 Structural VHDL Specifications ..................... 59 4.4 Connective Binary Decision Diagrams .................. 60 4.4.1 Motivation ............................. 61 4.4.2 Overview ............................. 61 4.4.3 Minimized Scalable BDDs .................... 62 4.4.3.1 MSBDD Generation Routines ............. 63 4.4.4 CBDD Definitions ........................ 65 4.4.5 Analysis of Connective-BDDs .................. 67 4.4.6 Implementation .......................... 70 4.4.7 Advantages and Disadvantages ................. 72 4.4.8 Results ............................... 73 4.4.9 Summary ............................. 74 5 Behavioral-Level Switching Activity Estimation 76 5.1 Methodology Overview .......................... 77 5.1.1 Methodology Assumptions .................... 77 5.1.2 Methodology Outline ....................... 78 5.2 Methodology Task Decomposition .................... 78 5.2.1 Transformation of VHDL into Boolean Equations ....... 80 5.2.2 Decomposition of Boolean Equations .............. 84 ix 6 I 7 I AP A I 5.2.2.1 Implicit Structure Representation ........... 5.2.2.2 Mapped Structure Representation ........... 5.2.3 Building the CBDD Representation ............... 5.2.3.1 CBDD Selection Rationale ............... 5.2.3.2 Generation of CBDD Graph .............. 5.3 Signal Probability Computation ..................... 5.3.1 Overview ............................. 5.3.1.1 Network Levelization .................. 5.3.1.2 Disjoint Post-Order Boolean Equation Generation . . 5.3.1.3 IPR Cubeset Generation ................ 5.3.1.4 Computation of Signal Probability .......... 5.3.1.5 SPCA Component Interconnection .......... 5.4 Switching Activity and Power Computation .............. 5.5 Experimental Results ........................... 5.5.1 Experiment 1 : 64-Bit Adder Benchmark ............ 5.5.2 Experiment 2 : Arithmetic Benchmarks ............. 5.5.3 Experiment 3 : ISCAS-85 Benchmarks ............. 5.5.4 Experiment 4 : Nonredundant ISCAS-85 Benchmarks ..... 5.5.5 Experiment 5 : MCNC Synth89 Benchmarks .......... 5.5.6 Remarks .............................. Behavioral-Level Visualization of Switching Activity 6.1 Activity Viewer Tool ........................... 6.1.1 Input Transformation Process .................. 6.1.2 Activity Viewer Results ..................... 6.2 Future Work and Summary ....................... Conclusions 7.1 Contributions ............................... 7.2 Future Work ................................ 7.3 Impact of Contributions ......................... APPENDICES A Definitions and Formulas B Application of BLAPE to 4-bit Booth Multiplier B.1 High-Level VHDL Specification Input ................. B.2 Boolean Equation Generation ..................... 84 86 88 88 88 92 93 94 98 104 109 111 115 119 120 123 124 125 126 128 132 132 135 135 137 139 139 141 142 143 143 146 146 148 B.3 Generate Implicit Structural Specification ............... 158 BA Network Levelization ........................... 164 B5 SIS Activity and Power Estimation ................... 171 B6 BLAPE Activity and Power Estimation ................. 177 BIBLIOGRAPHY 184 xi LIST OF TABLES 2.1 Boolean expression signal probabilities .................. 22 2.2 Effects of uncorrelated inputs. ...................... 23 2.3 Effects of spatially correlated inputs. .................. 23 2.4 Effects of temporal input correlations ................... 24 2.5 Effects of spatio-temporal input correlations ............... 24 2.6 Two-input AND, OR, XOR truth table. ................ 26 2.7 Two—input AND, OR, XOR signal probability table ........... 27 2.8 Switching activities for two-input AND, OR, XOR gates. ....... 27 3.1 Huntington’s postulates. ......................... 47 3.2 Truth tables for NOT, AND, OR logic operations ............ 48 3.3 'Ii‘uth table for full-adder carry-out bit .................. 49 4.1 Benchmark Results ............................. 74 5.1 Post-order Boolean equations for FA carry-out bit. .......... 103 5.2 Output from GenerateJPR-Cubeset procedure. ............ 108 5.3 Signal probability computation for full-adder carry-out bit. ...... 112 5.4 Symbol definitions. ............................ 115 5.5 Procedure running times .......................... 115 5.6 64-bit adder size/ power estimates ..................... 120 5.7 Power/ Accuracy estimates of arithmetic benchmarks. ......... 123 5.8 Time estimates of arithmetic benchmarks. ............... 123 5.9 ISCAS-85 power/accuracy benchmarks .................. 124 5.10 ISCAS—85 time benchmarks. ....................... 124 5.11 Nonredundant ISCAS—85 power/ accuracy benchmarks. ........ 125 5.12 Nonredundant ISCAS-85 time benchmarks ................ 125 5.13 MCNC Synth89 power/ accuracy benchmarks. ............. 127 5.14 MCNC Synth89 time benchmarks ..................... 128 5.15 Benchmarks not available in SIS. .................... 130 xii 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 2.1 2.2 2.3 2.4 2.5 2.6 3.1 3.2 3.3 3.4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 5.1 5.2 LIST OF FIGURES EDA process. ............................... 4 Design phase. ............................... 6 Short-circuit current. ........................... 8 Capacitive current. ............................ 8 Standby current. ............................. 9 Leakage current ............................... 9 Path balancing. .............................. 14 Technology decomposition ......................... 15 Glitch example. .............................. 25 Reconvergent fanout circuit example. .................. 28 Sequential circuit model .......................... 31 Symbolic sequential circuit model ..................... 36 k-unrolling of next state logic. ...................... 41 m—expanded network with m=2 ...................... 42 BDD of full-adder carry-out bit ...................... 52 BDD with ordering 1 ............................ 53 BDD with ordering 2 ............................ 53 Behavioral VHDL model of FA carry-out bit ............... 55 Schematics of basic logic gates. ..................... 57 Logic diagram for FA carry-out bit. ................... 58 Gate-level BLIF netlist for FA carry-out bit. .............. 59 Structural VHDL model of FA carry-out bit ............... 60 Minimized-scalable BDDs. ........................ 62 Definition of CBDD. ........................... 66 CBDD of FA carry-out bit ......................... 71 BLAPE methodology diagram. ..................... 79 Behavioral VHDL specification for full-adder. ............. 83 xiii 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 5.20 5.21 6.1 6.2 6.3 6.4 6.5 Full-adder Boolean equations. ...................... 83 Netlist for implicit full-adder structure .................. 85 Logic diagram for implicit full-adder structure .............. 85 Netlist for mapped full-adder structure .................. 87 Logic diagram for mapped full-adder structure. ............ 87 Implicit structural CBDD full-adder. .................. 9O Mapped structural CBDD full-adder. .................. 91 SPCA structural overview ......................... 94 DAG of FA carry-out bit network ..................... 97 Carry-out bit adjacency list and levelization ............... 97 Expression tree for full-adder carry-out bit ................ 100 BLAPE implicit structural FA activity/ power estimates. ....... 117 BLAPE mapped structural FA activity/ power estimates. ....... 117 SIS implicit structural FA activity/ power estimates ........... 118 SIS mapped structural FA activity/power estimates ........... 118 Power v. k (64-bit adder). ........................ 121 Accuracy v. 1: (64-bit adder). ...................... 122 Time v. k (64—bit adder) .......................... 122 Average accuracy v. It (All benchmarks) ................. 129 Activity color mapping ........................... 133 Bar view. ................................. 134 Level view. ................................ 134 Booth multiplier(4-bit) level view ..................... 136 Booth multiplier(4-bit) bar view. .................... 137 xiv .——.~'— ~—- CHAPTER 1 Introduction The goal of accurately estimating dynamic power at the behavioral level of design abstraction has recently received increased attention. The demand for accurate behavioral-level computer-aided design (CAD) tools for power analysis is due to the IC industry’s large-scale migration to HDL—based hardware design methodologies, and the emergence of power consumption as a key VLSI design parameter The estimates generated by many behavioral-level power analysis CAD tools con- tain inaccuracies stemming from a lack of specific information which is determined or specified at deeper levels (i. e., the circuit- and physical- levels) within the electronic design automation (EDA) process. Despite these inaccuracies, HDL—based hardware design methodologies are necessary for efficient hardware design. Currently, the ben- efits of behavioral-level design outweigh the inaccuracies found in the estimates of behavioral—level power analysis CAD tools. The focus of this research is to develop enhanced tools which provide additional accuracy for behavioral-level design. The research presented in this dissertation out- lines a new technique which provides improved accuracy of switching activity and power estimates using behavioral-level design specifications described in VHDL. Be- fore discussing the details of this research, a review of the EDA process and the motivation for this research tOpic is given in the following sections. 1 . 1 Background 1.1.1 Electronic Design Automation Process Today’s complex circuit designs require computer support for virtually all aspects of design. Computer-based design automation (DA) tools make the design of very large or complex circuits feasible. Designs which are too large or complex for manual design use such processes for improved quality (performance and reliability), reduced product cost, and shortened design time. Given a specification of an abstract object, a computer-based DA system generates the physical design automatically and verifies that the design satisfies its requirements specification. The design process followed by DA systems can be viewed as a sequence of transformations on the following design representations: behavioral, structural, and physical, at various levels of abstraction. 1 . 1 . 2 Design Representations o Behavioral representations - describe a Circuit’s function. Behavior can be de- scribed in a functional or procedural fashion. Emphasis is placed on what the design does, not how it is built. Designs are viewed as one or more black boxes with sets of inputs and outputs, and a set of functions describing the behavior of each output in terms of the inputs over time. 0 Structural representations - describe the composition of circuits in terms of cells (abstractions of circuit element definitions) and components (abstractions of instances of circuit elements) and the interconnection among these compo- nents. Structural descriptions include block diagrams, schematic drawings, and netlists. 0 Physical representations - are characterized by information used to manufac- ture and fabricate the physical system. They are concerned with binding the structural design space to silicon. Physical information includes geometric lay- out data such as transistor location and wire routing. 1.1.3 Design Optimization When designing complex integrated circuits, removing false design paths is as impor- tant as executing the correct design procedures. The removal of false design paths is accomplished by analyzing a number of design variations and choosing the one that best meets the system’s requirement specifications. The earlier one optimizes a design or removes an unsatisfactory design variation, the less effort one exerts in pursuing an alternative design that does not meet requirements. 1.1.4 Design Phases The electronic design automation process is a top-down approach to designing large and complex integrated circuits. The EDA process starts with a system specification and continues through a series of abstraction level transformations, resulting in sepa- rate design phases (Figure 1.1). Each succeeding level adds more specific or detailed design information. The process is briefly described below. 0 Design Specification considers the following: 1) application of system, 2) per- formance requirements, 3) system architecture, 4) external interfaces and pro- tocols, and 5) manufacturing costs. 0 Behavioral Design is synthesized to meet specifications. The result is a behav- ioral representation such as Boolean expressions, differential equations, instruc- tion set descriptions, algorithms, or flowcharts. Behavioral simulation is the method of analysis. Requirements Design Specification Behavioral s cifilcation simulation pe Behavioral Design behavioral representation Logic simulation Logic Design structural representation Circuit Design Extraction and structural representation Verification Physical Design physical representation I Circuit analysis I Fabrication Figure 1.1. EDA process. 0 Logic Design is concerned with the logic structure that implements the behav- ioral design. The design representation may be a register-transfer—level (RTL) description, schematic description, logic diagram, or netlist of gates. For anal- ysis the representations are simulated at the transistor-, gate-, and register- transfer- levels. Validation takes place by comparing results from the logic-level and behavioral-level simulations. 0 Circuit Design is concerned with the electrical laws that govern the detailed behavior of the basic elements such as transistors, resistors, capacitors, and inductors. Transistors are sized to meet signal delay requirements. Analysis is performed using circuit and timing simulations. 0 Physical Design is concerned with the transformation of the structural repre- sentation from the previous phase into the geometric shapes (and layout cells) representing details of the fabrication process. Additionally, the placement of cells and routing are important concerns of the physical design stage. 1.1.5 Steps Within the Design Phase The process of removing a false design path is governed by the following Operations: synthesis, analysis, and verification, illustrated in Figure 1.2. 0 Synthesis derives a new design representation based on the representation of a previous stage. 0 Analysis evaluates the correctness of a design representation against its require— ments. 0 Verification provides a formal process for demonstrating the equivalence of two design representations under specified conditions. From upper level I Synthesis Accept Reject Analysis From lower level Verification Reject—> To upper level I Accept i To upper level Figure 1.2. Design phase. 1.2 Motivation The emergence of low power as a critical VLSI design parameter combined with the popularity and large-scale movement to HDL-based hardware design methodolo- gies provide the motivation for this research effort. Additionally, the overwhelming inaccuracies in previous behavioral-level power analysis CAD tools present a signifi- cant problem, whose elimination will benefit various aspects of the electronic design automation process. Discussions on low power design and HDL—based design are presented next. 1.3 Low Power Design Reducing power consumption has emerged as a critical design parameter for digi- tal VLSI systems [1]. The trend has been to develop methodologies and techniques which maintain a Circuit’s throughput and area constraints while achieving some de- sired level of power efficiency. The power efficiency revolution was initiated by the introduction of high-throughput portable electronic devices, such as laptop comput- ers, portable televisions, camcorders, and wireless communications systems. For these devices it is desirable to support high-speed computation with complex functionality, while more efficiently using minimal size and minimal weight batteries. Advances in battery technology have also promoted the design of power efficient circuits. It is anticipated that battery lifetimes will increase to about 90 - 110 watt-hours/ kilogram over the next five years [1]. If low power design techniques are not considered, then high-throughput portable electronic devices will suffer from short operation times due to limited battery lifetimes or become burdened by heavier battery packs. Other factors which motivate the design of power efficient circuits include chip packaging costs, cooling, and reliability. 1.3.1 Sources of Power Dissipation Power dissipation in digital CMOS circuits can be classified into two categories: static and dynamic dissipation. The static and dynamic power dissipations are determined by the way in which the individual MOS transistors circulate current [2]. There are four main currents which are related to power dissipation in CMOS circuits: short circuit current, capacitive current, leakage current, and standby current. The short circuit current arises when both N MOS and PMOS transistors are simul- taneously conducting current from the supply to ground during an input transition. The capacitive current is present when charging and discharging of a capacitance takes pl; (SDI Ill; I‘ll: at). K place to switch the output state of the logic device. The short circuit and capacitive currents are illustrated in Figures 1.3 and 1.4 for a CMOS inverter. Short-circuit and capacitive currents are known as dynamic currents because they result from the switching which arises when an input transition takes place. The magnitude of these currents are on the order of microamperes and milliamperes. 1‘12. vdd 4 _ —«I 1|]l[<_ l V Figure 1.3. Short-circuit current. Figure 1.4. Capacitive current. The standby current represents current which is continuously drawn from the sup- ply to ground. The leakage current, which is a property of the fabrication technology, consists of reverse bias current in the parasitic diodes, which are formed by source- drain diflusions and p-well or n-well diflusions. The leakage current contributed by the reverse biased parasitic diodes can be described by the diode equation: i D = 10kg — 1], where 10 is the reverse saturation current, q is the charge of an electron, V is the voltage across the diode’s pn junction, k is Boltzman’s constant, and T is the temperature of the device material in degrees Kelvin. An additional component of leakage current is due to subthreshold conduction current, which is generated by the inversion charges that are produced for gate volt- ages below the threshold voltage. The standby and leakage currents, shown in Figures | I I ,', I . :: CL ——.l 1 :2 CL —-—_—-—-—. — 1.5 and 1.6 for a CMOS inverter, are known as static currents. These currents result only from applying power to the device. The leakage and standby currents are very small, on the order of nanoamperes and microamperes, respectively. The leakage and _Vdd_ _Vd2_ J4 : IStandby l : Vin=0 vout : : lsth : lr.bias '_‘l , __I : T: __ _"'_ ___‘_" Figure 1.5. Standby current. Figure 1.6. Leakage current. standby currents are responsible for the static power dissipation component in digital CMOS circuits. The static power dissipation quantity is the result of summing the leakage and standby currents to form a static current, and multiplying this current by the supply voltage (Kid). The equations are IStatic = ILeakage + IStandy PStatic : IStatic‘fdd- The dynamic power dissipation due to short circuit current is given by Equation 1.1, where 0 represents the switching activity factor or the probability that a power- consuming transition occurs, Q50 is defined as the amount of charge transferred per transition, fCLK is the clock frequency, and Vdd is the supply voltage. The short circuit power dissipation contributes about 10 percent to the total power dissipation. y—._...._r-... . n“..— RS'hort = aQSCfCLKVdd- (1-1) The dynamic power dissipation due to charging/ discharging of node capacitances is referred to as switching activity power or switched-capacitance power. This power component is given by Equation 1.2, where CL represents the node capacitance, and the other components are as defined previously. The switched-capacitive power dis- sipation contributes about 90 percent to the total power dissipation. 1 . PSwitch = §aCLfCLKVd2d (1-2) The switched-capacitance power is much larger than short-circuit power dissipa- tion due to the low rate of occurence in which both NMOS and PMOS transistors of the CMOS inverter are simultaneously on. The dynamic power dissipation quantity is the result of summing the short-circuit and switched-capacitive power dissipations, given by PDynamic : PShort + PSwitch- (13) When all of the power dissipation components are combined, the total power dissipation is PTotal : PDynamic "I" PStatic (14) 01‘ 1 PTotal = iaCLfCLKVdii + aQschLKVdd + [Staticvdd- (1-5) The most dominant component is the switched capacitance power dissipation, Which accounts for 90 percent of the power dissipation in CMOS circuits [3]. Most of the research in the area of low power design is concerned with reducing the switched capacitance power dissipation component because of its dominance. 10 1.3.2 Low Power Optimization Techniques The majority of low power circuit research targets the dynamic power component as a means of reducing power. The switched-capacitance power, which is the power con- sumed in CMOS circuits caused by switching currents, is the primary focus because it is responsible for 90 percent of the total power dissipation. The switched-capacitance power for a single gate is given by Equation 1.2. The supply voltage (Vdd), physical capacitance (CL), and switching activity factor (a) are the parameters most targeted for optimization within the switched-capacitance power component. Other methods for switched-capacitance power minimization will be addressed in this discussion. Re- ducing the supply voltage is most attractive because of its quadratic relationship to power. A factor of two decrease in the supply voltage will yield a factor of four decrease in the switched-capacitance power. However, as supply voltage is lowered, circuit delays increase leading to reduced system performance. For Vdd > Vt, delays increase linearly with decreasing voltage. This is expressed by Vdd Dela or ——, y (Vdd - V02 where V, is the threshold voltage, or the voltage level at which the transistor conducts. 1.3.2.1 Supply Voltage Reduction One approach taken to scale or reduce supply voltage without sacrificing throughput was reported in [1]. In this approach the threshold voltage (Vt) is reduced, allowing the supply voltage to be scaled down without loss of speed. Reducing both supply and threshold voltages by some small amount maintains the operational behavior of the MOS transistors, without decreases in device currents. The limitations in lowering the threshold voltage are due to the requirement to retain adequate noise margins and control of increased subthreshold leakage currents. In [4], a study was conducted 11 on the effect of reducing the supply voltage for a variety of different logic circuits, containing from 56 to 44,000 transistors. The results of the study indicated that a speed penalty was incurred for all circuits and that delays drastically increased as the supply voltage approached the sum of the threshold voltages of the devices. Shen et al. [5] suggested a method of designing lower power circuits, where the initial task is to build the circuit to be fast as possible, regardless of the area and power. Finally, the supply voltage is decreased to lower power dissipation. 1.3.2.2 Physical Capacitance Reduction A technique used for minimizing switched-capacitance power dissipation involves low- ering or reducing the circuit’s physical capacitance. Power dissipation is dependent upon the physical capacitance associated with each of the individual gates in the circuit. Methods for reducing physical capacitance include using less logic, smaller transistor sizing, and shorter wirelength. Techniques for optimizing logic include re- source sharing and improved logic equation minimization methods. The reduction or scaling of transistor size reduces physical capacitances, but also reduces the current drive capability of the transistor, resulting in slower circuits [3]. Lowering physical capacitance by means of using shorter wiring can be achieved by improved placement and routing strategies, which attempt to optimally locate and position logic blocks such that minimal wiring is used for block interconnections. Optimal wiring strate- gies result in reducing physical capacitance and PR power losses. Improvements in the placement and routing areas are discussed in [6], where a new implementation of the simulated annealing algorithm is presented. The new algorithm makes use of an improved cost function which includes an enhanced overlap penalty function, combined with the inclusion of a timing path penalty function. These improvements allow for the optimized placement and routing of rectilinearly shaped macro cells and have produced modest reductions in chip area and total wire length. 12 1.3.2.3 Switching Activity Reduction Minimizing the switching activity factor is another means for optimizing power dissi— pation in CMOS circuits. The switching activity is dependent upon the logic function implemented and the primary input signals to the circuit. The switching activity a at node a: is defined as the fraction of time the node performs a transition within a clock period. Switching activity may be reduced at the algorithmic and architecture levels. One method for reducing switching activity involves optimizing the number repre- sentation [7]. For certain signal processing applications, a change from the two’s com- plement number representation to the sign-magnitude number representation gave modest improvements. It was found that the two’s complement representation was subject to higher switching activity when the sign of input signal values changed. This is due to the fact that sign extension causes many of the most significant bits (MSB’s) to toggle. In the case of the sign magnitude representation, only a single bit toggles when the input signal changes sign, resulting in a reduction of switching activity for some of the MSB’s. A method to reduce the switching activity factor by means of path balancing is illustrated in Figure 1.7. Path balancing reduces glitches or spurious transition activity in combinational logic circuits. Glitching is reduced if paths in the circuit that converge at certain gates all have roughly equal lengths (delays) [5]. Balancing path delays leads to nearly simultaneous switching on the input signals to a gate, and thus eliminates possible hazards at the output of the gate. This equalization of path delays in a circuit results in a reduction of spurious gate output transitions which reduces switching activity, thus lowering power dissipation. Path balancing can take place before technology mapping by selective collapsing and logic decomposition or after technology mapping by delay insertion and pin reordering. 13 a a b b ___—> C C d d Figure 1.7. Path balancing. A switching activity reduction method has been proposed for finite state machines (F SM’s), where an encoding of states was the focus of attention [8]. This method uses a hypercube embedding technique and generates state encodings such that the sum of bit toggles between each pair of states multiplied by the encoding affinity between states is minimized. By reducing the number of bit toggles in each state transition, the switching activity in the combinational logic which determines the next state and output functions is reduced. 1.3.2.4 Structural Optimization Techniques which consider a design’s structure and component interconnection also offer reductions in switching activity. Various combinations and arrangements of spe- cific logic gates and subcircuits lead to improved power reduction for certain designs. Sobelman et al. [9] proposed a power dissipation reduction technique for multiplier circuits. This method makes use of a self-timed evaluate signal, such that each carry- save or carry-propagate adder within the array triggers only after all of its inputs have stabilized. This technique avoids spurious switching of internal nodes so that the average power dissipation is minimized. Moshnyaga et al. [10] proposed another multiplier power dissipation improvement. The goal of this method is to lower the switching activity per operation by reducing the number of active elements in the adding array. This method incorporates the use of 4-2 compressors which lower the number of propagation stages in the adding array. The 4-2 compressors have five inputs and three outputs which can compress four partial products into two. The 14 compressor has the same logic function as that of a carry-save adder constructed by two serial full adders, but uses fewer transistors and has 25% less propagation delay. Tsui et al. [11] discuss a technology decomposition technique which converts a set of Boolean equations, or network, to another network consisting of only AND and INVERTER gates. This technique minimizes the total switching activity in the final two-input AND tree using a zero—delay model. The goal of the procedure is to apply high-activity inputs into the tree composition at the latest possible stage. Consider Figure 1.8 for example, where P(zr,) is the signal probability of primary input or internal signal and E,,,,(g) represents the switching activity of a gate. In this technology decomposition example, the highest activity signal (d) is applied last in the tree decomposition of configuration A, thus yielding a lower switching activity than the equivalent logic in configuration B. The placement of higher activity signals in later stages of the tree decomposition limits the switching activity experienced by internal gates and lowers the number of transitions taking place at the gate’s output. P(a) = 0.3 P(b) = 0.4 P(c) = 0.5 P(d) = 0.7 C d Configuration A: Configuration B: Esw(g) = P(ab) + P(abc) + P(abcd) E“,(g) = P(ab) + P(cd) + P(abcd) = 0.222 = 0.512 Figure 1.8. Technology decomposition. 15 1.3.2.5 System and Architectural Level Optimizations Other techniques for power dissipation minimization which have received attention are power-down methods, precomputation, don’t care optimization, low-power software optimization, and adiabatic-switching. In the case of the power-down approach, blocks of logic not involved in the present computation are automatically turned off to save power. Methods for detecting and disabling unused blocks as well as scheduling algorithms which maximize the “shut-down” period of execution units are discussed in [1, 12]. When certain conditions are satisfied, modules are disabled, thus eliminating any switching activity and power dissipation. In the precomputation technique the idea is to selectively precompute the output logic values of the circuits one clock cycle before they are required, and then use the precomputed values to reduce internal switching activity in succeeding cycles [13]. The technique for don’t care optimization reported in [14] explores the Boolean space in an effort to identify minterms highly appropriate for influencing switching activity. A partitioning of the don’t care set into regions strongly and weakly influential upon switching activity is performed. This variance is exploited to bias area optimization towards reduced power dissipation. Methods for the measurement of power dissipation for the software component of an embedded system have received some attention. For some time, application specific software running on dedicated microprocessor or microcontroller-based systems have been optimized for size and speed. In [15], an approach for estimating the power cost of embedded software is presented. This experimental approach involves the measurement of the amount of current drawn by the microprocessor or microcontroller when instructions are executed. The energy cost of the instruction is the observed average current value multiplied by the number of cycles taken by the instruction. By reordering several sequences of instructions, the average current for the execution Of a program was reduced. The goal of this method is to assign energy costs to 16 each instruction and to generate power—efficient software by selecting and ordering instructions such that the overall program energy cost is minimized. Athas et al. [16] explore a technique for low-power CMOS design by means of constructing combinational and sequential adiabatic switching logic circuits. The concept behind adiabatic switching logic is to recycle signal energies stored in circuit capacitances instead of allowing the energies to dissipate in the form of heat. 1.4 HDL-Based Design Many IC manufacturers are changing their hardware design methodologies to an HDL- based design approach. The primary benefit of the HDL-based design approach is higher productivity [17]. Increased efficiency is achieved when using an HDL by the use of available cores (e. g.,UARTs, bus interfaces, microcontrollers) to incorporate functions into the design. Another example of HDL design efficiency is exhibited in its ability to facilitate reuse of already designed components. HDLs promote hierarchical design allowing one to build more flexibility into lower level building blocks. The ability to define components at the structural-level or behavioral-level of abstraction is a key feature of an HDL. This results in shorter product design completion times. Additionally, designers have less difficulty re-targeting HDL-based designs to dif- ferent programmable-logic families than with schematic-based designs. This advan- tage allows a developer to quickly compare the device cost, performance, and other factors with product-lines of various vendors. To summarize, HDL-based hardware design methodologies promote design reuse, Component definition, design flexibility, and shortened design time. These advantages translate into higher productivity and reduced cost. 17 1.5 Problem Statement The research of this dissertation addresses the power dissipation problem. The objec- tive is to find a more accurate and low-cost method of computing switching activity and power dissipation for behavioral-level designs described in VHDL. The main goal is to achieve an accurate or reasonably approximate estimate of switching activity with minimal computational complexity and memory resources. Additionally, the re- search is concerned with identifying and locating high activity circuit nodes through the development and implementation of new activity visualization tools. The use of the proposed estimator and visualization tool will assist circuit design- ers in the development of power-efficient designs. IC designs will greatly benefit from the use of the proposed techniques in terms of design flexibility, time, cost, and reuse. The research is centered on the use of VHDL because of its widespread use in industry and academic research environments for development, simulation and test- ing capability of high-level and implementation-free designs. VHDL also supports behavioral and structural level design. It allows the development of digital systems based on functional descriptions or component interconnections. A majority of switching activity estimation techniques target gate- and circuit- level design descriptions. Very few techniques exist for accurately computing switch- ing activity at higher levels of design abstraction. On average most high-level switch- ing activity and power estimates contain about 12% error [18]. The goal of this research is to improve the accuracy of high-level switching activity and power esti- mators and to be competitive in terms of accuracy, CPU time and memory resources with respect to gate— and circuit- level estimators. 18 1.6 Dissertation Overview Chapter 2 describes the switching activity estimation problem. It provides back- ground and fundamental information detailing the factors that complicate the accu- rate calculation of switching activity for CMOS circuits. Additionally, a survey of current and past switching activity estimation techniques, along with a description of their advantages and disadvantages, is given. Chapter 3 provides a discussion of the topic of behavioral-level design. Specifically, this chapter addresses some implementation-free or behavioral methods for represent- ing switching functions. Methods such as truth tables, Boolean equations, Binary Decision Diagrams (BDDs), and VHDL behavioral specifications are described. In Chapter 4, the concept of structural methods for representing switching func- tions is reviewed. The most common structural representation, the gate-level netlist is illustrated, along with an overview of VHDL structural specifications. Additionally, an introduction and overview of the Connective Binary Decision Diagram (CBDD) is presented. In Chapter 5, an overview of the new methodology for accurately estimating the switching activity of behavioral-level designs described in VHDL is given. The specific assumptions and constraints that make the new technique a success are highlighted. Additionally, the algorithms and the intermediate transformations are discussed. To demonstrate the effectiveness of the new approach, results and benchmark compar- isons with other power analysis CAD tools are given. Chapter 6 presents a new visualization tool that identifies and highlights the power-hungry areas of the circuit design. A description of the tool’s input transfor- mation process as well as an overview of the activity views is given. A set of examples are used to highlight the usefulness of the new tools. Chapter 7 contains conclusions and a discussion of future enhancements that could 19 improve the behavioral-level activity and power estimator methodlogy and supporting algorithms. Appendix A contains definitions and theorems used throughout the dissertation. Finally, Appendix B contains a walk-through of the BLAPE algorithm when applied to a 4-bit Booth multiplier, described in VHDL. 2O III M ’Ir’h'li . Nina (J H _1 SW] I f.’ (Nth CHAPTER 2 Switching Activity Estimation The switching activity factor, a, is a key component of the switched-capacitance power dissipation model, described by Equation 1.2. Dynamic power is dissipated whenever any internal switching occurs. Thus, switching activity can be determined by evaluating and summing the 0 ——> 1 and 1 —> 0 transition probabilities at the specified node. 2.1 Calculation of Switching Activity Switching Activity, denoted E,w(:r) or (1(a), is defined as the probability that the logic signal at node x experiences a change in its logic state. The most common method of calculating switching activity involves the use of signal probability, P,(:r), which is the probability that the signal at node a: is equal to logic 1. The evaluation of signal probabilities for inputs and simple Boolean expressions is described in Table 2.1. Upon determining a node’s signal probability, P,(:r), the calculation of the node’s switching activity is given by Equation 2.1. A derivation of the switching activity calculation model is provided in Appendix A. Esw(I) = 2 - (1 - P.(IE)) - P303) (2-1) 21 Function Signal Probability Assumptions A P304) -- Z 1 -— P,(A) —— A - B P,(A) . P,(B) A Indep B A+B P3(A)+P,(B) —P,(A-B) AIndepB 2:15:32 - - 0113,, {"11 P,(:r,-) All :r,- Indep x1+ x2 + .. + r" 1— 'I'I1(1 — P,(a:,-)) All :5,- Indep Table 2.1. Boolean expression signal probabilities. The calculation of switching activity is difficult because it depends on circuit input streams, various circuit parameters, and technology-dependent factors which may be unavailable or difficult to characterize. Some of these factors include input pattern dependence, glitch activity, delay model, logic function, and circuit structure. The following sections discuss details of switching activity computations which are based on statistical or probabilistic concepts. Appendix A contains definitions of terms and concepts used in the following sections. 2.1.1 Input Pattern Dependence The switching activity, E,w(g), at gate output g, depends on the input signal prob- abilities, the gate’s logic function, as well as spatial and temporal input pattern dependencies. For example, consider a two-input NAN D gate with independent in- puts (171,172), such that their signal probabilities are each %. From Table 2.2, it is apparent that 6 out of 16 input patterns result in a gate output transition. There- %. When spatial correlation conditions are applied to (271,232) such that (0,0) and (1, 1) are the only input patterns, the NAND gate output switches twice out of the possible four input arrangements and therefore E,w(g) = % (Table 22 2.3). Given that a temporal correlation governs the inputs, where each 0 applied to input 231 is followed by a 1, while each 1 applied to input x2 is followed by a O, 4 output transitions out of 9 input patterns take place, with E,w(g) = 3 (Table 2.4). If spatial-temporal correlation conditions are governing the input dependence, such that $2 changes exactly when 271 changes, then Esw(g) = -} (Table 2.5). 1:1 2:2 NAND 0—>0 0-—>0 1—>1 0—>0 0—->1 1——>1 0—>1 0—>0 1—>1 0—+10-+1 1-—)0 0—>01—>0 1—>1 0—>01-—>1 1—>1 0—>11—>0 1—91 0—+11—>1 1—+0 1—+0 0——>0 1—>1 1—>0 0—+1 1-—>1 1—+10—+0 1—+1 1—>1 0—>1 1—+O 1—>01—+0 0—>1 1—>0 1-—>1 O—>1 1—+11—+0 0—+1 1—>11—>1 0—>0 Table 2.2. Efl'ects of uncorrelated inputs. x1 2:2 NAND 0—+0 O—>0 1—>1 0—>1 0—>1 1—>O 1—>0 1——>0 0—+1 1—>1 1——>1 0—>0 23 Table 2.3. Effects of spatially correlated inputs. 1171 £122 NAND 0—+10—>0 1—+1 O—>10—+1 1—>0 0—>11—->0 1—>1 1—>0 0——>0 1—>1 1—90 0—+1 1—>1 1—>01—>0 0—>1 1—>10—>0 1—>1 1—>10—>1 1—>O 1—>11—>0 0—>1 Table 2.4. Effects of temporal input correlations. $1 $2 NAND 0—+O O—+O 1—>1 0—>O 1—+1 1—>l O—+1 0——>1 1—>O 0—>1 1—+0 1—>1 1—>0 1-—>0 0—>1 1—+0 0—+1 1—>1 1—>1 0—>0 1—+1 1—>1 1-—>1 0—+0 Table 2.5. Effects of spatio—temporal input correlations. 2.1 .2 Glitch Activity Spurious transitions at any node outputs represent an additional component of the switching activity. The spurious transitions, known as glitches or glitch activity, are 1mWanted transitions that occur before a node settles to its final steady-state value. The Simple circuit and waveform shown in Figure 2.1 demonstrate the effects of glitch activity. The output y of the circuit should always remain at logic one, but due to 24 w —. It: ’ may [Wit act-m SUI‘h due i internal delays injected by the inverter (denoted by signal are) a transient drop to logic zero occurs at the output. Glitches are apparent at the architectural level in static designs due to finite propagation delays from one logic block to the next block resulting in a node having multiple transitions in a single clock cycle before settling to the correct logic level [4]. Each of the spurious transitions consumes power and yiiwiw 0123456789 Figure 2.1. Glitch example. may account for as much as 10 to 40 percent of the switching activity power loss in typical combinational logic circuits [3]. It was reported by Shen et al. [5] that glitches accounted for 20 percent of the power dissipation over a range of circuits, and circuits such as combinational adders were subject to a 70 percent or more power dissipation due to glitch activity. 2.1 .3 Delay Model The delay model used for switching activity estimation is an important factor. Models web as zero delay and real delay make different assumptions concerning the propaga- tion 0f the signals throughout the circuit. In the zero delay model it is assumed that all Changes at the circuit inputs reach the internal gates of the circuit instantaneously, 25 which allows for a glitch-free representation of the circuit. In the case of the real delay model, each gate in the circuit is given a delay. This in turn, may cause internal and output nodes to experience multiple transitions during a single input transition. In many circuits, glitch activity accounts for a large percentage of the switching activity, thus making the calculation of the switching activity more difficult. The difficulties arise in the determination of the glitch locations. 2.1.4 Logic Function The logic function implemented by a circuit directly influences the switching activity. The logic function of a gate determines the probability that the present value of the gate will differ from the previous value. For example, the truth table (Table 2.6) provides the logic input / output conditions for two-input AND, OR, and XOR gates. To mimic the logic in this truth table from a probability perspective, the signal probability table (Table 2.7) uses p to denote the probability that an input signal equals logic 1 or P(x, = 1), and uses g, which is 1 — p, to denote the probability that an input signal equals logic 0 or P(x, = 0). The signal probability for the logic gate outputs are computed by summing the probability permutation entries, which indicate a logic 1 at the gate’s output of the truth table for the corresponding input combination. The resulting switching activities for each two-input gate (Table 2.8) are computed by applying Equation 2.1 to the signal probabilities. .732 AND OR XOR O O O O 1 O l 1 O 0 1 1 1 1 1 O Table 2.6. Two-input AND, OR, XOR truth table. 26 P,(;I:1) P,(:1:2) P,(AND) P,(0R) P,(XOR) q q (12 <12 «172 q p 41) (qr) (qr) :9 q M (pq) (pq) p p (1)?) (1)2) :22 Table 2.7. Two-input AND, OR, XOR signal probability table. g Ps(g) Esw(g) AND 102 2192(1 - 102) OR 1 - (12 MO - (12) XOR 2m 4(1 - 2mm Table 2.8. Switching activities for two—input AND, OR, XOR gates. 2.1.5 Circuit Structure The circuit structure may also cause difficulties in computing the switching activity when reconvergent fanout nodes are considered. This problem is more challenging because the internal signals may be correlated, potentially requiring a large amount of computational effort and memory usage. Some power estimation techniques ignore these fanout correlations. Approximations may be used to improve accuracy while shortening the execution time of the simulation. The simple circuit depicted in Figure 2.2 contains reconvergent fanout. The gate outputs, M1 and M2, are correlated due to their common dependence on input 232. The correlation results in a signal probability overestimate (Equation 2.4) at output F, when assuming M1 and M2 are independent. This common inaccuracy is attributed to a statistical inequivalence of two Boolean expression forms (Equations 2.2 and 2.3) 27 x1 , . Ml WE. :D—F ‘ , M2 x3 Figure 2.2. Reconvergent fanout circuit example. which equivalently represent the output F. The correct signal probability (Equation 2.5) for node F is obtained when minimizing node F’ s Boolean expression to Equation 2.3, followed by applying the OR-signal probability calculation given in Table 2.1. In the reconvergent fanout example below, all input signal probabilities are assumed to be 133(11): % f1 = M1 + [”2 = (2:1 + 2:2) +(1r2 + 1:3) (2.2) f2 = (331 + $2 + 1’33) (2-3) Ps(f1) = PSUIII'i—MQ) = 1- (1 — P.(M1))(1— P.(M2)) z 1- (1 — as. + x2))(1— P3022 + $3» = I; (Overestimate) (2.4) P8(f2) = 1— (1 _ P3(.’171))(1— Ps($2))(1— P3033» = g (Correct) (2-5) Esw(f2) : 2'Ps(f2)‘(l-P8(f2)) 7 1 = 2°§'§ 14 251 = 0.219 28 2.2 Previous Work Switching activity estimation techniques have usually been classified as either sta— tistical or probabilistic, and in most cases are used at the circuit- or gate- level of abstraction. In statistical methods, traditional models are used to simulate the cir- cuit for a set of randomly chosen input vectors while monitoring the switching activity on each circuit node. The input vectors are generated from user-specified probability information. This approach uses statistical mean estimation techniques, such as the Monte Carlo procedure, to determine when to terminate the simulation to obtain a certain user-specified accuracy and confidence level. In the probabilistic approach, a stochastic model describing the input signals along with special library models for gates are used. The signal probabilities of the Circuit’s Primary inputs are propagated into the circuit to promote switching activity at all internal and output nodes. Other methods of switching activity estimation are performed at a higher level of abStraction. In general, these methods involve the transformation of a hardware de- scriPtion language (HDL) behavioral specifications into register-transfer-level (RTL) architectures followed by a simulation process to determine switching activity esti- mates. The following sections discuss previous work in the area of switching activity es- tirnation for both statistical, probabilistic and high-level methods. The discussion Su“llllarizes techniques which support combinational and sequential circuits, where fa‘CtOI‘s such as delay model, circuit structure, glitch activity, and input pattern de- Pendence are considered. 29 2.2.1 Statistical Methods Burch et al. [19] proposed a Monte Carlo simulation approach for the estimation of power dissipation in combinational circuits which alleviates the problem of pattern dependence by properly choosing the input vectors. The approach applies randomly generated input patterns to the circuit while monitoring the power dissipation for T clock cycles. Each measurement gives a power sample which is treated as a random variable. As the sample size T approaches infinity, the sample distribution approaches a normal distribution in accordance with the central limit theorem. Sample sizes greater than 30 ensure normal density for most combinational circuits. To establish a stopping criterion, the normality assumption is required, where total power (PT) is normally distributed for any T. N different simulations of the circuit of length T are performed, producing the sample average (771*) and the sample standard deviation (3T) , for N different PT values. According to [20], there is a (1 — a) at 100% confidence that [171» — E [PT][ < t% - 5%, where tg. is obtained from the t-distribution with (N — 1) degrees of freedom. As a result, for the desired percentage error e in the power eStiInate and for a given confidence level (1 — a), the circuit must be simulated for N lterations, where N is given as N = (ti 'ST)2. (2.6) am The Monte Carlo simulation method may not converge for circuits which do not have normal power distributions. Non-convergence may also occur when T is too Small. Moreover, the Monte Carlo method does not support sequential circuits since it must wait for a setup time TMAX , where TMAX iS the longest delay along any path. For sequential circuits TMAX z 00. The setup time is the time required before the beginning of a sample interval to guarantee stationarity of the circuit’s internal tramsition process. A process is called stationary if its distribution functions or certain 30 expected values are invariant with respect to time [21]. When this technique was applied to ISCAS-85 benchmark circuits, the maximum error for a 5% accuracy and 99% confidence level was greater than 5% for only 1% of the cases [19]. Stamoulis proposed a Monte Carlo approach for estimating switching activity in sequential circuits based on the analysis of paths in the state transition graph (STG) [22]. The simulation is performed at the gate-level and obtains an accuracy within 10% of the actual switching probability value for a particular node with a 95% confidence level. Using the circuit described by Figure 2.3 the approach focuses on accurately determining the switching probabilities at the flip-flop output nodes. Primary Inputs » ._p Present States Figure 2.3. Sequential circuit model. Primary Outputs Combinational Logic Next Latches 4—— States Clock The following assumptions are made: 1) the circuit can be in any of the states with equal probability upon power up, 2) the latch outputs are glitch-free, and 3) all latches reach steady state before the next state enters the combinational logic. 31 These assumptions allow for the estimation of the power dissipation of the latches separately from the rest of the circuit (i.e., the combinational part), and constrain the number of latch output transitions to at most one per clock cycle. By the definition of initializable circuits, the stochastic processes which describe the state of the flip— flops in the time domain are stationary. States that are N clock ticks apart become independent as N —-) 00. This ensures that the node processes are also mean ergodic. The mean ergodicity property refers to a stationary random process X (t), where the ensemble averages [umhw satisfy 1) fiEEOEU/‘xlTl = pa, and 2) #51; Var{[ux]T} = O [21]. By using the “path” notion in the STG, long time-consuming simulations are avoided. Pathwise averages are computed using sample simulations with different ini- tial conditions. The estimation process is performed in two steps. First, the pathwise transition probabilities are estimated using Monte Carlo simulation to determine the minimum number of path samples which will contribute to an error s with confidence (1— 01):: 100%. Second, the average switching probability estimation is computed over all paths, with the switching probability estimate of each path being one measure- ment. The results of this technique when applied to ISCAS-89 benchmark circuits indicate switching probabilities can be estimated with 5% accuracy at a 95% confi- dence level [22]. Xakellis et al. [23] proposed a switching activity estimation approach which effi- ciently estimates the transition density at all circuit nodes. This technique improves upon the approach proposed in [19] by eliminating the statistical sampling at single gates, which requires a large number of input patterns for convergence. The conver- gence problem is overcome by classifying nodes into low-density and regular-density categories and applying absolute error bounds on low-density nodes instead of per- centage error bounds. This is done by establishing a threshold to classify low- and regular-density nodes. A node with a transition density value less than the threshold is a low-density node; nodes with a transition density equal to or above the thresh- 32 in:- l"; old are regular density nodes. The advantage of this method is that it allows the desired accuracy error bounds to be specified by the user. The user also supplies the transition density for every input node, which is the fraction of time the circuit input signal is high. If the circuit input probability is not specified, it is assigned a default value of %. Next, a random number generator is used to generate the logic input waveforms which drive the simulator. For a given period T, the number of transitions at each node is counted. This process is repeated N times to form an average transition density at each node. This approach, just as in [19], requires a normal distribution of the node densities to establish a stopping criterion with con- fidence (1 -—- oz) :1: 100%. Additionally, for synchronous mode simulation, the input signals are assumed to be Markov. The speed of the algorithm is strongly affected by the user-specified threshold which will determine the stopping criterion for the two categories of nodes. Convergence speeds are improved while sacrificing accuracy only at low-density nodes. When tested on a variety of ISCAS-85 benchmark circuits, it was found that over 95% of regular node transition density values have less than 5% error. Low density nodes performed well; over 95% of the low-density node transition values were found to be less than the specified absolute error [23]. Another Monte Carlo simulation technique was proposed by Najm et al. [24] in which a method for estimating the switching activity at the latch outputs of sequential circuits is discussed. Similar to [23], this approach makes use of up—front user-specified accuracy information. The algorithm runs until the specified accuracy is achieved. This technique applies a number of randomly generated input vectors to the circuit and collects statistics at the latch output using zero-delay logic simulation. When computing the state line probabilities (signal probability of latch outputs), the Monte Carlo approach is used for estimating N, the number of iterations necessary to achieve the user-specified error-tolerance e and confidence level oz [23, 25]. The approach makes two assumptions: 1) the sequential circuit is a non-decomposable FSM, and 33 2) the state of the machine at time K becomes independent of its initial state as K —) oo. Assumption 2 implies that the F SM is aperiodic in that it does not cycle through a repetitive pattern of states. The method requires that two simulation runs be performed, with each run starting in a different initial state, say (X0,X1). The signal probabilities PK(z,-|Xo) and PK(:1:,-|X1) are computed for increasing values of K. Both simulation run estimates should converge to P(r), the signal probability of the input signal. When both measures remain within a window of is for three consecutive time instants, the node is assumed to have reached convergence. When all nodes have converged, the simulation is complete and the average of the last PK(:r,-|X0) and PK(a:,-|X1) value is reported as the signal probability P(zi), for each :ri. An important feature of this technique is that no assumptions about the FSM behavior (Markov or otherwise) or state line independence is made. Results for this approach on sequential circuits with 1452 flip—flops required 4.6 hours of simulation time (SUN SparclO) to achieve an error tolerance of 5% with confidence 95% [24]. 2.2.2 Probabilistic Methods Ghosh et al. [26] proposed a probabilistic method for the estimation of average switch- ing activity in combinational and sequential circuits. The method considers temporal correlation at the internal nodes and outputs, but requires that the primary inputs be uncorrelated. The method automatically computes switching rates and correlation among flip-flop outputs of FSM’s. The behavior of the primary inputs is described in terms of their transition probabilities. The general delay model is used to cor- rectly compute Boolean conditions by compensating for glitch activity. For the signal at, the transition probabilities are p2, which denotes the probability that the sig- nal a: will experience a transition from state i to j. The probability of the signal a: changing from 1 to O is p],0 = —,1,7$(k)§(k—+—1—) The signal probability of node a: in terms of transition probabilities is P,(:1:) = P; = P310 + P3”. These signal prob- 34 abilities are propagated from the inputs to internal and output nodes by summing the probabilities which disjointly cover the Boolean function in terms of its primary inputs. For example, if the function g = 3:372 + Eyz + 5372, then P(g 2 1) or P9 = p($)p(y)p(z) + p(E)p(y)p(z) + p(f)p(y)p(§). The transition probabilities are propagated from the inputs to internal and output nodes by finding the signal prob- ability of the XOR of the Boolean function of each node in two consecutive time frames. For node y, P,(y) = P,(y(t) ®y(t + 1)). For a two-input AND gate, where y = 271 AND $2, the following equations apply: PAS!) = P3[$1(t)$2(t)$$1(t+1)$2(t+1)] = Pslflyll fl?!) = Cl71(t+1)1131(l)-’152(15)‘l' and Pslf (31)] = vii 19.1.2 + pig mil + p211 191-2 + pit 19312- Binary decision diagrams (BDDs) are used for the calculation of the signal probabil- ities for each of the Boolean functions [27]. This is more simplistic than the enumer- ation of disjoint covers for each Boolean function. The exact signal probabilities can be computed by performing a linear traversal of the BDD representation of a logic function [28, 29]. To account for gate delays a symbolic simulation method is used to generate a multiple-output function that represents the total switching activity over any possible input vector pair. The method proposed by Ghosh et al. [26] may also be applied to F SM’s. This technique accounts for correlation by transforming the conventional FSM structure 35 >I. def SUM an. no} the I ('01,, DIUI‘ (Figure 2.3) to a new structure (Figure 2.4) containing symbolic simulation equations which represent internal and output nodes in the next state logic. For the present state, the gate output’s switching activity can only be determined by primary inputs. To compute the transition probabilities, the static probabilities for the present state are used. The next state logic generates Boolean equations which model correlation between the present and next states, thus computing the transition probabilities auto- matically, which considers the correlation between transitions. This method assumes that present state lines are uncorrelated and that upon power-up the FSM can be in any of the 2N states, where N is number of flip—flops. I(t+l) a, b . Transition [(0 Ssgymfmhc Probabilities ___.__.p Next State p(t+1) 1muat1on i———> P(t) Logic ‘—’ Equations a Figure 2.4. Symbolic sequential circuit model. In [29] a new measure of activity, the transition density is proposed. The transition density may be defined as the average switching rate at a node. This method uses a stochastic model of the logic signals in which the density values of the primary inputs are prOpagated to internal and output nodes. The transition density at a: is defined as D(:r) = 11:“; 3%3, where nx(T) is the number of transitions of the signal x(t) at node :1: in the interval (—§, g]. The propagation algorithm involves a single pass over the circuit and computes the transition density at all nodes. The primary inputs are considered to be spatially independent and strict sense stationary (SSS). A random process X(t) is called strict sense stationary (SSS) if all of the distribution functions 36 I\'l .I, f I f‘ :7 nl'dr lei" _ — —'— M __ describing the process are invariant under a translation of time [21]. The transition density at each node is defined as pip (3:?) Dee), (2.7) i=1 where 5% represents the Boolean difference of a function y with respect to signal :12,- and P(-) represents the equilibrium probability. Since the input signals are SSS, the output will have the same statistics as its inputs. The zero-delay model is assumed, and the algorithm is limited to combinational circuits only. The algorithm considers the circuit to be an interconnection of logic modules, where each module represents a Boolean function based on the zero-delay model. The drawback to the approach concerns the input independence requirement. If the circuit topology includes re- convergent fanout and feedback, then internal nodes could become correlated, thus possibly destroying the independence property. The propagation of density and prob- ability proceeds on a per module basis from the primary inputs to primary outputs. Next, the equilibrium probability, P (g3), is evaluated for each node in a module, on a per module basis using the BDD [30]. This approach has a high cost in time and memory usage because the propagation algorithm must interact with the BDD, which may grow exponentially with respect to the number and order of the circuit inputs. In [31] an improvement in the accuracy of circuit activity measurement is dis- cussed. This technique offers a more efficient mechanism for computing the Boolean difl'erence probabilities at each node of the circuit, which are necessary for the esti- mation of transition density [24]. In addition, this method allows measurements to be made in a pattern independent manner. The algorithm partitions the combinational circuit, which is modeled as a directed acyclic graph, with the goal of maximizing the number of correlated nodes within each partition. 37 .Yl' 7": as: r: for SlZi- , The partitioning aspect of the algorithm prevents overestimation and underesti- mation inaccuracies in the density simulation which may occur from the correlation of fanout inputs in the circuit. To compute the Boolean difference probabilities for each node in a given circuit, this method constructs an ordered BDD (OBDD), by using the APPLY and RESTRICT procedures at each node [27]. The APPLY procedure provides the basic method for creating the representation of a function according to the operators in a Boolean expression or logic gate network. The RESTRICT proce- dure transforms the graph representing a function f into a function representing the function f[:r,-:b for specified values of i and b. The efficiency of the computation for producing maximally reduced OBDDs is improved according to the gate operation needed. The development of a new operation, the DIFFERENCE operation, gener- ates Boolean difference functions by applying algebraic operations to other functions. The DIFFERENCE operation is implemented as a sequence of APPLY operations. The partitioning aspect of the technique allows for an improvement in accuracy as the size of the partition grows. However, as the size of the OBDD grows, the algorithm slows down. The OBDD can grow exponentially with the number of inputs. For this reason the number of inputs is chosen as a parameter to determine partition size. Next, a breadth—first search is applied at every primary input such that each partition consists of a single output with k variables, where the variables may be primary outputs or inputs. The circuit partitioning procedure stops when each node belongs to a partition which contains k or fewer input variables. The purpose of the circuit partitioning is to keep each partition small enough to achieve accurate results. The partitions are then placed in a partition set, stored in the order of their formation. The transition density is computed for each partition by computing the Boolean difference probability. The drawback to the technique is the rapid growth of the OBDD, which leads to memory overflow problems. Experimental results report that for 50 combinational circuits, with up to 20 inputs, this technique required 40% 38 71/81 : f1('l1..i]\1,P81..P81V') (2.8) 7182 = f2(l1..lM,P81..PSN) ”SN '3 fN(ll..l)\1,P81..PSN) and P(nsl) = P(f1(’ll..lM,P81..PSN)) (2.9) P(nsg) = P(f2(ll..7:M,P81..PSN)) P(TLSN) = P(fN(i1..’lM,P81..PSN)) where P(nsl) = P(nsl = 1) and P(ps,) :2 P(ns,) = p, for 1 < i < N. The present state line probabilities are applied to a nonlinear function g and a nonlinear system of equations is given by 91 = P1 —‘ 91(P1,P2, "-9le = 0 (2-10) 212 = P2 - 92(P1,P2,---,PN) = 0 yN : pN _ gN(I)1)p2?”°,pN)= O The nonlinear system of equations may be denoted as Y(P) = 0 or P = C(P). An iterative solution can be obtained by the use of the Newton-Raphson method for the system Y(P) = O [33]. Given the nonlinear system of equations, P = G(P), the Picard-Peano method can be applied to determine a solution [32]. Since the 40 nonlinear solutions do not capture correlation between state line probabilities, minor inaccuracies are incurred. To improve accuracy, an unrolling of the next state logic network is performed (Figure 2.5) [32]. The signal probabilities are approximated by unrolling the next state logic k times, where k is a user-specified parameter. Usually an unrolling of the next state logic improves the accuracy of the results. For k = 3 the average error was reported to be only 1.5% [32]. As It increases the time consumption increases, along with a decrease in average error. (signal probability feedback) NS" N3“ + PS° I Ps“ —' o o o ———-’ (k=a user defined limit) Figure 2.5. k-unrolling of next state logic. The correlation accuracy improvement (m-expanded network) involves modifying the next state logic by selecting m-tuples of the present state lines, separated by one clock cycle, and computing probabilities for each combination of the m-tuples pairs (Figure 2.6) [33]. These probability values are fed into the combinational logic block. Using the ISCAS-85 benchmark circuits, the m—expanded network method for an accuracy improvement with m = 2 obtained an average error less than 4.1%, and for m = 4 the reported average error was less than 3.6% [33]. Cheng et al. [34] prOposed a method to increase the speed of estimation of power dissipation by applying topological analysis to the circuit using the concept of super- 41 II": at. I111) NSL2[II] Psl D— Next State ‘ Mgic ° N51.2“)” PS I ..n ' rs, Figure 2.6. m-expanded network with m=2. gates. The supergate of a node in a combinational circuit is the minimal subcircuit whose inputs are logically independent. The algorithm transforms a circuit to an undirected graph and solves the problem of finding supergates by finding articula- tion points in an undirected graph. An articulation point is defined as a node whose removal disconnects the graph. The supergate concept allows for the partitioning of BDDs, where each partition boundary is defined by its logically independent inputs. When the supergate concept is applied to most BDD-based power estimation tech- niques, for select benchmark circuits, cpu-time and memory usage are reduced by as much as 86.2 and 94.0 percent, respectively [34]. 2.2.3 High-Level Methods A profile driven approach to low power behavioral synthesis is presented in Katkoori et al. [35, 18]. The method presented is known as the Profile Driven Synthesis System (PDSS). Given a behavioral design specification, a set of input vectors, a param- eterized library module, and user-specified constraints such as area and speed, the switching activity is estimated for the design. Each library module is characterized by its average switching activity per input vector and consists of units such as adders, registers, muxes, etc. The behavioral specification can be written in a hardware 42 description language such as VHDL. The synthesized design consists of interacting datapath and controller components. The datapath is composed of modules from the module library and the controller is a finite state machine (F SM) implemented as a PLA/microprogram. The PDSS accepts the behavioral VHDL specification as input and extracts a data flow graph (DFG). The DFG is passed through a profiler, where operations and carriers (edges of the DFG) are collected. In more detail, carriers are defined as data flow edges that cross a control step boundary which denotes a value that needs to be stored in a register. The goal of the profiling phase is to gather the following: 1) the number of times a node is executed for a given profiling stimuli, 2) the number of times each edge is traversed during execution, and 3) the number of times the edge value changes. Upon the completion of the profiling phase, four additional phases are entered: 1) scheduling and performance estimation, 2) register optimization, 3) interconnect optimization, and 4) controller generation. During the scheduling and performance estimation phase, Operations in the DFG are assigned to control steps and various operation nodes are bound to specific mod- ules selected from the module library. The schedule is acceptable when the estimates of the area and clock period satisfy the user-specified constraints. The register op- timization phase groups carriers such that no two carriers in the same group are simultaneously active. The interconnect Optimization phase involves the assignment of interconnect paths to each value transfer in the DFG. The controller generation phase produces a finite state machine description. The FSM accepts as input data- path status flags and produces control signals which enable register transfers in the datapath. DFG edges that cross control step boundaries denote state transitions in the FSM. Each control step corresponds to at least one state in the FSM. Using the data collected in the profiling phase, the PDSS system determines an estimate for aggregate switching activity (ASA). The ASA is the sum of the switching activities in the datapath and the controller. The switching activity associated with 43 pr- (‘01. Ili-s F n ma Sit; COI‘: it }, Elfin fewer nodes then the straight-forward way of computing Boolean difference equations. In two simultaneous publications by Tsui et al. [32] and Monteiro et al. [33], exact and approximate methods for estimating switching activity in FSM’s are discussed. These techniques also closely model the work presented in [26] and share the same sequential circuit and correlation decomposition structures (Figures 2.3 and 2.4). Both techniques are almost identical with a few minor differences. To begin with, the calculation of switching activity for sequential circuits must consider the probability of the circuit being in any Of its 2N states, where N is the number of flip-flops. Further, to arrive at an accurate estimate of switching activity, the values for the present state line probabilities, steady-state probabilities, and signal probabilities must be known. One method of exactly computing the steady-state probabilities is to find a solu- tion to the 2” linear system of Chapman-Kolmogorov equations. Next, present state line probabilities are computed from the steady-state probabilities and the signal probability is computed by generating the Boolean function’s disjoint covering. The computational cost of computing the steady-state probabilities from the Chapman- Kolmogorov system of equations is very high, but produces the exact state probabili— ties. The exact solution involves the system 7r = P7r or (I — P)1r = 0, where the vector 7r contains the steady-state probabilities. P is referred to as the transition probability matrix, which is derived from the state transition graph (STG). To determine 7r, the steady-state probability vector, the null-space of the system (I — P)7r = 0 must be computed. As the number of states grows exponentially with the number of flip-flOps it becomes impossible to build an STG for large sequential machines and thus the exact method cannot be applied. A more time-efficient but less accurate approach searches for an iterative solution to a nonlinear system of N equations [32, 33]. This system of nonlinear equations represents the next state logic preceding the symbolic combinational logic block in Figure 2.4. The next and present state line probabilities have the following form: 39 Slit] of. ili'i] Ih] nsl = f1(i1..iM,Psl..PsN) (2.8) 71.32 = f2(i1..iM,Psl..PsN) TLSN = fN(l1..lA[,P81..PSN) and P(TlSl) = P(f1(’ll..’l:M,P81..PSN)) (2.9) P(nsg) '2 P(f2(i1..lM,P81..PSN)) a...) = P(f~(i1--iMaPSI--P3N)) where P(nsl) = P(nsl = 1) and P(ps,) = P(ns,) = p,- for 1 < i < N. The present state line probabilities are applied to a nonlinear function g and a nonlinear system of equations is given by yl : Pl _ 91(p1ap27°"apN) : 0 (2'10) P2 — 92(P1,P2, -o-,PN) = 0 312 yN : pN — gN(p13p29°"3[)N) : 0- The nonlinear system of equations may be denoted as Y(P) =2 O or P = G (P) An iterative solution can be obtained by the use of the Newton-Raphson method for the system Y(P) = O [33]. Given the nonlinear system of equations, P = C(P), the Picard-Peano method can be applied to determine a solution [32]. Since the 40 nonlinear solutions do not capture correlation between state line probabilities, minor inaccuracies are incurred. To improve accuracy, an unrolling of the next state logic network is performed (Figure 2.5) [32]. The signal probabilities are approximated by unrolling the next state logic k times, where k is a user-specified parameter. Usually an unrolling of the next state logic improves the accuracy of the results. For k = 3 the average error was reported to be only 1.5% [32]. As It: increases the time consumption increases, along with a decrease in average error. (signal probability feedback) NS" NS“ p PsO [ 1>s"'l ———> o o o ——’ (k=a user defined limit) Figure 2.5. k-unrolling of next state logic. The correlation accuracy improvement (m-expanded network) involves modifying the next state logic by selecting m—tuples of the present state lines, separated by one clock cycle, and computing probabilities for each combination of the m-tuples pairs (Figure 2.6) [33]. These probability values are fed into the combinational logic block. Using the ISCAS-85 benchmark circuits, the m-expanded network method for an accuracy improvement with m = 2 obtained an average error less than 4.1%, and for m = 4 the reported average error was less than 3.6% [33]. Cheng et al. [34] proposed a method to increase the speed of estimation of power dissipation by applying topological analysis to the circuit using the concept of super- 41 will... i ]-— Nsmm] PS, ‘ 1...... P82 1 D— NSmIIOl Next State ' l—q mgic . NSIJZIOII Pslm . Figure 2.6. m-expanded network with m=2. gates. The supergate of a node in a combinational circuit is the minimal subcircuit whose inputs are logically independent. The algorithm transforms a circuit to an undirected graph and solves the problem of finding supergates by finding articula- tion points in an undirected graph. An articulation point is defined as a node whose removal disconnects the graph. The supergate concept allows for the partitioning of BDDs, where each partition boundary is defined by its logically independent inputs. When the supergate concept is applied to most BDD-based power estimation tech- niques, for select benchmark circuits, cpu-time and memory usage are reduced by as much as 86.2 and 94.0 percent, respectively [34]. 2.2.3 High-Level Methods A profile driven approach to low power behavioral synthesis is presented in Katkoori et al. [35, 18]. The method presented is known as the Profile Driven Synthesis System (PDSS). Given a behavioral design specification, a set of input vectors, a param- eterized library module, and user-specified constraints such as area and speed, the switching activity is estimated for the design. Each library module is characterized by its average switching activity per input vector and consists of units such as adders, registers, muxes, etc. The behavioral specification can be written in a hardware 42 description language such as VHDL. The synthesized design consists of interacting datapath and controller components. The datapath is composed of modules from the module library and the controller is a finite state machine (F SM) implemented as a PLA/microprogram. The PDSS accepts the behavioral VHDL specification as input and extracts a data flow graph (DFG). The DF G is passed through a profiler, where operations and carriers (edges of the DFG) are collected. In more detail, carriers are defined as data flow edges that cross a control step boundary which denotes a value that needs to be stored in a register. The goal of the profiling phase is to gather the following: 1) the number of times a node is executed for a given profiling stimuli, 2) the number of times each edge is traversed during execution, and 3) the number of times the edge value changes. Upon the completion of the profiling phase, four additional phases are entered: 1) scheduling and performance estimation, 2) register optimization, 3) interconnect optimization, and 4) controller generation. During the scheduling and performance estimation phase, operations in the DFG are assigned to control steps and various operation nodes are bound to specific mod- ules selected from the module library. The schedule is acceptable when the estimates of the area and clock period satisfy the user-specified constraints. The register op- timization phase groups carriers such that no two carriers in the same group are simultaneously active. The interconnect optimization phase involves the assignment of interconnect paths to each value transfer in the DFG. The controller generation phase produces a finite state machine description. The FSM accepts as input data- path status flags and produces control signals which enable register transfers in the datapath. DF G edges that cross control step boundaries denote state transitions in the FSM. Each control step corresponds to at least one state in the FSM. Using the data collected in the profiling phase, the PDSS system determines an estimate for aggregate switching activity (ASA). The ASA is the sum of the switching activities in the datapath and the controller. The switching activity associated with 43 the datapath is computed by summing the switching activities determined for the following: combinational Operators, registers, and interconnection elements such as buses, multiplexers, and wires. The switching activity contributed by the controller is determined by analyzing the PLA structure used to implement the controller’s F SM. The activity concerning the PLA structure is the sum Of the PLA’s input and output plane switching activities. The experimental results of the method indicate that the estimated switching activity deviates by less than 10% [35]. Landman et al. [36] present techniques for accurately estimating power consump- tion based on a high-level description of the system architecture. This approach, based on stochastic modeling of bus statistics, achieves the accuracy associated with gate- level estimation tools. Algorithmic and architectural estimation techniques based on high-level statistics such as mean, variance, and autocorrelation are developed using concepts from the gate-level techniques. While gate-level techniques focus on power consumed by individual Boolean logic gates as a function of their input probabilities; this method considers module (adder, register, multiplier) power consumption in re- gards to input-word statistics. For a variety of input distributions, the techniques perform very well on real-world signals such as speech, music and image, typically found in digital signal processing (DSP) applications. The results obtained from these techniques exhibit an estimation accuracy within 9.4% of the gate-level simulations [36]. In [37, 38] Nemani et al. and Najm, in similar publications, presented a digital IC power estimation technique that Operates at the register-transfer—level (RTL). The estimator is based on using entrOpy as a measure of the average activity to be expected in the final circuit-level implementation. Entropy is a characterization of a random variable or random process. If a: is a random Boolean variable with signal probability p, then the entropy of :1: is defined as: H (3:) = p log2 %+ (1 — p) log2 0+“. The entropy can be expressed for discrete systems in terms of inputs and outputs. The high-level 44 power estimation methodology for a combinational circuit block that is part of a synchronous sequential circuit involves the following steps: 1) Run a structural RTL simulation of the sequential circuit to measure the input / output entropies of the combinational block. 2) From the input/output entropies, estimate average node density, circuit area, and average power. 3) Combine with latch and clock power to compute total average power. The area and average power estimates are computed via input/output entropy estimates where A oc %H (Y), Pm,g oc A x H, such that H (Y) is the output entropy and H is the average input entropy. When this method was tested against a zero— delay model for 56 different ISCAS-85 benchmark circuits with sizes ranging from 100 to 22,000 gates, the error was reported to be less than 9% with a 90% confidence level [37]. In summary, the estimates generated by current high-level power and switching activity computation methods experience between 9% to 12% error on average. For some circuits containing large amounts of reconvergent fanout the error is as high as 80%. The error experienced by current high-level power and switching activity estimation techniques is too high. This dissertation presents a new methodology for improved switching activity estimation of behavioral-level designs described in VHDL. 45 _C pr. in; {0. lift; bin; Cal [1 [le CHAPTER 3 Behavioral Representations of Switching Functions The fundamental building blocks of digital systems are switching functions. This chapter specifically addresses the use of Boolean switching functions. A review of terminology and notation used to describe the functional and mathematical models of Boolean switching functions when applied to behavioral design of digital systems is provided. Additionally, a discussion of various behavioral representations for switch- ing functions is provided as well. 3. 1 Switching Algebra Background The mathematics which define rules and operations for processing the binary set {0, 1} in a combinational digital system is called Switching Algebra, also known as t“VG-Valued Boolean algebra, developed by English mathematician George Boole in 1854. The most primitive Operations applied to variables (i.e., V33,- 6 {0,1}) of the binary set are “-”, “+”, and “—”, which denote switching algebra operations, also called logic Operations. A switching algebra must satisfy Huntington’s postulates (Table 3.1), which define the basis of switching algebra theory [39]. 46 M... Pig1 Property Meaning Closure 11:,y 6 {0,1} ——> (a:+y) 6 {0,1} :c,y€ {0,1}—>(r-y)€ {0,1} Identity $+O=$ 23-020 $+1=1 vl-x Commutative 3+3] = y +2: x'y=y-$ Distributive :r + (y - z) = (.7: + y)(:r + z) $‘(y+2)= (IE-y)+($'2) Complement :1: + f = 1 a: - E = O Table 3.1. Huntington’s postulates. 3.2 Truth Tables ” fl.” “_” The definition of the three logic Operators ( “+ , , ) can be deduced from Hunt- ington’s postulates using truth tables. A truth table is a mechanism for systematically enumerating the output or result of a logic operation for every possible Boolean input combination. The truth table can be extended to enumerate the outputs for switch- ing functions consisting of sets of logic operations and Boolean operands. A double vertical bar is used to separate truth table inputs from truth table outputs. The commonly used truth table convention places inputs on the left of the double vertical bar, with outputs placed to the right. 3.2.1 Logic Operations The “—” logic operation is called the complement, NOT, or invert operation, of which the term NOT is the most commonly used. The NOT operation is a unary logic operation, meaning that it operates on a single operand. When the NOT operation is applied to the Boolean operand :c, the result is f. The truth table in Table 3.2(a) shows that the NOT operation changes 0 to 1 and changes 1 to 0. 47 a: f .1: y a: . y a: y :r+y O 1 0 0 0 0 0 0 1 0 O 1 0 O 1 1 1 0 0 1 0 1 1 1 1 1 1 1 (a) NOT (b) AND (c) OR Table 3.2. Truth tables for NOT, AND, OR logic operations. The “3’ logic operation is called the AND or conjunction operation. The term AND is most commonly used. It is a binary operator, meaning that it takes two operands and generates 22 input combinations. The result after applying the AND operation to binary operands :1: and y is 1, only when :1: = y = 1, otherwise the result is 0. The truth table in Table 3.2(b) depicts the AND logic operation. The “+” logic Operation is called the OR or disjunction operation. The term OR is most commonly used and is classified as a binary operator. The result after applying the OR operation to binary Operands a: and y is 0, only when a: = y = 0, otherwise the result is 1. The truth table in Table 3.2(c) depicts the OR logic operation. The truth table technique is an implementation-independent method of repre- senting switching functions. 'Ituth tables are considered behavioral representations because no reference to technology, structure, or implementation is made. The truth table completely specifies a switching function’s output for all possible (2”) input combinations, where N is the number of Boolean inputs. Consider the 3-input switch- ing function which represents the carry-out bit for a full-adder (FA). The FA carry-out bit input/output behavior is specified for all 23 inputs sequences (Table 3.3). 48 ., .1 .. yl. a b c,-,, cont 0 0 O 0 O 0 1 O 0 1 0 0 0 1 1 1 1 0 0 O 1 0 1 1 1 1 O 1 1 1 1 1 Table 3.3. Truth table for full-adder carry-out bit. 3.2.2 Output Enumeration The outputs of switching functions are easily enumerated with truth tables. Each row of a fully specified truth table is referred to as a term or product. The term represents the input combination that determines the output of a switching function. Each term or product is logically an AN D-product of input variable literals. A switching function consisting of N input variables has 2” terms. The enumeration of terms utilizes the base-2 number system because switching algebra is defined against a two- valued domain of {0, 1}. The expression for representing a term’s decimal equivalent for an ordered n-bit sequence (xn_1, $n_2, - - -, 231,170) is n—1 X = x.._,2"-1+ x.._22"-2 + . . - + $121+ 2:020 = Z 1:,2’ (3.1) 120 3.3 Boolean Expressions The full-adder’s carry-out bit switching function described by the truth table in Table 3.3 has 23 terms, numbered from O to 7 using Equation 3.1. The terms which result in switching function’s output of logic 1 are known as minterms. The carry-out bit switching function has four minterms and can be expressed in the following forms: 49 f(a,b,c,-,,) = 011+101+110+111 (3.2) 2 about + abc,n + abfi; + abctn (3-3) 2 mg + ms + m6 + m7 (3'4) 2 Z m(3, 5, 6, 7) (3.5) Each of the forms represent what is known as a Boolean expression. The Boolean expression is an n-variable function of Boolean inputs whose output is Boolean, or f (:13) : {0, 1}" —> {0,1}. Equations 3.2 and 3.3 represent the minterms of the carry- out bit switching function in binary and literal form, respectively. Each switching variable is expressed as un/complemented (1 / 0) for the specified input combination. Equation 3.4 uses a shorthand notation for minterms, m3. The subscript x denotes the associated row of the truth table or binary value of the input combination. Equa- tion 3.5 is the most commonly used form for representing minterm lists, it is a more compact representation of the form in Equation 3.4. The switching function repre- sentations shown in Equations 3.2, 3.4, and 3.5 are known as canonical minterm or sum-of-products (SOP) expressions. After reducing Equation 3.3, using Huntington’s postulates, the resulting Boolean expression is f(a1 b) Gin) : ab + cin(a + b)' (36) The reduced Boolean expression (Equation 3.6) is in SOP form, but it is not a canonical minterm expression because the product terms are not minterms. The canonical minterm expressions, as well as Boolean expressions, are implementation- independent methods for representing the input/output behavior of switching func- tions. Boolean expressions do not consider technology, circuit structure, or imple- 5O mentation of a digital design. The advantage of the canonical minterm expression representation over the truth table representation is size. Canonical minterm ex- pressions are more compact than truth tables. Truth tables represent all 2N input combinations, whereas canonical minterm expression representations list the input combinations for which the switching function’s output is logic 1 only. 3.4 Binary Decision Diagrams Probability-based power analysis tools depend heavily on Binary Decision Diagrams (BDDs) to determine signal activity. Historic uses of BDDs have been in the digital circuit design areas of synthesis, verification, and testing. The BDD is not a new concept. As early as 1959, Lee [40] introduced the concept of Binary Decision Pro- grams and a set of rules to transform these programs into switching circuits. Later, in 1978, Akers [41] revisited the concept of BDDs by using the diagram as a means to define, analyze, and test large digital functions from an implementation-free per- spective. It was in 1986 that Bryant [27] demonstrated the advantages of BDDs as a canonical representation. Bryant demonstrated that BDDs have two very useful properties. First, BDDs are canonical: given two circuits, they are equivalent if their BDDs are identical. Second, BDDs are effective at representing combinatorially large sets, which is useful in FSM equivalence checking and logic minimization. BDDs represent a switching function as a directed acyclic graph (DAG). A graph consists of an interconnection of nodes (vertices) and edges (arcs). There are two node types: decision nodes or terminal nodes. Terminal nodes are characterized by not having outgoing edges which lead to children nodes and contain fixed values which possibly correspond to the output of a function. Decision nodes are characterized by having outgoing edges which lead to other decision nodes and terminal nodes. The decision node is labeled with a variable identifier and has one outgoing edge for each 51 value this variable can assume. Since Boolean decisions are being made, the decision node variable identifiers can only assume the values of O and 1. The terminal node values will be fixed at either 0 or 1, and the possibilities for edges will be either the O-edge or 1-edge. The O- edge will be chosen when the decision node variable assumes the value 0 and the Ledge is chosen when the decision node variable takes on the value 1. Figure 3.1 is a BDD representing the full-adder (FA) carry-out bit switching function described by Equation 3.6 and Table 3.3. Typically in BDD graphs dotted arcs represent the O-edges while the solid arcs represent 1-edges. ii Figure 3.1. BDD of full-adder carry-out bit. The BDD size is directly related to the number of nodes in the BDD’s graph, which is controlled by the number of input variables and their ordering. One such BDD type, the Ordered Binary Decision Diagram (OBDD), addresses the BDD size issue by considering input variable ordering. The ordering of input variables determines the level at which each input will appear in the BDD’s graph. In the OBDD the ordering will remain the same for each path taken from the root (lowest order) node to a terminal node. Different input variable orderings lead to different BDDS, with 52 each potentially having a different size. Consider the switching function given by Equation 3.7. An input variable ordering of a < d < b < c leads to the BDD displayed in Figure 3.2, while an input variable ordering of b < c < a < d leads to the BDD representation displayed in Figure 3.3. f(a, b, c, d) 2 abc + bd + Ed (3.7) Figure 3.2. BDD with ordering 1. Figure 3.3. BDD with ordering 2. Clearly the second ordering provides the more compact BDD representation; it has a smaller node count. Hence, a good input variable ordering will yield a more compact BDD representation with reasonable memory usage [42]. A modification to the OBDD is the Reduced-Ordered BDD (ROBDD). An initial ordering is given in the ROBDD, and the iterative identification and removal of isomorphic subgraphs and redundant nodes takes place [39]. The removal results in a BDD which is minimal for the given input variable ordering and canonical in form. Once such ROBDD implementation was developed by Brace et al. [30]. This ROBDD implementation made improvements in the if-then-else (ITE) operator, hash- ing technique, and memory garbage collection. The results reported that an amortized memory cost of 22 bytes per node was achieved. Additionally, it was reported that the improvements yielded a faster, more memory-efficient ROBDD implementation 53 than the original implementation presented in [27]. Shen et al. [43], proposed a data structure called the Free Boolean Diagram (F BD), which improved the ROBDD representation by trading off canonicity. One distinction between the ROBDD and the FBD is that the FBD allows different input variable orderings along different paths from the root node to a terminal node. Additionally, the nodes in the graph of the FBD may be of type function (XOR or AND nodes), which is further discussed in [43]. It was reported that the FBD implementation resulted in an amortized memory cost of 32 bytes per node and for certain cases the F BD size was significantly reduced [43]. The FBD implementation is an improvement over the ROBDD implementation for certain circuits because its total memory usage for behavioral representation is less. 3.5 Behavioral VHDL Specifications VHDL, short for VHSIC (Very High Speed IC) Hardware Description Language, is a formal notation for hardware description, standardized by the IEEE in 1987, that allows for, among other descriptions, the behavioral-level design of a digital system [44]. The VHDL behavioral specification supports the modeling of digital hardware using sequential statements similar to programming languages such as Ada, C, or Pascal. VHDL, just like the previously mentioned programming languages, contains loop, if-then-else, and assignment constructs. A VHDL model is comprised of entities and architectures. An entity defines the interface between a system and its environment, such as signals that flow into and out of the system or component. The architecture describes the functional nature of the digital system; it defines how the outputs respond to the inputs. A behavioral VHDL specification describing the carry-out bit of a full-adder (FA) is given in Figure 3.4, where the architecture closely resembles the Boolean 54 function defined by Equation 3.6. Behavioral VHDL specifications are high—level implementation-independent representations which describe the functional nature of a digital system without any reference to circuit structure or technology. ENTITY CARRY is port (A, B, Cin : IN BIT; Cout : OUT BIT); end CARRY; ARCHITECTURE Behavioral of CARRY is BEGIN -- Behavioral FA carry-out bit Cout <= ((A and B) or (Gin and (A or B))) after 10ns; END Behavioral; Figure 3.4. Behavioral VHDL model of FA carry-out bit. 55 -'-..‘1 CHAPTER 4 Structural Representations of Switching Functions The previously discussed behavioral representations describe only a Circuit’s function- ality in terms of truth tables, Boolean expressions, or BDDs. When detailed infor- mation, such as timing, speed, or area are needed, these representations are of little use. To effectively describe the input/output relationships of a circuit, a lower level of design abstraction is necessary. The logic— and circuit- design levels of abstraction within the EDA process (Figure 1.1) provide information which enable designers to obtain or calculate critical timing, speed, or area information. The logic- and circuit- design levels of abstraction yield structural representations. Given a logic- or circuit- level description, structural-level representations efl'ectively describe the design’s in- put/output relationships, as well as the internal connectivity of circuit elements and logic gates. This chapter deals with logic-level structural representations, including the modeling of switching functions in terms of schematics, netlists, structural VHDL specifications, and the Connective Binary Decision Diagram (CBDD). 56 4.1 Schematics Schematics or logic diagrams are a visual means of representing the structure of a digital circuit design. Schematics illustrate connective relationships between a circuit’s primary inputs/outputs, internal signals, and logic elements. Schematics display the logic elements and the wires used to interconnect these elements. The most basic logic elements used to perform the operations necessary to implement switching functions are the NOT-, AND-, and OR- gates. The schematics of the basic logic gates are illustrated in Figures 4.1, and they implement the behavior given in the truth tables of Table 3.2. -Dr- ID- (a) NUT (b) AND (c) 0R Figure 4.1. Schematics of basic logic gates. The interconnection and recombination of these basic logic gates support the design of more complex switching functions such as adders, ALUs, muxes, etc. By interconnecting an arrangement of OR- and AND- gates, a structural representation for the carry-out bit of the full-adder can be realized, Figure 4.2. Vendors such as ViewLogic, Xilinx, and Mentor Graphics develop schematic entry software systems which support the design of digital hardware. Schematic entry software systems provide a pre-characterized library of logic elements with the ability to place and wire these components to a design’s connective Specification. A limitation of the schematic design approach is its difficulty supporting low-level and detailed views of large and complex designs. Large and complex designs are limited to system-level or hierarchical views. 57 cina b 1 Z] ‘1— 22 Cout 23 Figure 4.2. Logic diagram for FA carry-out bit. 4.2 Netlists The structure of a circuit can be represented using a netlist. A netlist is a data structure that describes all components connected to each net or to each internally produced signal within a circuit. Industry standard formats exist for describing a netlist enabling the easy transfer of designs between various design tools and vendors. Design tools may use netlist formats which represent circuit structure at various levels of design abstraction. For example, the SPICE tool considers the circuit-level of design abstraction. SPICE netlists model a circuit as an interconnection of analog elements such as capacitors, resistors, and transistors. The Berkeley SIS tool consid- ers the logic-level of design abstraction. The widely used SIS netlist format, known as Berkeley Logic Interchange Format (BLIF), models a digital circuit as an intercon- nection of predefined or user-defined logic elements, which includes but is not limited to, N/AND, N/OR, NOT, or XOR gates. Consider Figure 4.3 which is a gate-level netlist that describes the structure of a full-adder carry-out bit. The gate-level netlist example directly corresponds to the logic diagram shown in Figure 4.2 and the Boolean expression given in Equation 3.6. In general, netlist representations describe the structure and connective relation- ships visualized by schematic entry systems and may serve as their input. Similar 58 .inputs c_in a b .outputs c_out .gate and2 a=a b=b 0=zl .gate or2 a=a b=b 0=z2 .gate and2 a=z2 b=c_in 0=z3 .gate or2 a=zl b=23 0=c_out .end Figure 4.3. Gate-level BLIF netlist for FA carry-out bit. to schematic-based design, netlist-based design is limited by circuit size (number Of nets). As the circuit size grows, the difficulty of debugging and isolating problems increases. Isolating circuit design problems given lower level circuit component gran- ularity is a complex task when visually analyzing the netlist. Tools, such as analyzers or simulators are needed to locate and assess low level problems. 4.3 Structural VHDL Specifications In VHDL, design architectures may be developed using a structural specification. Structural specifications express an architecture as a hierarchical arrangement of in- terconnected components. The interconnected components may be pre—defined li- brary units or user-defined units. Structural-based VHDL design is powerful because it simplifies the development of complex systems. Larger or more complex systems are easier to design because the high-level system architecture can be viewed as an interconnection of less complex black boxes or components, which can be modeled behaviorally or structurally. The interconnection of components is made possible through the use of signals, which are analogous to wires. A structural VHDL specification describing the carry-out bit of a full-adder (PA) is depicted in Figure 4.4 where the architecture closely resembles the logic diagram in Figure 4.2 and the gate-level netlist shown in Figure 4.3. Structural VHDL specifica- ENTITY CARRY is port (A, B, Cin : IN BIT; Cout : OUT BIT); END CARRY; ARCHITECTURE Structural of CARRY is -- declare internal components COMPONENT or_gate port (X, Y : IN BIT; Z : OUT BIT); END COMPONENT; ' COMPONENT and-gate port (X, Y : IN BIT; Z : OUT BIT); END COMPONENT; signal 21, 22, 23 : BIT; BEGIN -- Structural FA carry-out bit -- instantiate declared components to form structure AND1 : and_gate port map (x=>a, Y=>b, Z=>Zl); 0R1 : or_gate port map (X=>a, Y=>b, Z=>Z2); AND2 : and_gate port map (X=>22, Y=>Cin, Z=>23); 0R2 : or_gate port map (X=>21, Y=>23, Z=>Cout); END Structural; Figure 4.4. Structural VHDL model of FA carry-out bit. tions are very useful when a design is decomposed into a set of components allowing a system to be viewed as an interconnection of library or user-defined black boxes. The structural VHDL design approach supports the interchanging of various off-the—shelf cores and other pre—defined components to help a designer meet system requirements. 4.4 Connective Binary Decision Diagrams The Connective Binary Decision Diagram (CBDD) is a variant of the standard BDD. The CBDD is a major contribution of this research. It is entirely new and has some advantages over the traditional BDD implementations. CBDDS model the structure 60 present in a digital Circuit’s netlist description as well as the Circuit’s connective relationships. The combination of the CBDD’S graph along with a set of graph traversal algorithms provide the necessary support to construct compact structural representations which maintain circuit behavior. 4.4.1 Motivation The traditional BDD implementation is attractive because it is a canonical represen- tation. The canonicity feature is very useful for circuit verification applications. But, the advantages of the BDD quickly diminish when the circuit size becomes large. Tra- ditional BDDs may result in multiple subgraphs, one for each internal and primary output. Efficient manipulation of conventional BDDs is limited to modest-sized cir- cuits. As the number of circuit inputs and logic elements increase, conventional BDD graphs may experience exponential (z 2”) growth. Exponentially-sized graphs result in CPU-intensive applications and may exhaust the available memory of a typical workstation. Given the rapid advancement in VLSI design, the efficient representa— tion of larger and more complex circuits by probabilistic power anaylsis tools is a crucial requirement. To meet this challenge a new decision diagram, whose size does not grow exponentially, is needed. 4.4.2 Overview CBDD’s, unlike conventional BDD implementations, maintain a Circuit’s structural input/output relationships and internal connectivity. The CBDD results in one di- rected acyclic graph which completely represents the structure of an entire circuit. The CBDD’s graph grows linearly. Its size is proportional to the number of logic el- ements in the circuit description. As a result, for most circuits, CBDDs yield a more compact graph representation when compared to conventional BDD implementations. 61 4.4.3 Minimized Scalable BDDS A significant component used to construct the CBDD’S graph is the minimized- scalable BDD (MSBDD). MSBDDs are subgraphs which represent the behavior of a Circuit’s structural building blocks such as gates, muxes, adders, etc. The MSBDD concept is a contribution made by this research effort. Fundamentally, MSBDDs are Binary Decision Diagrams as discussed in [27], but are minimal in the number of nodes and edges used to construct the decision graph. (a) f(a.h) = NAND(a.b) (b) f(mb) = NOR(a.b) (C) HELD) - XOR(a.b) (d) f(a.h)=AND(a.b) (c) f(a.h)=0R(a.b) (f) f(a)==N()'l‘(a) Figure 4.5. Minimized-scalable BDDS. A revealing feature concerning the MSBDD is its ability to represent the behavior of a variable-input logic block with minimal size (node count). The MSBDD is de- signed to have a small number of decision nodes and only two terminal nodes for each logic block representation. The minimal size is achieved by utilizing unique MSBDD Generation Routines which recognize and exploit the output drive of the logic block being represented by the MSBDD. The output drive is a simplified set of decision 62 node interconnections which serves to minimized the logic block’s output path depth. For example, an AND gate is O-driven. The presence of a logic 0 at any AND input, regardless of fanin size, results in a logic 0 gate output. Likewise, a logic 1 at any OR gate input results in a logic 1 output. Therefore, OR gates are l-driven. The output drive feature considered by the MSBDD supports the construction of minimized de- cision diagrams for certain logic elements such as NOT, N AN D, AND, NOR, and OR gates, Figure 4.5. It is accomplished by increasing the number of direct connections or edges between decision nodes and terminal nodes within the MSBDD’s graph with the goal of minimizing the depth. 4.4.3.1 MSBDD Generation Routines The MSBDD generation routines are unique for each represented logic element. The main goal of each MSBDD generation routine is to produce an interconnection of nodes and edges which yield a decision diagram of limited size and depth. The CBDD implementation utilizes MSBDD generation routines which mimic the behavior of basic logic elements such as NOT, NAND, AND, NOR, and OR gates. Additionally, the CBDD implementation supports the inclusion of more complex logic structures. This is easily accomplished by adding a routine that generates an interconnection of nodes and edges which mimic the intended behavior. The Mk_NAND procedure illustrated in Algorithm 1 generates an MSBDD graph that represents the behavior of an n—input N AN D gate. This procedure interconnects the MSBDD graph with a CBDD graph in a manner that mimics the structure found in the input specification. The procedure utilizes three input parameters: a CBDD, an argument list, and fanin size. The CBDD’s graph models the functionality of the network described by an input netlist. The argument list contains the fanin names, and fanin size is the number of inputs to the NAND gate. Lines 2-4 of the Mk_NAND procedure build an array of decision nodes for each 63 Procedure: Mk_NAND input : G - CBDD of network input : n - Fanin size input : argv - Argument names output : M’ - MSBDD of n-NAN D logic gate begin for z' E 0..n do vlist[i] (— Mk_CBDD_Node_Set(G, argu[t]) end for z' 6 0..n do Add_Edge(G', ultst[z’], ZEROJERMINAL, ZERO_EDGE) end for 2' E 0..n — 1 do Add_Edge(G, vlz'st(i), ulz'st[z' + 1], ONE_EDG'E) end Add_Edge(G,vlist[n-1], 0N E.TERM I N AL, 0N E.EDGE) Add_Edge(G, ZEROJERMINAL, OUTJVODE, VALUE_EDGE) Add_Edge(G, ONEJERMINAL, OUTJVODE, VALUE_EDGE) 14 M’ <— Connect_Nodes(G', n, ulz'st, OUTPUTNODE) 15 end cmqaahuuu HHHH “ND-‘9 Algorithm 1: Mk_NAND_msbdd argument. Lines 5-7 generate O-edge connections between the decision nodes and a zero-terminal node. Lines 8-10 perform l-edge connections between the decision nodes. Line 11 connects the last decision node to a one-terminal node. Using a value-edge, lines 12 and 13 connect the zero- and one— terminals to the output node. Line 14 connects the newly created output node to the CBDD’S graph and returns an MSBDD reference. The Mk_CBDD_Node_Set, Add_Edge, and Connect.Nodes operations invoked by the Mk-NAND procedure, run in 0(1) time. These operations are invoked a maximum of 3n + 4 times, yielding a time complexity of 0(3n + 4), where n is the number of inputs. Since a NAND gate is being modeled, n is usually small, resulting in n + 3 nodes and 272 + 2 edges. A similar routine exists for the remaining basic logic gates. 64 The CBDD package supports logic elements other than gates. The behavior of higher level logic elements such as muses, adders and encoders can be modeled by the construction of unique MSBDD structures. 4.4.4 CBDD Definitions An advantage of the CBDD is its ability to maintain the Circuit’s structural in- put/output relationships and internal connectivity. The definition of the CBDD is based on the definitions of a DAG and conventional BDD. Structural and connective relationships are achieved by altering the conventional BDD’s definition to support additional node (vertex) and edge (arc) types and properties. The CBDD’s formal definition is given and described by Definition 4.1 and Figure 4.6. Definition 4.1 A Connective Binary Decision Diagram (CBDD) is a directed acyclic graph which is composed of an MSBDD set M, vertex set V and edge set E. There are differences between the conventional BDD implementation and the CBDD. First, the conventional BDD graph contains only two node types, the decision and terminal node types. The CBDD’s graph supports four node types, with each node serving a unique purpose. Second, the CBDD’S graph supports an additional edge, the value-edge. Third, the CBDD’s graph is an interconnection of subgraphs, known as Minimized-Scalable BDDS (MSBDDs). CBDDS utilize the 0- and 1- edges in the same manner as a conventional BDD. The traversal of the 0— and 1- edges indicates that the predecessor decision node variable is equal to 0 or 1, respectively. The value-edge is new to the CBDD’s design; value-edges are used to model the propagation of internally produced signals. The CBDD’s graph contains Output, Input, Internal and Terminal node types. The concept of an Output node is new; it represents or contains the value of a Circuit’s primary output. Output nodes are preceded by fixed—valued nodes or Terminal nodes 65 1. Vv E V may be of type: 1.1. Input 1.1.1. followed by children via 0- and 1- edges 1.1.2. assumes variable primary input value 1.1.3. value(v) E {0, 1} 1.2. Internal 1.2.1. followed by children via 0—, 1- and value- edges 1.2.2. assumes variable internal output value 1.2.3. value(v) E {0, 1} 1.3. Terminal 1.3.1. value termination for internal and output nodes 1.3.2. proceeded by children via value-edges 1.3.3. value is fixed, value(v) E {0, 1} 1.4. Output 1.4.1. absolute termination, not followed by children 1.4.2. assumes variable primary output value 1.4.3. value is fixed, value(v) 6 {0,1} 2. Ve E E may be of type: 2.1. O-edge (l-edge) 2.1.1. traversed when value(v) = 0(1) 2.1.2. outgoing to Input, Internal, and Terminal vertices 2.2. value-edge 2.2.1. propagates value(v) to Internal or Output vertices 2.2.2. incoming only to terminal vertices 3. MSBDD set M elements: 3.1. Vm E M, vertex(m) 6 V 3.2. MSBDD terminals are connected to Internal or Output nodes via value-edges 3.3. V MSBDDs represent a function on vertex v, f” 3.4. fv = 3:7 - flow”) + :12,- - fhighm, where a); is a decision variable Figure 4.6. Definition of CBDD. connected by value-edges, and are not succeeded by descendent nodes. The Input node is essentially a decision node, as in the conventional BDD; it represents the state of an input variable and may be followed by descendent nodes connected by O-edges and l-edges. The Internal node is a special decision node whose binary value results from the evaluation of a logic element represented by an MSBDD. Internal nodes are preceded by fixed-valued nodes or Terminals via value-edges and are succeeded by descendent Input nodes. The Internal node’s value is propagated to the Input nodes of other logic element MSBDD representations by the use of value-edges. The 66 Terminal nodes used in the CBDD’S graph differ from those used by the conventional BDD. CBDD Terminal nodes carry a fixed binary value but have descendents; they are succeeded by Internal or Output nodes and connected by value-edges. 4.4.5 Analysis of Connective-BDDS Circuit size is a limiting factor for applications utilizing traditional BDD implemen- tations. For large circuits BDD graphs may become exponentially-sized, resulting in CPU-intensive applications. The CBDD is an alternative graph structure which is useful for certain applications. This section provides background information con- cerning the CBDD’s linear growth and functional rationalization. An explanation detailing the CBDD’s improved circuit representation when compared to the tradi- tional BDD is given. Theorem 4.1 The CBDD’s graph grows linearly with respect to the number of logic elements within the circuit. Proof: Let M be the number of logic elements present in a circuit. Let n be the fanin size of a logic element. MSBDD graphs are constructed to grow linearly, with the largest MSBDD graph (n-XOR gate) having 2n + 3 nodes. The CBDD’s graph is an interconnection of M MSBDD graphs, resulting in linear growth and a maximum size of M x (2n + 3) nodes. Theorem 4.2 The minimized-scalable BDD (MSBDD) maintains the behavior of an individual logic element. Proof: The MSBDD’s graph contains a single O-Terminal and a single 1- Terminal. If a post-order traversal from the 1-Terminal to the root node of the MSBDD’s graph is performed then a set of disjoint decision paths is generated. The 67 AND-product of the decision nodes along the disjoint paths are minterms in the on- set of the logic element’s Boolean switching function. The OR—sum of these disjoint minterms equals the sum-of-products that describes the switching behavior of the given logic element. Theorem 4.3 The CBDD, an interconnection of MSBDDs, maintains the behavior of a circuit. Proof: From Theorem 4.2, it was established that MSBDDS maintain the be- havior of individual logic elements. The post-order traversal of a MSBDD’S graph results in a Boolean switching function whose switching variables may be primary inputs or internally produced signals. The Circuit’s behavior, a collection of 2-level Boolean switching functions for all circuit nodes, can be produced by performing re- cursive replacement of non-primary input variables by their corresponding MSBDD switching function until all non-primary input variables have been replaced by pri- mary input variables. Theorem 4.4 The minimal MSBDD representation for n-input N/AND and N/OR logic functions contain n + 3 nodes and 2n + 2 edges at most. Proof: The output of all n-input N/AND and N /OT logic functions is directly controlled by the logic function’s output drive. The output drive of a logic function is a single input which directly controls the function’s output. The n-input N / AND logic functions are O-driven because a single logic zero applied to any of the n inputs leads to immediate output values of 1/0. Likewise, the n-input N/ OR logic functions are l-driven because a single logic one applied to any of the n inputs leads to immediate output values of 0/1. Assuming that a MSBDD graph representation contains a 68 maximum of two terminal vertices, the disjoint terms of the MSBDD’S logic function are constructed from the AND-product of decision nodes within the MSBDD’s graph. The terms have the following form: If]? - Ill—01 :r,, where x, are decision nodes given that z: 0 _<_ N S n — 1, and f]? is an output driven decision node that is directly connected to a terminal vertex. Let L equal the number of literals for each disjoint term of the logic function represented by the MSBDD, such that 1 g L _<_ n. The number of edges necessary to construct a disjoint term and connect the term’s output driving decision node to one of the terminal vertices within the MSBDD’s graph is 1 + (L — 1) edges. An additional edge is needed to connect the term’s non-output driving decision node combinations to a second terminal vertex. Additionally, two value-edges are used to connect the terminal vertices to the MSBDD output node. Given that K is the number of disjoint terms within the logic function, the total number of edges to minimally represent the MSBDD of a N/ AND or N / OR logic function is given by EdgeCount = 2+f;1(1+(L,—1)) for lgLign nggn = 2+I:Z_—:01L,- for L,-=i+1 = 2+2} 2+2n 22 The minimal number of MSBDD nodes required to represent a N / AN D or N / OR logic function is 72. plus the number of terminal vertices (usually 2), plus a single output node, for a minimum total of n + 3 nodes. Assertion 4.1 For large circuits CBDDs generally result in smaller and more com- pact representations when compared to traditional BDDs. 69 Explanation: Applications using traditional BDDS try to make logical sense of the behavior obtained from a given input specification for purpose the of producing an optimized set of Boolean equations and limiting the BDD’s size (node count). The BDD’s size is controlled by the number of circuit inputs along with the input variable ordering. Circuits containing many internal logic elements and a large number of inputs (N), may result in BDD graphs comprised of 2N nodes along with N! input variable orderings. For many applications, the time necessary to explore all N! input variable orderings is not available, so a non-optimal input variable ordering is generally chosen. Benchmark comparisons demonstrate that non-optimal input variable ordering results in a BDD graph which is larger than a CBDD graph for the same circuit. The CBDD does not analyze behavior or consider input variable ordering. For the logic elements encountered during the input of the design specification, the CBDD application interconnects the MSBDD subgraphs of the corresponding logic elements. According to Theorems 4.2 and 4.4, MSBDD subgraphs are unique and minimal in size for the given logic elements. From Theorem 4.3, the interconnection of MSBDD subgraphs results in a compact CBDD graph representing a Circuit’s behavior and connective relationships. For the reasons stated, the CBDD’s graph maintains be- havior and for large circuits it is smaller than the graph of traditional BDDS. 4.4.6 Implementation The application program that generates a CBDD reads a BLIF-formatted netlist file as input. The basic logic gates such as N/AND, N /OR, XOR, and NOT, when encountered during the input phase, are converted to minimized-scalable BDDS (MS- BDDS). These MSBDDs represent the most reduced BDDS, in terms of total node count, for the given functional unit (logic gate) and size. 70 Figure 4.5 displays a small selection of the MSBDDs. Once the MSBDDs are generated, their Internal node outputs are interconnected with the Input nodes of other MSBDDs according to the structure present in the netlist. The full-adder carry-out bit, structurally represented in Figures 4.2, 4.3, and 4.4, is represented by the CBDD in Figure 4.7. OR-Z 2 zl O :; C0015: Figure 4.7. CBDD of FA carry-out bit. The CBDD of Figure 4.7 models the structure of the full-adder carry-out bit. The CBDD of the carry-out bit is an interconnection of MSBDDs which model the behavior and structural connectivity of two 2-input AND gates along with two 2- 71 input OR gates, in accordance with the logic diagram of Figure 4.2. The MSBDDS for all logic gates are enclosed by labeled boxes featuring the gate name and size. The Circuit’s primary inputs, located outside of the MSBDD enclosures, are represented by boxes containing the input name. Primary inputs are connected to the MSBDD graphs by value-edges, depicted by heavy solid lines. Within the MSBDD enclosures O- and 1- edges are represented by dotted lines and solid lines, respectively. Input nodes are represented by circles enclosing the input name and interconnected by 0- and 1- edges. Terminal nodes are represented by boxes containing a 0 or 1 value. Internal nodes are represented by circles containing an internally produced signal name; they are preceded by Terminal nodes connected by value-edges. Primary outputs are represented by double-dotted circles enclosing the output name; they are preceded by Terminal nodes. 4.4.7 Advantages and Disadvantages In spite of the differences between the conventional BDD implementation and the CBDD, there are some very positive benefits of using the CBDD. First, the CBDD is not affected by input variable ordering because the internal logic elements of a digital circuit can be mapped to pre—defined MSBDDs, which are already minimal in size. Second, CBDD size (number of nodes) is directly related to the number of logic elements present in the circuit, so exponential growth will not occur. Third, the CBDD’s graph represents the entire circuit design, whereas the traditional BDD utilizes multiple graphs, one for each primary output and internally produced signal. Fourth, the CBDD’s size grows linearly because its graph utilizes Internal decision nodes representing internally produced signals, instead of expressing the specified node in terms of primary inputs which could result in exponential growth. The use of Internal nodes increases the sparseness of the CBDD’S graph and provides a more com act re resentation of the Circuit’s behavior and structure. P 72 The drawbacks of the CBDD include loss of canonicity. However, if needed, equiv- alence of two circuits can be determined by performing functional simulations using their CBDDs, followed by a comparison of their results. Additionally, unlike tradi- tional BDDS, the CBDD’S graph does not have a single path from the root node to a terminal node. This is not good because several paths must be considered when determining a primary output’s value. However, in some cases CBDDs benefit from this drawback; they yield more compact graphs that are represented by far fewer nodes and edges than conventional CBDDs. Thus, traversing additional paths in the CBDD is not costly because the maximum CBDD graph depth is less than the BDD graph depth. Lastly, in comparison to traditional BDD implementations, CBDDs generate a more compact representation for large circuits, but may produce larger than normal or poor representations for small circuits. 4.4.8 Results The ISCAS-85 benchmark circuits were chosen for an experiment to compare the CBDD size to the size of traditional BDD implementations. The results after applying the CBDD implementation to the benchmark circuits are summarized in Table 4.1. Additionally, as a means of comparison, Table 4.1 provides the results of imple- mentations used by Brace and Shen in [30, 43] for the same benchmarks. The main entity used in comparing the BDD implementations was the BDD node count. The BDD node count is used as a measure of performance for both CPU and memory utilization. The term, Unable, used in Table 4.1, indicates that the corresponding decision diagram package was unable to provide a measurement for the given circuit due to memory limitations. In comparison to the BDD implementations by Brace and Shen, for almost every circuit of the benchmark suite with the exception of Shen’s FBD node measurement for circuit c7552, the CBDD resulted in a significant node savings. Given the size of other circuits within the benchmark suite and their cor- 73 Circuit #Inputs #Outputs ROBDD[30] FBD[43] CBDD #Nodes #Nodes #Nodes c432 36 7 30200 31195 2017 c499 41 32 49786 33214 2899 c880 60 26 7655 7761 4276 c1355 41 32 39858 33214 6411 c1908 33 25 12463 12734 9387 c2670 233 140 Unable 57767 12886 c3540 50 22 208947 88652 17874 c5315 178 123 32193 26129 26078 c6288 32 32 Unable 115607 29361 c7552 207 108 Unable 19187 37774 Table 4.1. Benchmark Results. responding F BD size, it is very likely that Shen’s report of the FBD node count for circuit c7 552 is an order of magnitude smaller than it should be. The savings in node count exhibited by the CBDD implementation is due to usage of Internal nodes which represent the value of internally produced signals. The Internal nodes along with value-edges propagate the binary output of internal logic elements to subsequent MSBDD inputs nodes at deeper levels within the circuit structure. The small size of the CBDD is due to the fact that CBDDs grow with respect to the number of functional units or logic elements present in the Circuit’s structure, not the number of inputs or input variable ordering. 4.4.9 Summary This section has described the Connective BDD (CBDD), defined as a DAG inter- connection of Minimized-Scalable BDDS. CBDDs are very economical in representing large circuits and maintain the Circuit’s structural and connective relationships. CB- DDS represent circuits with far fewer nodes than previous BDD implementations. CBDDS have a few drawbacks including loss of canonicity, multiple paths from the 74 root to terminal nodes, and poor representation of small circuits. The main advantage of the CBDD is that it does not suffer from exponential growth when the numbers of inputs and interconnections grow. 75 CHAPTER 5 Behavioral-Level Switching Activity Estimation The accurate estimation of switching activity is an important and necessary procedure for probabilistic power analysis tools. Behavioral-level power analysis tools have the disadvantage of not having detailed information concerning structure and technology. This disadvantage is combined with the challenging task of computing dynamic power and switching activity with improved accuracy from an implementation-independent perspective. There are very few design tools which provide power estimation for behavioral VHDL specifications. The average minimum error found in their power estimates is around 10%. Switching activity is a critical parameter needed to compute dynamic power dissipation (Equation 1.2) in CMOS circuits. A large portion of the error experienced by power analysis tools can be easily attributed to inaccurate switching activity (a) estimates due to factors such as reconvergent fanout and input/internal correlations. This chapter presents a new technique which accurately performs high—level ac- tivity and power analysis. This approach operates at the behavioral level of design abstraction and uses behavioral VHDL specifications as input. The developed tech- 76 nique, and associated algorithms, have been implemented in a program called the Behavioral-Level Activity and Power Estimator (BLAPE). The details concerning the design and implementation are discussed, along with benchmark comparisons to demonstrate the effectiveness and capability of the new approach. 5.1 Methodology Overview The focus of this section is to provide an overview of the assumptions, system archi- tecture, and components used to implement the Behavioral-Level Activity and Power Estimator (BLAPE) system. A description of all major tasks which contribute to the process of accurately estimating switching activity is presented as well. 5 . 1 . 1 Methodology Assumptions 0 Circuit designs must be synthesizeable. Input to the BLAPE system is in the form of behavioral VHDL specifications. The VHDL language supports many constructs such as for- and while- loops which are nice for simulation but may not result in a synthesizeable design. The BLAPE system requires implementation-free and platform-unspecific design de- scriptions which represent real circuits that are mappable to some technology. This assumption supports the easy transfer of various benchmark VHDL spec- ifications between BLAPE and other high-level power analysis tools. 0 Circuit types are combinational. The circuits targeted for analysis must be combinational, where all circuit out- puts are functions of their primary inputs. This assumption simplifies the con- version of the input behavioral VHDL specification to a set of Boolean equations. 77 0 Zero delay model. All gate or logic element operation times are zero. Upon a change to the in- put stream, it is assumed that steady-state logic results appear at their gate outputs instantaneously. This assumption simplifies the computation of signal probability, but does not support the computation of glitch activity. 0 Uncorrelated primary inputs. Primary inputs are assumed to be spatially and temporally independent. This assumption allows the signal probability computation for each node to be a sum of disjoint input / internal signal probability products. 5.1.2 Methodology Outline The BLAPE system is capable of operating on two input sources. It supports the use of behavioral / structural VHDL specifications as high-level input, along with a BLIF- formatted gate-level netlist as low-level input. The output generated by BLAPE consists of the switching activity (a) and node capacitance (CL) for each net, along with a power estimate for the entire circuit. The system-level flow of the BLAPE implementation is given in Figure 5.1. 5.2 Methodology Task Decomposition The Behavioral-Level and Activity Power Estimator system is composed of many in- dividual tasks and sub-tasks. In this section a task decomposition of the methodology together with a brief description of the major tasks is given. 0 Task 1 - Transform behavioral VHDL specification to Boolean equations. This task performs syntax checking and determines if the VHDL input specifi- cation is synthesizeable. A data structure containing primary inputs, primary 78 VHDL Behavioral Specification Transform VHDL to _Beolesnfiquefiens , [ Decompose Boolean? [ ISCAS & other l Equations J benchmark circuns J BLIF Netlist BLIF Netlist Build CBDD Representation Compute Signal Probabilities Switching Activity _. ,Pi‘ififis ___ Figure 5.1. BLAPE methodology diagram. outputs, and a list of Boolean equations in sum-of—products form is generated when the VHDL input specification is synthesizeable. Task 2 - Decompose Boolean equations. This task applies structure to the Boolean equations generated in Task 1 in two possible forms. First, implicit structure, which directly models a Boolean equation in sum-of-products form using the basic logic operations (NOT, AND, OR) only is generated. Second, mapped structure, where the Boolean equations are transformed to a structural specification consisting of gates or logic elements defined by a user-specified library is produced. Task 3 - Generate Connective BDD (CBDD). This task converts the structural specification generated in Task 2 to a CBDD. 79 The CBDD maintains the design’s original behavior and maps the input Circuit’s individual logic elements to nodes within the CBDD’s graph. 0 Task 4 - Determine signal probabilities. This task traverses the CBDD generated in Task 3 in an effort to collapse a node’s Boolean expression, represented by a multi-level Boolean function, to a Boolean expression with a user-specified depth, not less than 2. The final Boolean expression contains disjoint product terms, whose indices are primary input and internal signal probabilities. The result is a node signal probability that is almost independent or free of reconvergent fanout. 0 Task 5 - Estimate switching activities. This task computes the switching activity for a specified node given the signal probability provided in Task 4. 0 Task 6 - Compute dynamic power. This task determines the dynamic power of the entire circuit, utilizing the switching activity estimate (a) computed in Task 5, a computed estimate for node capacitance (CL), and default values for supply voltage (Vdd) , and clock frequency (fem)- 5.2.1 Transformation of VHDL into Boolean Equations The initial stage of the BLAPE methodology requires the use of a VHDL specification as input. The VHDL modeling style is assumed to be behavioral, although a structural VHDL specification is permitted. The focus of this transformation is to provide a systematic and efficient mechanism for transforming a behavioral VHDL specification to a set of Boolean equations. 80 After careful consideration, it was decided that a VHDL compiler was the best so- lution to this problem. Like Ada, C, and Pascal programming language compilers, the VHDL compiler would be responsible for syntax analysis, bounds checking, and other normal compiler tasks. The Altera MAX+PLUS II compiler (Version 9.01) was selected. Useful and necessary features supported by the selected compiler include de- termination of design synthesizeability and generation of a data structure containing the Circuit’s primary inputs and outputs, combined with a list of Boolean equations describing the input/output relationships of all primary outputs and internal nets. Similar to the compilers of other programming languages, the selected VHDL com- piler supports directives and optimizations to control the outcome of the synthesized design. One such optimization is logic reduction, whereby the desired compact logic representation is generated by means of technology mapping and by the removal of redundant or unused logic. The invocation of this logic reduction option will consis- tently generate the same reduced set of Boolean equations for various behaviorally equivalent designs. For the purpose of this research, the logic reduction feature was disabled. The rationale for making this decision was to preserve the design’s in- herent logical structure, with the understanding that structural modifications to the logic represented in the VHDL specification directly affect the switching activity and the power dissipated by the design. In making this decision, the BLAPE system yields unique power and activity estimates for various designs that are behaviorally equivalent. The goal of disabling the logic minimization option is accomplished by setting the synthesis style to WYSIWYG and setting the MINIMIZA TION primitive to OFF. Figures 5.2 and 5.3 depict a behavioral VHDL specification for a full-adder (FA) and the corresponding Boolean expressions, generated by the Altera compiler. The Altera compiler generates a report file containing the Boolean equations shown in Figure 5.3. The Altera Boolean equation notation makes uses of the basic (AND / OR) 81 logic operations only. Each equation is represented by a list of product terms, where an & separates the literals of each term and each term appears on a single line separated by a # sign. All complemented literals are preceded by the ! character. These Boolean equations represent the behavior of the VHDL input specification and are eventually used to compute switching activity in later stages of the BLAPE implementation. Following the compilation of the VHDL specification, the generated Boolean equations are extracted from a report file. Next, the BLAPE implementation converts the Boolean equations to an implicit or mapped structural representation. The result is a gate-level netlist, discussed in the next section. 82 ENTITY FA is port (x, y, c_in : IN BIT; sum, c_out : OUT BIT); END FA; ARCHITECTURE behav of PA is BEGIN -- behav sum <= (x xor y xor c_in); c_out <= (c_in and (x or y)) or (x and y); END behav; Figure 5.2. Behavioral VHDL specification for full-adder. c_in : INPUT; x : INPUT; y : INPUT; c_out = _LC2_A1; sum = _LC1_A1; _LC1_A1 = LCELLC _EQOOl); _EQOOI c_in & x t y c_in & !x t !y !c_in & !x a y !c_in & x a !y; ##3tll _LC2_A1 _EQOO2 LCELL( _EQOO2); x a y c_in & y c_in & x; #13: Figure 5.3. F ull-adder Boolean equations. 83 5.2.2 Decomposition of Boolean Equations Applying structure to a design which originates as an algorithm or set of Boolean expressions binds the behavioral representation to a selected set and arrangement of logic elements. Decomposing the circuit specification to various structures enables a designer to select a design which best satisfies the system-level power requirements. The decomposition stage of the BLAPE implementation results in an implicit or mapped structural representation, which is then transformed to a BLIF-formatted gate-level netlist. The BLIF format was selected because of its simplicity and com- patability with the Berkeley SIS design tool. 5.2.2.1 Implicit Structure Representation The implicit structure representation mimics the two-level sum-of-products form of the Boolean expressions. This representation places connective restrictions on the circuit behavior, using the basic logic gates (NOT, AND, OR) only. Given the full- adder Boolean equations (Figure 5.3), the resulting implicit structure netlist and corresponding logic diagram are illustrated in Figures 5.4 and 5.5. In many instances, the resulting implicit structure representation does not yield the Optimal structure because of unrealistic, although logically correct, logic elements which exceed input number restrictions. For this reason, the implicit structure should be used as an initial or starting structural design solution, which can be improved iteratively by the use of technology mapping procedures as described in the next section [45]. 84 .inputs c_in x y .outputs c_out sum .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .end not a=c_in 0=NOTc_in not a=y O=NOTy not a=x 0=NOTx bufl a=_LC2_Bl 0=c_out buf1 a=_LC1_B1 0=sum and3 =NOTc_in b=x c=NOTy 0=t0 and3 a=NOTc_in b=NOTx c=y 0=t1 and3 a=c_in b=x c=y 0=t2 and3 a=c_in b=NOTx c=NOTy 0=t3 or4 a=t0 b=t1 c=t2 d=t3 0=_LC1_B1 and2 a=c_in b=x 0=t4 and2 a=c_in b=y 0=t5 and2 a=x b=y O=t6 or3 a=t4 b=t5 c=t6 0=_LC2_Bl Figure 5.4. Netlist for implicit full-adder structure. LC2_Bl c out Figure 5.5. Logic diagram for implicit full-adder structure. 5.2.2.2 Mapped Structure Representation The mapped structure representation is a structural circuit specification composed of user-specified gates or logic elements. This specification is generated by an automated process, known as technology mapping. The technology mapping process transforms an optimized set of technology independent logic equations into a feasible circuit which satisfies certain area, delay, and power constraints. The role of technology mapping is neither to radically change the Circuit’s structure nor to reduce the Circuit’s depth along the critical path, but to perform the best gate selection for implementing the logic equations, given the system-level area, delay, and power criteria [39]. For the purposes of this research structural mapping is used as a means to gen- erate a realistic circuit specification. As illustrated by Figure 1.8, various circuit structures or decompositions yield different switching activities, and hence different power dissipations. This stage of the BLAPE process supports the computation of improved switching activity by producing a gate-level structural specification which closely resembles the desired circuit. As a result, the activity and power estimates are more meaningful and consistent with eventual circuit- or transistor- level switching activity and power approximations. The BLAPE implementation utilizes the Berkeley SIS tool’s technology mapping procedure. During the mapping stage of the BLAPE process, an implicit structural circuit specification is used as input to the SIS tool, along with a user-specified gate library. The map operator is applied, resulting in a behaviorally-equivalent circuit that can be optimized for area and delay. For example, given a gate library composed of NOT- and NAND- gates only, a technology mapping of the full-adder implicit structural representation, shown in Figure 5.4, results in the mapped structural rep- resentation displayed in Figure 5.6. Additionally, the corresponding logic diagram for the full-adder’s mapped structure representation is given in Figure 5.7. 86 .inputs c_in x y .outputs c_out sum .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .end nand2 a=c_in b=y 0=[236] nand2 a=x b=y 0=[238] nand2 a=c_in b=x 0=[234] nand3 a=[236] b=[238] c=[234] 0=c_out not a=y 0=[182] not a=c_in 0=[181] nand3 a=[182] b=x c=[181] 0=[226] not a=x 0=[183] nand3 a=y b=[181] c-[183] O=[228] nand3 a=c_in b=x c-y 0-[230] nand3 a=[182] b=c_in c=[183] 0=[232] nand4 a=[226] b=[228] c=[230] d=[232] 0=sum Figure 5.6. Netlist for mapped full-adder structure. Figure 5.7. Logic diagram for mapped full-adder structure. 87 5.2.3 Building the CBDD Representation 5.2.3.1 CBDD Selection Rationale The Connective Binary Decision Diagram (CBDD) is a directed acyclic graph (DAG) which serves as a graph-based behavioral representation capable of modeling a cir- cuit’s input / output relationships and internal connectivity. The CBDD was selected due to its linear growth and preservation of circuit structure. Because it uses the CBDD, the BLAPE implementation is capable of processing larger circuits in com- parison to the Berkeley SIS tool. Many CAD applications that use traditional BDDS, specifically probabilistic power analysis tools, are limited to modest-sized circuits. CAD applications generally encompass a class of problems, known as nondeter— ministic polynomial or NP -complete problems [39]. For this class of problems there are no known feasible or polynomial-time solutions. The solutions to such problems, which use conventional BDDS, are computationally expensive and easily capable of ex- hausting a typical workstation’s memory system. The time complexity of algorithms which use decision diagrams is O (f (|V| , IE |)), where f is an operation performed on a graph such that W] and IE | represent the size of the graph’s vertex and edge sets, respectively. A reduction in both WI and |E| is appealing to CAD applications as it reduces time complexity. An additional reason for selecting the CBDD for this research is due to its improved representation of large circuits. When compared to other conventional BDD implementations, using ISCAS-85 benchmark circuits, the CBDD’S graphs were significantly smaller. An average size reduction of one order of magnitude was achieved [46]. 5.2.3.2 Generation of CBDD Graph The CBDD’s graph is produced during the structural specification input phase. The structural specification is a BLIF-formatted netlist which may be generated by 88 BLAPE or selected from the ISCAS—85 or other benchmark suites. Each logic gate, along with its inputs and output is transformed into a minimized-scalable BDD (MS- BDD). For the basic logic gates (N/AND, N/OR, XOR, NOT), there are routines (Algorithm 1) which construct the behavior of the specified gate. These routines build MSBDDs with the least number of nodes and edges for the specified fanin size. The node and edge data structures are placed into a lookup table of lists for the. purposes of maintaining connectivity. During the input phase, an adjacency list is built. The adjacency list contains connection (wire) information, which describes node to node reachability and is used for network levelization and other tasks. The CBDD is an edge-driven DAG rather than vertex or node driven. Traversals and graph inquiries rely on routines which search for and manipulate sets of edges. The precise modeling of signal flow is made possible by the CBDD’s value-edge. Additionally, the value-edge is responsible for the linear growth of the CBDD’s graph. The CBDD representations corresponding to the implicit and mapped full-adder structural representations shown in Figures 5.4 and 5.6 are illustrated in Figures 5.8 and 5.9, respectively. The mapped structural CBDD has fewer nodes and edges. It uses an interconnection of NOT and N AND gates only. Later sections of this chapter will demonstrate that these two full-adder structures representations yield different switching activities and different power dissipations. 89 M )T AND! “’9 l' , n I :I -' f @ ANIHJ Figure 5.8. Implicit structural CBDD full-adder. 90 INOT @ [182] x — NAND. II n ma . NAND-3 [l83] NAND} NAND-l [l82] [IRS] [232] [234] NAND! u X [233] Figure 5.9. Mapped structural CBDD full-adder. 91 5.3 Signal Probability Computation This section presents the most important component of the BLAPE implementa- tion, the Signal Probability Computation Algorithm (SPCA). An overview concerning the assumptions, origin and details of the SPCA is given. The SPCA was inspired by concepts associated with the fault analysis and digital circuit testability fields. The SPCA is an improvement over previously mentioned probabilistic and statistical approaches for computing signal probability because it considers the correlation of internal signals caused by reconvergent fanout. In the fault analysis and digital circuit testability areas, reconvergent fanout is a large source of error when computing node controllability. The 1-controllability of an edge in the circuit graph is defined as the fraction of 1’s in the truth table of its function. Its 0—controllability is defined as one minus its l-controllability [47]. A node’s exact controllability computation requires the complete flattening of a multi—level expression to a two-level or sum-of—products expression, comprised of primary inputs only. The flattening aspect uncoils the correlated internal signals, resulting in an expression composed of independent primary inputs. This process offers an exact solution, but is hampered by an exponential time complexity, thus limiting the technique’s usefulness to small- or modest- sized circuits. The exact computation of signal probability encounters the same dilemma expe- rienced when computing exact controllability. Exact signal probability or control- lability offers the best or most appropriate solution for a node’s Boolean equation since each has zero error. When addressing a network as an interconnection of logic gates, error due to reconvergent fanout arises (Figure 2.2). Reconvergent fanout may cause an overestimate (Equation 2.3) or underestimate in the computation of sig- nal probability, due to the statistical dependence of signals arriving at a gate. The SPCA approximates signal probability, which is less accurate than exact computation. 92 The approximation scheme assumes that the dependence resulting from reconvergent fanout is reduced the further the distance between a gate’s fanout and the resulting reconvergent fanin gate. This assumption is used by fault analysis heuristics that compute node controllabilities of large circuits [47]. The assumption allows SPCA to assume that after a certain depth (distance), reconvergent fanins may be considered to be almost-independent or very close to statistically independent. As a result, the signal probability approximation approach is much faster and less complex than the exact approach. 5.3. 1 Overview The SPCA is responsible for cost-effectively computing the signal probability at each node in the network. The algorithm has four inputs: 1) node identifier, 2) CBDD, 3) input signal probabilities, and 4) depth-accuracy parameter (k). The input signal probabilities are a user-specified set of real numbers which represent the probability that inputs are logic one or P(x; = 1). The depth-accuracy parameter (k) is the depth or level to which the SPCA collapses or flattens the network. A larger depth- accuracy yields a more accurate signal probability estimate at the expense of time and memory resources. A structural overview of the SPCA is illustrated in Figure 5.10. The SPCA involves the coordination of four main components: 1) Network Lev- elization, 2) Disjoint Post-Order Boolean Equation Generation, 3) IPR Cubeset Gen- eration, and 4) Signal Probability Computation. The Network Levelization compo- nent is responsible for ordering the CBDD nodes according to their degree of logical independence. The Disjoint Post-Order Boolean Equation Generation component is responsible for traversing the CBDD from a given node with the intent of producing a disjoint post-ordered Boolean equation that is flattened to k (depth-accuracy) levels. The [PR Cubeset Generation component makes sense of the post-ordered Boolean 93 f.— / \\ \ /T—‘\\ “\ Input Signal Qde.) C0399 Ck) (ProbabilitieD _. \T ‘1 \T/ [ Levelize Network ] Disjoint Post-Order Boolean Equation Generation [ IPR Cubeset Generation ] Signal Probability Computation P.(nodei) Figure 5.10. SPCA structural overview. equations; it converts the equations to a set of Integer Pair Representation (IPR) cubes, in infiz form. The Signal Probability Computation component reads the IPR cubeset and approximates the given node’s signal probability. 5.3.1.1 Network Levelization Before the generation of Boolean equations takes place, each node in the network is leveled. Levelization is the process of ordering the internal/output nodes of the CBDD’s graph in terms of their depth or distance from the primary inputs. The nodes at lowest level of depth, level zero, are the primary input variables. The levelization process is managed by the Network_Levelization procedure (Algorithm 2); this process is necessary because the approximated signal probability of internal nodes is repeatedly used for future signal probability computations involving deeper 94 nodes. The Network-Levelization procedure relies on a simple recursive algorithm which performs a depth-first search of the CBDD’s graph. The search runs in 0(n + e) time, where n and e are the number of nodes and edges of the CBDD’s graph, respectively. The depth-first search is a pOpular and well known search algorithm used for traversing graphs. Detailed analysis concerning the complexity of the depth- first search algorithm is reported in [48]. 95 Procedure: Network_Levelization input : v - CBDD node input : lev - Level of node 1 if lev > v.lev then 2 v.lev <— lev 3 end 4 a (— v.adj 5 while a 76 NULL do 6 Networkievelization(a.v, lev + 1) 7 a (— a.nea:t s end Algorithm 2: Levelize network procedure. Correctness of NetworkLevelization procedure Claim 5.1 Network-Levelization(v, lev) produces a dependency ordering of the nodes in a directed acyclic graph. Explanation: Assume that v is initially the DAG root node, and lev is initially set to zero. Let a E DAG. Upon the initial invocation of the Network_Levelization procedure all node levels are zero. Each invocation compares the node level, v.lev, to the stack argument, lev, at line 1. Line 2 updates v.lev (node v’s level) with the larger level value from the stack when the comparison condition of line 1 is satisfied. For every invocation of N etworkievelization, node variable a is assigned to the adjacency list of node variable v on line 4. The adjacency list assigned to variable a contains all descendents of node variable v. The nodes adjacent to node v are visited sequentially by the execution of lines 5—7. Each node having descendents experiences a recursive invocation of the N etwork_Levelization procedure at line 6. Each recursive invocation traverses to a deeper level within the CBDD’s graph. The traversal finds a deeper generation of node v’s lineage and results in an increment of the stack argument lev. If some node y experiences multiple recursive invocations, y.lev is only updated when 96 a larger level is encountered causing each node to be updated with the appropriate dependency level value. Recursive invocations terminate when all paths from the root node to all primary output nodes have been traversed. @ 3’42 . Figure 5.11. DAG of FA carry-out bit network. Node Adj acency List Level root a, b, c_in —1 a 21, 22 0 b 21, 22 0 c_in 23 0 21 c_out 1 22 23 1 23 c_out 2 c_out 3 Figure 5.12. Carry-out bit adjacency list and levelization. The inputs to the Network_Levelization procedure include: 1) CBDD node and 2) node level. The procedure is initially invoked with the root node of the given CBDD, 97 with the node level parameter set to zero. The procedure’s output is the level ordering for each of the CBDD’s internal and output nodes. Figure 5.12 shows an example levelization for the mapped full-adder carry-out bit network. The DAG modeling just the primary input, gate and primary output interconnections maintained by the CBDD is given in Figure 5.11. The network node adjacency list is illustrated in Figure 5.12. The results of the network levelization example demonstrate that primary inputs are the most independent nodes; they appear at level zero. Due to their dependence on other signals, internal signals and primary outputs appear at deeper levels of the network. 5.3.1.2 Disjoint Post-Order Boolean Equation Generation The generation of disjoint Boolean equations is a key task necessary to the computa- tion of signal probability. The procedure described in this section produces a disjoint Boolean equation, where all terms are mutually exclusive similar to the minterms of a truth table. The Post-Order procedure, Algorithm 3, is a simple recursive rou- tine that ascends from some node (v) to a user-selected depth-accuracy (k) within a CBDD’s graph. The goal of the Post-Order procedure is to construct a disjoint post-ordered Boolean equation composed only of literals that appear a maximum of 1:: levels above node v’s level within the CBDD’s graph. 98 Procedure: Post-Order input input input input output 1 begin : CBDD - CBDD of network : v - Vertex : d - Vertex depth : lent - Level counter : POE.Stack - Post ordered expression stack 2 EdgenS'et (— GetJn_Edges(CBDD, v) 3 for V E E Edge_set do 4 if E.edge_type = ZERO_EDGE or E.edge_type = 0NE_EDGE then 5 tcnt (_— tent 'l‘ 1 6 eV (— Get.Source_Verte$(CBDD, E.vertea:) 7 Sign (— (E.edge_type = ZERO_EDGE ? FALSE : TRUE) 3 Val (— (E.edge_type = ZERO_EDGE ? — eV.id:r : eV.id:r) 9 if eV.verte:z:_type 2 INPUT then 10 Push(POE _S tack, Val) 11 else 12 if d < depth_accuracy(k) then 13 T <— GetJerminal_Vertex(CBDD, eV, Sign) 14 Post.0rder(CBDD, T, d + 1,0) 15 else 16 Push(POE _S tack, Val) 17 endif 13 endif 19 Post_0rder(CBDD, eV, d, lcnt + 1) 20 if lent > 0 then 21 Push(POE_Stack, *) 22 end 23 end 24 if tent > 0 then 25 Push(POE_Stack, +) 26 end 27 end 23 end Algorithm 3: Post-order CBDD traversal. 99 The design of the Post-Order procedure is the result of modifications to the tradi- tional post-order binary tree traversal. The post-order traversal was selected because it’s a very effective means for circumnavigating a graph to produce meaningful ex- pressions. The most common implementation of the post-order traversal algorithm is recursive, where all un-traversed node edges are visited first, followed last by a node visit(output). The result is an expression in postfix notation. Given the expression tree, Figure 5.13, describing the full-adder carry-out bit Boolean equation (Equation 3.6), the resulting postfir expression is ab * cab + *+. Figure 5.13. Expression tree for full-adder carry-out bit. Correctness of Post-Order procedure The Post-Order procedure arguments include 1) a CBDD, 2) vertex v, 3) invo- cation depth d and 4) node counter lent. Argument v is a node within the CBDD’s graph, the start point of the traversal. Argument d is the procedure’s invocation depth given node v. The lent argument tracks the number of Output nodes encountered 100 during the procedure’s recursive invocations for a given vertex v. Claim 5.2 The Post-0rder procedure generates a valid post-ordered Boolean equa- tion. Explanation: The Post_Order procedure generates a post-order Boolean ex- pression composed of Boolean AN D / OR operations accompanied by literals that rep- resent the value of individual decision nodes. The procedure assumes that all logic operations have two inputs which alleviates the need for delimiters. The Post_Order procedure is invoked with the vertex depth (d) and the level counter (lent) argu- ments. Initially, both arguments are set to zero. The vertex, v, is initially set to the l-Terminal of the specified MSBDD within the CBDD’s graph. The procedure begins by collecting the incoming edges of node v on line 2. The incoming edge set of node v is traversed by lines 3-27. For each iteration of the edge set, line 4 checks if the given edge is a 0—edge or 1-edge. If the line 4 condition is satisfied, then lines 5-23 are encountered, otherwise lines 23—26 are encountered. Lines 24-26 Push the Boolean OR symbol to the output stack if at least one pair of post-order arguments have been previously added to the stack. The block between lines 5 and 23 allow the Post_Order procedure to ascend from a given node to its predecessor node. A counter tracking the number of post-order argument pairs per invocation is maintained on line 5. To assist with the ascending traversal from node v to its predecessor, line 6 lo- cates the head vertex of the current edge, by invoking the Get_Source_Vertex routine. Based on the current edge type, the sign variable is updated in line 7. The sign variable contains the Boolean phase of the head source vertex. The index value of head source vertex is determined on line 8. A negative index value indicates that the head vertex literal is complemented. If the head vertex models a primary input, then lines 9-10 add the head vertex index value of line 8 to the stack. Otherwise, the head 101 source vertex models an internal signal and is dealt with in lines 12-17. If the depth of the search is less than the depth-accuracy (k) parameter a recursive invocation takes place on line 14; this allows higher exploration into the CBDD’s graph. Line 13, invokes the Get-Terminal_Vertex routine which directs the upward search from the edge’s source vertex to a Terminal vertex in the preceding MSBDD. The Ter- minal is chosen based on the literal sign value. Lines 15—17 place the source node’s literal value on the output stack. Line 19 performs a second recursive invocation and increments the level counter argument. The second invocation allows for the upward traversal within the subgraph of the new MSBDD resulting from Line 13. Lines 20—22 build the Boolean product terms by pushing a Boolean AND symbol to output stack when the second literal (decision node) is encountered during the recursive invocation. Lines 24-26 push the Boolean OR operator symbol to the output stack just after the pairs of product terms encountered during the upward traversal from the Terminal node. The repeated interaction of lines 2-27 result in an output stack that contains product terms whose literals are positioned k levels or less above the specified input vertex argument within the CBDD’s graph. All terms are described using 2—input Boolean AND operators and each post-order pair of terms is followed by a Boolean OR symbol on the output stack. Analysis of Post_Order procedure Traditional post-order BDD traversals explore only two edges per node because of the binary nature of the BDD’s implementation. The Post-Order procedure for the CBDD must consider two edges per Input node and B edges per Internal node, where B is the maximum fanout of an internally produced signal. Let G be the number of logic elements in the circuit. According to the analysis of the Mk_N AN D procedure, a B—input MSBDD NAND representation has B + 3 nodes and 25 + 2 edges. The 102 fanout of the NAN D MSBDD’S Output node results in a maximum of B value-edges, yielding a maximum of 3B + 2 edges per logic element. The Post_Order procedure is recursively applied on a per node basis. The traversal encounters a maximum of Vmax nodes and Ema, edges which are C x (B + 3) and G x (3B + 2), respectively. For each recursive iteration of the Post_Order procedure, assume that the oper— ations at lines 2-8 are always performed. These operations run in constant time, with a total constant time of 0(7). The if-then-else block at lines 9-18 runs in 0(4) time, by assuming that lines (9, 12, 13 ,14) always run, yielding the longest possible thread through that block. The recursive invocation on line 19 runs in 0(1) time, and the if-then blocks at lines 20—22 and lines 24-26 both run in 0(2) time for a total constant time of 0(4). The block of lines from 3-27 encloses a total constant time of 0(7) + 0(4) + 0(5) = 0(16). The search can be viewed as a traversal of all edges with visitation to all nodes in the ascending ancestry of the specified start node. The largest number of invocations of the Post_Order procedure for a given node is Vmax +Emaz, with each invocation costing a maximum of 0(16) time. Thus, the worst case running time of the Post_Order procedure is 0 (16 x (Vm + Ema$))- Given the user-selected depth-accuracy parameter, k, and the maximum logic element fanout, B, the overall worst case running time of the Post_Order procedure can be expressed asO(16x (45+5) xG). Post — Order BoolearTEquations Ida: Node k:1 [€22 k:3 1 c.in - - - 2 a 3 b - - - 4 21 32* 32* 32* 5 22 23(-2)*+ 23(—2)*+ 23(-2)*+ 6 23 15* 123(—2)#+# 123(—2)*+4 7 c_out 46(-—4)*+ 32*15*(-2)(—3)2*+t+ 32*123(-2)*+*(—2)(—3)2*+*+ Table 5.1. Post-order Boolean equations for FA carry-out bit. 103 After invoking the Post-Order procedure with the full-adder carry—out bit CBDD, Figure 4.7, the resulting post-ordered Boolean equations are shown in Table 5.1 for k = 1, 2, and 3. From inspection, as the depth-accuracy parameter increases the length of the post-order equations increase as expected. At the request of a deep network flattening (large k), nodes having large fanouts (many value-edges) contribute to a dramatic increase in the run-time of the procedure. Thus, a deeper equation flattening requires longer execution times and obviously a greater demand on memory resources. 5.3.1.3 IPR Cubeset Generation The IPR Cubeset Generation component is responsible for converting the post-ordered Boolean equations, generated by the Post-Order procedure, to a collection of IPR (Integer Pair Representation) cubes in infiz form. IPR is a very compact Boolean equation representation system, developed by Diaz and Jimenez in [49]. A cube is similar to a product term. It may represent one or more minterms of a switching function. It consists of a pair of integers: the position and expansion. The position integer represents the literals of the individual switching variables and the expansion integer indicates whether a literal is in use or if it is a don’t care variable [49]. A cubeset is a list or collection of cubes. For the purposes of the SPCA, cubesets are used to efficiently represent the behavior of switching functions. The Generate_IPR-Cubeset procedure, Algorithm 4, is responsible for generating an infix Boolean expression in terms of IPR cubes. The procedure’s input argument is a post-ordered Boolean expression stack that is composed of Boolean Operators (AND/ OR) and literals only. It is assumed that all operators are binary, requiring two inputs. This assumption simplifies the conversion of postfir to infir notation because delimiters are not necessary. 104 Procedure: GenerateJPR-Cubeset input : I N _Stack - Input expression stack output : I PR.Cubeset - IPR cubeset 1 begin 2 while I N -Stack Not Empty do 3 repeat 4 item (— Pop(IN_Staek) 5 if item.type = OPERATOR then 6 Operator (— item 7 else 8 Push(E val _Expr_S tack, item) 9 endif 10 until item = OPERATOR 11 iteml (— Pop(Eval_Expr.Staek) 12 item2 (— Pop(Eval_Expr_Staek) 13 result (— Per f ormJ PR.0peration(0perator, iteml, item2) 14 Push(Eval_Expr_Stack, result) 15 end 16 I PR_Cubeset <— Pop(Eval_Expr_Stack) 17 end Algorithm 4: IPR cubeset generation. Correctness of GenerateJPR_Cubeset procedure Claim 5.3 The Generate_IPR_Cubeset procedure generates a Boolean expression in infix notation. Explanation: The GenerateJPR-Cubeset is invoked with a post-ordered Boolean expression input stack. Line 2 checks the length of the input stack. While the stack is not empty the procedure enters lines 3-10, where it continuously removes items from the stack until an operator (AND/OR) is encountered. Line 4 performs the Pop command which removes the next item from the stack. If the item is not an Operator then it must be a literal. The literal is Pushed onto the evaluation expression stack at line 8. If an operator is encountered by the check at line 5, then the item variable is updated with the operator value, line 6. Following the update, the repeat-until 105 loop is broken at line 10, with processing continuing to line 11. The repeat-until loop, lines 3-10, is iterated two times and loads two literals on the evaluation expression stack, due to post-order pairing of literals in the Post..Order procedure. Lines 11 and 12 remove the top two items from the evaluation expression stack. Line 13 invokes the PerformJPR-Operation routine with the two items from the evalu- ation expression stack and the operator acquired from line 6. The PerformJPR_Operation routine converts the item arguments to IPR cubes and performs a Boolean (AN D / OR) IPR operation on the two item cubes. The result of the operation is Pushed onto the evaluation expression stack (line 14). The procedure returns to line 2 and tests the size of the input stack. While the input stack size remains nonzero processing continues as previously described. When the input stack becomes empty, processing continues to line 16, where the final IPR cube infix Boolean expression result is re- trieved from the evaluation expression stack and returned to the invoking procedure. Analysis of GenerateJPR_Cubeset procedure The complexity of the GenerateJPR.Cubeset is more diflicult to compute than it appears. The duration of the outer while-loop, lines 2—15, must be determined; this duration is actually the size of the post-order input expression stack (in_si2e). The exact computation of such a value is circuit specific. However, the derivation for the worst case in..si2e approximation is presented in the following test. Let B equal the maximum number of logic element inputs. Given the structure of a B-input MSBDD, the post-order expression size is determined by summing the number of decision node literals with the number of Boolean (AND/ OR) operations encountered during the post-order traversals. For a single level (11:21) flattening of a typical MSBDD (N /AND, N /OR, XOR), a worst case approximation for in_si2e (input stack size) given B inputs, is computed as: 106 [3 5-1 in_si2ek:1 2 2i + 22' +s—1 i=1 i=0 ' v V #ORs #literals #ANDs = ;fi(1+fl)+%fi(fl—1)+fi—1 fi2+fi fi2+fi 2 + 2 "1 W W #literals #operations 2 B2+B-1. For multi-level circuits, the Post_Order procedure will expand the CBDD into an input stack expression whose size depends on k, the depth-accuracy parameter. Let L, the literal count for a B-input MSBDD, equal 927%. Let C, the number of Boolean operations for a B—input MSBDD post-order expression, equal fl—Ztg — 1. Let F, equal L+C or B2+B — 1. The resulting input stack expression size can be modeled using the following recurrence relation: in.si2e,c = L x in_si2ek_1 + C. After a few iterations of the recurrence relation the solution was determined to be in.si2e,c = F for k =1 2 L-F+C for 10:2 2 L-(L'F+C)"‘2+C for k>2 2 litre ((WgW’” — 1)'€-2 + 1) — 1 for k > 2 The condition on line 2 runs in constant time, 0(1). The Push/POp Operations at lines 4, 8, 11, 12, 14, and 16 run in constant time, 0(1). Lines 5—9 run in a maximum of 0(2) time. The PerformJPR-Operation routine runs in 0(G) time, where G is the number of circuit inputs plus the number of circuit logic elements. The while-loop, lines 2-15, executes a maximum of 152938 times. The repeat-until block, lines 3-10, always runs twice, with a total time cost of 0(8). Lines 11-14, result in a running time of 0(1) + 0(2) + 0(G) + 0(1) z 0(G + 4). The total running time for each iteration of the while-loop, lines 2-15, is 0(1) + 0(8) + 0(G + 107 4) z 0(G + 13). Given that the while-loop iterates %§L times, the worst case running time of the GenerateJPR_Cubeset procedure is 0 (% x (G + 13)) or o(§[é§t—5((fi‘—+?g‘ifl’i—1)k-2+1)—1]x(G+13)),fors>1andk>2. Step Input Expression Stack Evaluation Expression Stack 0 32*123(—2)t+*(—2)(—3)2#+*+ - l 2t123(—2)*+t(—2)(—3)2¢+*+ 3 2 *123(-2)*+*(—2)(—3)2*+#+ 32 3 123(—2)*+*(-2)(—3)2*+*+ (31:2) 4 23(—2)*+*(—2)(—3)2*+t+ (3*2)l 5 3(—2)*+*(—2)(—3)2t+*+ (3:2)12 6 (—2)*+*(——2)(—3)2*+*+ (3:2)123 7 t+*(—2)(—3)2*+*+ (3*2)123(—2) 8 +*(—2)(—3)2*+*+ (332)12(33(—2)) 9 *(—2)(—3)2 a: + t + (3 at 2) 1 (2 + (3 t (-—2))) 10 (—2)(—3)2*+¥+ (33:2) ((23: l) + (3t(—2)* 1)) ll (—3)2*+t+ (33:2) ((2t1) + (3*(—2)*1))(—2) l2 2e+t+ (31:2) ((231) + (3*(-—2)* 1)) (—2) (—-3) 13 at + 3+ (3 t 2) ((2 #1) + (3 t (—2) at: 1)) (—2) (—3) 2 14 + 3 + (3 t 2) ((2 *1) + (3 t (—2) a: 1)) (—2) ((—3) t 2) 15 2+ (3 t 2) ((2 *1) + (3 t (—2) at 1)) ((—2) + ((—3) It 2)) 16 + (3 at 2) (((—3) a: 2 It 1) + (3 at: (-—2) t 1)) 17 — (332)+((-3)t2*1)+(3t(—2)*1) Table 5.2. Output from GenerateJPR_Cubeset procedure. The GenerateJPR_Cubeset procedure returns an evaluation stack which contains the terms of a disjoint Boolean equation that is in sum-of-produets form. Each term is represented by an IPR cube. The full-adder carry-out bit primary output, c_out, pro- duced by the Post_Order procedure, Table 5.1, is used in an example to demonstrate the conversion of post-order Boolean equations to a sum-of-products equation in infix form. Table 5.2 shows the stages of the input expression stack (c_out) and the eval- uation expression stack as their contents are modified by the GenerateJPR_Cubeset Procedure’s postfix to infix conversion. If a node index to literal mapping, Table 5-1, is performed on the final evaluation expression stack output, the resulting infix expression for c_out is ab + abc + abc. The final expression is composed of disjoint teI‘rns only. 108 5.3.1.4 Computation of Signal Probability The Signal Probability Computation component is responsible for computing the sig- nal probability for a specified node within the CBDD’s graph. This task is im- plemented by the Compute_Signal_Probability procedure (Algorithm 5). Using the disjoint Boolean expression IPR cubes produced by the GenerateJPR-Cubeset pro- cedure, the ComputeSignaLProbability procedure determines the signal probability of each cube within the IPR cubeset and sums all IPR cube signal probabilities. The result is the signal probability for the node represented by the disjoint Boolean expression input. Procedure: Compute_Signal_Probability input : I PR.Cubeset - IPR Cubeset input : SP - Signal probability array output : Expr_SP - Expression signal probability 1 begin 2 for V C E I PR.Cubeset do 3 T_prob +— 1.00 4 for idx E 1..IPR.Cube_Length do 5 EXP (— (C.exp[idx] AND mask[idx]) 6 POS (— (C.pos[idx] AND mask[idx]) 7 if EXP = 0 then 8 I_prob (— (POS > 0 ? SP[idx].p : SP[idx].q) 9 T_prob (— T .prob * I .prob 10 end 11 end 12 Expr.SP <— Expr-SP + I_prob 13 end 14 end \ Algorithm 5: Signal probability computation procedure. 109 Correctness of Compute_Signal_Probability procedure Claim 5.4 The Compute_Signal_Probability procedure computes the signal probability for a disjoint Boolean expression. Explanation: The Compute_SignalProbability procedure advances through the list of IPR cubes (terms) within the outer for-loop, lines 2-13. Line 3 initializes the cube signal probability to 1. The inner for-loop, lines 4-11, iterates over the entire signal variable space for the current IPR cube. The goal at this point is to determine the variables used by the cube. Variables in use have a 0 in their corresponding bit position of the cube’s expansion integer. Line 5, using bitwise AND operations, determines if the variable is in use. Lines 6 determines the phase of the variable using bitwise AND operations. The if-block, lines 7-10, is entered if the variable is in use, otherwise processing continues within the inner for-loop for the next variable. Line 8 reads the signal probability for the corresponding variable from signal probability array based on the Boolean variable’s phase. Line 9 performs a multiplication of the variable’s signal probability with the current cube’s intermediate signal probability product. Upon termination of the inner for-loop, line 9 computes the signal probability of a cube in the following manner: P,(eubek) = mg! P,(v), where Vk is the set of signals in use by cubek. Line 12 performs a summatioric of the cube signal probabilities. Upon termination of the outer for-loop, line 12 computes the signal probability for the IPR cubeset as #cubes P,(eubeset)= Z P3(eube,-). i=1 110 Analysis of Compute_Signal_Probability procedure The running time of the Compute_Signal_Probability procedure heavily depends on the size of the IPR cubeset (number of terms). In the worst case the maximum number of terms is 2N , where N is the number of circuit inputs. The system supports a user- specified depth-accuracy (k) parameter which limits the maximum number of terms when k is less than the depth of the input circuit. Given that the circuit is composed of B-input logic elements, the worst case number of terms is approximately equal to Bk“, where B is the fanin size of the logic element. The outer for-loop, lines 2-13, will iterate a maximum of B"+1 times. Line 3 runs in constant time, 0(1). Lines 5—6 run in constant time, 0(2). The if-block, lines 7-10, in the worst case runs in 0(3) time. The duration of the inner for-loop is equal to number of logic elements plus the number of primary inputs within the circuit, designated as R. Therefore, the running time of the inner for-loop is 0(5R). Line 12 runs in constant time, 0(1). The worst case running time of the procedure is 0((5R+2) x Bk“). From the analysis it’s apparent that an increase in k dramatically affects the running time of the procedure. The more difficult tasks of computing signal probability were accomplished by the Post-Order and GenerateJPR-Cubeset procedures. The tasks performed by the Compute_Signal_Probability procedure are much easier to execute. An example which demonstrates the work performed by the Compute_Signal_Probability procedure is illustrated in Table 5.3. The IPR cubeset for the carry-out bit is used as input in 1 this example and the signal probabilities for primary inputs a, b and c, are %, :4- and %, respectively. 5.3.1.5 SPCA Component Interconnection The SPCA is an interconnection of procedures which flatten the input circuit’s net- work to a user-selected depth and efficiently approximates signal probability for all 111 Term Term Product PS (Term) ab :-: — 550 i' i ' i 3% as. iii — P, (cubeset) é Table 5.3. Signal probability computation for full-adder carry-out bit. internal and output nodes. Previous sections have discussed the operational and time complexity details of the individual SPCA components. The SPCAJntereonnect pro- cedure, Algorithm 6, is responsible for the invocation and coordination of all SPCA component procedures. When invoked, the SPCAJnterconnect procedure coordinates the signal probability computation for an entire CBDD with user-selectable Boolean expression flattening. Procedure: SPCAJnterconnect input : CBDD - DAG of input circuit input : PI .8 P - Primary input signal probabilities input : k - DepthAccuraey output : S P_Array - Signal probability array begin N etwork_Leveli2ation(C BDD.root, 0) for V MSBDD E CBDD do Expr +— Post-0rder(CBDD, MSBDD.0utput_vertex, k, 0) I PR.Cubeset (— GenerateJPR.Cubeset(Expr) Sp (— Compute_S ignalfrobabilityU PR_Cubeset, PI .5 P) SP_Array[M S BDD.0utput_vertex.idx] <— Sp end cmflaflhwwfl [. m 5 Algorithm 6: SPCA interconnection procedure 112 Correctness of SPCAJntereonnect procedure Claim 5.5 The SPCAJnterconnect procedure generates the signal probabilities for all nodes within a CBDD. Explanation: Conventional methods for estimating signal probability of a circuit utilize traditional BDDS. The computation involves a post-order traversal of the BDDS for each net within the circuit, resulting in a Boolean expression. Assuming uncorrelated primary inputs and the zero-delay model, the conventional methods for estimating signal probability of a net modeled by a BDD utilize Equations 5.1, 5.2 and 5.3. m1 P,(termj) :2 Hp(x,~) (5.1) i=1 termxfltermy = 0 nyéy (5.2) “k P,(nodek) = ZP,(termJ-) (5.3) i=1 0 Equation 5.1 specifies that the term signal probability is the product of the term’s input signal probabilties. 0 Equation 5.2 specifies that all terms are disjoint or independent of one another. 0 Equation 5.3 specifies that the node signal probability is the sum of disjoint term signal probabilities. The signal probability estimation method presented in this dissertation is correct and satisfies Equations 5.1, 5.2 and 5.3. The proposed method utilizes a CBDD with user-selectable Boolean expression flattening as opposed to a BDD. Theorem 4.3 val- idates the fact that a CBDD maintains a circuit’s behavior. Equation 5.2 is satisfied by the application of Theorem 4.2 which validates that all paths from any decision 113 node xj to any decision node xk are unique for j # k, within an MSBDD. The gen- eration of Boolean equations as a collection of disjoint IPR cubes is performed by the Post_Order and GenerateJPR_Cubeset procedures. Signal probability computa- tion for each node within the CBDD is performed by the Compute_Signal_Probability procedure which satisfies Equations 5.1 and 5.3. The SPCAJnterconnect procedure correctly generates the signal probability for all internal and primary output nodes within a CBDD. The procedures invoked by the the SPCAJnterconnect procedure are well defined and produce predictable outputs for their given inputs. The more important procedures which perform searchs/traversals are based on well known search algorithms which have been proven to be consis- tent and correct. The overall interconnection of these procedures is logical and has been tested as a system. The system level testing results are predictable and consis- tent with the results of conventioanl BDDS; this verifies that the SPCAJntereonnect procedure correctly computes signal probability for a circuit represented by a CBDD. Analysis of SPCAJnterconnect procedure The running time for the SPCAJnterconnect depends on the worst case running times of all procedure invocations. Line 2, the invocation of Network_Levelization is called once. The for-loop, lines 3-7, accounts for a majority of the SPCAJnterconnect running time. The duration of the for-loop is G. Line 7 runs in constant time, 0(1), but is dominated by the time of the procedure invocations of lines 4—6. Us- ing Tables 5.4 and 5.5, for B > 1 and k > 2 the worst case running time for the SPCAJnterconnect procedure is 0 (P1 + G x (P2 + P3 + P4)). From observation it’s apparent that the worst case running times of the SPCAJntereonnect procedures have the following ranking, P1 << P2 << P4 << P3. The ranking indicates that the SPCAJnterconnect procedure can potentially spend most of its time in the invo- 114 cation of the GenerateJPR-Cubeset procedure. This procedure is computationally expensive because the input expression stack could become very large and the literal to IPR cube conversion has a cost of (G + 13) per cube. The resulting worst case running time for the SPCAJnterconneet procedure is approximately 0(G x P3) or 0 (g [5:23 (MW/3W2 —1)'°-2 + 1) — 1] x (G + 13)) Symbol Meaning N Number of circuit inputs G Number of circuit logic elements R N + G B Maximum logic element fanin |V| CBDD vertex count [E] CBDD edge count k Depth-accuracy parameter Table 5.4. Symbol definitions. Symbol Procedure WorstCaseRunningTime P1 Networkievelization 0(|V| + IEI) P2 Post-Order 0 (16 x (4B + 5) x G) P3 GenerateJPR_Cubaset o G 132,” (“Hgsifi — 1)’=-2 + 1) — 1] x (G +13)) P4 ComputeSignalProbability 0 (5R + 2) x Bk“) Table 5.5. Procedure running times. 5.4 Switching Activity and Power Computation The final component of the BLAPE implementation involves the computation of switching activity and dynamic power. The complicated task of computing signal PTObability was performed by Signal Probability Computation Algorithm (SPCA). 115 Upon completion of the SPCA, a signal probability array is generated. The sig- nal probability array is passed to a routine which computes switching activity using a(x) = 2 - P,(x) - (1 — P,(x)). Switching activity is computed for each node in the network. After this computation, dynamic power is computed for each node using PSwach($) = -:;oz(x) oCL(x) - fCLK - Vd'fi. Next, the BLAPE implementation provides a power estimate for the entire circuit using fl Pgw,tc,,(x,-), where N is the number of nets in the circuit. — The following examples (Figures 5.14 and 5.15) provide BLAPE generated activity and power output for both implicit and mapped structural representations (Figures 5.4 and 5.6) of the full-adder. As a means of comparison, SIS generated activity and power analysis is given in Figures 5.16 and 5.17 for the same structural specifications. SIS switching activity roundoff to two decimals places results in a 0.26% power estimate difference, when compared to BLAPE. 116 Node Activity Capacitance c_in 0.5000 cap=11 x 0.5000 cap=11 y 0.5000 cap=11 c_out 0.5000 cap=0 sum 0.5000 cap=0 NOTc_in 0.5000 cap=6 NOTy 0.5000 cap=6 NOTx 0.5000 cap=6 to 0.2188 cap=5 t1 0.2188 cap=5 t2 0.2188 cap=5 t3 0.2188 cap=5 _LC1_B1 0.5000 cap=2 t4 0.3750 cap=4 t5 0.3750 cap=4 t6 0.3750 cap=4 _LC2_B1 0.5000 cap=2 Power = 91.94 uW, assuming 20 MHz CLK, Vdd = 5V Figure 5.14. BLAPE implicit structural FA activity/ power estimates. Node Activity Capacitance c_in 0.5000 cap=11 x 0.5000 cap=11 y 0.5000 cap=11 c_out 0.4878 cap=1 sum 0.4851 cap=2 [236] 0.3750 cap=4 [238] 0.3750 cap=4 [234] 0.3750 cap=4 [182] 0.5000 cap=6 [181] 0.5000 cap=6 [226] 0.2188 cap=5 [183] 0.5000 cap=6 [228] 0.2188 cap=5 [230] 0.2188 cap=5 [232] 0.2188 cap=5 Power = 89.58 uW, assuming 20 MHz CLK, Vdd = 5V Figure 5.15. BLAPE mapped structural FA activity/ power estimates. 117 Node c_in Cap. = 11 Switch Prob. 0.50 Power = 13.8 Node 1 Cap. 11 Switch Prob. 0.50 Power = 13.8 Node y Cap. 11 Switch Prob. = 0.50 Power = 13.8 Node [3019] Cap. = 0 Switch Prob. = 0.50 Power = 0.0 Node [3020] Cap. = 0 Switch Prob. 0.50 Power = 0.0 Node NOTc_in Cap. = 6 Switch Prob. = 0.50 Power = 7.5 Node NOTy Cap. = 6 Switch Prob. = 0.50 Power = 7.5 Node NOTx Cap. = 6 Switch Prob. = 0.50 Power = 7.5 Node _LC2_81 Cap. = 2 Switch Prob. = 0.50 Power = 2.5 Node _LC1_B1 Cap. = 3 Switch Prob. = 0.50 Power = 3.8 Node t0 Cap. = 5 Switch Prob. = 0.22 Power = 2.7 Node t1 Cap. = 5 Switch Prob. = 0.22 Power = 2.7 Node t2 Cap. = 5 Switch Prob. = 0.22 Power = 2.7 Node t3 Cap. = 5 Switch Prob. = 0.22 Power = 2.7 Node t4 Cap. = 4 Switch Prob. = 0.38 Power = 3.8 Node t5 Cap. = 4 Switch Prob. = 0.38 Power = 3.8 Node t6 Cap. = 4 Switch Prob. = 0.38 Power = 3.8 Total Power: 92.187500 Figure 5.16. SIS implicit structural FA aetivity/ power estimates. Node c_in Cap. = 11 Switch Prob. 0.50 Power = 13.8 Node 1 Cap. = 11 Switch Prob. = 0.50 Power = 13.8 Node y Cap. = 11 Switch Prob. = 0.50 Power = 13.8 Node [3021] Cap. = 1 Switch Prob. = 0.50 Power 8 1.2 Node [3022] Cap. = 2 Switch Prob. = 0.50 Power = 2.5 Node [236] Cap. 8 4 Switch Prob. = 0.38 Power = 3.8 Node [238] Cap. = 4 Switch Prob. = 0.38 Power = 3.8 Node [234] Cap. = 4 Switch Prob. = 0.38 Power = 3.8 Node [182] Cap. = 6 Switch Prob. = 0.50 Power 8 7.5 Node [181] Cap. = 6 Switch Prob. = 0.50 Power = 7.5 Node [226] Cap. = 5 Switch Prob. = 0.22 Power = 2.7 .Node [183] Cap. = 6 Switch Prob. = 0.50 Power = 7.5 lNOde [228] Cap. = 5 Switch Prob. = 0.22 Power = 2.7 INOde [230] Cap. = 5 Switch Prob. = 0.22 Power = 2.7 lNOde [232] Cap. = 5 Switch Prob. = 0.22 Power = 2.7 Total Power : 89 . 687500 Figure 5.17. SIS mapped structural FA activity/ power estimates. 118 The BLAPE implementation uses a behavioral VHDL specification as its initial input. In an effort to provide realistic activity and power estimates for a given design, structural specifications at the gate-level are considered. Various structural specifi- cations yield various power estimates, as shown above. BLAPE’S ability to consider various structural specifications, given an initial VHDL behavioral specification, com- plements an HDL-based hardware design methodology by providing improved high- level power estimates. 5.5 Experimental Results This section presents experimental results generated by the BLAPE program. BLAPE results are compared to the results generated by the Berkeley SIS power estimator tool. The Berkeley SIS tool was chosen for comparison because it is a mature VLSI development system that is considered to be the standard for switching activity and dynamic power measurement. The Berkeley SIS tool computes exact signal proba- bility by traversing a traditional BDD with the intent of generating a 2-level disjoint Boolean equation for each net of the input circuit. Quantities such as power (aw) and time (s) are used as a means of comparison between the two tools. The circuits involved in the tool comparison are a selection of ISCAS-85 benchmarks, MCNC Synth89 benchmarks, and other arithmetic circuits. To simplify the exchange of benchmark circuits between tools, the input circuits are BLIF-formatted gate-level specifications. For BLAPE accuracy comparisons, it is assumed that the Berkeley SIS power estimator generates exact power estimates. Benchmarking was performed on a workstation with the following configuration: Intel Pentium II 350MHz (processor), 64MB (RAM), and Linux (operating system) with 330MB swap partition. Five experiments were performed in which power estimates of the BLAPE pro- 119 gram were compared to the estimates of the Berkeley SIS tool. Each experiment includes one or more benchmark circuits. Tables containing the SIS power estimate and computation time, along with the corresponding BLAPE measurements for vary- ing depth-accuracy are given. 5.5.1 Experiment 1 : 64—Bit Adder Benchmark In the first comparison, the BLAPE and SIS tools were used to perform power esti— mates for a 64—bit adder. The outcome of the comparison is given in Table 5.6. The 64—bit adder was selected because of its depth and high level of isolated reconvergent fanout. - #Inputs #Outputs #Gates #Levels #Nodes 128 65 315 128 4944 Tool uw CPU k %Err SIS 3030.8 1 .208 — — BLAPE 3030.9 1 .138 12 0.003% Table 5.6. 64—bit adder size/ power estimates. Subsequent runs of the BLAPE program were performed using the 64—bit adder as input. For each run, the depth accuracy parameter (k) was increased. The results of the subsequent runs are illustrated in Figures 5.18, 5.19, and 5.20. The results for the 64-bit adder indicate that for a small depth-accuracy (k < 3), execution times are short and BLAPE underestimates power by just 0.019%. For 3 < k < 5, execution times are a little longer and BLAPE overestimates power by 0.089%. As It increases (1‘3 > 5), BLAPE starts to converge to the exact power estimate produced by SIS. 120 For k = 12, a very reasonable power estimate is achieved in about the same time as SIS takes. The time execution (Figure 5.20) reveals an exponential growth pattern, attributed to increased depth-accuracy. Power v. k (644-bit Adder) I I I I I I I T I I I I I I I I I I 3033.6 3033.4 3033.2 3033.0 3032.8 3032.6 3032.4 3032.2 3032.0 3031 .8 Power W) 3031 .6 3031 .4 3031 .2 3031 .0 3030.8 —1 3030.6 --1 3030.4 .1 3030.2 1 1 1 1 1 1 1 1 1 1 1 I I I l 1 1 1 2 3 4 5 6 7 8 9 1011121314151617181920 depth-accuracy (k) Figure 5.18. Power v. k (64—bit adder). 121 Accuracy Time (s) 1 .0000 0.9998 0.9996 0.9994 0.9992 0. 9990 0.9988 0. 9986 Accuracy v. k (64-bit Adder) 1 1 1 1 1 1 1 1 1 1 1 1 l 1 1 1 1 1 1 2 3 4 5 6 7 8 91011121314151617181920 depth-accuracy (k) Figure 5.19. Accuracy v. k (64—bit adder). Time v. k (64-btt Adder) 24.0 *- 22.0 — 20.0 - 18.0 '- 16.0 >- 14.0 '- 12.0 - 10.0 - I I I I I I TI I I I I I I T I I I I I ‘ ‘ L L 1 1 1 1 L 1 1 1 I l 2 3 4 5 6 7 8 9 1011 121314151617181920 depth-accuracy (k) Figure 5.20. Time v. k (64-bit adder). 122 5.5.2 Experiment 2 : Arithmetic Benchmarks The arithmetic benchmark suite contains adder and multiplier circuits of various sizes. The arithmetic benchmark comparisons demonstrated the same power/ accuracy and performance trends observed with the 64—bit adder (Tables 5.7 and 5.8). Certain circuits, due to their depth and large internal fanouts, were not tested for k > 5. For the arithmetic class of circuits a depth-accuracy of k = 1 provided an average error of 0.19% along with an average runtime of 0.012 seconds. As It was increased the average power estimate error approached zero. For k = 3 there was a small increase in the average error to about 0.32%, which can be attributed to reconvergent fanout. This error is so small that it can be considered negligible. SIS BLP k: k=2 k=3 k: k=7 Circuit pm pm %Err 14w %Err pw %Err aw %Err uw %Err add8 349 349 0.20 349 0.20 349 0.17 349 0.06 349 0.00 add 16 732 732 0.08 732 0.08 733 0.18 732 0.07 732 0.03 add32 1498 1498 0.02 1498 0.02 1501 0.18 1499 0.09 1498 0.04 mult4 410 410 0.02 410 0.02 411 0.32 410 0.00 410 0.02 mult8 2274 2289 0.65 2289 0.65 2292 0.78 2287 0.58 Average 0.19 0.19 0.32 0.16 0.02 Table 5.7. Power/Accuracy estimates of arithmetic benchmarks. add16 add32 mult4 mult8 Table 5.8. Time estimates of arithmetic benchmarks. 0.10 0.10 0.01 1.70 0.00 0.02 0.00 0.03 123 0.00 0.03 0.00 0.10 0.01 0.03 0.01 0.31 0.01 0.06 0.16 10.46 0.02 0.09 3.80 5.5.3 Experiment 3 : ISCAS-85 Benchmarks The ISCAS-85 benchmark suite contains a variety of ALU, control, and decoder cir- cuits. Due to circuit depth and large internal fanouts, these circuits were not tested beyond a depth—accuracy of three. A few circuits exhibited increased error for an increase in the depth-accuracy parameter. This is attributed to reconvergent fanout occurring below the depth—accuracy level of a node and the propagation of this error to subsequent signal probability computations. The ISCAS-85 benchmark compar- isons demonstrated similar accuracy/ performance trends observed by the arithmetic benchmarks, but with greater average error. Given a depth-accuracy of k = 1 the ISCAS—85 benchmark comparison yields an average error of 3.13% along with an av- erage runtime of 0.22 seconds (Tables 5.9 and 5.10). As It is increased the average power estimate error slowly approaches zero. SIS BLP k = l k = 2 k = 3 Circuit uw pw %Err pw %Err pw %Err C1355 2411 2482 2.96 2375 1.49 2442 1.29 C17 34 34 0.00 34 0.00 34 0.00 c1908 2762 2736 0.94 2732 1.08 2740 0.81 C432 1062 1009 4.94 985 7.21 C499 1820 1811 0.46 1810 0.53 1809 0.60 C5315 12136 11915 1.82 11880 2.11 11871 2.19 C880 1655 1765 6.66 1651 0.25 1608 2.82 C2670 4459 4132 7.33 4402 1.26 Average 3.13 1.74 1.28 Table 5.9. ISCAS-85 power/accuracy benchmarks. SIS BLAPE k = l k = 2 k = 3 Circuit CPU CPU CPU CPU c1355 6.10 0.07 0.17 0.56 C17 0.00 0.00 0.00 0.00 c1908 1.80 0.05 0.23 0.99 c432 2.80 0.01 5.23 c499 2.60 0.03 0.08 0.29 c5315 17.30 1.36 3.71 13.07 c880 1.50 0.03 0.09 0.69 c2670 1.80 0.23 0.61 124 Table 5.10. ISCAS-85 time benchmarks. 5.5.4 Experiment 4 : Nonredundant ISCAS-85 Benchmarks The nonredundant ISCAS-85 benchmark suite contains a variety of ALU, control, and decoder circuits. These circuits are functionally equivalent to the regular ISCAS- 85 benchmark circuit suite, except all redundant logic has been removed from the designs. A few circuits exhibited increased error for a larger depth-accuracy due to reconvergent fanout. Because of circuit depth and large internal fanouts the circuits were not tested beyond a depth-accuracy of three. The nonredundant ISCAS-85 benchmark comparisons demonstrated the same power/accuracy and performance trends observed by the ISCAS-85 benchmarks, but the average error for k = 1 was smaller due to less logic and less internal reconvergence. Given a depth-accuracy of k = 1 the nonredundant ISCAS-85 benchmark comparison yields an average error of 2.29% along with an average runtime of 0.43 seconds (Tables 5.11 and 5.12). As the depth-accuracy is increased the average power estimate error slowly approaches zero. SIS BLAPE kzl 1:22 :3 Circuit uw uw oErr pw %Err uw %Err c1355nr 2384.4 2462.7 3.00 2357 1.00 2398 1.00 c1908nr 3400.6 3365.9 1.00 3363.5 1.00 3369.6 1.00 c2670nr 3814.3 3772.4 1.00 3759.8 1.00 3757.1 1.00 c432nr 911.8 857.8 6.00 857.8 6.00 875.4 4.00 c499nr 1628.8 1639.8 1.00 1639.8 1.00 1635.1 0.00 c5315nr 11745.4 11510.5 2.00 11484 2.00 11471.3 2.00 c7552nr 13975 13690.2 2.00 13682.8 2.00 13674.6 2.00 Average 2.29 2.00 1.57 Table 5.11. Nonredundant ISCAS-85 power/accuracy benchmarks. SIS BLAPE IC "—" l k = 2 k : 3 Circuit CPU CPU CPU CPU c1355nr 5.60 0.06 0.14 0.40 c1908nr 2.30 0.12 0.57 13.71 c2670nr 2.30 0.16 0.37 0.83 c432nr 2.40 0.01 0.54 1.87 c499nr 2.40 0.01 0.03 1.58 c5315nr 26.30 0.81 2.41 7.02 c7552nr 2.20 1.88 5.39 20.51 Table 5.12. Nonredundant ISCAS-85 time benchmarks. 125 5.5.5 Experiment 5 : MCNC Synth89 Benchmarks The Synth89 benchmark suite contains various multi-level circuits which control, compare and decode. The Synth89 benchmark comparisons demonstrated the same power/accuracy and performance trends observed by the previous benchmarks, but the average error and average runtime were smaller due to less internal reconver- gence and smaller circuit size. Tables 5.13 and 5.14 show the power/ accuracy and performance comparisons with the Berkeley SIS tool. 126 3323:5283 53583330: mwfiamm OZOE .mfim £an ««3 «:3 «33 %3 %3 32%: %3 «33:: %3 «33:: %3 :32: :.:3 333:: %3 333:: 3.3:: «3.: %3 2% %3 3% %3 3% 3:3 :30 ::3 m.%3: ::3 %.%3: ::3 33:3: ::3 3%3: :3 w.%%: «33%: 3o :3 3.2:. :33 3.2:. %3 «3:«:. %3 o.%«:. %3 m.%«:. «.%«:. «9:6 %3 3% %3 33% %3 33% %3 3% %3 3% 3% Bi %3 3.:%« 33.: :33« :3: 33% %.: 33:3 %.: 332% 333% a: %3 3.3% $3 :.:%« $3 3.3% «3 «3:3 «:3 3.3% 333% :x 33 3%: 3:3 3%: 3:3 33%: «33 3%: «33 33%: 33%: «E %3 33% %3 «33% %3 «33% 33% «ms: «3 3.3% R3 «.«%« %3 3.3% «33% :58 %3 3.8% 8.: 3.%:% 333% :8 «3 :3% «3 3% %3 «3% %3 «3% «3 33% «.:% :2 «m.« 3% %.« :.:.% :3.« 33% :3 33% 33% 388 %3 «.«o« %3 «.«3« :3 3% 3:3 w.:o« 3:3 w.:3« %.«3« 38:5 :«3 3%: %3 3:3: %3 3:3: %3 3:3: %3 3:3: :33: 33:8“. %3 33% ::3 33% %3 «.:% :.:3 3% :.:3 «.:.% «3% 33:8 %3 :.%: %3 :.%: %3 :.%: %3 :.%: %3 :.%: 5%: d3%::5 %3 «3:«: %3 «3:«: :33 33%: %3 33%: %3 33:2 «3:«: Eu :3 33:. :3 33:. :3 33:. :3 33:. :3 33:. 3.«:. 8 «m3 3%: %3 5%: 33 A3%: %3 3%: %3 3%: 33%: we %3 3% %3 3% %3 3% %3 3% %3 3% «3:. E :w.« 333% %.« 33% 3«.« «33% :.:%« :5: «3.: 3:3: E: 33%: «:.: 3%: $3 «3%: 333: «-«3: %3 :.:%«: «3.: 33%: 3%.: 33%: 3%.: 33%: :.:%«: 35:3 %3 «3:... ««3 «3:... 3:3 33:... %3 :33 ««3 «3:... 33:3 3:: :«3 :33 :«3 33% :.«3 33% %3 3%... %3 33% 33% «a: %3 «.«S %3 «.«S %3 «.«S «:3 3%: %3 3.2.: «.«S :33 %3 3% %3 33% %3 «.:% %3 «.:% %3 :.:% 3% :28 Emma 31 .mefi 31 Egg 8: knack: a: fig 81 3.1 _ “.28er t I a: I u: 3 I u: « I a: E ma. l: 127 SIS BLAPE k = 1 k = 2 k = 3 k = 5 k = 7 Circuit CPU CPU CPU CPU CPU CPU adr4 0.90 0.00 0.00 0.00 0.01 0.01 alul 0.10 0.00 0.00 0.00 0.00 0.00 alu2 0.10 0.00 0.00 0.01 0.01 0.01 alu3 0.10 0.00 0.00 0.01 0.01 0.02 QSymml 0.10 0.01 0.02 0.06 7.00 alu2—2 0.10 0.03 0.08 0.21 13.69 alu4 0.50 0.09 0.25 1.24 b1 0.00 0.00 0.00 0.00 0.00 0.00 C8 0.10 0.01 0.02 0.02 0.03 0.03 cc 0.00 0.00 0.00 0.00 0.00 0.00 cht 0.10 0.01 0.02 0.03 0.03 0.03 cm138a 0.00 0.00 0.00 0.00 0.00 0.00 cm150a 0.10 0.00 0.01 0.01 0.03 0.08 cm1628. 0.00 0.00 0.00 0.00 0.00 0.00 cm163a 0.00 0.00 0.00 0.00 0.00 0.00 comp 0.00 0.01 0.02 0.03 0.35 lal 0.10 0.01 0.01 0.02 0.03 0.03 pair 2.30 0.46 1.18 terml 0.10 0.03 0.09 0.20 tlarge 0.50 0.10 0.38 0.71 ttt2 0.10 0.01 0.03 0.04 0.07 0.08 X] 0.10 0.03 0.06 0.10 0.24 0.33 x4 1.10 0.05 0.10 0.18 0.57 0.89 z4ml 0.10 0.00 0.00 0.01 0.02 0.02 duke2 0.10 0.06 0.15 0.29 0.91 1.74 e64 0.20 0.08 0.10 0.11 0.11 0.11 064 0.10 0.01 0.06 8.47 vg2 0.01 0.01 0.02 0.04 0.06 0.06 Table 5.14. MCNC Synth89 time benchmarks. 5.5.6 Remarks BLAPE supports a selective or user-specified power/ accuracy. The results of the five experiments indicate that BLAPE provides a very reasonable power estimate for a depth-accuracy of k = 1. As the depth-accuracy is increased the average error in the power estimates decreases. The trends indicate that for large k the average error will approach zero (Equation 5.4), with an increase in runtime for deep circuits containing a large number of internal fanouts. The power/ accuracy trend for all benchmarks is illustrated in Figure 5.21. The BLAPE system provides an approximation of signal probability. BLAPE supports user-selectable accuracy via the depth-accuracy parameter. The behavior of 128 Accuracy v. k 1 .0000 l— -------------------------------------------------------------------------- — 0.9980 0. 9960 Accuracy 0.9940 0.9920 0. 9900 0. 9880 i- 1 l l J 1 l 4 5 6 7 depth—accuracy (k) Figure 5.21. Average accuracy v. k (All benchmarks). BLAPE’S accuracy is rationalized by the facts given below and demonstrated in the previously illustrated power/accuracy and time benchmark graphs. BLAPE’s Signal Probability Computation Accuracy 0 The signal probability of nodes represented by Boolean equations (sum-of- products) equals the sum of their disjoint term signal probabilities, given by Equation 5.3. o Nodes represented by 2-level Boolean expressions, where 2:,- € {PI} such that PI is the set of primary inputs, yield exact signal probability. 0 Nodes represented by n—level Boolean expressions, where x,- 6 {P1, IS} such that I S is the set of internal signals, yield approximate signal probability due to reconvergent fanout. 129 a As nodal equations are reduced in depth, approaching a 2-level expression, the error in their signal probability approximation approaches zero, described by equation 5.4. Error _ Approx -— Exact k —> Lmax _ Exact —+ 0 (5.4) The parameters in Equation 5.4 include 1) Lmam, the circuit depth, 2) k, the depth- accuracy parameter, 3) Approx, BLAPE’S signal probability approximation, and 4) Exact, the exact (SIS) signal probability. An additional feature of the BLAPE imple- mentation is its support of large circuits. Circuits which SIS was not able to process, due to memory limitations, were handled by BLAPE in a reasonable time (Table 5.15) zrcmt 5445 1 c3540nr 1 594 5713 0.36 5696 1.28 5685 3.34 c7552 3512 13397 1.02 13462 3.46 13581 16.56 Table 5.15. Benchmarks not available in SIS. BLAPE’S ability to handle larger circuits is due to the CBDD’s compact be- havior/structure representation. Reasonable accuracy is achieved at k = 3. When analyzing all circuits, BLAPE’s average error is just 1.22% to 0.21%, for k = 1 to 7. BLAPE achieves a dramatic improvement when compared to behavioral-level activity and power estimation tools. The error found in most behavioral-level power estima- tors about 10 - 12 percent on average and about 80% in the worst reported case. The approach taken by BLAPE is to analyze the high-level structural components from a logic perspective, by generating the disjoint Boolean equation for each of the Circuit’s internal and output signals. BLAPE allows the user to perform a trade-off of time 130 l . versus accuracy. A less accurate estimate can be computed fairly quick but an exact estimate can take longer, depending on the circuit size and the number of inputs. 131 CHAPTER 6 Behavioral-Level Visualization of Switching Activity This chapter introduces a new tool for the visualization of switching activity in CMOS circuits. The approach presented consists of analyzing post-mortem data collected by the simulation of switching functions. An estimation of the switching activity for each circuit partition is captured in one of two views. This tool serves as a switching activity profiler, which illuminates a Circuit’s activity hot-spots and provides the locations where power dissipation minimization techniques can best be applied. The new tool allows the designer of CMOS circuits to visualize the switching activity On a partitioned basis. The illuminated areas should be considered first in the application of power minimization techniques. 6.1 Activity Viewer Tool The Activity Viewer tool is a new mechanism which provides a partitioned and color- based viewing/profiling of the activity hot-spots within a digital circuit. Previous switching activity estimation tools used the traditional tabular representation form [29, 32, 50, 51] while others used the traditional two-axis graph approach as [19, 24, 37, 132 52]. The presentation of the results in the aforementioned references can be greatly improved where spatial / locality activity inter-relationships can also be captured. The Activity Viewer tool is an improvement to the traditional circuit activity estimate viewing approaches. The Activity Viewer tool is a more effective means of presenting circuit activity estimates. The use of color-based activity coding combined with partitioning provides a more insightful depiction of how the circuit behaves in response to input stimuli. Additionally, the partitioning aspect of the tool gives a clear indication of how the internal logic partitions are inter-related with respect to activity and input. For a mapping of color to switching activity (Figure 6.1). (Color results have been repro- duced in black and white for this dissertation). The Activity Viewer tool provides two Color Activity 0.8 < Esw(g) < 1.0 0.5 < Esw(g) < 0.7 0.3 < Esw(g) < 0.4 0.1 < Esmg) < 0.2 No Activity Figure 6.1. Activity color mapping. views: the Bar and Level Views, illustrated by Figures 6.2 and 6.3 respectively, which were generated with artificial partition and activity data. The Bar View provides a color-based two-axis graph of the Circuit’s switching activity on a partition basis. As the input stimulus changes, the Bar View captures the activity of each partition. For 133 5M2) Figure 6.2. Bar view. Level Figure 6.3. Level view. multi-level circuit designs, the Level View, illustrated in Figure 6.3, captures level- based partitioned switching activity behavior. This view allows the circuit designer to isolate and inspect the activity profile of a Circuit’s network on a per level basis. 134 6.1.1 Input Transformation Process The input to the Activity Viewer tool begins with a gate-level circuit netlist described in the Berkeley Logic Interchange Format (BLIF). The netlist is passed through the BLAPE tool where level and partition information is obtained. Next, input vectors are read into BLAPE followed by simulation of the circuit. Following simulation, the outputs of selected internal and primary output nodes are collected and passed to an intermediate file. This intermediate file is passed to the mkactdat tool, which performs switching activity estimation (simulation) assuming the zero delay model and input data independence. Each switching activity estimate is computed by es- timating the average change of the selected nodes’ simulation outputs. The activity estimate is based on a block of ten simulation outputs. This simulation output block is repeatedly updated upon receipt of next input vector. The level and partition information, along with the switching activity measurements, are used as the final input to the Activity Viewer tool. Once the final input data file is read, the view- ing of switching activity can be controlled by using the Play, Stop, Forward, and Reverse buttons located on the control dialog. The menu bar provides options to features which allow the user to toggle between the Bar and Level views, as well as to copy the current view to the Window’s clipboard memory. The latter feature serves as a window snapshot function and was used to generate the figures in this section. 6.1.2 Activity Viewer Results A 4-bit Booth multiplier circuit can be used to illustrate the utility of the Activity Viewer tool. The Booth multiplier is a two’s complement array multiplier which does not require recoding for the final two’s complement result. The Booth multiplier is composed of controlled add/subtract/shift (CASS) subcircuits. Each row of the Booth array is headed by a CTRL subcircuit, which controls the CASS subcircuits. Level 13 26 Figure 6.4. Booth multiplier(4-bit) level view. After transforming the Booth multiplier circuit into a BLIF netlist, the final cir- cuit resulted in a 27 level structure, illustrated by the Level View in Figure 6.4. The selected partitions consists of the basic array elements: the CTRL and CASS subcir- cuits. The above view displays low switching activity due to the small change in the input vectors. The low switching activity is due to minimal changes in the logic of the CTRL and CASS subcircuits caused by infrequent changes in the sign and number of 1’s in the input vectors. The Level View indicates very well the relative size of each partition and the level or distance from the primary inputs. The relative size of the partitions are preserved by extending the width of each partition. The width of each partition is computed by summing the number of gates in the individual sums of products. Figure 6.5 provides a Bar View of the Booth Multiplier, where partition size and location are ignored. Each partition is placed on the horizontal axis and it’s switching activity for the given input block is indicated by color intensities. The color mappings are specified in Figure 6.1. The Bar View of the Booth multiplier is consistent with the Level View, displaying low circuit activities for the specified 136 input vectors. The switching activity for each partition of the level view matches the activity of the corresponding partition in bar view, but the level or distance from the primary inputs is considered. 1.0 0.9 0.8 0.7 Ewe) 0-6 E I 0.5 0.4 0.3 - - ' 0.2 I 0.I I 50 100 150 200 250 300 Partition Figure 6.5. Booth multiplier(4—bit) bar view. 6.2 Future Work and Summary Currently, the Activity Viewer tool is a Microsoft Windows based application program providing only two views. Future improvements may include the addition of new views that provide the display of total circuit switching activity and power, along with zoom in/out capabilities to isolate important areas of large circuits which have lost their viewing definition. Most importantly, the automation of the process of transforming the original Circuit’s file format to the final partition / activity file format can be considered. The Activity Viewer tool is a new visualization tool which provides color-based, partitioned-level viewing of switching activity in CMOS circuits. The tool provides 137 the circuit designer with switching activity profiling capability which illuminates the potential power-hungry hot-spots of the design. 138 CHAPTER 7 Conclusions A methodology for behavioral-level switching activity estimation in CMOS circuits has been presented. This chapter summarizes the contributions and future work of this research. 7.1 Contributions The research presented in this dissertation addresses the problem of computing switch- ing activity at the behavioral-level for CMOS circuits. The accurate computation of switching activity provides improved dynamic power estimates. An extensive liter- ature review was given detailing the background and importance of the research. Additionally, an overview of behavioral and structural representations of switching functions was presented. A formal problem definition along with its solution and associated constraints/restrictions was given. The problem’s solution was decom- posed into a set of tasks. Each task was described by an objective along with its implemented approach. The contributions of this research are the following: o A new decision diagram, the Connective Binary Decision Diagram (CBDD) has been introduced. The CBDD is a graph-based behavioral and structural rep- 139 l: " resentation for digital circuits. The CBDD advantages include maintenance of a Circuit’s structural and connective properties and linear growth. When com- pared to traditional BDD implementations, the CBDD demonstrated an average reduction in size of more than one order of magnitude for certain benchmark circuits. A set of procedures and a systematic approach for visiting each node of the CBDD’s graph and generating a disjoint Boolean equation in infix notation was developed. The procedures were verified using a series of test circuits which generated predictable CBDDs that have known disjoint Boolean equations. A technique for computing signal probability was developed. The techinque results in a signal probability approximation in which estimation error decreases as a depth-accuracy parameter is increased. A new methodology was developed for computing the switching activity and dynamic power dissipation of behavioral—level digital circuit designs described in VHDL. The implementation, called BLAPE, transforms a VHDL specification into a set of Boolean expressions. Then, the application of structure supports the computation of realistic switching activity and dynamic power estimates. A series of benchmark experiments were performed to validate the methodology and highlight the accuracy and performance of the BLAPE implementation as compared to the Berkeley SIS power estimator. Experiments were performed using circuits selected from an arithmetic circuit suite and the ISCAS-85, nonre- dundant ISCAS-85, and MCNC Synth89 benchmark suites. A new visualization tool for profiling/viewing the powephungry activity hot- spots within a circuit was developed. This tool is beneficial to circuit designers 140 because it identifies the location and switching activity of circuit nodes using various graphical views. 7 .2 Future Work The BLAPE implementation provides improved accuracy for high-level switching ac- tivity and power estimation. Future improvements and considerations in the following areas will improve BLAPE’S performance, accuracy, and usefulness. o The Connective Binary Decision Diagram (CBDD) should be more intelligent. The CBDD should be able to recognize redundant logic and not replicate this logic each time it is encountered. This improvement will produce a more com- pact graph, yielding smaller post-ordered Boolean equations and improving the performance. 0 BLAPE should provide support for sequential circuits. The modeling of sequen- tial circuits require additional considerations and modifications to the CBDD. This improvement will increase the usefulness of the BLAPE implementation. 0 Unit / Real delay models are needed to give more realistic power estimates. These delay models will yield longer program simulation runtimes, but offer switching activity and power estimates which are consistent with the results of circuit-level simulators. 0 Improved memory management for the BLAPE implementation is necessary. More efficient use of memory resources will allow the analysis of larger circuit designs. 141 7 .3 Impact of Contributions The research presented in this dissertation provides a solution to the problem of accu- rately estimating switching activity and dynamic power for high-level circuit designs described by VHDL behavioral specifications. The technique provides a significant improvement to existing behavioral-level activity and power estimators. The esti- mates generated by existing high—level techniques contain 10 to 12 percent error on average, with some estimates containing as much as 80 percent error. The application of the proposed technique, the BLAPE program, allows user-selectable accuracy of switching activity estimation, at the expense of time and memory resources. During a benchmark comparison containing 49 circuits, with the Berkeley SIS power esti- mator, the BLAPE program power estimates contained 1.22% average error for a depth-accuracy of k = 1 and 0.21% average error for k = 7. The BLAPE program combined with the use of the Activity Viewer tool provides an effective means of activity/power profiling at the behavioral-level. The combi- nation of the two tools identifies high-activity and power-hungry hot-spots within a circuit design. The ability to estimate switching activity and dynamic power at the behavioral-level will improve the quality of integrated circuits by providing early warning of power problems and reducing design time and cost. 142 APPENDICES APPENDIX A Definitions and Formulas Definition A.1 Glitch Activity: The portion of the switching activity due multiple gate output transitions in response to an input transition. Definition A.2 Reconvergent fanout nodes: Circuit nodes that receive inputs from two paths that fanout from some other circuit node. Definition A.3 Non-decomposable FSM: An FSM is said to be non-decomposable when every state of the machine is reachable from every other state in a finite number of cycles. Definition A.4 Signal Probability P,(x): The probability that the signal at node x evaluates to logic one [53]. Definition A.5 Transition Probability PT(x): The probability that the logic signal at node x experiences a change in its logic state [53]. Switching Activity Derivation: E.w(g) = Po:1(9)+P1:o(g) = P.‘(‘9’) ° P.‘“(9) + P.‘(g) - Pl“ (‘9') = PM) - Ps(g) + 11(9) - Ps('g‘) = 2-P.(g)-Ps(’g‘) = 2-P.(9)'(1-Ps(9)) 143 Definition A.6 Spatial Correlation: Two logic signals x and y are spatially cor- related if in the same time slot the state of one signal depends on the state of the other. Definition A.7 Temporal Correlation: The dependence of the current state of a signal on its previous logic value. Definition A.8 Boolean difference (3%) = y|x=1 GB ylxzo = y(x) 63y(?r‘). Definition A.9 Equilibrium probability: If x(t) is a logic signal (switching between 0 and I), then its equilibrium probability is defined as P(x) = lianoo % fig/22 x(t)dt Definition A.10 Transition Density: The average number of transitions per unit time experienced by a circuit node [53]. Remark: Then density provides an effective measure of switching activity in logic circuits in the presence of any delay model. Remark: If a logic signal x(t) makes n$(T) transitions in a time interval of length T, then the transition density of x(t) is defined as D(x) := limT_,oo 353.11. Remark: If all correlations are ignored, so that the input signals are independent of one another in both space and time then the signals are spatio-temporal independent, and the transition density is given by D(y) = i=1 P (59%) - D(xi). Remark: The relationship between transition density and transition probability is given by D(x) 2 5.1795). Definition A.1l Articulation Point: A node whose removal disconnects the graph. Definition A.12 The Boolean operators ’+’, ’*’, ’-’, represent the logic disjunction (0R), conjunction (AND), and inversion (NOT), respectively. By default, ’*’ may be denoted by a space. 144 Definition A.13 Primary inputs are Boolean variables of a circuit that depend on no other variables. Definition A.14 Primary outputs are Boolean variables on which no other vari— able(s) depends. Definition A.l5 The depth of a circuit is the maximum number of nodes between any primary input and any primary output. Definition A.16 F lattening or Collapsing is the action of reducing the circuit ’3 depth to a desired level. 145 APPENDIX B Application of BLAPE to 4—bit Booth Multiplier This appendix presents a stage by stage view of the BLAPE implementation when applied to a 4-bit Booth multiplier design. The initial input specification is in VHDL, using the behavioral model for some components, and the structural VHDL model for the high-level design. The representations for each intermediate stage are given. B.1 High-Level VHDL Specification Input ENTITY cass IS port (pin, cin, ain, h, d : IN BIT; pout, cout : OUT BIT); END cass; ARCHITECTURE cass_arch 0F cass IS BEGIN pout <= pin XOR (ain AND h) XOR (cin AND h); cout <= ((pin XOR d) AND ((ain 0R cin) 0R (ain AND cin))); END cass_arch; ENTITY ctrl IS port (x2, x1 : IN BIT; h, d : OUT BIT); END ctrl; 146 ARCHITECTURE ctr1_arch OF ctrl IS BEGIN h <= x2 XOR x1; d <= x2 AND ( NOT(x1) ); END ctrl_arch; ENTITY booth4 IS port (a, x : IN BIT_VECTOR(3 DOWNTO O); p : OUT BIT_VECTOR(7 DOHNTO 0)); END booth4; ARCHITECTURE network OF booth4 IS COMPONENT ctrl port (12, 11 : IN BIT; h, d : OUT BIT); END COMPONENT; COMPONENT cass port (pin, cin, ain, h, d : IN BIT; pout, cout : OUT BIT); END COMPONENT; SIGNAL p : BIT_VECTOR(34 DOWNTO 0); SIGNAL c : BIT_VECTOR(34 DOHNTO 0); SIGNAL h : BIT_VECTOR(4 DOWNTO 0); SIGNAL d : BIT_VECTOR(4 DONNTO 0); SIGNAL zero : BIT; BEGIN zero <= ’0’; zero, 1(3), h(4), d(4) x(3), x(2), h(3), d(3) x(2), 1(1), h(2), d(2) 1(1), 1(0), h(l), d(l) x(O), zero, h(O), d(O) ctr1_4 : ctrl port MAP ctr1_3 : ctrl port HAP ctr1_2 : ctrl port MAP ctr1_1 : ctrl port MAP ctr1_0 : ctrl port MAP .0 A A A A A V V V V V no u cass_0 : cass port MAP (zero, zero, a(O), h(4), d(4), p(0), c(O) cass_1 : cass port HAP (zero, c(O), a(1), h(4), d(4), p(l), c(1) cass_2 : cass port HAP (zero, c(1), a(2), h(4), d(4), p(2), c(2) cass_3 : cass port MAP (zero, c(2), a(3), h(4), d(4), p(3), c(3) cass_4 : cass port MAP (zero, c(3), zero, h(4), d(4), p(4), c(4) cass_5 : cass port MAP (zero, zero, a(O), h(3), d(3), p(5), c(5) cass_6 : cass port MAP (p(0), c(5), a(1), h(3), d(3), p(6), c(6) cass_7 : cass port MAP (p(l), c(6), a(2), h(3), d(3), p(7), c(7) 147 VVVVVVVV cass_8 cass_9 cass_31 : cass : cass cass_10 : cass_11 : cass_12 : cass-13 : cass_14 : cass_15 : cass_16 : cass_17 : cass_18 : cass_19 : cass_20 : cass_21 : cass_22 : cass_23 : cass_24 : cass_25 : cass_26 : cass_27 : cass_28 : cass_29 : cass_30 : : cass cass_32 : cass_33 : cass_34 : C888 C888 C888 C888 C888 C888 C888 C888 0888 C888 C888 C888 C888 C888 C888 C888 C888 C888 C888 C888 C888 C888 C888 C888 port port port port port port port port port port port port port port port port port port port port port port port port port port port MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP MAP p <= p(33 DOWNTO 26); END network; its inputs. (p(2), (p(3). (p(4), (zero, (p(5). (p(6). (p(7). (p(8). (p(9). (p(10), C(16), zero, h(2), d(2). p(17), C(17) ); c(7). C(8), C(9), zero, C(11): C(12): C(13): C(14): C(15): a(3). h(3), d(3), p(8). C(8) ); zero, h(3), d(3), p(9), C(9) )3 zero, h(3), d(3), p(10), C(10) ); a(O), h(2), d(2), p(11), C(11) ); a(1), h(2), d(2), p(12). a(2). h(2). d(2). p(13), a(3), h(2). d(2), p(14), zero, h(2), d(2). p(15). zero, h(2), d(2). p(16). C(12) )3 C(13) )3 C(14) ); C(15) ); C(16) ); (zero, zero, a(O), h(1), d(l), p(18), C(18) )3 (p(11). (p(12), (p(13). (p(l4): (p(15)9 (p(16). (p(17). C(18) C(19) C(20) C(21) C(22) C(23) C(24) . a(1), . a(2). . a(3). , zero, , zero, , zero, , zero, h(1), h(1), h(1), h(1), h(1), h(1), h(1), d(1), d(1), d(1). d(l), d(1). d(1), d(l), p(19). P(20): p(21): p(22). P(23)’ p(24). p(25): C(19) ) C(20) ) C(21) ) C(22) ) C(23) ) C(24) ) C(25) ) (zero, zero, a(O), h(O), d(O), p(26), C(26) ); (p(18). (p(19): (p(20)9 (p(21). (p(22), (p(23): (p(24). (P(25): C(26) C(27) C(28) C(29) C(30) C(31) C(32) C(33) 148 . a(1), . a(2). . a(3). , zero, , zero, , zero, , zero, , zero, h(O), h(O), h(O). h(O), h(O), h(O), h(O), h(O), B.2 Boolean Equation Generation d(O), d(O). d(O), d(O). d(O), d(O), d(O), d(O), p(27). p(28). P(29): p(30)9 p(31). p(32). p(33): P(34): C(27) ) C(28) ) C(29) ) C(30) ) C(31) ) C(32) ) C(33) ) C(34) ) The Booth multiplier is an array multiplier, composed of CASS and CTRL units. The CASS unit is responsible for performing arithmetic/shifts operations, based on The CTRL unit is responsible for producing the input signals to the CASS units. After compiling the each of the independent units (CASS, CTRL), the multiplier (Booth4) is compiled. The report file generated by the Altera MAX+PLUS II compiler contains input / output information as well as the Boolean equations given below. ** INPUTS ** Pin LC 44 - - 43 - - 42 - - 10 - - 84 - - 2 - .. 1 _ .- 11 - - tat OUTPUTS u: Pin LC 62 - - 29 - - 18 - - 17 - - 19 - - 25 - - 24 - - 23 - - ** EQUATIONS ** a0 : INPUT; a1 : INPUT; a2 : INPUT; a3 : INPUT; x0 : INPUT; xi : INPUT; x2 : INPUT; x3 : INPUT; -- Node name is ’pO’ -- Equation name is ’p0’, _LCl_C14; -- Node name is ’p1’ “- Equation name is ’pl’, type is output _L05_C6; ‘- Node name is ’p2’ Row www>>>aa Col Primitive Col Primitive INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT OUTPUT OUTPUT OUTPUT OUTPUT OUTPUT OUTPUT OUTPUT OUTPUT type is output 149 0 0000000 0 0000000 0 0000000 1 HHHHHHH 00000000 00000000 11 11 10 9 14 8 13 10 INP FBK OUT FBK 00000000 INP FBK OUT FBK Name a0 a1 a2 a3 x0 x1 x2 :3 Name p0 p1 p2 p3 p4 p5 p6 p7 -- Equation name is ’p2’, type is output p2 = _LC5_A1; -- Node name is ’p3’ -- Equation name is ’p3’, type is output p3 = _LC3_A1; -- Node name is ’p4’ -- Equation name is ’p4’, type is output p4 = _LC7_A1; -- Node name is ’p5’ -- Equation name is ’p5’, type is output p5 = _LC7_B2; -- Node name is ’p6’ -- Equation name is ’p6’, type is output p6 = _LC5-B2; -- Node name is ’p7’ -- Equation name is ’p7’, type is output p7 = _LC3_B2; -- Node name is ’lcass:cass_6l:14’ -- Equation name is ’_LC4_A10’, type is buried _LC4_A10 = LCELL( -EQOOI); _EQOOI = a1 & x2 & !13 # a0 & x2 & x3 # a1 & !x2 & x3 # a0 & a1 & x3; -- Node name is ’lcasszcass_6|:23’ -- Equation name is ’_LC4_CS’, type is buried _LC4_C3 LCELL( _EQOO2); _EQOO2 = !a0 & a1 & !x2 & x3 # a0 & a1 & x2 & x3; -- Node name is ’lcasszcass_7l:12’ -- Equation name is ’_LC5_C3’, type is buried _LC5_C3 = LCELL( _EQOO3); _EQOOS = a1 & x2 & 13 # a1 & !a2 & 13 # a2 & x2 & !13 # !a1 & a2 & !x2 & x3; -- Node name is ’lcasszcass_7l:14’ -- Equation name is ’-LC1_C3’, type is buried _LC1_C3 = LCELL( _EQOO4); 150 _EQOO4 = _LC4_C3 & !_LC5_C3 & !x2 & # _LC4_C3 & !_LC5_C3 & x2 & # !_LC4_C3 & _LC5_C3 # _LC5_C3 & 12 & x3 # _LC5_C3 & !x2 & !x3; -- Node name is ’lcass:cass_7l:22’ -- Equation name is ’_LC6_CS’, type is _L06_03 = LCELL( _EQOOS); _EQOOS - _LC4_C3 # a2; -- Node name is ’lcasszcass_7|:23’ —- Equation name is ’_LC3_C3’, type is _LC3_C3 3 LCELL( _EQOOG); _EQOOG = a1 & _LC6_C3 & x2 & x3 # !a1 & _LC6_C3 & !x2 & x3; -- Node name is ’lcass:cass_8|:12’ -- Equation name is ’_LC5_A4’, type is _LC5_A4 - LCELL( -E0007); _E0007 8 a2 & x2 & 13 # a2 & !a3 & 13 # a3 A :2 & !13 8 !a2 a a3 & !x2 & x3; -- Node name is ’Icass:Cass_8|:13’ -- Equation name is ’_LC2_CS’, type is _LC2_C3 I LCELL( _EQOO8); _EQOOB 3 !a1 & _LC6_C3 & !12 k 13; -- Node name is ’lcasszcass_8|:22’ -- Equation name is ’_LC2_09’, type is _LC2_C9 = LCELL( _EQOOQ)3 -EQOOQ = _LC3_C3 # a3; -- Node name is ’Icasszcass-9l:13’ -- Equation name is ’_LC3_C9’, type is _LC3_C9 8 LCELL( -EQOlO); _EQOIO 3 !a2 & _LC2_C9 & !x2 & x3; -- Node name is ’lcasszcass_9l:14’ -- Equation name is ’_LC1_09’, type is _LC1_C9 = LCELL( _EQOII); _EQOIl 8 _LC3_C9 & !x3 # !a3 & _LC3_09 x3 !x3 buried buried buried buried buried buried buried 151 # a3 & !_LC3_C9 & x3; —- Node name is ’lcasszcass_12|:12’ -- Equation name is ’_LC7_A10’, type is buried _LC7_A10 LCELL( _EQOIZ); _EQOI2 = a0 & !_LCl_A12 & _LC5_A10 # a0 & !a1 & _LC5_A10 # a1 & _LCI_A12 A !_LC5_A10 # !aO & a1 & _LCI_A12; -- Node name is ’lcass:cass_12l:14’ -- Equation name is ’_L03_A10’, type is buried _LC3_A10 = LCELL( _E0013); _EQOl3 = !_LCl_A11 & _LC7_A10 # !a0 & _LC7_A10 # !_LC1_A12 & _LC7_A10 # a0 & _LC1_A11 & _LC1_A12 & !_LC7_A10; -- Node name is ’lcass:cass_12l:23’ -- Equation name is ’_LC6_A10’, type is buried _LC6_A10 = LCELL( -E0014); _EQOl4 = a0 & a1 & !_LC1_A11 & _LC5_A10 # a0 & _LC1_A11 & !_LC5_A10 # a1 & _LCl_A11 & !_LC5_A10 # !aO & a1 & _LC1_A11; -- Node name is ’Icasszcass_13l:14’ -- Equation name is ’_LC2_A10’, type is buried _LC2_A10 = LCELL( _EQOIS); _EQO15 = !_LCI_A12 & _LC4_A10 # !a2 & _LC4_A10 & !_LC6_A10 # a2 A -LC1_A12 & !_LC4_A10 & !_LC6_A10 # a2 & -LC4_A10 & _LC6_A10 # !a2 & _LCl_A12 & !_LC4_A10 & _LC6_A10; -- Node name is ’lcasszcass-13l:23’ -- Equation name is ’_LC1_A10’, type is buried _LC1_A10 = LCELL( _EQOIS); _EQ016 = !_LC1_A11 & _LC4_A10 & _LC6_A10 # a2 & !_LC1_A11 & _LC4_A10 # _LC1_A11 & !_LC4_A10 & _LC6_A10 # a2 & _LC1_A11 & !_LC4_A10; -- Node name is ’Icass:cass_14|:14’ -- Equation name is ’_LC1_A8’, type is buried _LC1_A8 = LCELL( -30017); _EQOI7 = a3 & _LC1_A10 & _LC1_C3 # !a3 & _LC1_A10 & _LC1_A12 & !_LC1_C3 152 # !_LC1_A12 A _LC1_C3 # !a3 A !_LCI_A10 A _LC1_C3 # a3 A !_LC1_A10 A _LC1_A12 A !_LC1_C3; -- Node name is ’lcass:cass_14l:23’ -- Equation name is ’_LC4_A4’, type is buried _LC4_A4 8 LCELL( _EQOIS); _EQ018 = _LC1_A10 A !_LC1_A11 A _LC1_C3 # a3 A !_LC1_A11 A _LC1_C3 # _LC1_A10 A _LC1_A11 A !_LC1_C3 # a3 A _LC1_A11 A !_LC1-C3; -- Node name is ’lcasszcass_15l:14’ -- Equation name is ’_LC2_A4’, type is buried _LC2_A4 LCELL( _E0019); _EQOIQ _LC2_C3 A !_LC4_A4 A !_LC5-A4 !_LC2_C3 A !_LC4_A4 A _LCS_A4 !_LCI_A12 A _LC2_C3 A !_LCS_A4 !_LCI_A12 A !_LC2_03 A _LC5_A4 _LC1_A12 A _LC2_C3 A _LC4_A4 A _LC5_A4 _LC1_A12 A !_LC2_C3 A _LC4_A4 A !_LC5_A4; %fififi‘tl -- Node name is ’lcass:cass_15|:19’ -- Equation name is ’_L03_A4’, type is buried _LC3_A4 = LCELL( _EQ020); _EQO20 = _LC2_C3 A !_LC5_A4 A !12 # !_LC2_C3 A _LC5_A4 A 112 # _LC2_CB A !_LCS_A4 A 11 # !_LC2-C3 A _LC5_A4 A 11 # _LC2_C3 A _LC5_A4 A !xl A 12 # !_LC2_C3 A !_LC5_A4 A !11 A 12; -- Node name is ’lcass:cass_16l:13’ -- Equation name is ’_LC1_A4’, type is buried _LC1_A4 = LCELL( _EQO21); _EQOZI = _LC3_A4 A _LC4_A4 A !11 A x2 # _LC3_A4 A _LC4_A4 A :1 A !x2; -- Node name is ’lcasszcass_19|:12’ -- Equation name is ’_L05_A5’, type is buried _L05_A5 = LCELL( _EQO22); _E0022 = a0 A _L01_A12 A !_LCI_A13 # a0 A !a1 A _LCl_A12 # a1 A !-LCI_A12 A _LC1_A13 # !a0 A a1 A _LC1_A13; -- Node name is ’lcass:cass_19l:14’ -- Equation name is ’_LC4_A5’, type is buried 153 _LC4_A5 _EQ023 LCELL( _EQO23); !_LC1_A13 A _LC5_A5 !_LC1_C7 A _LC5_A5 !aO A _LCS_A5 a0 A _LC1_A13 A _LC1_C7 A !_LC5_A5; %fifill -- Node name is ’lcasszcass_19|:23’ -- Equation name is ’_LC6_A5’, type is buried _LC6_A5 = LCELL( _EQO24); _EQO24 = a0 A a1 A _LCl_A12 A !_LC1_C7 # a0 A !_LC1_A12 A _LC1_C7 # al A !_LC1_A12 A _LCl_C7 # !aO A a1 A _LC1_CT; -- Node name is ’lcasszcass_20l:14’ -- Equation name is ’_LC3_A5’, type is buried _LC3_A5 = LCELL( _EQO25); _EQO25 = !_LC1_A13 A _LC3_A10 # !a2 A _LC3_A10 A !_L06_A5 # a2 A _LCI_A13 A !_LC3_A10 A !_LC6_A5 # a2 A _LC3_A10 A _L06_A5 # !a2 A _LC1_A13 A !_LC3_A10 A _LC6_A5; -- Node name is ’lcass:Cass_20I:23’ -- Equation name is ’_LC7_A5’, type is buried _LC7_A5 = LCELL( _EQO26); _EQO26 a !_LC1_C7 A -LC3_A10 A -L06_A5 A a2 A !_LCl_C7 A _LC3-A10 # _LCI_C7 A !_LC3_A10 A _LC6_A5 # a2 A -LCl_C7 A !_LC3_A10; -- Node name is ’Icass:cass_21l:12’ -- Equation name is ’_LC2_A1’, type is buried _LC2_A1 = LCELL( _EQO27); _EQO27 = !a3 A _LC2_A10 # !_LC1_A13 A _LC2_A10 # a3 A _LC1_A13 A !_LC2_A10; -- Node name is ’lcasszcass_21|:13’ -- Equation name is ’_LC2_A5’, type is buried _LC2_A5 = LCELL( _EQO28); _EQO28 = _LC1_A13 A _LC7_A5; -- Node name is ’lcass:cass_21l:23’ -- Equation name is ’_LCI_A5’, type is buried _LC1_A5 = LCELL( _EQO29); _EQO29 = !_LC1_C7 A _LC2_A10 A _LC7_A5 154 # a3 A !_LC1_C7 A _LC2_A10 # _LC1_C7 A !_LC2_A10 A _LC7_A5 # a3 A _LC1_C7 A !_LC2_A10; -- Node name is ’lcasszcass_22|:14’ -- Equation name is ’_LC8_B2’, type is buried _LC8_B2 ' LCELL( _EQO30); _EQO3O = !_LC1_A5 A -LC1_A8 # _LC1_A8 A !_LC1_A13 # _LC1_A5 A !_LC1_A8 A _LC1_A13; -- Node name is ’lcass:cass_22l:23’ -- Equation name is ’_L01_B2’, type is buried _LC1_B2 8 LCELL( _EQOSi); _EQO31 - _LC1_A5 A _LC1_A8 A !_LC1_C7 # _LC1_A5 A !_LC1_A8 A _LC1_C7; -- Node name is ’lcasszcass_23|:14’ -- Equation name is ’-LC2_B2’, type is buried _LC2_B2 8 LCELL( _EQO32); _EQ032 8 !_LCl_B2 A _LC2_A4 # !_LC1_A13 A _LC2_A4 # _LC1_A13 A _LCl_B2 A !_LC2-A4; -- Node name is ’lcasszcass_24l:13’ -- Equation name is ’_LC4-B2’, type is buried -LC4_B2 - LCELL( -EQO33); -E0033 8 _LC1_A13 A _LC1_B2 A !_LCI_C7 A -LC2_A4 # _LC1-A13 A _L01-B2 A _LC1_C7 A !_LC2_A4; -- Node name is ’lcasszcass_26l:11’ -- Equation name is ’_LCl_Cl4’, type is buried __LCS-06 = LCELL( -EQO35); _E0035 8 !a0 A al A 10 # a0 A 110 A :1 # a1 A :0 A !xl # a0 A !a1 A 11; -- Node name is ’lcass:cass_27l:23’ -- Equation name is ’_LC4_A1’, type is buried -LC4_A1 = LCELL( -E0036); _EQO36 = a0 A x0 A :1 # a0 A a1 A 11 # a1 A 10 A :1 # !aO A a1 A 10; -- Node name is ’Icass:cass_28|:14’ 155 -- Equation name is ’_LCS_A1’, type is buried _LC5_A1 = LCELL( _EQO37); _EQO37 = _LC4_A5 A !xO # !a2 A !_LC4_A1 A _LC4_A5 # a2 A !_LC4_A1 A !_LC4_A5 A x0 # a2 A _LC4_A1 A _LC4_A5 # !a2 A _LC4_A1 A !_LC4_A5 A x0; -- Node name is ’lcasszcass_28l:23’ -- Equation name is ’_L08_A1’, type is buried _LC8_A1 = LCELL( -E0038); _EQOBB a2 A _LC4_A5 A !xO a2 A !_LC4_A5 A x0 _LC4_A1 A _LC4_A5 A !xO _LC4_A1 A !_LC4_A5 A 10; #%#II -- Node name is ’lcasszcass_29I:14’ -- Equation name is ’_LC3_A1’, type is buried _LC3_A1 = LCELL( _EQO39); _EQO39 = _LC3_A5 A !xO # !a3 A _LC3_A5 A !_LC8_A1 # a3 A !_LC3_A5 A !_LC8_A1 A x0 # a3 A _LC3_A5 A -LC8_A1 # !a3 A !_LC3_A5 A _LC8_A1 A x0; -- Node name is ’lcasszcass_29l:23’ -- Equation name is ’_LC6_A1’, type is buried _L06_A1 - LCELL( _EQO40); _EQO40 = a3 A _L03-A5 A !xO A -LC3_A5 A _LC8-A1 A !xO # a3 A !_L03_A5 A 10 # !_LCS_A5 A _LC8_A1 A 10; -- Node name is ’lcasszcass_30l:14’ -- Equation name is ’_LC7_A1’, type is buried _LC7_A1 = LCELL( _EQO41); _EQO41 = !_LC2_A1 A -LC2_A5 A !_L06_A1 # _LC2_A1 A !_LC2_A5 A !_LC6_A1 # !_LC2_A1 A _LC2_A5 A !x0 # _LC2_A1 A !_LC2_A5 A !xO # _LC2_A1 A _LC2_A5 A _LC6_A1 A :0 # !_LC2_A1 A !_LC2_A5 A _L06_A1 A 10; -- Node name is ’lcass:cass_30l:23’ -- Equation name is ’_LCl_A1’, type is buried _LC1_A1 = LCELL( -EQO42); _EQO42 = !_LC2_A1 A _LC2_A5 A _LC6_A1 A !xO 156 # _L02_A1 A !_LC2_A5 A _LC6_A1 A !xO # _LC2_A1 A _LC2_A5 A _LC6_A1 A x0 # !_LC2_A1 A !_LC2_A5 A _LC6_A1 A x0; -- Node name is ’lcasszcass_31|:14’ -- Equation name is ’_LC7_B2’, type is buried -LC7,B2 = LCELL( _EQO43); _EQO43 = !_LC1_A1 A _LC8_B2 # _LC8_B2 A !xO # _LCI_A1 A !_LC8_B2 A x0; -- Node name is ’lcasszcass_32|:14’ -- Equation name is ’_LCS_B2’, type is buried _LC5_B2 = LCELL( _EQO44); -EQ044 = _LC2_B2 A _LC8_B2 # !_LC1_A1 A _LC2_B2 # _LC2_B2 A !xO # _LC1_A1 A !_LC2_B2 A !_LC8_B2 A x0; -- Node name is ’lcass:cass_33l:13’ -- Equation name is ’_L06_B2’, type is buried _LC6_B2 = LCELL( _EQO45); _EQO45 = _LCi_A1 A !_LC2_B2 A !_LCB_B2 A x0; -- Node name is ’lcass:cass_33|:14’ -- Equation name is ’_LC3_B2’, type is buried _LC3_B2 = LCELL( _EQO46); _EQ046 = _LCI_A4 A !_LC1-C9 A _LC4_BZ A _LC6_B2 # !_LC1_A4 A _LCl_C9 A _LC4_82 A _LC6_32 # _LCl_A4 A _LCi_C9 A !_LC4_B2 A _LC6_BZ # !_LC1_A4 A !_LC1_C9 A !_LC4_B2 A _L06_B2 # -LC1_A4 A _LCl_C9 A _LC4_B2 A !_LC6_B2 # !_LCI_A4 A !_LC1_C9 A _LC4_B2 A !_LC6_B2 A _LC1_A4 A !_LCI_C9 A !_LC4_B2 A !_LC6_B2 # !_LC1_A4 A _LCl_C9 A !_LC4_B2 A !_LC6_B2; -- Node name is ’lctrlzctr1_1l:5’ -- Equation name is ’_L01_A13’, type is buried _L01_A13 = LCELL( -EQO47); _EQO47 = !x0 A x1 # :0 A !x1; -- Node name is ’lctrlzctr1_1l:8’ -- Equation name is ’_LC1_C7’, type is buried _LCI_C7 = LCELL( _E0048); _EQO48 = !xO A x1; 157 -- Node name is ’ICtrlzctr1_2I:5’ -- Equation name is ’_LCl_A12’, type is buried _LCI_A12 = LCELL( _EQO49); _EQO49 = !x1 A x2 # 11 A 1x2; -— Node name is ’lCtr1:ctr1_2l:8’ -- Equation name is ’_LC1_A11’, type is buried _LCl_A11 = LCELL( _EQOSO); _EQOSO = !xl A x2; -- Node name is ’Ictrlzctr1_3l:5’ -- Equation name is ’_LCS_A10’, type is buried _L05_A10 = LCELL( _E0051); _E0051 = !x2 A 13 # :2 A !x3; ** COMPILATION SETTINGS A TIMES ** B.3 Generate Implicit Structural Specification Next, a BLAPE program extracts the Boolean equations from the report file and applies implicit structure. The resulting structural representation is a BLIF-formatted gate-level circuit Specification. .inputs a0 a1 a2 a3 10 11 x2 x3 .outputs p0 p1 p2 p3 p4 p5 p6 p7 .gate not a=-LC5_03 O=NOT_LCS_C3 .gate not a=_LC4_C3 O=NOT_LC4_C3 .gate not a=_LC3_CQ O=NOT_LC3_C9 .gate not a=_LCl_A12 O=NOT_L01_A12 .gate not a=_LC5_A10 O=NOT_LC5_A10 .gate not a=_LC1_A11 O=NOT_LC1_A11 .gate not a=_LC6_A10 O=NOT_L06_A10 .gate not a=-LC4_A10 O=NOT_LC4_A10 .gate not a=_LC1_C3 O=NOT_LC1_C3 .gate not a=_LCi_A10 O=NOT-LC1_A10 .gate not a=_LC4_A4 O=NOT_LC4_A4 .gate not a=_L05_A4 O=NOT_LC5_A4 .gate not a=_LC2_03 O=NOT_LC2_C3 158 I?“ _ IL .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate not not not not not not not not not not not not not not not not not not not not not not not not not not not not not not not not not buf1 buf1 buf1 buf1 buf1 buf1 buf1 buf1 and3 and3 and3 a=_LC1_A13 O=NOT_LCI_A13 a=_LC1_C7 a=_LC6_A5 a=_LC3_A10 O=NOT_LC3_A10 a=_LC2_A10 O=NOT_LC2_A10 a=_LCl_A5 a=_LC1_A8 a=_LC1_B2 a=_LC4_A1 a=_LC4_A5 a=_LC8_A1 a=_LC3_A5 a=_LC2_A1 a=_LC6-A1 a=_LC2_A5 a=_LC1_A1 a=_LC8_B2 a=_LC2_B2 a=_LC1-09 a=_LC1_A4 a=_LC4_B2 a=_LC6_B2 a=_LC2_A4 a=_LC5_A5 a=_LC7_A10 O=NOT_LC7_A10 a=a0 a=a1 a=a2 a=a3 a=xO a=x1 a=x2 a=13 O=NOT_LC1_C7 O=NOT_LC6_A5 O=NOT_LC1_A5 O=NOT_LC1-A8 O=NOT_LC1_BZ O=NOT_LC4_A1 O=NOT-LC4-A5 O=NOT_LC8_A1 O=NOT_LC3_A5 O=NOT_LC2_A1 O=NOT_LC6_A1 O=NOT_LC2_A5 O=NOT_LC1_A1 O=NOT_LC8_B2 O=NOT_LC2_B2 O=NOT_LCI-C9 O=NOT_LCI_A4 O=NOT_LC4_B2 O=NOT_LCG_B2 O=NOT_LC2_A4 O=NOT_LCS_A5 O=NOTa0 O=NOTa1 O=NOTa2 O=NOTa3 O=NOTxO =NOTx1 O=NOTx2 O=NOT13 a=_LC1_Cl4 O=p0 a=_LC5_C6 a=_LC5_A1 a=_LC3_A1 a=-LC7_A1 a=_LC7_B2 a=_LC5_B2 a=_LC3_82 a=a1 b=x2 a=a0 b=x2 O=p1 O=p2 O=p3 O=p4 O=p6 O=p7 c=NOTx3 O=t0 c=x3 0=t1 a=a1 b=NOTx2 c=x3 O=t2 .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate and3 a=a0 b=a1 c=x3 O=t3 or4 a=t0 b=t1 c=t2 d=t3 O=_LC4_A10 and4 a=NOTa0 b=a1 c=NOTx2 d=x3 O=t4 and4 a=a0 b=a1 c=x2 d=x3 O=t5 or2 a=t4 b=t5 O=_LC4_C3 and3 a=a1 b=x2 c=x3 O=t6 and3 a=a1 b=NOTa2 c=13 O=t7 and3 a=a2 b=x2 c=NOTx3 O=t8 and4 a=NOTa1 b=a2 c=NOTx2 d=x3 O=t9 or4 a=t6 b=t7 c=t8 d=t9 O=-L05_C3 and4 a=-LC4_C3 b=NOT-L05_C3 c=NOTx2 d=x3 O=t10 and4 a=_LC4_03 b=NOT_LC5_03 c=x2 d=NOTx3 O=t11 and2 a=NOT_LC4_03 b=_LC5_C3 O=t12 and3 a-_LC5_C3 b=x2 c=13 O=t13 and3 a-_LCS_C3 b=NOTx2 c=NOTx3 O=t14 or5 a=t10 b=t11 c=t12 d=t13 e=t14 O=_LC1_C3 or2 a=_LC4_03 b=a2 O=_LC6_CS and4 a=a1 b=_LCG_03 c=x2 d=13 O=t15 and4 a=NOTa1 b=-L06_03 c=NOTx2 d=x3 O=t16 or2 a=t15 b=t16 O=_L03_C3 and3 a=a2 b=x2 c=x3 O=t17 and3 a=a2 b=NOTa3 C813 O=t18 and3 a-a3 b-x2 c=NOTx3 O=t19 and4 a=NOTa2 b=a3 c=NOTx2 d=x3 O=t20 or4 a=t17 b=t18 c=t19 d=t20 O=_LC5_A4 and4 a-NOTal b=_LC6_C3 c=NOTx2 d-xa O=t21 buf1 a=t21 O=_LC2_03 or2 a=_L03_C3 b=a3 0=_LC2_C9 and4 a-NOTa2 b=_LC2_09 c=NOTx2 d=x3 O=t22 bufl a=t22 O=_LC3_09 and2 a=_LC3_C9 b=NOTxS O=t23 and2 a=NOTa3 b=_L03_C9 O=t24 and3 a=a3 b=NOT_LCS_C9 c=13 O=t25 or3 a=t23 b=t24 c=t25 O=_LC1_C9 and3 a=a0 b=NOT-LCl_A12 c=_LCS_A10 O=t26 and3 a=a0 b=NOTa1 c=_LC5_A10 O=t27 and3 a=a1 b=_LC1_A12 c=NOT_L05_A10 O=t28 and3 a=NOTa0 b=a1 c=_LC1_A12 O=t29 or4 a=t26 b=t27 c=t28 d=t29 O=_LC7_A10 and2 a=NOT_LCl_A11 b=_LC7_A10 0=t30 and2 a=NOTaO b=_LC7_A10 O=t31 and2 a=NOT_LC1_A12 b=_LC7_A10 O=t32 and4 a=a0 b=_LCl_A11 c=_LCI_A12 d=NOT_LC7_A10 O=t33 or4 a=t30 b=t31 c=t32 d=t33 O=_L03_A10 and4 a=a0 b=a1 c=NOT_L01_A11 d=_LC$_A10 O=t34 and3 a=a0 b=_LCl_A11 c=NOT_LC5_A10 O=t35 and3 a=a1 b=_LCl_A11 c=NOT_LC5_A10 O=t36 160 .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate and3 a=NOTa0 b=a1 c=_LCI_A11 O=t37 or4 a=t34 b=t35 c=t36 d=t3? O=_LC6_A10 and2 a=NOT_LC1_A12 b=_LC4_A10 O=t38 and3 a=NOTa2 b=_LC4_A10 c=NOT_LC6_A10 O=t39 and4 a=a2 b=_LCl_A12 c=NOT_LC4_A10 d=NOT_LC6_A10 O=t40 and3 a=a2 b=_LC4_A10 c=_L06_A10 O=t41 and4 a=NOTa2 b=_LC1_A12 c=NOT_LC4_A10 d=-L06_A10 O=t42 or5 a=t38 b=t39 c=t40 d=t41 e=t42 O=_LC2_A10 and3 a=NOT_LCl_A11 b=_LC4_A10 c=_LC6_A10 O=t43 and3 a=a2 b=NOT_LCl_A11 c=_LC4_A10 O=t44 and3 a=_LC1_A11 b=NOT_LC4_A10 c=_LC6_A10 O=t45 and3 a=a2 b=_LCl_A11 c=NOT_LC4-A10 O=t46 or4 a=t43 b=t44 c=t45 d=t46 O=_LC1_A10 and3 a=a3 b=_LCl_A10 c=_LCl_C3 O=t47 and4 a=NOTa3 b=_LCi_A10 c=_LCl-A12 d=NOT_LC1-03 O=t48 and2 a=NOT_LCl_A12 b=_LC1_C3 O=t49 and3 a=NOTa3 b=NOT_LCl_A10 c=_LC1_C3 O=t50 and4 a=a3 b=NOT_LC1-A10 c=_LCI_A12 d=NOT_LCl_C3 O=t51 or5 a=t47 b=t48 c=t49 d=t50 e=t51 O=_LC1_A8 and3 a=_LC1_A10 b=NOT_LCi_A11 c=_LC1_03 O=t52 and3 a=a3 b=NOT_LC1_A11 c=_LC1_C3 O=t53 and3 a=_LCI_A10 b=_LC1_A11 c=NOT_LC1_03 O=t54 and3 a=a3 b=-LC1_A11 c=NOT_L01_C3 O=t55 or4 a=t52 b=t53 c=t54 d=t55 O=_LC4_A4 and3 a=_LC2_C3 b=NOT_LC4_A4 c=NOT_LC5_A4 O=t56 and3 a=NOT_LC2_C3 b=NOT_LC4_A4 c=_L05_A4 0=t57 and3 a=NOT_LC1_A12 b=_LC2_C3 c=NOT_LC5_A4 O=t58 and3 a=NOT_LCl_A12 b=NOT_LC2_03 c=_LC5_A4 O=t59 and4 a=_LC1_A12 b=_LC2_C3 c=_LC4_A4 d=_L05_A4 O=t60 and4 a=_LCl_A12 b=NOT-LC2_C3 c=_LC4_A4 d=NOT_LC5_A4 O=t61 or6 a=t56 b=t57 c=t58 d=t59 e=t60 f=t61 O=_LC2_A4 and3 a--LC2_C3 b=NOT_L05_A4 c=NOTx2 O=t62 and3 a=NOT_LC2-C3 b=_LCS_A4 c=NOTx2 0=t63 and3 a=_LC2_C3 b=NOT_LC5_A4 c=x1 O=t64 and3 a=NOT_LC2-C3 b=_LC5_A4 c=xl O=t65 and4 a=_LC2_C3 b=_L05_A4 c=NOTxl d=x2 O=t66 and4 a=NOT_LC2_C3 b=NOT_L05_A4 c=NOTx1 d=x2 O=t67 or6 a=t62 b=t63 c=t64 d=t65 e=t66 f=t67 O=_LC3_A4 and4 a-_LCB_A4 b=_LC4_A4 c=NOTxl d=x2 O=t68 and4 a=_LC3_A4 b=_LC4_A4 c=x1 d=NOTx2 O=t69 or2 a=t68 b=t69 O=_LCI_A4 and3 a=a0 b=_LCl_A12 c=NOT_LCI_A13 O=t70 and3 a=a0 b=NOTa1 c=-LC1_A12 O=t71 and3 a=a1 b=NOT_LCl_A12 c=_L01_A13 O=t72 and3 a=NOTa0 b=a1 c=_L01_A13 O=t73 or4 a=t70 b=t71 c=t72 d=t73 O=_LCS_A5 and2 a=NOT-LCl-A13 b=_LCS_A5 O=t74 161 .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate and2 a=NOT_LC1_C7 b=_LC5_A5 O=t75 and2 a=NOTa0 b=_LCS_A5 O=t76 and4 a=a0 b=_LC1_A13 c=_LC1_C7 d=NOT_LC5_A5 O=t77 or4 a=t74 b=t75 c=t76 d=t77 O=_LC4_A5 and4 a=a0 b=a1 c=_LC1_A12 d=NOT_LC1_C7 O=t78 and3 a=a0 b=NOT_LCl_A12 c=_LCl_C7 O=t79 and3 a=a1 b=NOT_LCl_A12 c=_LC1_C7 O=t80 and3 a=NOTa0 b=a1 c=_LC1_C7 O=t81 or4 a=t78 b=t79 c=t80 d=t81 O=_LC6_A5 and2 a=NOT_LC1-A13 b=_LC3_A10 O=t82 and3 a=NOTa2 b=_LC3_A10 c=NOT_L06_A5 O=t83 and4 =a2 b=_LCl_A13 c=NOT_LC3_A10 d=NOT_LC6_A5 O=t84 and3 a=a2 b=_LC3_A10 c=_L06_A5 O=t85 and4 a=NOTa2 b=_LC1_A13 c=NOT_LC3_A10 d=_LCG_A5 0=t86 or5 a-t82 b=t83 c=t84 d=t85 e=t86 O=_LC3_A5 and3 a=NOT_LC1_C7 b--LC3_A10 c=_L06_A5 0=t87 and3 a=a2 b=NOT_LC1_C7 c=_LC3_A10 O=t88 and3 a=_LCl_C7 b=NOT_LC3-A10 c=_LC6_A5 O=t89 and3 a-a2 b=_LCl_C7 c=NOT_LC3_A10 O=t90 or4 a=t87 b=t88 c=t89 d=t90 O=-LC7_A5 and2 a=NOTa3 b=_LC2_A10 O=t91 and2 a=NOT_LC1_A13 b=_LC2_A10 O=t92 and3 a=a3 b=_LCl_A13 c=NOT_LC2_A10 O=t93 or3 a=t91 b-t92 c=t93 O=_LC2_A1 and2 a-_LC1_A13 b=-LC7_A5 O=t94 bufl a=t94 O=_LC2_A5 and3 a=NOT_LCl-C7 b-_LC2_A10 c=_LC7_A5 O=t95 and3 a=a3 b-NOT_L01_C7 c=-LC2_A10 O=t96 and3 a=_LCl-CT b=NOT-LC2_A10 c=_LC7_A5 O=t97 and3 a=a3 b-_LCI_C7 c=NOT_LC2_A10 O=t98 or4 a=t95 b=t96 c=t97 d=t98 O=_LC1_A5 and2 a=NOT_LCl-A5 b=_LC1_A8 O=t99 and2 a=_LC1_A8 b=NOT_LCI_A13 0=t100 and3 a=_LC1_A5 b=NOT_LC1_A8 c=_LC1_A13 O=t101 or3 a=t99 b=t100 C-t101 O=_L08_B2 and3 a=_LC1_A5 b=_LC1_A8 c=NOT_LC1_C7 O=t102 and3 a=_LC1_A5 b=NOT_LC1_A8 c=_LC1_C7 O=t103 or2 a=t102 b=t103 O=_LCI_B2 and2 a=NOT_LC1_B2 b=_LC2_A4 O=t104 and2 a=NOT-LCl_A13 b=_LC2_A4 O=t105 and3 a=_LC1_A13 b=_LCl_B2 c=NOT_LC2_A4 O=t106 or3 a=t104 b=t105 c=t106 O=_LC2_B2 and4 a=_LC1_A13 b=_LCl_B2 c=NOT_LCI_C7 d=_LC2_A4 O=t107 and4 a=-LC1_A13 b=_LC1_B2 c=_LC1_C7 d=NOT-LC2_A4 O=t108 or2 a=t107 b=t108 O=-LC4_B2 and2 a=a0 b=xO O=t109 buf1 a=t109 O=_LC1_Ci4 162 .L' .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate .gate and3 a=NOTa0 b=a1 c=x0 O=t110 and3 a=a0 b=NOTx0 c=xl O=t111 and3 a=a1 b=x0 c=NOTxl O=t112 and3 a=a0 b=NOTa1 c=xl O=t113 or4 a=t110 b=t111 c=t112 d=t113 O=_LC5_C6 and3 a=a0 b=xO c=x1 O=t114 and3 a=a0 b=a1 c=x1 O=t115 and3 a=a1 b=xO c=x1 O=t116 and3 a=NOTa0 b=a1 c=xo O=t117 or4 a=t114 b=t115 c=t116 d=t117 O=_LC4_A1 and2 a=_LC4_A5 b=NOTxO O=t118 and3 a=NOTa2 b=NOT_LC4_A1 c=_LC4_A5 O=t119 and4 a=a2 b=NOT-LC4_A1 c=NOT_LC4_A5 d=x0 O=t120 and3 a=a2 b=_LC4_A1 c=_LC4_A5 O=t121 and4 a=NOTa2 b=_LC4_A1 c=NOT_LC4_A5 d=xO O=t122 or5 a=t118 b=t119 c=t120 d=t121 e=t122 O=_LCS_A1 and3 a=a2 b=_LC4_A5 c=NOTxO O=t123 and3 a=a2 b=NOT_LC4_A5 c=xO 0=t124 and3 a=_LC4_A1 b=_LC4_A5 c=NOTxO O=t125 and3 a=_LC4_A1 b=NOT_LC4_A5 C310 O=t126 or4 a=t123 b=t124 c=t125 d=t126 O=_LC8_A1 and2 a=_LC3_A5 b=NOTxO O=t127 and3 a=NOTa3 b=_L03_A5 C8NOT_LC8_A1 O=t128 and4 a=a3 b=NOT_LC3_A5 c=NOT_LC8-A1 d=x0 O=t129 and3 a=a3 b=_LC3_A5 c=_LC8_A1 O=t130 and4 a=NOTa3 b=NOT_LCS_A5 c=_L08_A1 d=xO 0=t131 or5 a=t127 b=t128 c=t129 d=t130 e=t131 O=_LC3_A1 and3 a=a3 b=_LC3_A5 c=NOT10 O=t132 and3 a=_LC3_A5 b=_L08_A1 c=NOTxo O=t133 and3 a=a3 b=NOT_LC3-A5 c=x0 O=t134 and3 a=NOT_LC3-A5 b=_LCB_A1 c=xo O=t135 or4 a=t132 b=t133 c=t134 d=t135 O=_LC6_A1 and3 a=NOT_LC2_A1 b=_LC2_A5 c=NOT_LC6_A1 0=t136 and3 a=_LC2_A1 b=NOT_LC2_A5 c=NOT_LC6_A1 O=t137 and3 a=NOT-LC2_A1 b=_LC2_A5 c=NOTxO O=t138 and3 a=_LC2_A1 b=NOT_LC2-A5 c=NOTx0 O=t139 and4 a=_LC2_A1 b=_LC2_A5 c=_LC6_A1 d=xO O=t140 and4 a=NOT_LC2_A1 b=NOT_LC2_A5 c=_L06_A1 d=xO 0=t141 or6 a=t136 b=t137 c=t138 d=t139 e=t140 f=t141 O=_LC7_A1 and4 a=NOT_LC2_A1 b=_LC2_A5 c=_LC6_A1 d=NOTx0 O=t142 and4 a=_LC2_A1 b=NOT_LC2_A5 c=_LC6_A1 d=NOTx0 O=t143 and4 a=_LC2_A1 b=_LC2_A5 c=_LC6_A1 d=x0 O=t144 and4 a=NOT_LC2_A1 b=NOT_LC2_A5 c=_L06_A1 d=x0 O=t145 or4 a=t142 b=t143 c=t144 d=t145 O=_LC1_A1 and2 a=NOT_LC1_A1 b=_LCB_B2 O=t146 and2 a=-LC8_B2 b=NOTx0 O=t147 and3 a=_LC1_A1 b=NOT-LC8_B2 c=xo O=t148 163 .gate or3 a=t146 b=t147 c=t148 O=_LC7_B2 .gate and2 a=_LC2_B2 b=-L08_B2 O=t149 .gate and2 a=NOT_LCl_A1 b=_LC2_B2 O=t150 .gate and2 a=_LC2_B2 b=NOTxO O=t151 .gate and4 a=_LC1_A1 b=NOT_LC2_B2 c=NOT_LC8_B2 d=xO O=t152 .gate or4 a=t149 b=t150 c=t151 d=t152 O=_LC5_B2 .gate and4 a=_LCl_A1 b=NOT_LC2_B2 c=NOT_L08_B2 d=xO O=t153 .gate bufl a=t153 O=_LC6_B2 .gate and4 a=_LC1_A4 b=NOT_LC1_C9 c=-LC4_B2 d=_LC6_B2 O=t154 .gate and4 a=NOT_LCl_A4 b=_LCl_C9 c=,LC4-B2 d=_LC6_B2 O=t155 .gate and4 a=_LCi_A4 b=_LCl_C9 c=NOT_LC4_B2 d=_LC6_B2 O=t156 .gate and4 a=NOT_LC1_A4 b=NOT_LCI_C9 c=NOT_LC4_B2 d=_LC6_B2 O=t157 .gate and4 a=_LC1_A4 b=_LC1_C9 c=_LC4_B2 d=NOT_LC6_B2 O=t158 .gate and4 a=NOT_LC1_A4 b=NOT_LC1_C9 c=_LC4_B2 d=NOT_LC6_B2 O=t159 .gate and4 a=_LC1_A4 b=NOT_LC1_C9 c=NOT_LC4_B2 d=NOT_LC6_B2 O=t160 .gate and4 a=NOT-LC1_A4 b=_LC1_C9 c=NOT_LC4_B2 d=NOT_LC6_B2 O=t161 .gate or8 a=t154 b=t155 c=t156 d=t157 e=t158 f=t159 g=t160 h=t161 O=_LC3_B2 .gate and2 a=NOTx0 b=xl O=t162 .gate and2 a=10 b=NOTxl O=t163 .gate or2 a=t162 b=t163 O=_L01_A13 .gate and2 a=NOTxO b=xl O=t164 .gate bufl a=t164 O=_L01_C7 .gate and2 a=NOTx1 b=x2 O=t165 .gate and2 a=x1 b=NOTx2 O=t166 .gate or2 a=t165 b=t166 O=_LCI_A12 .gate and2 a=NOTx1 b=x2 O=t167 .gate bufl a=t167 O=_LC1_A11 .gate and2 a=NOTx2 b=13 O=t168 .gate and2 a=x2 b-NOTx3 0=t169 .gate or2 a=t168 b=t169 O-_LCS-A10 .end B.4 Network Levelization The levelization of the network is given below. ( 0) NODE= o LEVEL= 0 NAME: ROOT ( 1) NODE= 1 LEVEL= 0 NAME: a0 ( 2) NODE= 2 LEVEL= 0 NAME: a1 ( 3) NODE= 3 LEVEL= 0 NAME: a2 ( 4) NODE= 4 LEVEL= 0 NAME: a3 ( 5) NODE= 5 LEVEL= 0 NAME: 10 ( 6) NODE= 6 LEVEL= 0 NAME: x1 164 AAA/\AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/\AAAAAA 7) 8) 9) 10) 11) 12) 13) 14) 15) 16) 17) 18) 19) 20) 21) 22) 23) 24) 25) 26) 27) 28) 29) 30) 31) 32) 33) 34) 35) 36) 37) 38) 39) 40) 41) 42) 43) 44) 45) 46) 47) 48) 49) 50) 51) 52) 53) NODE= 7 NODE= 8 NODE= NODE8 NODE8 NODE8 NODE8 NODE8 NODE8 NODE= NODE8 NODE8 NODE8 NODE8 NODE8 86 NUDE-205 NODE8212 NODE8213 NODE8214 NODE8 63 NODE= 65 NODE8 68 NODE8 72 NODE8 73 NODE8 74 NODE8 87 NODE8 88 NODE8 89 NODE=206 NODE8207 NODE8208 NODE=209 NODE=210 NODE8215 NODE=271 NODE=272 NODE8274 NODE8276 NODE8277 NODE=279 NODE8281 NODE=282 NODE8 9 NODE= 67 NODE= 70 NODE= 75 NODE8 90 LEVEL= LEVEL= LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL= LEVEL= LEVEL8 LEVEL8 LEVEL= LEVEL8 LEVEL8 LEVEL= LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL= LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL= wwwwwNNMNNMMMMNNNNNMNMNNNNNMHHHHHHHHHHHHHHHHHOO NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: 12 x3 NOTaO NOTa1 NOTa2 NOTa3 NOTXO NOTxl NOTx2 NOTx3 t1 t3 t5 t6 t17 t109 t114 t115 t116 t0 t2 t4 t7 t8 t9 t18 t19 t20 -LC1_Cl4 t110 t111 t112 t113 t117 t162 t163 t164 t165 t166 t167 t168 t169 p0 _LC4_A10 _LC4_C3 _LC5_C3 -LC5_A4 165 54) 55) 56) 57) 58) 59) 60) 61) 62) 63) 64) 65) 66) 67) 68) 69) 70) 71) 72) 73) 74) 75) 76) 77) 78) 79) 80) 81) 82) 83) 84) 85) 86) 87) 88) 89) 90) 91) 92) 93) 94) 95) 96) 97) 98) 99) 100) NODE=211 NODE=216 NODE=273 NODE=275 NODE=278 NODE=280 NODE=283 NODE= 10 NODE= NODE8 NODE8 NODE= NODE= NODE= NODE8 NODE= NODE= NODE= NUDE8 NODE8 NODE= 82 NODE8101 NODE8103 NODE8113 NODE8155 NODE=157 NODE8167 NODE= 76 NODE= 77 NODE8 78 NODE= 83 NODE= 84 NODE8 91 NODE=100 NODE8102 NODE=110 NODE=111 NODE=112 NODE8115 NODE=122 NODE=124 NODE8154 NODE=156 NODE=164 NODE=165 NODE=166 NODE= 81 LEVEL= LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL= LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 0:010:0101mmmmmmmmmmmmmmmppphpppppppbphphbpppwwwwwww NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: _LC5_C6 _LC4_A1 -LC1_A13 _LC1_C7 -LC1_A12 _LC1_A11 _LC5_A10 p1 NOT_LCS-C3 NOT_LC4_C3 NOT_LCI_A12 NOT_LCS_A10 NOT-LCl_A11 NOT_LC4_A10 NOT_LC5_A4 NOT_LCI_A13 NOT_LCI_C7 NOT_LC4_A1 t13 t14 -LC6_C3 t27 t29 t37 t71 t73 t81 t10 t11 t12 t15 t16 t21 t26 t28 t34 t35 t36 t38 t44 t46 t70 t72 t78 t79 t80 _LC1_C3 166 AAA/NAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA/\AAAAAAAAAA 101) 102) 103) 104) 105) 106) 107) 108) 109) 110) 111) 112) 113) 114) 115) 116) 117) 118) 119) 120) 121) 122) 123) 124) 125) 126) 127) 128) 129) 130) 131) 132) 133) 134) 135) 136) 137) 138) 139) 140) 141) 142) 143) 144) 145) 146) 147) NODE= 85 NODE= 92 NODE=104 NODE=114 NODE=158 NODE=168 NODE= 23 NODE= 25 NODE= 29 NODE= 32 NODE= 53 NODE= 54 NODE= 93 NODE=105 NODE=106 NODE=107 NODE=118 NODE=119 NODE=121 NODE=123 NODE8128 NODE=133 NODE=139 NODE=144 NODE=146 NODE=148 NODE=159 NODE=160 NODE=161 NODE= 94 NODE=108 NODE=116 NODE=117 NODE=125 NODE=135 NODE=140 NODE=145 NODE=147 NODE=149 NODE=162 NODE= 26 NODE= 95 NODE=109 NODE=120 NODE=126 NODE=127 NODE=132 LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL8 LEVEL= LEVEL= LEVEL= LEVEL8 LEVEL= LEVEL= LEVEL= LEVEL= LEVEL8 LEVEL= LEVEL= LEVEL8 LEVEL= LEVEL= LEVEL= LEVEL8 LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL8 LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= 000000000mmmmmmmmmflfl‘lflflflflflfl‘lflflflVNNNNNNNNN00000Q NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: _LC3_C3 _LC2_CS _LC7_A10 -LC6_A10 -LC5_A5 _LC6_A5 NOT_LC6_A10 NOT_LCI_C3 NOT_LC2_C3 NOT_LC6_A5 NOT_LC5_A5 NOT_LC7_A10 -LC2_C9 t30 t31 t32 t41 t42 t43 t45 t49 t53 t58 t62 t64 t66 t74 t75 t76 t22 t33 t39 t40 _LC1_A10 t55 t59 t63 t65 t67 t77 NOT_LC1_A10 _LC3_C9 _LC3_A10 -LC2_A10 t47 t48 t52 167 AAAAA/KINAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 148) 149) 150) 151) 152) 153) 154) 155) 156) 157) 158) 159) 160) 161) 162) 163) 164) 165) 166) 167) 168) 169) 170) 171) 172) 173) 174) 175) 176) 177) 178) 179) 180) 181) 182) 183) 184) 185) 186) 187) 188) 189) 190) 191) 192) 193) 194) NODE=134 NODE=150 NODE=163 NODE= 19 NODE= 33 NODE= 34 NODE= 39 NODE= 96 NODE= 97 NODE=129 NODE=130 NODE=136 NODE=169 NODE=170 NODE=172 NODE=175 NODE=176 NODE=180 NODE=181 NODE=187 NODE=217 NODE=218 NODE=220 NODE=223 NODE=225 NODE= 27 NODE= 98 NODE=131 NODE=141 NODE=142 NODE=151 NODE=152 NODE=171 NODE=173 NODE=177 NODE=178 NODE=182 NODE=189 NODE=219 NODE=221 NODE=224 NODE=226 NODE= 36 NODE= 99 NODE=137 NODE=138 NODE=153 LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL8 LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL8 LEVEL= LEVEL= LEVEL= LEVEL= LEVEL8 LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL8 LEVEL= LEVEL= LEVEL= LEVEL8 LEVEL8 LEVEL= 000 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: t54 _LC3_A4 _LC4_A5 NOT_LC3_C9 NOT_LC3_A10 NOT_LC2_A10 NOT_LC4_A5 t23 t24 t50 t51 -LC4_A4 t82 t83 t85 t87 t88 t91 t92 t96 t118 t119 t121 t123 t125 NOT_LC4_A4 t25 _LC1-A8 t60 t61 t68 t69 t84 t86 t89 t90 t93 t98 t120 t122 t124 t126 NOT_LCI_A8 -LC1-C9 t56 t57 _LCI_A4 168 AAA/\AAAAAAAAA/KAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 195) NODE=174 LEVEL= 12 NAME: _LC3_A5 196) NODE=179 LEVEL= 12 NAME: _LC7_A5 197) NODE=183 LEVEL8 12 NAME: _LC2_A1 198) NODE=192 LEVEL8 12 NAME: t100 199) NODE=222 LEVEL= 12 NAME: _LC5_A1 200) NODE=227 LEVEL8 12 NAME: _LC8_A1 201) NODE= 11 LEVEL= 13 NAME: p2 202) NODE8 40 LEVEL= 13 NAME: NOT_LC8_A1 203) NODE= 41 LEVEL8 13 NAME: NOT_LC3_A5 204) NODE= 42 LEVEL8 13 NAME: NOT_LC2_A1 205) NUDE8 48 LEVEL8 13 NAME: NOT_LCI_09 206) NODE= 49 LEVEL8 13 NAME: NOT_LC1_A4 207) NODE=143 LEVEL8 13 NAME: _LC2_A4 208) NODE=184 LEVEL8 13 NAME: t94 209) NODE8186 LEVEL8 13 NAME: t95 210) NODE=188 LEVEL8 13 NAME: t97 211) NODE=228 LEVEL8 13 NAME: t127 212) NODE=231 LEVEL8 13 NAME: t130 213) NODE=234 LEVEL8 13 NAME: t132 214) NODE=235 LEVEL8 13 NAME: t133 215) NODE8 52 LEVEL8 14 NAME: NOT_LC2_A4 216) NODE=185 LEVEL= 14 NAME: _LC2_A5 217) NODE8190 LEVEL8 14 NAME: _LC1_A5 218) NODE8199 LEVEL8 14 NAME: t105 219) NODE8229 LEVEL= 14 NAME: t128 220) NODE=230 LEVEL8 14 NAME: t129 221) NODE=232 LEVEL8 14 NAME: t131 222) NODE8236 LEVEL8 14 NAME: t134 223) NODE8237 LEVEL8 14 NAME: t135 224) NODE8 35 LEVEL8 15 NAME: NOT_LC1_A5 225) NODE= 44 LEVEL8 15 NAME: NOT_LC2_A5 226) NUDE-193 LEVEL= 15 NAME: t101 227) NODE=195 LEVEL8 15 NAME: t102 228) NODE=196 LEVEL8 15 NAME: t103 229) NODE=233 LEVEL8 15 NAME: _LC3_A1 230) NODE=238 LEVEL8 15 NAME: _LC6_A1 231) NODE=241 LEVEL8 15 NAME: t138 232) NODE= 12 LEVEL= 16 NAME: p3 233) NODE= 43 LEVEL8 16 NAME: NOT_LC6_A1 234) NODE=191 LEVEL8 16 NAME: t99 235) NODE8197 LEVEL8 16 NAME: _LC1_B2 236) NODE8242 LEVEL8 16 NAME: t139 237) NODE=243 LEVEL= 16 NAME: t140 238) NODE=244 LEVEL= 16 NAME: t141 239) NODE=246 LEVEL= 16 NAME: t142 240) NODE=247 LEVEL= 16 NAME: t143 241) NODE=248 LEVEL= 16 NAME: t144 169 AAA/KAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 242) 243) 244) 245) 246) 247) 248) 249) 250) 251) 252) 253) 254) 255) 256) 257) 258) 259) 260) 261) 262) 263) 264) 265) 266) 267) 268) 269) 270) 271) 272) 273) 274) 275) 276) 277) 278) 279) 280) 281) 282) 283) NODE=249 NODE= 37 NODE=194 NODE=200 NODE=202 NODE82O3 NODE=239 NODE=240 NODE=250 NODE= 45 NODE= 46 NODE=198 NODE=204 NODE=245 NODE=252 NODE8 13 NODE= 50 NODE8201 NODE8251 NODE=253 NODE= 47 NODE=254 NODE=255 NODE=256 NODE=257 NODE= 14 NODE=258 NODE=260 NODE=259 NODE=261 NODE8 15 NODE8 51 NODE8262 NODE=263 NODE=264 NODE=265 NODE=266 NODE=267 NODE=268 NODE=269 NODE8270 NODE8 16 LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL= LEVEL8 LEVEL= LEVEL= LEVEL8 LEVEL8 LEVEL= LEVEL8 LEVEL= LEVEL= LEVEL= LEVEL8 LEVEL8 LEVEL= LEVEL8 LEVEL= LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL= LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL8 LEVEL= LEVEL8 LEVEL= LEVEL8 LEVEL8 NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: NAME: t145 NOT_LC1_B2 _LC8_B2 t106 t107 t108 t136 t137 _LC1_A1 NOT_LC1_A1 NOT-LC8_B2 t104 _LC4_B2 _LC7_A1 t147 p4 NOT-LC4_B2 -LC2_B2 t146 t148 NOT_LC2_B2 _LC7_B2 t149 t150 t151 p5 t152 t153 -L05_B2 _LC6-B2 p6 NOT-LC6_B2 t154 t155 t156 t157 t158 t159 t160 t161 _LC3_B2 p7 170 B.5 SIS Activity and Power Estimation The switching activity and power estimates provided by SIS are given below. Script started on Fri Jun 11 02:25:50 1999 Combinational power estimation, with Zero delay model. Network: bth4.b1f, Power 8 2292.5 uw assuming 20 MHz Clock and Vdd 8 5V sis> time elapse: 1.0 seconds, total: 1.0 seconds sis> power_print Node [3393] Cap. 8 Node [3394] Cap. 8 Switch Prob. 8 Node [3395] Cap. Switch Prob. 8 .29 Power 8 Node _LC5_C3 Cap. 8 11 Switch Prob. 0.47 Power 8 12.9 Node NOT_LC5_C3 Cap. 8 8 Switch Prob. 8 0.47 Power 8 9.4 Node -LC4_C3 Cap. 8 12 Switch Prob. 8 0.22 Power 8 6.6 Node NOT_LC4_C3 Cap. 8 2 Switch Prob. 8 0.22 Power 8 1.1 Node _LC3_C9 Cap. 8 5 Switch Prob. 8 0.12 Power 8 1.5 Node NOT_L03_C9 Cap. 8 3 Switch Prob. 8 0.12 Power 8 0.9 Node _LCI_A12 Cap. 8 46 Switch Prob. 8 0.50 Power 8 57.5 Node NOT_LC1_A12 Cap. 8 24 Switch Prob. 8 0.50 Power 8 30.0 Node _LC5_A10 Cap. 8 12 Switch Prob. 8 0.50 Power 8 15.0 Node NOT-LCS-A10 Cap. 8 9 Switch Prob. 8 0.50 Power 8 11.2 Node _LCI_A11 Cap. 8 26 Switch Prob. 8 0.38 Power 8 24.4 Node NOT_LC1_A11 Cap. 8 18 Switch Prob. 8 0.38 Power 8 16.9 Node _LC6_A10 Cap. 8 16 Switch Prob. 8 0.34 Power 8 13.7 Node NOT_L06_A10 Cap. 8 7 Switch Prob. 8 0.34 Power 8 6.0 Node _LC4_A10 Cap. 8 17 Switch Prob. 8 0.47 Power 8 19.9 Node NOT-LC4_A10 Cap. 8 14 Switch Prob. 8 0.47 Power 8 16.4 Node _LCI_C3 Cap. 8 17 Switch Prob. 8 0.47 Power 8 19.9 Node NOT_L01_03 Cap. 8 14 Switch Prob. 8 0.47 Power 8 16.4 Node _LC1_A10 Cap. 8 16 Switch Prob. 8 0.38 Power 8 15.0 .46 Power 8 .43 Power 8 Switch Prob. 8 Node aO Cap. 8 59 Switch Prob. 8 0.50 Power 8 73.8 Node a1 Cap. 8 75 Switch Prob. 8 0.50 Power 8 93.8 Node a2 Cap. 8 55 Switch Prob. 8 0.50 Power 8 68.8 Node a3 Cap. 8 48 Switch Prob. 8 0.50 Power 8 60.0 Node :0 Cap. 8 75 Switch Prob. 8 0.50 Power 8 93.8 Node :1 Cap. 8 32 Switch Prob. 8 0.50 Power 8 40.0 Node x2 Cap. 8 52 Switch Prob. 8 0.50 Power 8 65.0 Node :3 Cap. 8 66 Switch Prob. 8 0.50 Power 8 82.5 Node [3388] Cap. 8 0 Switch Prob. 8 0.38 Power 8 0.0 Node [3389] Cap. 8 0 Switch Prob. 8 0.47 Power 8 0.0 Node [3390] Cap. 8 0 Switch Prob. 8 0.49 Power 8 0.0 Node [3391] Cap. 8 0 Switch Prob. 8 0.50 Power 8 0.0 Node [3392] Cap. 8 0 Switch Prob. 8 0.49 Power 8 0.0 0 0 0.0 0 0 0.0 0 0 0.0 171 Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node NOT_LC1_A10 Cap. 8 7 Switch Prob. 8 0.38 Power 8 6.6 -LC4_A4 Cap. 8 19 Switch Prob. 8 0.38 Power 8 17.8 NOT-LC4_A4 Cap. 8 6 Switch Prob. 8 0.38 Power 8 5.6 _LC5_A4 Cap. 8 23 Switch Prob. 8 0.47 Power 8 27.0 NOT_LC5_A4 Cap. 8 20 Switch Prob. 8 0.47 Power 8 23.4 -LC2_03 Cap. 8 21 Switch Prob. 8 0.12 Power 8 6.2 NOT_LC2_C3 Cap. 8 20 Switch Prob. 8 0.12 Power 8 5.9 _LCI_A13 Cap. 8 39 Switch Prob. 8 0.50 Power 8 48.8 NOT-LC1_A13 Cap. 8 13 Switch Prob. 8 0.50 Power 8 16.2 _LC1-C7 Cap. 8 33 Switch Prob. 8 0.38 Power 8 30.9 NOT-LC1_C7 Cap. 8 25 Switch Prob. 8 0.38 Power 8 23.4 _LC6_A5 Cap. 8 16 Switch Prob. 8 0.34 Power 8 13.7 NOT-LC6_A5 Cap. 8 7 Switch Prob. 8 0.34 Power 8 6.0 _LC3_A10 Cap. 8 17 Switch Prob. 8 0.47 Power 8 19.9 NOT_LC3_A10 Cap. 8 14 Switch Prob. 8 0.47 Power 8 16.4 _LC2-A10 Cap. 8 13 Switch Prob. 8 0.49 Power 8 16.0 NOT_LC2_A10 Cap. 8 9 Switch Prob. 8 0.49 Power 8 11.1 Switch Prob. 8 0.40 Power 8 12.1 2 Switch Prob. 8 0.40 Power 8 2.0 Switch Prob. 8 0.49 Power 8 12.3 6 Switch Prob. 8 0.49 Power 8 7.4 Switch Prob. 8 0.22 Power 8 7.1 2 Switch Prob. 8 0.22 Power 8 1.1 Switch Prob. 8 0.43 Power 8 17.2 0 -L01_A5 Cap. 8 12 NOT-LC1_A5 Cap. 8 -LC1_A8 Cap. 8 10 NOT_LCI_A8 Cap. 8 _LCI_B2 Cap. 8 13 NOT_L01_B2 Cap. 8 _LC4_A1 Cap. 8 16 NOT_LC4_A1 Cap. 8 _LC4_A5 Cap. 8 17 NOT_LC4-A5 Cap. 8 -LC8_A1 Cap. 8 16 NOT_LC8-A1 Cap. 8 _LC3_A5 Cap. 8 17 NOT_L03_A5 Cap. 8 7 Switch Prob. 0.43 Power 8 7.5 Switch Prob. 8 .47 Power 8 19.9 14 Switch Prob. 8 0.47 Power 8 16.4 Switch Prob. 8 0.44 Power 8 17.6 7 Switch Prob. 8 0.44 Power 8 7.7 Switch Prob. 8 0.49 Power 8 20.9 14 Switch Prob. 8 0.49 Power 8 17.2 _LC2_A1 Cap. 8 20 Switch Prob. 8 0.50 Power 8 24.9 NOT_LC2_A1 Cap. 8 18 Switch Prob. 8 0.50 Power 8 22.4 _L06_A1 Cap. 8 27 Switch Prob. 8 0.44 Power 8 29.9 6 Switch Prob. 8 0.44 Power 8 6.7 Switch Prob. 8 0.27 Power 8 13.0 18 Switch Prob. 8 0.27 Power 8 12.3 Switch Prob. 8 0.25 Power 8 8.7 4 Switch Prob. 8 0.25 Power 8 2.5 _LC8_B2 Cap. 8 8 Switch Prob. 8 0.48 Power 8 9.6 NOT_LC8_B2 Cap. 8 11 Switch Prob. 8 0.48 Power 8 _LC2_B2 Cap. 8 8 Switch Prob. 8 0.43 Power 8 8.7 NOT_LC2_B2 Cap. 8 8 Switch Prob. 8 0.43 Power 8 8.7 _LCl_C9 Cap. 8 18 Switch Prob. 8 0.30 Power 8 13.7 NOT_LC1_C9 Cap. 8 16 Switch Prob. 8 0.30 Power 8 12.2 _LCl_A4 Cap. 8 18 Switch Prob. 8 0.09 Power 8 4.0 NOT_LC1_A4 Cap. 8 16 Switch Prob. 8 0.09 Power 8 NOT_LC6_A1 Cap. = _LC2_A5 Cap. - 19 NOT_LC2_A5 Cap. = _LC1_A1 Cap. = 14 NOT_L01_A1 Cap. = 13.2 3.6 172 Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node _LC4_B2 Cap. 8 18 Switch Prob. 8 0.03 Power 8 1.4 NOT_LC4_B2 Cap. 8 16 Switch Prob. 8 0.03 Power _LC6_B2 Cap. 8 17 Switch Prob. 8 0.01 Power NOT_LC6_B2 Cap. 8 16 Switch Prob. 8 0.01 Power _LC2_A4 Cap. 8 12 Switch Prob. 8 0.44 Power NOT_LC2_A4 Cap. 8 7 Switch Prob. 8 0.44 Power _LC5_A5 Cap. 8 9 Switch Prob. 8 0.47 Power 8 10.5 NOT_LC5_A5 Cap. 8 4 Switch Prob. 8 0.47 Power _LC7_A10 Cap. 8 9 Switch Prob. 8 NOT_LC7_A10 Cap. 8 4 Switch Prob. NOTaO Cap. 8 26 Switch Prob. 8 NOTa1 Cap. 8 21 Switch Prob. 8 NOTa2 Cap. 8 32 Switch Prob. 8 NOTa3 Cap. 8 21 Switch Prob. 8 NOTxO Cap. 8 41 Switch Prob. 8 NOTxl Cap. 8 21 Switch Prob. 8 NOTx2 Cap. 8 48 Switch Prob. 8 NOTx3 Cap. 8 20 Switch Prob. 8 _LC1_Cl4 Cap. 8 1 Switch Prob. "00000000 0.47 Power 8 8 0.47 Power .50 Power 8 32. .50 Power 8 26. .50 Power 8 40. .50 Power 8 26. .50 Power 8 51. .50 Power 8 26. .50 Power 8 60. .50 Power 8 25. 0.38 Power 8 CMUIOOHMNMHHMIOMMN ONHOOQNNNMMN‘INNWWNthw 0.3 13. 8 7.6 - 4.7 1 . 0 OOONNMONO'IIIO Enioioio'sz'qén. 16. 1 7 1.2 8 2.0 2 7 2 7 _LC5_06 Cap. 8 3 Switch Prob. 8 0.47 Power 8 _L05_A1 Cap. 8 3 Switch Prob. 8 0.49 Power 8 _L03_A1 Cap. 8 3 Switch Prob. 8 0.50 Power 8 _LC7_A1 Cap. 8 4 Switch Prob. 8 0.49 Power 8 _LC7_B2 Cap. 8 2 Switch Prob. 8 0.46 Power 8 _L05_B2 Cap. 8 3 Switch Prob. 8 0.43 Power 8 _LC3_B2 Cap. 8 5 Switch Prob. 8 0.29 Power to Cap. 8 5 Switch Prob. 8 0.22 Power t1 Cap. 8 5 Switch Prob. 8 0.22 Power 8 t2 Cap. 8 5 Switch Prob. 8 0.22 Power t3 Cap. 8 5 Switch Prob. 8 0.22 Power t4 Cap. 8 4 Switch Prob. 8 0.12 Power 8 t5 Cap. 8 4 Switch Prob. 8 0.12 Power 8 t6 Cap. 8 5 Switch Prob. 8 0.22 Power 8 t7 Cap. 8 5 Switch Prob. 8 0.22 Power 8 t8 Cap. 8 5 Switch Prob. 8 0.22 Power 8 t9 Cap. 8 6 Switch Prob. 8 0.12 Power 8 t10 Cap. 8 6 Switch Prob. 8 0.06 Power 8 t11 Cap. 8 6 Switch Prob. 8 0.00 Power 8 t12 Cap. 8 5 Switch Prob. 8 0.40 Power 8 t13 Cap. 8 5 Switch Prob. 8 0.22 Power 8 t14 Cap. 8 5 Switch Prob. 8 0.00 Power 8 _LC6_C3 Cap. 8 13 Switch Prob. 8 0.49 Power t15 Cap. 8 4 Switch Prob. 8 0.17 Power 8 t16 Cap. 8 4 Switch Prob. 8 0.12 Power 8 _LC3_C3 Cap. 8 3 Switch Prob. 8 0.26 Power t17 Cap. 8 5 Switch Prob. 8 0.22 Power 8 t18 Cap. 8 5 Switch Prob. 8 0.22 Power 8 173 1. 0 0. 1 I50] ‘1 2 3 Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node t19 Cap. 8 t20 Cap. 8 t21 Cap. 8 _LC2_C9 Cap. t22 Cap. 8 t23 Cap. 8 t24 Cap. 8 t25 Cap. 8 t26 Cap. 8 t27 Cap. 8 t28 Cap. 8 t29 Cap. 8 t30 Cap. 8 t31 Cap. 8 t32 Cap. 8 t33 Cap. 8 t34 Cap. 8 t35 Cap. 8 t36 Cap. 8 t37 Cap. 8 t38 Cap. 8 t39 Cap. 8 t40 Cap. 8 t41 Cap. 8 t42 Cap. 8 t43 Cap. 8 t44 Cap. 8 t45 Cap. 8 t46 Cap. 8 t47 Cap. 8 t48 Cap. 8 t49 Cap. 8 t50 Cap. 8 t51 Cap. 8 t52 Cap. 8 t53 Cap. 8 t54 Cap. 8 t55 Cap. 8 t56 Cap. 8 t57 Cap. 8 t58 Cap. 8 t59 Cap. 8 t60 Cap. 8 t61 Cap. 8 t62 Cap. 8 t63 Cap. 8 t64 Cap. 8 0101010101010101a1010101010101mmmmmmmmmmmmmmmmammmmmmm37:38-80:IIComa! Switch Switch Switch Prob. Prob. Prob. 5 Switch Prob. Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. 8 Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. 8 Prob. Prob. 8 Prob. Prob. Prob. Prob. Prob. 0000000000000000000000000000000000000000000ll000 .22 .12 .12 Power Power Power 0.49 Power .12 .00 .00 .30 .22 .22 .22 .22 .38 .22 .22 .12 .17 .12 .12 .12 .30 .17 .24 .17 .03 .17 .22 .06 .12 .16 .03 .30 .18 .24 .19 .22 .03 .12 .00 .32 .03 .28 .00 .06 .06 .17 .03 174 Power Power 8 Power Power 8 Power Power 8 Power Power Power Power Power Power Power Power Power Power Power 8 Power Power 8 Power Power Power Power Power Power Power Power Power Power Power 8 Power Power Power Power Power Power Power Power Power Power Power Power 8 Power 0M000w00h01-‘0NMCAMW0MH0MMOMODNODHHHMHMMhMMMNw000llCHM .8 escr 83¢: C>u>co c>cn p-lqih 01¢» a>cn C’Cfl a>~418 01:8 05:8 a>cn balm 512» 811434 84-4 44.4 c><3 c>¢o 0530 a>-q Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node t65 Cap. 8 t66 Cap. 8 t67 Cap. 8 _L03_A4 Cap. t68 Cap. 8 t69 Cap. 8 t70 Cap. 8 t71 Cap. 8 t72 Cap. 8 t73 Cap. 8 t74 Cap. 8 t75 Cap. 8 t76 Cap. 8 t77 Cap. 8 t78 Cap. 8 t79 Cap. 8 t80 Cap. 8 t81 Cap. 8 t82 Cap. 8 t83 Cap. 8 t84 Cap. 8 t85 Cap. 8 t86 Cap. 8 t8? Cap. 8 t88 Cap. 8 t89 Cap. 8 t90 Cap. 8 _LC7_A5 Cap. t91 Cap. 8 t92 Cap. 8 t93 Cap. 8 t94 Cap. 8 t95 Cap. 8 t96 Cap. 8 t97 Cap. 8 t98 Cap. 8 t99 Cap. 8 t100 Cap. 8 t101 Cap. 8 t102 Cap. 8 t103 Cap. 8 t104 Cap. 8 t105 Cap. 8 t106 Cap. 8 t107 Cap. 8 t108 Cap. 8 t109 Cap. 8 mpppppwwpbhmmmmmpppummmmmmmmmmmmmmmmmmmmmphucacao: Switch Switch Switch 11 Switch Prob. - Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch 10 Switch Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Switch Prob. 8 Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. 8 Prob. Prob. Prob. Prob. Prob. 00000000000000000000000 0000000000000000000 “000Mw0Hkubt-80MHMMOJW munmqopmo¢owmm~zflpbp 175 Power 8 3.6 Power 8 0.0 Power 8 3.3 Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power .47 Power Power 8 Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power .40 Power 8 Power 8 3. 0 0 2 2 2 2 2 4 2 1 2. 1 1 1 3 2 3 1 1 1 3 1 1 cocnostisi-sobo'mininmookz'q'q'q'qlz'q'mi» 3 2 1 2 3 0 1 4. 3. 1 1 0 3 2 0 0 0 1 4 4 4 4 1 3 8 5 0 4 0 4 3 9 7 5 2 1 9 Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node Node t110 t111 t112 t113 t114 t115 t116 t117 t118 t119 t120 t121 t122 t123 t124 t125 t126 t127 t128 t129 t130 t131 t132 t133 t134 t135 t136 t137 t138 t139 t140 t141 t142 t143 t144 t145 t146 t147 t148 t149 t150 t151 t152 t153 t154 t155 t156 Cap. Cap. Cap. Cap. 8 Cap. - Cap. 8 Cap. 8 Cap. 8 Cap. Cap. Cap. Cap. Cap. Cap. 8 Cap. Cap. 8 Cap. Cap. 8 Cap. Cap. 8 Cap. 8 Cap. Cap. 8 Cap. Cap. Cap. Cap. Cap. Cap. Cap. Cap. Cap. Cap. Cap. Cap. Cap. 8 Cap. - Cap. 8 Cap. Cap. - Cap. 8 Cap. 8 Cap. 8 Cap. 8 Cap. 8 Cap. - Cap. 8 mmmwoscncnmbpbmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Switch Prob. 8 Prob. Prob. 8 Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. Prob. - Prob. 8 Prob. Prob. Prob. Prob. 8 Prob. Prob. 8 Prob. Prob. 8 Prob. - Prob. 8 Prob. 8 Prob. - Prob. 8 Prob. 8 Prob. Prob. Prob. Prob. Prob. 8 Prob. Prob. 8 Prob. Prob. - Prob. 8 Prob. 8 Prob. Prob. 8 Prob. Prob. - Prob. 8 Prob. 8 Prob. 00000000000000000000000000000000000000000000000 .22 .22 .22 .22 .22 .22 .22 .22 .30 .19 .17 .14 .12 .17 .26 .06 .22 .34 .24 .16 .15 .10 .19 .10 .24 .19 .07 .36 .10 .27 .02 .12 .05 .09 .02 .12 .43 .29 .03 .27 .37 .25 .01 .01 .00 .00 .00 176 Power Power 8 Power Power Power 8 Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power Power 8 Power Power 8 Power Power 8 Power 8 Power - Power 8 Power 8 Power 8 Power Power 8 Power - Power 8 Power ooooowpwowhpo~to~owupowwpMupmwpwowwwuwnwwmnnwwwto c><3 C>h8ld sac» 5.0310 o:cn10 01¢» a>ss.b oocntm ¢-c>cn.h 0530 ¢»c>co«q date.» one» alustn 81-4-8 «I-q-A ~a-q $337.18: Node t157 Cap. 8 6 Switch Prob. 8 0.00 Power — 0.0 Node t158 Cap. 8 6 Switch Prob. 8 0.00 Power 8 0.0 Node t159 Cap. 8 6 Switch Prob. 8 0.02 Power 8 0.3 Node t160 Cap. 8 6 Switch Prob. 8 0.03 Power 8 0.5 Node t161 Cap. 8 6 Switch Prob. 8 0.25 Power 8 3.8 Node t162 Cap. 8 3 Switch Prob. 8 0.38 Power 8 2.8 Node t163 Cap. 8 3 Switch Prob. 8 0.38 Power 8 2.8 Node t164 Cap. 8 2 Switch Prob. 8 0.38 Power 8 1.9 Node t165 Cap. 8 3 Switch Prob. 8 0.38 Power -. 2.8 Node t166 Cap. 8 3 Switch Prob. 8 0.38 Power 8 2.8 Node t167 Cap. 8 2 Switch Prob. 8 0.38 Power 8 1.9 Node t168 Cap. 8 3 Switch Prob. 8 0.38 Power - 2.8 Node t169 Cap. 8 3 Switch Prob. 8 0.38 Power 8 2.8 Total Power: 2292 . 467728 B.6 BLAPE Activity and Power Estimation The switching activity and power estimates provided by BLAPE are given below. Activity Computation delay 8 0.03 Seconds Node Activity Capacitance a0 0.5000 cap859 a1 0.5000 cap875 a2 0 . 5000 cap855 a3 0 . 5000 cap848 x0 0 . 5000 cap875 11 0 . 5000 cap832 x2 0 . 5000 cap852 x3 0 . 5000 cap866 p0 0 . 3750 cap80 pl 0 . 4851 cap80 p2 0 . 4982 cap80 p3 0 . 4992 cap80 p4 0 . 4994 cap80 p5 0 . 4736 cap80 p6 O . 3864 cap80 p7 O . 4304 cap=0 NOT-LC5_C3 0 . 4672 cap88 NOT_LC4_C3 0 . 2129 cap=2 NOT_LC3_C9 O . 1318 cap83 NOT_LCI_A1 2 0 . 4922 cap824 N OT_LCS_A10 0 . 4922 cap89 177 NOT_LC1_A11 NOT_LC6_A10 NOT_LC4_A10 NOT_LCI_C3 NOT_LC1_A10 NOT_LC4_A4 NOT_LC5_A4 NOT_LC2_63 NOT_LC1_A13 NOT_LC1_C7 NOT_LC6_A5 NOT_LC3_A10 NOT_LC2_A10 NOT_LC1_A5 NOT_LC1_A8 NOT_LC1_B2 NOT_LC4_A1 NOT_LC4_A5 NOT_LC8_A1 NOT_LC3-A5 NOT_LC2-A1 NOT_LC6_A1 NOT_LC2_A5 NOT_LC1_A1 NOT_LC8_B2 NOT_LC2_B2 NOT_LC1_C9 NOT_LCl_A4 NOT_LC4_B2 NOT_LC6_B2 NOT_LC2_A4 NOT_LC5_A5 NOT_LC7_A10 NOTaO NOTa1 NOTa2 NOTa3 NOTxO NOTx1 NOTx2 NOTxS t0 t1 t2 t3 -LC4_A10 t4 0.3750 0.3811 0.4851 0.4978 0.4250 0.4493 0.4672 0.1303 0.4922 0.3750 0.3811 0.4901 0.4971 0.4587 .4999 .2831 .4851 .4901 .4736 .4951 .5000 .4687 .2629 .2908 .4960 .4918 .4082 .1377 .0690 .0344 .4975 .4758 0.4758 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .2188 .2188 .2188 .2188 .4851 .1172 000000000000000000 00000000000000 cap818 cap87 cap814 cap814 cap87 cap=6 cap820 cap820 cap813 cap825 cap87 cap814 cap89 cap=2 cap=6 cap=2 cap87 cap814 cap87 cap814 cap818 cap=6 cap818 cap84 cap=11 cap88 cap816 cap816 cap816 cap816 cap87 cap84 cap84 cap826 cap821 cap832 cap821 cap841 cap821 cap848 cap820 cap=5 cap=5 cap=5 cap=5 cap816 cap83 178 t5 _LC4_C3 t6 t7 t8 t9 _LC5_03 t10 t11 t12 t13 t14 -LC1_C3 -LC6_C3 t15 t16 _LC3_03 t17 t18 t19 t20 -LCS_A4 t21 _LC2_C3 _LC2_C9 t22 -LC3_09 t23 t24 t25 _LC1_C9 t26 t27 t28 t29 _LC7_A10 t30 t31 t32 t33 _LC3_A10 ‘t34 t35 't36 ‘t37 _LCS_A10 1fi38 <3 C><3 C><3 c><3 c><3 c><3 c><3 c><3 c><3 c><> c><3 c><3 c><3 c><3 c><3 .1172 .2129 .2188 .2188 .2188 .1172 .4672 .0373 .0373 .4401 .1687 .1687 .4978 .4927 .1303 .1303 .2339 .2188 .2188 .2188 .1172 .4672 .1303 .1303 .4909 .1318 .1318 .0684 .0684 .3566 .4082 .2158 .1948 .2158 .1948 .4758 .4139 .3139 .3425 .0645 .4901 .1506 .1307 .1307 .1172 .3811 .3572 cap83 cap812 cap85 cap85 cap85 cap85 cap810 cap85 cap85 cap85 cap85 cap85 cap816 cap813 cap83 cap83 cap83 cap85 cap85 cap85 cap85 cap-22 cap-2 cap820 cap85 cap=2 cap84 cap84 cap84 cap84 cap818 cap85 cap85 cap85 cap85 cap88 cap85 cap85 cap85 cap85 cap816 cap85 cap85 cap85 cap85 cap815 cap85 t39 t40 t41 t42 _LC2_A10 t43 t44 t45 t46 _LC1_A10 t47 t48 t49 t50 t51 _LC1_A8 t52 t53 t54 t55 _LC4_A4 t56 t57 t58 t59 t60 t61 _LC2_A4 t62 t63 t64 t65 t66 t67 -LC3_A4 t68 t69 _LCl_A4 t70 t71 t72 t73 _LC5_A5 ‘t74 ‘t75 ‘t76 ‘t77 <3 C><3 C)<><><>¢D c><>¢3 c><3 c><><3 c:c><3 c><>¢3 c><><3 c><3 c>c><3 c><3 .2604 .1726 .1004 .0635 .4971 .1464 .2622 .0723 .1358 .4250 .1329 .0689 .3874 .2715 .1486 .4999 .1916 .2890 .0783 .1243 .4493 .0563 .3520 .0483 .3134 .0077 .1590 .4975 .0430 .2861 .0430 .2861 .0129 .2494 .4939 .0729 .0729 .1377 .2158 .1948 .2158 .1948 .4758 .3425 .4139 .3139 .0645 cap85 cap85 cap85 cap85 cap812 cap85 cap85 cap85 cap85 cap815 cap85 cap85 cap85 cap85 cap85 cap89 cap85 cap85 cap85 cap85 cap818 cap85 cap85 cap85 cap85 cap85 cap85 cap-=10 cap85 cap85 cap85 cap85 cap85 cap85 cap89 cap83 cap83 cap818 cap85 cap85 cap85 cap85 cap88 cap85 cap85 cap85 cap85 180 -LC4_A5 fl? HS t80 t81 -LC6_A5 t82 t83 t84 t85 t86 -LC3_A5 t87 t88 t89 t90 _LC7_A5 t91 t92 t93 _LC2_A1 t94 _LC2_A5 t95 t96 t97 t98 _LC1_A5 t99 t100 t101 _LC8_B2 t102 t103 _LC1_B2 t104 t105 t106 _LC2-B2 t107 t108 _LC4_B2 t109 _LCI_C14 t110 t111 t112 cap816 cap85 cap85 cap85 cap85 cap815 cap85 cap85 cap85 cap85 cap85 cap816 cap85 cap85 cap85 cap85 cap89 cap84 cap84 cap84 cap820 cap=2 cap818 cap85 cap85 cap85 cap85 cap8 1 1 cap84 cap84 cap84 cap88 cap83 cap83 cap813 cap84 cap84 cap84 cap88 cap83 cap83 cap818 cap=2 cap80 cap85 cap85 cap85 181 1:31-08 - 1:". I J t113 0.2188 cap85 _L05-C6 0.4851 cap=2 t114 0.2188 cap85 t115 0.2188 cap85 t116 0.2188 cap85 t117 0.2188 cap85 -LC4_A1 0.4851 cap815 t118 0.4076 cap85 t119 0.2784 cap85 t120 0.1180 cap85 t121 0.2081 cap85 t122 0.0850 cap85 _LC5_A1 0.4982 cap=2 t123 0.2445 cap85 t124 0.1918 cap85 t125 0.2081 cap85 t126 0.1620 cap85 _LC8_A1 0.4736 cap815 t127 0.3986 cap85 t128 0.2809 cap85 t129 0.1289 cap85 t130 0.1892 cap85 t131 0.0829 cap85 _LC3_A1 0.4992 cap-2 t132 0.2371 cap85 t133 0.1892 cap85 t134 0.1998 cap85 t135 0.1583 cap85 _LC6_A1 0.4687 cap-26 t136 0.0929 cap85 t137 0.3875 cap85 t138 0.0751 cap85 t139 0.3320 cap85 t140 0.0286 cap85 t141 0.1463 cap=5 -LC7_A1 0.4994 cap=2 t142 0.0289 cap85 t143 0.1452 cap85 t144 0.0286 cap85 t145 0.1463 cap85 _LC1_A1 0.2908 cap813 t146 0.4947 cap84 t147 0.3963 cap84 t148 0.0772 cap84 _LC7_B2 0.4736 cap=2 t149 0.4257 cap=5 t150 0.4975 cap85 182 88881 t151 0.4050 cap85 t152 0.0344 cap85 -LC5_B2 0.3864 cap=2 t153 0.0344 cap=2 _LC6_B2 0.0344 cap816 t154 0.0001 cap85 t155 0.0003 cap85 t156 0.0007 cap85 t157 0.0221 cap=5 t158 0.0015 cap85 t159 0.0454 cap85 t160 0.0956 cap85 t161 0.3756 cap85 _LC3-B2 0.4304 cap=2 t162 0.3750 cap83 t163 0.3750 cap83 _LC1_A13 0.4922 cap839 t164 0.3750 cap=2 _LC1-C7 0.3750 cap80 t165 0.3750 cap83 t166 0.3750 cap83 _LCI_A12 0.4922 cap846 t167 0.3750 cap=2 _LCI,A11 0.3750 cap80 t168 0.3750 cap83 t169 0.3750 cap83 _LC5_A10 0.4922 cap812 Power 8 2289.34 uW assuming 20 MHz CLK, Vdd 8 5V The Bar and Level views of the 4-bit Booth multiplier are illustrated in Figures 6.5 and 6.4 The BLAPE implementation, using a depth-accuracy of 1 performs the activity and power estimation in just 0.03 seconds; SIS runs in 1.0 seconds. The percent error, when compared to SIS, is just 0.13%. 183 BIBLIOGRAPHY BIBLIOGRAPHY [1] M. Pedram. Power Minimization in IC Design: Principles and Applications. ACM Trans. on Design Automation of Electronic Systems, 1(1):3—-56, 1996. [2] N. Weste and K. Eshraghian. Principles of CM OS VLSI Design. Addison—Wesley Company, Reading, MA, 1985. [3] S. Devadas and S. Malik. A Survey of Optimization Techniques Targeting Low Power VLSI Circuits. In Proceedings of the 32nd Design Automation Conference, pages 242-247, June 1995. [4] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen. Low-Power CMOS Design. IEEE Journal Of Solid-State Circuits, 27(4):472—484, April 1992. [5] A. Shen, A. Ghosh, S. Devadas, and K. Keutzer. On Average Power Dissipation and Random Pattern Testability of Combinational Logic Circuits. In Proceedings of the Int ’1 Conference on Computer-Aided Design, pages 402—407, November 1992. [6] W. Swartz and C. Sechen. New Algorithms for the Placement and Routing of Macro Cells. IEEE International Conference on Computer-Aided Design, pages 336-339, 1990. [7] A. P. Chandrakasan, R. Allmon, A. Stratakos, and R. W. Brodersen. Design of Portable Systems. In Proceedings of CICC, pages 1211—1218, May 1994. [8] D—S Chen, M. Sarrafzadeh, and G. K. H. Yeap. State Encoding of Finite State Machines for Low Power Design. In Proceeding of the International Symposium on Circuits and Systems, volume 3, pages 2309—2312, 1995. [9] G. E. Sobelman and D. L. Raatz. Low-Power Multiplier Design Using Delayed Evaluation. In Proceeding of the International Symposium on Circuits and Sys- tems, pages 1564—1566, 1995. 184 [10] V. G. Moshnyaga and K. Tamaru. A Comparative Study of Switching Activity Reduction Techniques for Design of Low-Power Multipliers. In Proceedings of the International Symposium on Circuits and Systems, pages 1560—1563, 1995. [11] C-Y. Tsui, M. Pedram, and A. M. Despain. Technology Decomposition and Mapping Targeting Low Power Dissipation. In Proceedings of the 30th Design Automation Conference, pages 68—73, June 1993. [12] J. Monteiro, S. Devadas, P. Ashar, and A. Mauskar. Scheduling Techniques to Enable Power Management. In Proceedings of the 33rd Design Automation Conference, pages 349—352, June 1996. M. Alidina, J. Monteiro, S. Devadas, A. Ghosh, and M. Papaefthymiou. [13] Precomputation-Based Sequential Logic Optimization for Low Power. In Pro- ceedings of the Int ’1 Conference on Computer-Aided Design, pages 74—81, Novem- ber 1994. [14] C. K. Lennard, P. Buch, and A. R. Newton. Logic Synthesis Using Power- Sensitive Don’t Care Sets. In Proceeding of the ACM Int ’1 Symposium on Low Power Electronics 63 Design, August 1996. [15] V. Tiwari, S. Malik, and A. Wolfe. Power Analysis of Embedded Software: A First Step Towards Software Power Minimization. IEEE Transactions on VLSI Systems, 2(4):437—445, December 1994. [16] W. C. Athas, L. J. Svensson, J. G. Koller, N. Tzartzanis, and E. Y—C Chou. Low- Power Digital Systems Based on Adiabatic-Switching Principles. IEEE Trans- actions On VLSI Systems, 2(4):398—407, December 1994. [17] D. Conner. Making the jump to HDL-based programmable-logic design. EDN Magazine, pages 181—187, September 1997. [18] S. Katkoori, N. Kumar, and R. Vemuri. High Level Profiling Based Low Power Synthesis Technique. In Proceedings of International Conference on Computer Design, pages 759—765, October 2—5 1995. [19] R. Burch, F. Najm, P. Yang, and T. Trick. A Monte Carlo Approach for Power Estimation. IEEE Transactions on VLSI Systems, 1(1):63—-71, March 1993. [20] S. M. Ross. Introduction to Probability Models. Academic Press Inc., San Diego, CA 92101, 5th edition, 1993. 185 W [21] K. S. Shanmugan and A. M. Breipohl. Random Signals: Detection, Estimation and Data Analysis. John Wiley & Sons, Inc., New York, NY, lst edition, 1988. [22] G. I. Stamoulis. A Monte-Carlo Approach for the Accurate and Efficient Es- timation of Average Transition Probabilities in Sequential Logic Circuits. In Proceedings of the IEEE 1996 Custom [C Conference, pages 2218224, May 1996. [23] M. Xakellis and F. Najm. Statistical Estimation of the Switching Activity in Digital Circuits. In Proceedings of the 3Ist ACM/IEEE Design Automation Con- ference, pages 728—733, 1994. [24] F. N. Najm, S. Goel, and I. N. Hajj. Power Estimation in Sequential Circuits. In Proceedings of the 32nd ACM/IEEE Design Automation Conference, pages 635-640, June 1995. [25] F. N. Najm and M. G. Xakellis. Statistical Estimation of the Switching Activity in VLSI Circuits. VLSI Design, 1996. [26] A. Ghosh, S. Devadas, K. Keutzer, and J. White. Estimation of Average Switch- ing Activity in Combinational and Sequential Circuits. In Proceedings of the 29th Design Automation Conference, pages 253—259, June 1992. [27] R. E. Bryant. Graph-Based Algorithms for Boolean Function Manipulation. IEEE Transactions on Computers, c-35:677—691, August 1986. [28] S. Chakravarty. On the complexity of using BDDS for the synthesis and analysis of boolean circuits. In Proceedings of the 27th Annual Allerton Conference on Communication, Control and Computing, pages 730—739, 1989. F. Najm. Transition Density: A New Measure of Activity in Digital Circuits. [29] IEEE Transactions on Computer-Aided Design, 12(2):310—323, February 1993. K. S. Brace, R. L. Rudell, and R. E. Bryant. Efficient Implementation of a BDD Package. In Proceedings of the 27th ACM/IEEE Design Automation Conference, pages 40—45, June 1990. [301 Bhanu Kapoor. Improving the Accuracy of Circuit Activity Measurement. In Proceedings of the 31st ACM/IEEE Design Automation Conference, pages 734— 739, 1994. C—Y. Tsui, M. Pedram, and A. M. Despain. Exact and Approximate Methods For Calculating Signal and Transition Probabilities in FSMs. In Proceeding of the 313t Design Automation Conference, pages 19—23, June 1994. 186 [33] J. Monteiro, S. Devadas, and B. Lin. A Methodology for Efficient Estimation of Switching Activity in Sequential Logic Circuits. In Proceedings of the 31st Design Automation Conference, pages 12—17, June 1994. [34] D. I. Cheng, M. Marek-Sadowska, and K-T Cheng. Speeding Up Power Esti- mation by Topological Analysis. IEEE Custom Integrated Circuits Conference, pages 623—626, 1995. [35] S. Katkoori, L. Rader N. Kumar, and R. Vemuri. A Profile Driven Approach for Low Power Synthesis. In Proceedings of VLSI, pages 759—765, Japan, 1995. [36] P. E. Landman and J. Rabaey. Power Estimation for High Level Synthesis. In Proceedings of the European Conference on Design Automation, pages 361-366, February 1993. [37] M. Nemani and F. N. Najm. Towards a High-Level Power Estimation Capability. IEEE Transactions on Computer-Aided Design, 15(6):588-—598, June 1996. [38] F. N ajm. Towards a high-level power estimation capability. In Proceedings of the 1995 International Symposium on Low Power Design, pages 87—92, April 23—26 1995. [39] Gary Hachtel and Fabio Somenzi. Logic Synthesis And Verification Alogrithms. Kluwer Academic Publishers, Norwell, MA 02061, 1996. [40] C. Y. Lee. Representation of Switching Circuits by Binary-Decision Programs. Bell System Technology J, 382985—999, July 1959. [41] S. B. Akers. Binary Decision Diagrams. IEEE Transactions on Computers, c-27(6):509-516, June 1978. [42] R. E. Bryant. Binary Decision Diagrams and Beyond: Enabling Technologies for Formal Verification. International Conference on Computer-Aided Design, pages 236—243, November 1995. [43] A. Shen, S. Devadas, and A. Ghosh. Probabilistic Manipulation of Boolean Einctions Using Free Boolean Diagrams. IEEE Transactions on Computer-Aided Design, 14(1):87—95, January 1995. [44] L. F. Saunders R. Camposano and R. M. Tabet. VHDL as Input for High-Level Synthesis. IEEE Design 85 Test of Computers, pages 43—49, March 1991. 187 [45] [46] [47] [48] [49] [50] [51] [53] F. Mailhot and G. D. Michelo. Technology mapping using Boolean matching. In European Conference on Design Automation, pages 180—185, Mar. 1990. Ronnie L. Wright and Michael A. Shanblatt. Reducing BDD Size By Exploiting Structural Connectivity. In Proceeding of the 9th Great Lakes Symposium on VLSI (to appear), March 1999. Lilu Pan Sharad C. Seth and Vishwani D. Agrawal. PREDICT - Probabilistic Estimation of Digital Circuit Testability. IEEE International Symposium on Fault- Tolerant Computing, pages 220—225, June 1985. J. E. Hopcroft A. V. Aho and J. D. Ullman. The Design and Analysis of Com- puter Algorithms. Addison-Wesley Publishing Company, Reading, MA, lst edi- tion, 1974. M. Jimenez A. Diaz and M. Shanblatt. Integer Pair Representation of Binary Terms and Equations. In Proceeding of the 41th Midwest Symposium on Circuits and Systems, August 1998. J. Monteiro, S. Devadas, A. Ghosh, K. Keutzer, and J. White. Estimation of Average Switching Activity in Combinational Logic Circuits Using Symbolic Sim- ulation. IEEE Transactions on Computer-Aided Design, 16(1):121—127, January 1997. H. Mehta, M. Borah, R. M. Owens, and M. J. Irwin. Accurate Estimation of Combinational Circuit Activity. In Proceedings of the 32nd ACM/IEEE Design Automation Conference, pages 618—622, June 1995. R. S. Martin and J. Knight. Power-Profiler: Optimizing ASIC’s Power Consump- tion At The Behavioral Level. In Proceedings of the 32nd ACM/IEEE Design Automation Conference, pages 42—47, June 1995. F. N. Najm. Power Estimation Techniques for Integrated Circuits. IEEE/A CM International Conference on Computer-Aided Design, pages 492—499, 1995. 188 nICHIGAN srnrs UNIV. LIBRARIES II]II[I]III][I[1|][[11]][[111]]! 31293020589879