_ .Q~:I1\I_[?‘Q:I V- .‘ . 1 1:0 ILZIII I :II . i,“ :. ‘. I I If”. I . 1. 1 UI '.'-I.'I I q. 4" . a g : :u (11;. i'l'z'; 5"): V“I):‘$li} I‘ t 4 . @13er 3:513”??? ”u“- ' 1" .LI; 'I:.'.I..I.I::I»" ' .g" Zi“‘.l‘.i‘I‘IrI:Ii"' 351133“? :‘L' -."r". 3 5516?: {fi- §é"“ 5533’." I . ‘ 'iI.:'I" q. .9 (90 !' 12::I“:|; ‘EthIA'L-Itflk-I #1:; “I. ' finflibf‘ffi? . '1! ' ”1| . 525,3”. ' IF‘I .5271; .9313; III‘ ‘1 If: 1-31" ‘ ”V :tiéh'llI I 1.233132?! Llflsizwfi fififizigtfi' ’17-: I" ,- .« firm ".WII“ 22:: :50}..- J‘ 5: y‘zvm'; . :2: ' I -II ~ ‘5 H m! . “hr-ii "32'! 4; ‘ '- {‘t flfié'f. 3-?"1 o u'u v 1' 13‘s“. ii? "I": t '12 ‘1“ :}"!: I I: r 2"“. ‘ 2. _ u 2';I:I'I 1‘: ‘. .V I; [-I .3 If; I‘ll}? .EI'IL'I z. ‘A . - . 31:125.:1] I12[1'-I .‘ 2 - ' n I . I ‘ . ' :‘J ‘ 19%;} 9‘ II‘ :‘E'IcII K1172“ EsII’I'I‘J-E Iii 1;};3‘: f ' "17'III15'Q'IFII‘JI'I ””1.- “:11VI‘EJ‘III:“LIIEI}I‘;I'£ 1-“ :,I4.419I‘illtftslhhfi'3'fiuq}, . 11$." {I ' {£33 “'115‘I;I§.3::J.§Ig I "a4 W) : ”‘5!"3‘ I91? I ”$111111? a‘”‘§'“~§fizz' ., knuwzr‘fiiit" I It w»? 11:22:“ v1.3.1?) "[92“! ‘M’II‘I ' ”13‘3“: " .: I l 9 “I Igg‘siggys‘ wt:- fob. 21.11%“ ‘H i .5.“ ‘ - - n 22'. - «€72: - 5; .{fl T 23;! Win 3. 3. my; I” I ' , g. (M II 3”." PN ." 1? I; ”a? 3; . . _..i»3‘;z::'4 .‘v‘ o R ‘ ‘ 1‘1"!" . c .If .“.E itimfvlm "ail-id" - "‘5' .2; I. 93:3.“ c #313 'v': I. I " r- I“: 22m" " 2 I .., 2....“ am; 51:44: ’5; .. .. .. 'I:‘-I'v'. ' ' Il‘illofl‘l“; $31!. i rho) Iii}. , , l‘z‘i‘t‘zi‘ .33 ‘ , I I 1:1'0' "r't'flwé’: I I" ‘ '2‘ .: “WW5 .:> _ .1 1,: t!!!’3:itii‘:.¥§iinig 13:31:? ‘ ‘14? . :2,” 19::S.-Iff.l:.pi§;:.é'i I 3 1.1 C :l': :1’il"dl"‘ ' 5 J“. 33* o '23. 'I1 IV I”! #32 ’U. ,-- ii 15:. or??? . E 1' “I If ‘ . 42:91:“!!! -- .. . . z: .‘c; - «a- '1‘v<- m . 1': - v. J V - .-__ P 5:! c _ ~ ‘21:)». ‘.n ézglq, :2." III .. .. I»; It 'I‘I‘I "I" ‘ .Jt‘fll 4:333 I‘M. ‘5. . . 5;.‘6 . \. .24. (I? ‘41”? I’I IAll" I! I54; IIII. .' | {13:11:}; ' 1:21.}13'31134 , J ' ":1. I: I.,,:-’:, .‘ ‘1‘; ,. Ila l"Mi'-!I"LIII II'EII’: I‘l‘I {I I: MW wiiiijwfifi“; 11134111 . i: I'E'I‘fl’fIl: . ”m ”I “2"th HI." ‘1'”: "Iq XIII“: I’IIIV} “1111);: $534} I'I IAIMII 1‘ 6|... . III III III. .416!“ LAW “2 . 3.11.5” THESIS I IIIIIIIIIIIIIIIIII‘ III/III IIIIIIII LIBRARY “293 ”6" 969 Michigan State University This is to certify that the thesis entitled EFFICIENT ALGORITHMS FOR GENERATING HIGH PRECISION BINARY LOGARITHMIC NUMBERS presented by Y1 Nan has been accepted towards fulfillment of the requirements for MS degreein Electrical Eng Major proesfé fifréjy Date BJ}0, {$7 0-7639 MS U i: an Affirmative Action/Equal Opportunity Institution PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINE return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE ma mu Efficient Algorithms for Generating High Precision Binary Logarithmic Numbers by Yi Wan A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Department of Electrical Engineering 1997 Abstract Efficient Algorithms for Generating Precise Binary Logarithmic Numbers by Yi Wan Two algorithms for fast evaluation of binary logarithm are developed in this paper. They provide advantages over other recent work on this problem. The first algorithm — Two Layer Factorization Algorithm (TLFA), uses a di- vision and a newly discovered nonlinear approximation method to provide an improvement over the often quoted difference grouping algorithm and other recent work on this problem, both in speed and implementation structure. The second algorithm — Continuous Factorization Algorithm (CFA), is pro- posed to suit high speed and high precision conversions. It uses a continuous factorization process to reduce the problem to a division one and employs the SRT division technique to enhance speed. The resulting conversion time is only about three additions irrespective of word length. The error analysis and implementation structure are developed for this algorithm. ACKNOWLEDGMENT I would like to thank Dr. Chin-Long Way for his encouragement, support, and care, and many of his insights I got during our discus- sions together. iii Contents 1 Introduction 1 2 Background 6 3 Algorithm Development 15 3.1 Two Layer Factorization Algorithm ............... 15 3.2 Continuous Factorization Algorithm ............... 37 3.2.1 CFA Approximation Error Analysis ........... 41 3.2.2 Restoring Approach .................... 54 3.2.3 Modified Restoring Approach .............. 59 3.2.4 Non-restoring Approach ................. 63 3.2.5 Modified Error Analysis and Hardware Implementa- tion Structure ....................... 68 4 Conclusion 72 A Source code of simulation program for TLFA 1.1 (in Java) 74 B Source code of simulation program for TLFA 1.2 (in Java) 80 iv List of Figures 1 Graphs of y = a: and y = log(1 + x) ............... 6 2 The graphs of 22 — 1 and 1 — log(2 — z) ............. 20 3 Difference between 22 — 1 and 1 —— log(2 — z) .......... 22 4 Radix-2 P-D plot ......................... 64 5 Block diagram of the implementation of algorithm 2.3 ..... 71 List of Tables 1 TLFA ROM size comparison .................. 18 2 Simulation result of TLFA 1 with n = 16 ............ 32 3 8-bit look-up table for log(1 + 2“) ............... 52 4 Example 2.5 ........................... 58 5 Example 2.5 ............................ 62 6 Example 2.6 ............................ 67 7 Truth table for q ......................... 70 vi 1 Introduction Logarithmic number system (LN S) is an attractive alternative to the con- ventional number system when the data need to be manipulated at very high rate over a wide data range. The LNIS can simplify multiplication, division, root, and power operations [2]. When logarithm is used, multiplication and division are reduced to addition and subtraction, and power and root opera- tions are reduced to multiplication and division. On the other hand, addition and subtraction operations become more complex. Another major problem is deriving logarithms and anti-logarithms quickly and accurately enough to al- low conversions to and from the conventional number representations. These conversions always involve slow-speed approximations. Therefore, binary logarithms can be useful only in arithmetic units dedicated to special appli- cations, where very few conversions are required but many multiplications and divisions are executed; e.g., real-time digital filters [3]. The objective of this thesis is to develop efficient algorithms that convert the conventional number representation to binary logarithmic representation. More specifi- cally, given a non-zero binary number X, evaluate log X. Note that such an X can always be normalized as X = 1.x1$2...xn x 2" for some integer k and then logX = log(1 + 311:2 . . .513”) + k Therefore, this conversion problem can be re-stated as: Given an n-bit fractional binary number x=.x1222...:1:n how to efficiently generate the precise binary logarithmic value of (l + :c), i.e., y = log(1 + 1:) (1) Here y = .ylyg..y,, is also an n-bit fractional binary number, i.e., y E [0, 1). In the LNS, for logarithmic addition, let a = log(A) and b = log(B) i.e., A = 2“ and B = 2b. Suppose A < B. Let r = A/B, then 0 5 1' <1 and A A+B—A(1+§) = A(1 +1") hence log(A + B) =log[A(1+ r)] = log(A) + log(1 + r) = a+log(1 +7') The problem of addition in LNS then involves the following important step: Given an n-bit binary number 7‘ E [0, 1), it is to generate the binary logarithmic value of (1 + 7'). Therefore, the efficient algorithms deve10ped for conversion in Eq. 1 can also be used for the logarithmic addition problem. Recently a number of new binary logarithmic conversion algorithms, e.g., [1][2] [4] [5], have been proposed. The existing conversion algorithms can be roughly classified into two categories: look-up table approach and computation approach. The former approach adopts various approximation schemes such as linear approximation methods so that moderate-size look-up table can be implemented [1]. Look-up table approach generally has the advantage of fast speed. However, the look-up table size often increases drastically as the word length increases. The latter approach converts the binary logarithm with multiplication and/ or division operations [2] [4] However, the applicability of such approach is limited by its slow operation speed. The trade-off between both approaches is the speed and precision. Thus, a feasible solution is the combination of both approaches. This study develops two efficient algorithms using factorization scheme for binary logarithm conversion. The first algorithm — Two Layer Factorization Algorithm (TLFA), uses a factorization approach to reduce the look-up table size and employs a nonlinear approximation method to reduce the computational complexity. Simulation results on IEEE single precision (23 bits) conversion shows that the algorithm requires only a reasonable small-size ROM. For handling long word length numbers, such as IEEE double precision (53 hits), the second algorithm — Continuous Factorization Algorithm (CFA) is developed. It uses a continuous factorization process to further reduce the look-up table size and incorporates the SRT division technique [2] to reduce computational complexity. In the next chapter, some recent conversion algorithms are briefly reviewed. Chapter 3 presents the proposed conversion algorithms with experimental results. Finally, a concluding remark is given in Chapter 4. 2 Background This chapter briefly reviews some existing conversion algorithms. The basic easy method in the look-up table approach is the use of a simple linear function y = as to approximate y = log(1 + x), as shown in Fig. 1 [1]. 1 1 I l r r r r l r 0.9 - - 0.8 - - 0.7 - .. 0.6 - _ 0.5 r - 0.4 '- - 0.3 *- 1 0.2 - _ 0.1 '- a O l l 1 1 1 l 1 l l 0 01 02 0.3 04 0.5 06 07 08 09 1 Figure 1: Graphs of y = :r and y = log(1 + z) In this scheme, if we let the approximation error A=log(1+a:)—:r then the maximal approximation error Am“ = 0.086071 occurs at a: = 0.442695. The number of different groups is defined by [2" - Amax], where [*] is the ceiling function and n is the word length. Thus, it requires 23 different groups for 8-bit word length. The conversion uses a ROM with 8 inputs and 5 outputs. However, the number of different groups increases exponentially as the word length increases. So, even though this algorithm has very fast speed, it is only suitable for very low conversion precision and hence has very limited use. A multiplicative normalization algorithm is developed in [2] which uses a relative simple convergence procedure for the conversion process. The recursive procedure is: At step 2', $14.1 = 113; ° bi and yi+1 = yi —' log(bi) The initial conditions are 2:0 = :1: where x is a fraction and yo = 0. For the ease of using look-up table, each b,- is chosen as either 1 or (1 + 2"). It shows that if 33,,“ = 1, then yn+1 = 108(3) = - " log(bk) k=1 In fact, each 3},- is a truncated conversion result of i-bit length. The algorithm uses a small look-up table to store the values log(1 + 2“). In addition, only the log(b,),z’ = 1, 2, . . . ,n, need to be stored for 272-bit word length. However, this algorithm needs to compute the multiplication 3:, - b,- and the subtraction y,- — log(b,). The former operation can be simplified using shift-add operations because I),- is either 1’ or (1 + 2"). Carry propagation is a speed-slowing factor in this approach. The latter operation can be improved using carry-save adder. An ATA (add-table look up-add) method [4] uses the truncated Taylor series and a difference grouping method. Suppose the function f is smooth and the 24-fraction-bit input X = 1:0 + M21 + A2332 + A3$3 where A = 2’6 and 1:,- < 1, z' = 0, 1, 2, 3, are 6-bit in length each. The Taylor series of f (X ) is f(X) = i f(n)($o + A$1)(A2x2 + A3$3)n n! n=0 which, after truncating the high order terms and further expanding the second term, can be approximated by f(X) = f(zo + Aral) + A §{f($0 + A131 + A1112) — f(IIIo + /\$1 — AT2)} + A2 —2—{f(zo + Ax1+ A333) — f(rro + Axl — Ax3)} + 2 3 A4{Ezf(2)($0) _ £sz3)($0)} 2 6 The computation of f needs only add/sub/ shift operations, including at least 5 full-length and 4 half-length addition operations. The paper mentions the conversion error of less than 2‘29, which means the last term $2 2:3 mime.) — firm.» in the expression of f (X ) cannot be omitted. In this case more look-up tables or multiplications are needed. 10 In [5], digit partition (DP) is used to divide a long word and Iterative Difference by Linear Approximation (IDLA) is used to compute an exponential function needed in the conversion. Suppose that a fractional input (C = 0.3301131. . .1122 of 23-bit length produces a 23-bit output y = O-yoy1 - - - 3122 The output is partitioned as 3/ = 0.yoy1...y11y12...y22 = 0.YZ where Y represents the first 12 bits yoyl . ..y11 and Z the remaining bits y12y13 - - -?/22- By experiment or simple analysis proof, Y is determined only by $012] . . .2312. Hence a ROM with 13 inputs can be used to store the first 12-bit output Y. The remaining 11-bit output Z = y12y13 . . . ygg is 11 calculated as follows. 2O'O"OZ = (1.1201131. . . {1712) X (2—0°Y) + (0.0 . . . 031311314 . . . (1322) X 2_0'Y In the above expression, the term (1.$0$1 ...:r12) x (2”0'Y) can be obtained by a PLA with 13-bit input $0231 . . .3312, and the second term can be rewritten as (0.00 _ . .033133314 . . 4322) x 2—0.Y = 210g(0.z13:t14...122)—13—0.Y and eventually 20'0"“ = (1.x0z1...:1:12) x (TO-Y) + 2-13 x 210g(0-mzu-.-x22)—0.Y The output bits Z = y12y13 . . . ygg can be evaluated by the IDLA algorithm. However, the exponential evaluation is a time-consuming process, and a full-length addition is needed for summing up the right-hand-side of the last equation. In addition, we still need to perform a logarithmic conversion to determine Z eventually. A simple linear approximation plus second stage piecewise linear approximation is developed in [7]. 12 Suppose y = y1 + yg is a 23-bit fraction, where yl is the first 11-bit part and y2 is the last 12-bit part of y, then log(1+y) zy+EyiAEy-y2 where Ey 2108(11‘ 311) - .711 and A311 = Edi/1+ 2’”) - Ey(y1) This algorithm requires two look-up tables to store By and AEy. In addition, it also requires a 12-bit multiplication which may degrade the speed performance. A direct computation method for converting y = log(z) is proposed in [8]. Suppose logrr 2y = yn—lyn—z ' ‘ 'ylyo-y—ly—z ' '° 13 then a: = 23’ : 2y"-lyn-2'”y1yo.y_1y_2... .2n-2 = 29n—1‘2n_1 , 2yn—2 , , , 2311-2 .2310 , 2y-1-2'1 _ _ , The developed algorithm starts with 2' = n — 1 and does the following recursively Ifa3222i, then y,=l, x=zx2‘2i. Otherwise, y,- = 0. The procedure is repeated by setting 2' = z' — 1. In this scheme, both the comparison and the multiplication can be efficiently operated for 2' 2 0. The former compares the 2‘th bit of :1), while the latter shifts at to the left by 2i bits. However, for 2' < 0, 2i is a fraction and 22‘. involves root Operation, which is considered too time consuming. This problem can be solved by using look-up table to store {22-1, 22-2, . . . }. But a full length subtraction is still needed here for comparison. Since 2‘? = 1/22‘, a division is needed in the multiplication step :r = :c x 2‘”. Again to avoid this division another look-up table has to be used. Then a full length l4 multiplication has to be performed to get the product, which seriously degrades the speed performance. As can be seen from the above review, current work on this conversion problem is not very satisfactory in dealing with conversion speed and high precision even though some involve complicated structure. The work developed in the next chapter attempts to make further improvement on these problems. 3 Algorithm Development This chapter presents the development of both TLFA and CPA for binary logarithmic conversion. The former uses a two-layer factorization approach, while the second algorithm employs a continuous factorization process for long word conversion. We first propose the TLFA, then develop the CFA. 3.1 Two Layer Factorization Algorithm This section describes the development of an efficient algorithm that generates the precise binary logarithm log(1 + 1:) for an n-bit fractional binary number x = 4231232 . . .13". Without loss of generality, n is assumed to be an even number, i.e., n = 2m for some integer m. The term 1+x can be factorized as 1+$=1+.2:1:r2...:z:,, = l +.$1$2...$m$m+1$m+2...$2m z (1 + “751172 . . . (Em)(1+ .00...0C162 . . . Cm) : (1+.W$1$2$m)(1+ .61C2...Cm 2‘7") (2) 15 16 Let a: .5L'11L‘2...$m b=.xm+1xm+2...$2m b'=b~2"'m C: .C102...Cm c'=c-2'm then (1 + :r) can be expressed as 1+z=1+a+H =(L+@u+w0 B) By Eq. 3, c can be computed as b c _ 1 + a (4) then the logarithmic value of (1+x) is derived as log(1 + x) z log(1 + a) + log(1 + c’) (5) ROM look-up table approach has been an efficient way to generate binary logarithms. For generating the logarithm of an n-bit binary number, a look-up table can be implemented with a 2" x n. ROM. 17 By Eq. 5, two look-up tables need to be used: one table — ROMl, for log(1 + 331272 . . .xm) and the other — ROM2, for log(1+.00...0c1c2...cm) where each table is implemented with at most a 2'" x n ROM. In other words, instead of using a 2" x n ROM in the conventional implementation, this approach uses two 2’" x n ROMS, whose total size is only a small fraction of the original one, namely, ”—14. For example, for n = 16 and m = 8, the ROM table size is reduced from 216 x 16, or 1M bits, to two 28 x 16, or a total of 8K bits. The reduction is significant. Table 1 summarizes the ROM size reduction for various values of n. 18 n m 2"-n 2-2m-n 10 5 10K 640 12 6 48K 1.5K 14 7 112K 3.5K 16 8 1M 8K 18 9 4.5M 18K 20 10 20M 40K Table 1: TLFA ROM size comparison However, the above approach requires the computation of c as in Eq. 4, which involves a slow division process. As a result, the speed improvement gained by the use of look-up tables for Eq. 5 may be offset by the slow division process. Therefore, attempts are made to get around the division in Eq. 4 to improve the speed performance. Taking advantage of the existing look-up table ROMl used for log(1 + .$1$2 . . . mm), the slow division process 19 can be simplified. More specifically, by Eq. 4, c can be expressed as _ b C_1+a _1+b 1 _1+a_1+a : 2108(1+b)—108(1+0) _ 2—log(1+a) 2 2H ‘ 2"“ (6) where A =log(1+ a.) and B = log(1+ b). Note that the values of both A and B can be quickly retrieved from ROMl. Therefore, the only problem remaining is the evaluation of the function 2‘. Interestingly, the functions y = (22 — 1) and y = l — log(2 — 2), as shown in Fig. 2, are very close to each other when 0 g 2 S 1. i.e., 2‘—1z1—log(2—z) or 2Z z 2 — log(2 — z) (7) In other words, the function 2‘ can be approximated by the nonlinear function 2 — log(2 — z) for all z 6 [0,1]. 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Figure 2: The graphs of 2‘ — 1 and 1 -— log(2 -— z) 0.9 21 Since log(2 — z) = log[1 + (1 — 2)], its value can be found from the look-up table ROMl because (1 — z) E [0, 1] for all z 6 [0,1]. Fig. 3 plots the approximation error between 2z —— 1 and 1 — log(2 — z) for all z 6 [0,1]. Results show that the maximum error is 0.004198, which is much better than 0.086071 with a linear function approximation in [1]. Note that the maximum approximation error of 0.004198 is equivalent to 7 bit accuracy. To further reduce the maximum approximation error, we use the function 1 — log(2 — z) + 2'12 + 2’13 (8) to approximate 2‘ — 1. As such, the maximum approximation error can be reduced to 0.0038038, which is less than 2‘8. So m = 8 is the appropriate value suggested with this implementation. 22 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 3: Difference between 22 — 1 and 1 — log(2 — z) 23 It should be mentioned that, since the approximation to 2" in Eq. 7 is valid only for all z 6 [0,1], the exponents in both terms of Eq. 6 must be ranged between 0 and 1. Notice that A and B as in Eq. 6 are in the range of [0,1]. Hence Eq. 6 is re-written as c = 2B-A — 21-A/2 if A < B (9a) 01‘ c = 21+B—A/2 — 21-A/2 if A 2 B (9b) In Eq. 9a, by Eq. 7, the second term 21‘A is approximated by the following 21—A z 2 — log(1 + A) =2—A’ am where A’ = log(1 + A) and the first term 23’A is approximated by zB-A z 2 — log(2 — (B — A)) 24 Letp=1—B+A,thenp< 1 and 2 —1og(2 — (B — A)) = 2 — log(1 +p) Similarly, the first term 21+B‘A in Eq. 9b is approximated by 21+B'A z 2 — log(2 — (1+ B — A)) = 2 - log(1?) where p Z 1 because A 2 B in Eq. 9b. Denote p = pa + p f, where p0 and p, are the integral and fractional parts of p respectively. If p Z 1, then p0 = 1 and p — l = p f. On the other hand, if p<1,thenp0=0andp=pf. InEq.9a,A 2‘”, in other words, all DI FF g 2‘13. This concludes that the developed algorithm achieves an accuracy of 13 bits. The two tables — ROMl and ROM2, are used and each has a size of 33 28 x 16, or 4k bits. Note that log(1 + :13) < 22: for all :1: > 0. It follows that the first 7 bits of the output of ROM2 must be 0. So the size of 28 x 9, or 2.25k bits is actually enough for ROM2. It should be mentioned that the slow division process in Eq. 4 is improved by approximating 2’ using the existing ROM table. However, the nonlinear approximation approach developed in this algorithm limits to m = 8 and the accuracy of log(1 + z) to 13 bits. The accuracy can be improved by either selecting better nonlinear approximation functions, or alternatively, using additional ROM table for the values of 2’, where z E [0, 1). The first alternative seems difficult due to the fact that any nonlinear approximation function to the function 2’ itself need to be effectively evaluated first. In this study a connection between 2‘ and available logarithmic function is discovered and hence there is not much hardware overhead involved. But in general, it’s not easy to find a very convenient approximation function. As discussed in chapter 2, linear function and 34 truncated Taylor series have been tried before but are not very satisfactory because of either low precision or heavy computation involved in the evaluation of the approximation function. Difference grouping technique can be used in the second alternative. Let ROM3 be the look-up table for the difference between 22 — 1 and the nonlinear approximation function (8) with the size of 2'" x m bits. Thus, the approximation of 2‘ can be expressed as 2z z 2 -— log(2 — z) + A(z) for z 6 [0,1] (12) where A(z) is the difference error by the first stage nonlinear approximation and can be stored in ROM3. The value c in Eq. 10 can then be calculated as (1—P)+A(1—p)+fl§‘:—Al ingB C: (13) Algorithm 1.2 summarizes the procedure for this implementation. It has been developed and simulated for IEEE single precision conversion (23 bits) 35 with three internal guarding digits, making the total number of bits in a word to be 26, which corresponds to m=13. In this case, ROM1 has size 213 x 26 bits, ROM2 has size 213 x 14 bits since the first 12 bits of log(1 + c’) are all 0, and ROM3 has size 213 x 6 since, by previous analysis, A(z) is less than 2‘8 and hence only the last 5 bits plus a sign bit need to be stored. Algorithm 1.2 Step 0. Step 1. Step 2. 2.1 2.2 2.3 2.4 Step 3. (ROM1(x) and ROM2(c) are the same as in Algorithm 1.1) ROM3(x) for 2‘ — 2 + log(2 — :r) with size of 2'" x m (Same as Algorithm 1.1) Calculate c from Eq. 13 p=1—B+A=po+pf. P = ROM1(pf); A’ = ROM1(A); IF p0 = 0 THEN c =1— P + ROM3(1— p) + ROM1(1 — A)/2; ELSE c = (—P + ROM3(2 — p) + ROM3(1— A))/2; c = c + A’/2. (Same as Algorithm 1.1) 36 Simulation results show that the maximum conversion error is 2‘24 after truncating the last two guarding digits. Note that the look-up table size can be further reduced by using PLA and other look-up table compactification techniques. The use of PLA in [1] reduces the size to about 16% of the original ROM size, similar effect can also be achieved by using PLA in this scenario. 37 3.2 Continuous Factorization Algorithm In the previous section, nonlinear approximation method is proposed to reduce the size of ROM for look-up table. The limitation of the scheme is that its reduction of ROM size is only one-step and hence is not suitable for high precision conversion. Limited by the look-up table size, a natural alternative to increase conversion speed is to use parallel processing. The Continuous Factorization Algorithm (CFA) is proposed that uses look-up table and SRT division technique [2] to achieve fast high precision conversion. We first formulate the problem in this approach. For an n-bit fractional binary number a: =.:1:1:r:2...:r,, we factorize 1 + a: as 1+1: = 1+.x1x2...xn z (1+.q1)(l+.0q2)...(1+.0...0qn) (14) where each q,- is either 0 or 1. Then log(1 + x) can be approximated as log(1+ x) zlog(1+.q1)+log(1 + .0q2) + . . . + log(1 + .0. . .an) (15) 38 Since each q,- is either 0 or 1, the term log(1 + .0. . . 0q,-) is either 0 or log(1 + .0. . .01), which can be pre-stored in a look-up table. Therefore this approach employs a look-up table of at most n x n bits. For example, if n=64, then the ROM size is no more than 64 x 64 = 4K bits. The question now is how to determine the qi’s. In principle the factorization process of 1 + a: can be carried out as follows. First 1+2: = 1+.x1z2...xn z (1 + .$1)(1+ .00.],2013 . . . 0.1,”) (16) where q1 = $1 and .a1,2a1,3...a1,n = %—f—" (17) In a similar way, 1 + .0a1,2a1,3 . . . a1,n 2 (1+ .0a1,2)(1 + .00a2‘3a2,4 . . .a2,,,) (18) where q2 = am, and .a2,3a2,4 . . .02," = .a1,3a1,4...a1,n (19) 1 + .0013 39 In general, at step j, j = 1,2,... ,n — 1, 1+ .00 . . . OajJHajJ-H . . . 07',” Q: (1+.00 . . . Dam-+1) X (1 '1' .00 . . . 0aj+1,j+2aj+1,j+3 . . . 0.341,") (20) where q,- = aj.j+1 and 'aj1j+2aj1j+3 ' ' ' ajin (21) .a '+1, 31.20. '+1, '+3 . . . a '+1,n : J I I I J 1+.00...0a.,-,,-+1 For example, consider n = 6 and :r = .111101, By Eq. 21, 1.111101 z (1.1) - (1.010011) q1=1 1.010011 5:: (1.01) . (1.000011) q2 :1 1.000011 m (1.000) . (1.000011) q3 = 0 1.000011 2 (1.0000) . (1.000011) 04 = 0 1.000011 z (1.00001) . (1.000001) qs = 1 1.000001 z (1.000001) - (1.000000) qe = 0 As a result, (1.1)(1.01)(1.000)(1.0000)(1.00001)(1.000001)=1.11101111 40 In the next section we analyze the approximation error and ways to reduce look-up table size and computation steps. In that section we give a sufficient condition that guarantees the conversion precision. Following that we first develop a parallel processing structure that reduces the computation steps from 0(n2) to 0(n), then we develop a more effective implementation algorithm which is based on the SRT division spirit. 41 3.2.1 CFA Approximation Error Analysis In this section we analyze the various approximation errors in the CFA conversion scheme which is based on the factorization process. We assume the number of fractional bits is n and use ulp (unit of last position) to denote 2‘". Also we assume that in any division if the word length of the divisor is less than that of the divident, then the divisor is truncated. This makes the hardware structure simpler. Theorem 1. Suppose the register has fractional length of n, then .0. . .Oak,k+1 . --ak,n (1+ .0...0qk) gives an n-bit fraction with error less than ulp. Proof. If g). = 0, then clearly the error is 0. If qk = 1, note that in the division process, the quotient bits are accurate until the last A: fractional 42 bits when the divisor 1 + .0 . . .qu becomes 1 due to truncation. Hence 0 - ' ° Oyn—k+lyn-k+2 - ° - yn '0 - - - Oyn—k+lyn—k+2 - - -yn error = — 1+.0...01 1 (.0 . . .Oy._..1y.-.+2 . . .y.)(.0 . . .01) 1 + .0...01 < 2“"4‘) - 2"c : 2_n = ulp Theorem 2. For any fixed n = 2m > 0, the approximation error in the following 1 + .0 . . . 0$m+1$m+2 . . . (L'Qm z (1+ zm+12-m-1)(1 + xm+22-m-2) . . . (1 + x2m2-2'") (22) is less than %ulp. Here each 2:,- is either 0 or 1. Proof. Clearly the maximum approximation error occurs when xm+k = 1 for all 1 S k g m. We assume this and proceed as follows: For 1 S k g m, let rk denote the remainder (error) of the product of the 43 first k terms on the right side of the Eq. 22, i.e., r), = (1+ 2""H)(1+ 2‘m‘2) . . . (1 + 2"""‘) m k —(1+.0...011...1) Note that rm equals the maximum error stated in the theorem. The first few rk’s are easy to compute. For example, 1+2—m-1=1+2‘""1+0 => T1=0 and (1+ 2-m-1)(1+ 2-m-2) = (1+ 2-m-1 + 2-m-2) + 2-2m—3 => T2 = 2-2m—3 It can also be checked that 7.3 : 2—2m-3 + 2—2m—4 + 2—2m-5 < 2—2m—2 So the theorem holds for m S 3. 44 To estimate the bound on rk in general, first note the following facts. Fact 1: 1+x0. Fact 2: e“ < 2:1: +1 for all :1: E (0, %]. Form2k24, Tk-Tk—i = [(1+2‘m‘1)(1+2'm‘2)...(1+2"")’° (1+ M] — [(1+ 2'm‘1)(1 + 2‘m’2) . . . (1 + 2"" "+1)1+ M] = (1 + 2—m—1)(1+ 2"”) . . . (1 + 2-""-’°+1)2-"'*—’c — 2”“ = 2-m-k[(1 + 2-m-1)(1+ 2-m-2) . . . (1 + 2-m—k“) — 1] 45 The term (1 + 2'm‘1)(1 + 2‘m’2) . . . (1 + 2‘m‘k“) is estimated as follows, (1 + 2-m-1)(1+ 2"“) . . . (1 + 2-m-k+1) 00 < H(1+ 2"“) 00 -—m—i <11e2 1'21 00 2—m—i 2 eg1 (by Fact 1) 2—m =e < Tm“ + 1 (by Fact 2) Thus 7-,, — rk_1= 2—m-’=[(1 + 2-m-1)(1+ 2‘m-2) . . . (1 + 2—m-k+1)— 1] < 2—m-k . 2—m+1 = 2-2m-k-l-I 46 and it follows that for m 2 4, m Tm = 7'3 + :01: — Tic—1) k=4 00 < Ts +201: — Tic—1) k=4 m < T3 + Z 2-2m—k-H k=4 = T3 + 2-2m—2 < 2—2m—2 + 2—2m—2 < 2—2m—1 1 1:] Corollary 1. For any n = 2m-bit fraction factorization as in Eq. 15, only the first m steps are needed and Qm+k = am,m+ka 1 S k S m The error caused by this is less than ulp. Proof. By theorem 2, error < (1+.q1)(1+.0q2)...(1+.0...0qm)-rm < 2rm < ulp 47 El Theorem 3. For each n = 2m-bit fractional input 2:, at most m(m — 1)/2 quotient bit operations are needed and the maximal error resulting from factorization is less than n . ulp. Proof. From corollary 2, only m divisions are needed. Because of truncation at the end of each division, for 1 S k S m, only m — k quotient bit operations are needed. So the total number is i:k=m(m— 1)/2 k=1 Note that the error caused by air—,3 is at most §ulp < ulp and is less than 2ulp for each divisor 1 + .0 . . .qu, 2 S k g m. By theorem 2, total error < ulp + 2ulp(m —- 1) + ulp = 2m - ulp = n - ulp 1:] Corollary 2. The final conversion error due to the factorization process is 71-11! less than —31n2 . 48 Proof. Since the function log(1+x) is continuously differentiable for x 6 [0,1], for any x and y such that 0 g x < y S 1, dlog(1 + x) log(1 + y) -— log(1 + x) g Max dx x6(0,1) '(y—Ir) Cl Corollary 3. In order to achieve n-bit conversion accuracy, i.e., for each n-bit fractional input, the logarithmic conversion error is less than ulp, the following condition on the number of internal guarding digits g is sufficient. log(n + g) — log(ln 2) < g Proof. Note that the error caused by factorization is positive and the error caused by look-up table truncation is negative. So we can just restrict the error from factorization and error from look-up table truncation to be both less than ulp. From Corollary 2, we have "_l'gg-(Mg) < Q-n ln2 49 which leads to log(n + g) — log(ln 2) < g For the look-up table truncation error, it is less than (n + g) - ulp. So we have (n+g)2-(n+g) < 2-n which leads to log(n + g) < 9 By comparing the above two conditions we have log(n + g) — log(ln 2) < g (23) Example 2.1 Suppose n = 56, then by Eq. 23 we can choose 9 = 7 to guarantee the conversion accuracy. Note that the condition developed in this section is just sufficient and may not be optimal. Some estimates in the analysis can be further refined. For 50 example, it can be easily shown that not all qi’s can be 1 whenever n > 3; the bounds in the proof of Theorem 1 can also be lowered. Nevertheless, as shown in the above example, the number of guarding digits is not big, especially in long word case. So the result can be used in designing high precision conversion structure. The next two theorems are developed to reduce the size of the look-up table used to store log(1+ 2“), i = 1,2,. . .. Theorem 4. For each i > 1, log(1 + 2") is a fraction with (i-I) leading 0’s. Proof. From the identity log(1 + x) < we have —i ln2 log(1 + 2") < < 2““) 51 Theorem 5. Suppose n = 2m, then for k > m, 1 log(1+ 2"’°) z ~2- log(l + 24‘“) 1 Proof. From calculus technique, we have that for x E (0, 5], 1 5 log(1 + x) — log(1 + 9] = logv1+x—log(1+ £)] 2 \/1+x =log 1 1+5 1 _§ -1+c 2 =log(1—8( +2 x2) where0 D 2 0.5. The division 2Fj—1 : Fj—l DH — : Fj_1 D defines P = Fj_1. This implies that D is ranged between 0.5 and 1.0 and P is between -1.0 and 1.0. As shown in Fig. 4, two comparison constants cl = 0.2510 2 0.012 66 and c_1 = —0.2510 = 1.112 (2’s complement representation) can be used. Thus, the quotient digit qj is estimated by 1 If Fj_1 > C1 Qj = 0 If C_1 S Fj__1 S Cl (27) _1. lfFj_1< 6.1 The improved continuous factorization scheme is summarized in the following. Algorithm 2.3 Step 02 F0 = $311132 . . .13" and D0 =1 Step 1: For each 3' = 1,2,... ,n, 1.1 Estimate g, from Eq. 27 1.2 E] = 2Fj_1— qJ'DJ'_1 1.3 Dj = Dj_1 + Qj ' 2_ij_1 Example 2.7 67 217,.1 D,-_1 (13' F1 Di 1.111010 1.110100 0.101000 1.010000 1.010000 1.011110 1.000000 1.100000 1.111000 1.111000 1.111111 1.111100 l—‘I 0.111010 0.010100 0.101000 0.101000 0.101111 0.100010 1.100000 1.111000 1.111000 1.111111 1.111100 1.111101 Table 6: Example 2.6 Let’s reconsider the previous example in this algorithm as shown in Table 6. In this example D6 = x. In general there could be some error whose range is analyzed in the section 3.2.1. Note that steps 1.2 and 1.3 in algorithm 2.3 can be completed very fast by the use of carry-save-adder (CSA), which eliminates carry propagation and produces an approximate value, whose accuracy determines the number of bits kept for a short length full addition. 68 3.2.5 Modified Error Analysis and Hardware Implementation Structure Because of the simplication involved in algorithm 2.3, the error analysis in subsection 3.2.1 needs to be modified. It is done in this subsection and the schematic structure to implement the developed algorithm 2.3 is also proposed. Suppose the input word length is n bits and let g be the number of guarding digits needed to ensure conversion accuracy of also n bits. This means the internal register length is n + g. Let ulp = 2‘("+9). Note that in algorithm 2.3 the only error is caused by the truncation in calculating DJ- at each step j, which result in an error no larger than ulp. It then follows easily that we get exactly the same result as given by Corollary 3 in subsection 3.2.1, under the additional condition that log(1 + 2") is stored in truncated form and log(1 — 2”) in round-up form. In the following the implementation structure of algorithm 2.3 is deve10ped. Suppose the input word length is n, then the look-up table used to store log(1 :1: 2’1) will have the size of 2n2. As an example, if n = 64, then 69 2n2 = 8K (bits). It’s interesting to note that by the look-up table reduction technique developed in subsection 3.2.1, and the fact that log(1 — 2:) z — log(1 + 1:), the look-up table size can be just n2, but this involves a little hardware overhead. Three CSA’s are needed. one for the addition of Zlogu :l: qj2'j) 1:1 and two others for the factorization process. One is used for the dividend F], the other for the divisor Dj. Note that q1 = x1 is directly determined. For the comparison of partial remainder F ,- with comparison constant 0, note from the P-D plot that the allowable error is i. To ensure convergence in the algorithm, we need to retain 4 fractional bits of the CSA for Dj, plus the first integral digit and sign bit, we need to do a full addition of 6 bits for the comparison step 2.1 in algorithm 2.3, which can be implemented by a carry-look-ahead adder to improve speed or a usual full adder to save chip area if the carry propagation delay is bearable in the application. 70 «9 a f1f2-f4 q q. q. 001***101 Table 7: Truth table for q After the result sa.f1f2f3f4 of full addition is produced the quotient bit q can be determined. Table 7 shows the encoding of q and its logic function depending on sa.f1f2f3. The logic functions of q, and q, are then derived as ‘13 = 3(6'1' 71) qa=a€Bs+a€Bf1 The implementation structure is shown next. 71 Shift bits Shiftj bits Figure 5: Block diagram of the implementation of algorithm 2.3 4 Conclusion Logarithmic number system (LN S) is an attractive alternative to the conventional number system when the data need to be manipulated at very high rate over a wide data range. The major problems in dealing with the logarithmic number system are to derive logarithm and anti-logarithm quickly and accurately enough to allow conversion to and from the conventional number representation. This thesis study develops two efficient algorithms for binary logarithmic conversion : TLFA (Two Layer Factorization Algorithm) and CFA (Continuous Factorization Algorithm). The former uses a factorization approach to reduce the look-up table size and employs a nonlinear approximation method to reduce the computational complexity. Simulation results on IEEE single precision conversion (23 bits) show that the algorithm requires only one ROM table of 213 x 26 bits, one of 213 x 14 bits, and one of 213 x 5 bits, or a total of 360K bits. This algorithm has advantages over other work on this problem in both speed and implementation structure which is mainly due to the discovery of a convenient nonlinear approximation function. For handling long word 72 73 length numbers, such as in IEEE double precision (53 bits), CFA uses a continuous factorization process to further reduce the look-up table size and reduces the problem to a division one which can be implemented using the SRT division technique [2] to give a high-speed yet simple structure. More specifically, the conversion time is about three additions irrespective of word length. The speed of SRT-based divisions is mainly determined by the complexity of the quotient-digit selection [2, 15, 16]. To speed up the division process, one may reduce the number of iteration steps by increasing the radix 6 of the sign digit number system used in the process. Selecting ,6 = 2'" allows the generation of m quotient bits at each step and the number of steps can be reduced to [i] where n is the word length. However, the complexity of the quotient bit selection and remainder updating increase for high radices, offsetting the advantages of the reduction in the number of iterations [2]. Therefore, exploring the speed performance in this logarithmic conversion problem using the high-radix SRT division scheme is an interesting topic for future work. Appendices Appendix A Source code of simulation program for TLFA 1.1 (in Java) // This program simulates algorithm 1.1 for all // possible 16—bit input combinations. class misc { static int N=8; static long one = (long) (Math.pow(2,N)+.5); static double e=Math.pow(2,-N); public static void toBinary(long x, int n) { int 1; int [J a=new int [128]; for (i=n-1; i>=0; i--) { 74 75 long t=x&1; a[i]= (int) t; } for(i=0; i=O; i--) { long t=x&1; x /= 2; a[i]= (int) t; } for(i=0; i=0) c=misc.expconv(t1)+one/2-t2; else c=misc.expconv(one+t1)/2-t2; y1 = misc.1t1(x,0)+misc.lt2(c,0); t = misc.lt(x*one+y); // misc.toBinary(y1,N); misc.toBinary(t,N); t1 = yl-t; if (Math.abs(t1)>err) err = Math abs(t1); // System.out.println(tl + " " + err); } System.out.println("Maxerror=" + err); References [1] H.Y. Lo and Y. Aoki, ”Generation of a precise binary logarithm with difference grouping programmable logic array,” IEEE Trans. on Computers, C-34(8) pp.681-691, 1985 [2] I. Koren, Computer Arithmetic Algorithms, Prentice Hall, 1993. [3] NC. Kingsbury, ”Digital filtering using logarithmic arithmetic”, Electron. Lett., Vol 7, pp. 56-58, 1971 [4] W.F. Wong and E. Goto, “Fast evaluation of the elementary functions in single precision”, IEEE Transactions on Computers, Vol 44, No. 3, pp. 453-457, 1995 [5] S.-C. Huang and LG. Chen, ”The chip design of A 32-b Logarithmic Number System,” Proc. IEEE International Symp. on Circuits and Systems, pp. 167-170, 1994. [6] MC. Arnold, T.A. Bailey, J .R. Cowles and MD. Winkel, “Applying features of IEEE 754 to sign/ logarithm arithmatic”, IEEE Transactions on Computers, Vol 41, No. 8, pp. 1040 1049, 1992 85 [7] [8] [9] [10] [11] [12] 86 F.-S. Lai and C.-F. E. Wu, “A hybrid number system processor with geometric and complex arithmetic capabilities”, IEEE Transactions on Computers, Vol 40, No. 8, pp. 952-962, 1991 D.K. Lostopoulos, “An algorithm for the computation of binary logarithms”, IEEE Transactions on Computers, Vol 40, No. 11, pp. 1267-1270, 1991 I. Koren and O.Zinaty, “Evaluating elementary functions in a numerical coprocessor based on rational approximations”, IEEE Transactions on Computers, Vol. 39, No. 8, pp. 1030- 1037, 1990 D.M. Lewis, “An architecture for addition and subtraction of long word length numbers in the logarithmic number system”, IEEE Transactions on Computers, Vol 39, No. 11, pp. 1325-1336, 1990 T. Stouraitis and F.J. Taylor, “Floating-point to logarithmic encoder error analysis”, IEEE Transactions on Computer, Vol 37, NO. 7, pp. 858-863, 1989. T. Kurokawa, J. Payne and S. Lee, “Error analysis of recursive digital filters implemented with logarithmic number systems”, IEEE 'Ilrans. Accost., Speech, Signal processing, Vol ASSP-28, pp.706-715, 1980 87 [13] F.J. Taylor, “A hybrid floating-point logarithmic number system processor”, IEEE Trans. Circuit System, Vol GAS-32, pp. 92-95, 1985 [14] F.J. Taylor, R. Gill, J. Joseph and J. Radke, “A 20-bit logarithmic number system processor”, IEEE Transactions on Computer, Vol 37, pp. 190-199, Feb. 1988 [15] T .-H. Pan, H.-S. Kay, Y. Chun and CL. Wey, “High-Radix SRT division with speculation of quotient digits,” Proc. IEEE International Conference on Computer Design: VLSI in Computers & Processors (ICCD’95), Austin, TX, pp. 479-482, Oct. 1995 [16] CL. Wey and C.-P. Wang, “VLSI implementation of a fast radix-4 SRT division,” Proc. of 39th Midwest Symp. on Circuits and Systems, Iowa, 65-68, August 1996 MICHIan STQTE UNIV. LIBRARIES [I]ll[ll[l[llllllllllll[I[I]Ill][Illllllllll 31293017865969