_ .Q~:I1\I_[?‘Q:I V- .‘ .
1 1:0 ILZIII I :II
. i,“ :. ‘. I I If”. I . 1.
1 UI '.'-I.'I I q. 4"
. a g

: :u (11;. i'l'z'; 5"): V“I):‘$li} I‘ t
4 . @13er 3:513”??? ”u“-

   

 

 
 
 
  

  
 

' 1" .LI; 'I:.'.I..I.I::I»" ' .g" Zi“‘.l‘.i‘I‘IrI:Ii"' 351133“? :‘L' -."r". 3 5516?: {ﬁ- §é"“ 5533’." I
. ‘ 'iI.:'I" q. .9 (90 !' 12::I“:|; ‘EthIA'L-Itﬂk-I #1:; “I. ' ﬁnﬂibf‘fﬁ? . '1! ' ”1| .
525,3”. ' IF‘I .5271; .9313; III‘ ‘1 If: 1-31" ‘ ”V :tiéh'llI I 1.233132?! Llflsizwﬁ ﬁﬁﬁzigtﬁ' ’17-:
I" ,- .« ﬁrm ".WII“ 22:: :50}..- J‘ 5: y‘zvm'; .
:2: ' I -II ~ ‘5 H m! . “hr-ii "32'! 4; ‘

     
 
 
 

'- {‘t
ﬂﬁé'f.
3-?"1

o

u'u v 1'
13‘s“. ii? "I": t

'12 ‘1“
:}"!: I I:

    

         
    

           

    

  
 
 
   
   

         
     
  

   
  
   

 
     
  
    
  

  
     
 
 

  
 
 
  
  
      
       
    

r 2"“. ‘
2. _ u 2';I:I'I 1‘: ‘.
.V I; [-I .3 If; I‘ll}? .EI'IL'I z.
‘A . - . 31:125.:1] I12[1'-I .‘ 2 - ' n
I . I ‘ . ' :‘J ‘ 19%;} 9‘ II‘ :‘E'IcII K1172“ EsII’I'I‘J-E Iii 1;};3‘:

f ' "17'III15'Q'IFII‘JI'I ””1.- “:11VI‘EJ‘III:“LIIEI}I‘;I'£ 1-“
:,I4.419I‘illtftslhhﬁ'3'ﬁuq}, . 11$." {I ' {£33 “'115‘I;I§.3::J.§Ig I
"a4 W) : ”‘5!"3‘ I91? I ”$111111? a‘”‘§'“~§ﬁzz' .,
knuwzr‘ﬁiit" I It w»?
11:22:“ v1.3.1?) "[92“! ‘M’II‘I ' ”13‘3“:
" .: I l 9 “I Igg‘siggys‘

wt:-

 
   
 

  
  
  

fob.

21.11%“
‘H i

.5.“ ‘ -
- n 22'.
- «€72:

- 5;
.{ﬂ
T 23;!
Win 3.

3.

my;

    
    

    
          
 
 
 
  
 

      
 

  
   
 
    
  

   

      
    
  

 

  
  
  
    
    
  
  

      
    

 
      
 
   
 
 
 
   

 

 

        

    
  

   

     

 

 
 
    

  
    
 

   

     

   

            

I” I '
, g. (M
II 3”." PN
." 1? I; ”a?
3; . . _..i»3‘;z::'4
.‘v‘ o R ‘ ‘ 1‘1"!" .
c .If
.“.E itimfvlm "ail-id" - "‘5'
.2; I. 93:3.“ c #313 'v':
I. I " r- I“: 22m" "
2 I .., 2....“ am; 51:44: ’5;
.. .. .. 'I:‘-I'v'. ' '
Il‘illoﬂ‘l“; $31!. i rho) Iii}. , , l‘z‘i‘t‘zi‘ .33 ‘ ,
I I 1:1'0' "r't'ﬂwé’: I I" ‘ '2‘ .:
“WW5 .:> _ .1 1,:
t!!!’3:itii‘:.¥§iinig 13:31:? ‘ ‘14?
. :2,” 19::S.-Iff.l:.pi§;:.é'i I 3
1.1 C :l':

   

:1’il"dl"‘ '
5 J“. 33*

    

o '23. 'I1

IV I”!

    
   

 

  

#32
’U.

  
   
     
   

,--

   

 
   
  

   
  

 

 

 

  
 

 

ii 15:.

or???

. E 1' “I If ‘ .
42:91:“!!! -- ..

.
. z:
.‘c; -

«a-
'1‘v<-
m .

 

1': -
v.
J V -
.-__
P
5:!
c _ ~
‘21:)».

‘.n ézglq, :2."
III .. .. I»;
It

'I‘I‘I "I"

‘ .Jt‘ﬂl
4:333
I‘M. ‘5.
. .
5;.‘6

.

\.

.24.

(I? ‘41”?
I’I IAll"
I!

 

I54; IIII. .'
| {13:11:};
' 1:21.}13'31134 , J '

":1. I: I.,,:-’:, .‘ ‘1‘;
,. Ila l"Mi'-!I"LIII II'EII’: I‘l‘I {I I: MW wiiiijwﬁﬁ“; 11134111
. i: I'E'I‘ﬂ’fIl: . ”m ”I “2"th HI." ‘1'”: "Iq XIII“: I’IIIV} “1111);: $534}

I'I IAIMII 1‘ 6|... . III III III. .416!“ LAW “2 . 3.11.5”

 

THESIS

I IIIIIIIIIIIIIIIIII‘ III/III IIIIIIII

LIBRARY “293 ”6" 969
Michigan State
University

 

 

 

 

This is to certify that the

thesis entitled

EFFICIENT ALGORITHMS FOR GENERATING
HIGH PRECISION BINARY LOGARITHMIC NUMBERS

presented by

Y1 Nan
has been accepted towards fulﬁllment

of the requirements for

MS degreein Electrical Eng

Major proesfé ﬁfréjy

 

Date BJ}0, {$7

0-7639 MS U i: an Afﬁrmative Action/Equal Opportunity Institution

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINE return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE

DATE DUE

DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ma mu

Efﬁcient Algorithms for Generating

High Precision Binary Logarithmic Numbers
by

Yi Wan

A THESIS

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

MASTER OF SCIENCE

Department of Electrical Engineering
1997

Abstract

Efficient Algorithms for Generating
Precise Binary Logarithmic Numbers

by

Yi Wan

Two algorithms for fast evaluation of binary logarithm are developed in
this paper. They provide advantages over other recent work on this problem.
The ﬁrst algorithm — Two Layer Factorization Algorithm (TLFA), uses a di-
vision and a newly discovered nonlinear approximation method to provide an
improvement over the often quoted difference grouping algorithm and other
recent work on this problem, both in speed and implementation structure.
The second algorithm — Continuous Factorization Algorithm (CFA), is pro-
posed to suit high speed and high precision conversions. It uses a continuous
factorization process to reduce the problem to a division one and employs
the SRT division technique to enhance speed. The resulting conversion time
is only about three additions irrespective of word length. The error analysis

and implementation structure are developed for this algorithm.

ACKNOWLEDGMENT

I would like to thank Dr. Chin-Long Way
for his encouragement, support, and care, and
many of his insights I got during our discus-

sions together.

iii

Contents

1 Introduction 1
2 Background 6
3 Algorithm Development 15
3.1 Two Layer Factorization Algorithm ............... 15
3.2 Continuous Factorization Algorithm ............... 37
3.2.1 CFA Approximation Error Analysis ........... 41

3.2.2 Restoring Approach .................... 54

3.2.3 Modiﬁed Restoring Approach .............. 59

3.2.4 Non-restoring Approach ................. 63

3.2.5 Modiﬁed Error Analysis and Hardware Implementa-

tion Structure ....................... 68

4 Conclusion 72

A Source code of simulation program for TLFA 1.1 (in Java) 74

B Source code of simulation program for TLFA 1.2 (in Java) 80

iv

List of Figures

1 Graphs of y = a: and y = log(1 + x) ............... 6
2 The graphs of 22 — 1 and 1 — log(2 — z) ............. 20
3 Difference between 22 — 1 and 1 —— log(2 — z) .......... 22
4 Radix-2 P-D plot ......................... 64
5 Block diagram of the implementation of algorithm 2.3 ..... 71

List of Tables

1 TLFA ROM size comparison .................. 18
2 Simulation result of TLFA 1 with n = 16 ............ 32
3 8-bit look-up table for log(1 + 2“) ............... 52
4 Example 2.5 ........................... 58
5 Example 2.5 ............................ 62
6 Example 2.6 ............................ 67
7 Truth table for q ......................... 70

vi

1 Introduction

Logarithmic number system (LN S) is an attractive alternative to the con-
ventional number system when the data need to be manipulated at very high
rate over a wide data range. The LNIS can simplify multiplication, division,
root, and power operations [2]. When logarithm is used, multiplication and
division are reduced to addition and subtraction, and power and root opera-
tions are reduced to multiplication and division. On the other hand, addition
and subtraction operations become more complex. Another major problem is
deriving logarithms and anti-logarithms quickly and accurately enough to al-
low conversions to and from the conventional number representations. These
conversions always involve slow-speed approximations. Therefore, binary
logarithms can be useful only in arithmetic units dedicated to special appli-
cations, where very few conversions are required but many multiplications
and divisions are executed; e.g., real-time digital ﬁlters [3]. The objective
of this thesis is to develop efﬁcient algorithms that convert the conventional
number representation to binary logarithmic representation. More speciﬁ-
cally, given a non-zero binary number X, evaluate log X. Note that such an

X can always be normalized as

X = 1.x1$2...xn x 2"

for some integer k and then

logX = log(1 + 311:2 . . .513”) + k

Therefore, this conversion problem can be re-stated as:

Given an n-bit fractional binary number

x=.x1222...:1:n

how to efficiently generate the precise binary logarithmic value of (l + :c),

i.e.,

y = log(1 + 1:) (1)

Here y = .ylyg..y,, is also an n-bit fractional binary number, i.e., y E [0, 1).

In the LNS, for logarithmic addition, let

a = log(A) and b = log(B)

i.e., A = 2“ and B = 2b.

Suppose A < B. Let r = A/B, then 0 5 1' <1 and

A
A+B—A(1+§)

= A(1 +1")
hence

log(A + B) =log[A(1+ r)]
= log(A) + log(1 + r)

= a+log(1 +7')

The problem of addition in LNS then involves the following important step:

Given an n-bit binary number 7‘ E [0, 1), it is to generate the binary

logarithmic value of (1 + 7').

Therefore, the efﬁcient algorithms deve10ped for conversion in Eq. 1 can
also be used for the logarithmic addition problem. Recently a number of
new binary logarithmic conversion algorithms, e.g., [1][2] [4] [5], have been
proposed. The existing conversion algorithms can be roughly classiﬁed into
two categories: look-up table approach and computation approach. The

former approach adopts various approximation schemes such as linear

approximation methods so that moderate-size look-up table can be
implemented [1]. Look-up table approach generally has the advantage of
fast speed. However, the look-up table size often increases drastically as the
word length increases. The latter approach converts the binary logarithm
with multiplication and/ or division operations [2] [4] However, the
applicability of such approach is limited by its slow operation speed. The
trade-off between both approaches is the speed and precision. Thus, a
feasible solution is the combination of both approaches. This study
develops two efﬁcient algorithms using factorization scheme for binary
logarithm conversion. The ﬁrst algorithm — Two Layer Factorization
Algorithm (TLFA), uses a factorization approach to reduce the look-up
table size and employs a nonlinear approximation method to reduce the
computational complexity. Simulation results on IEEE single precision (23
bits) conversion shows that the algorithm requires only a reasonable
small-size ROM. For handling long word length numbers, such as IEEE
double precision (53 hits), the second algorithm — Continuous Factorization
Algorithm (CFA) is developed. It uses a continuous factorization process to
further reduce the look-up table size and incorporates the SRT division

technique [2] to reduce computational complexity.

In the next chapter, some recent conversion algorithms are brieﬂy reviewed.
Chapter 3 presents the proposed conversion algorithms with experimental

results. Finally, a concluding remark is given in Chapter 4.

2 Background

This chapter brieﬂy reviews some existing conversion algorithms. The basic
easy method in the look-up table approach is the use of a simple linear

function y = as to approximate y = log(1 + x), as shown in Fig. 1 [1].

 

 

 

 

1 1 I l r r r r l r
0.9 - -
0.8 - -
0.7 - ..
0.6 - _
0.5 r -
0.4 '- -
0.3 *- 1
0.2 - _
0.1 '- a

O l l 1 1 1 l 1 l l
0 01 02 0.3 04 0.5 06 07 08 09 1

Figure 1: Graphs of y = :r and y = log(1 + z)

In this scheme, if we let the approximation error

A=log(1+a:)—:r

then the maximal approximation error Am“ = 0.086071 occurs at

a: = 0.442695. The number of different groups is deﬁned by [2" - Amax],
where [*] is the ceiling function and n is the word length. Thus, it requires
23 different groups for 8-bit word length. The conversion uses a ROM with
8 inputs and 5 outputs. However, the number of different groups increases
exponentially as the word length increases. So, even though this algorithm
has very fast speed, it is only suitable for very low conversion precision and

hence has very limited use.

A multiplicative normalization algorithm is developed in [2] which uses a
relative simple convergence procedure for the conversion process. The

recursive procedure is:

At step 2',

$14.1 = 113; ° bi and yi+1 = yi —' log(bi)

The initial conditions are 2:0 = :1: where x is a fraction and yo = 0.

For the ease of using look-up table, each b,- is chosen as either 1 or (1 + 2").
It shows that if 33,,“ = 1, then

yn+1 = 108(3)

= - " log(bk)
k=1

In fact, each 3},- is a truncated conversion result of i-bit length. The
algorithm uses a small look-up table to store the values log(1 + 2“). In
addition, only the log(b,),z’ = 1, 2, . . . ,n, need to be stored for 272-bit word
length. However, this algorithm needs to compute the multiplication 3:, - b,-
and the subtraction y,- — log(b,). The former operation can be simpliﬁed
using shift-add operations because I),- is either 1’ or (1 + 2"). Carry

propagation is a speed-slowing factor in this approach. The latter operation

can be improved using carry-save adder.

An ATA (add-table look up-add) method [4] uses the truncated Taylor
series and a difference grouping method. Suppose the function f is smooth

and the 24-fraction-bit input

X = 1:0 + M21 + A2332 + A3$3

where A = 2’6 and 1:,- < 1, z' = 0, 1, 2, 3, are 6-bit in length each.

The Taylor series of f (X ) is

f(X) = i f(n)($o + A$1)(A2x2 + A3$3)n

n!

 

n=0
which, after truncating the high order terms and further expanding the

second term, can be approximated by
f(X) = f(zo + Aral) +
A
§{f($0 + A131 + A1112) — f(IIIo + /\$1 — AT2)} +
A2
—2—{f(zo + Ax1+ A333) — f(rro + Axl — Ax3)} +
2 3
A4{Ezf(2)($0) _ £sz3)($0)}
2 6
The computation of f needs only add/sub/ shift operations, including at

least 5 full-length and 4 half-length addition operations. The paper

mentions the conversion error of less than 2‘29, which means the last term
$2 2:3
mime.) — ﬁrm.»

in the expression of f (X ) cannot be omitted. In this case more look-up

tables or multiplications are needed.

 

10

In [5], digit partition (DP) is used to divide a long word and Iterative
Difference by Linear Approximation (IDLA) is used to compute an

exponential function needed in the conversion.

Suppose that a fractional input
(C = 0.3301131. . .1122
of 23-bit length produces a 23-bit output

y = O-yoy1 - - - 3122

The output is partitioned as

3/ = 0.yoy1...y11y12...y22

= 0.YZ

where Y represents the ﬁrst 12 bits yoyl . ..y11 and Z the remaining bits

y12y13 - - -?/22-

By experiment or simple analysis proof, Y is determined only by
$012] . . .2312. Hence a ROM with 13 inputs can be used to store the ﬁrst

12-bit output Y. The remaining 11-bit output Z = y12y13 . . . ygg is

11

calculated as follows.
2O'O"OZ = (1.1201131. . . {1712) X (2—0°Y) + (0.0 . . . 031311314 . . . (1322) X 2_0'Y

In the above expression, the term (1.$0$1 ...:r12) x (2”0'Y) can be obtained
by a PLA with 13-bit input $0231 . . .3312, and the second term can be

rewritten as
(0.00 _ . .033133314 . . 4322) x 2—0.Y = 210g(0.z13:t14...122)—13—0.Y
and eventually
20'0"“ = (1.x0z1...:1:12) x (TO-Y) + 2-13 x 210g(0-mzu-.-x22)—0.Y

The output bits Z = y12y13 . . . ygg can be evaluated by the IDLA algorithm.

However, the exponential evaluation is a time-consuming process, and a
full-length addition is needed for summing up the right-hand-side of the
last equation. In addition, we still need to perform a logarithmic conversion

to determine Z eventually.

A simple linear approximation plus second stage piecewise linear

approximation is developed in [7].

12

Suppose y = y1 + yg is a 23-bit fraction, where yl is the ﬁrst 11-bit part and

y2 is the last 12-bit part of y, then
log(1+y) zy+EyiAEy-y2

where

Ey 2108(11‘ 311) - .711

and

A311 = Edi/1+ 2’”) - Ey(y1)

This algorithm requires two look-up tables to store By and AEy. In
addition, it also requires a 12-bit multiplication which may degrade the

speed performance.

A direct computation method for converting y = log(z) is proposed in [8].

Suppose

logrr 2y

= yn—lyn—z ' ‘ 'ylyo-y—ly—z ' '°

13

then

a: = 23’
: 2y"-lyn-2'”y1yo.y_1y_2...

.2n-2

= 29n—1‘2n_1 , 2yn—2 , , , 2311-2 .2310 , 2y-1-2'1 _ _ ,

The developed algorithm starts with 2' = n — 1 and does the following

recursively

Ifa3222i, then y,=l, x=zx2‘2i.

Otherwise, y,- = 0.

The procedure is repeated by setting 2' = z' — 1. In this scheme, both the
comparison and the multiplication can be efﬁciently operated for 2' 2 0.
The former compares the 2‘th bit of :1), while the latter shifts at to the left
by 2i bits. However, for 2' < 0, 2i is a fraction and 22‘. involves root
Operation, which is considered too time consuming. This problem can be
solved by using look-up table to store {22-1, 22-2, . . . }. But a full length
subtraction is still needed here for comparison. Since 2‘? = 1/22‘, a
division is needed in the multiplication step :r = :c x 2‘”. Again to avoid

this division another look-up table has to be used. Then a full length

l4

multiplication has to be performed to get the product, which seriously

degrades the speed performance.

As can be seen from the above review, current work on this conversion
problem is not very satisfactory in dealing with conversion speed and high
precision even though some involve complicated structure. The work
developed in the next chapter attempts to make further improvement on

these problems.

3 Algorithm Development

This chapter presents the development of both TLFA and CPA for binary
logarithmic conversion. The former uses a two-layer factorization approach,
while the second algorithm employs a continuous factorization process for

long word conversion. We ﬁrst propose the TLFA, then develop the CFA.

3.1 Two Layer Factorization Algorithm

This section describes the development of an efﬁcient algorithm that
generates the precise binary logarithm log(1 + 1:) for an n-bit fractional
binary number x = 4231232 . . .13". Without loss of generality, n is assumed to
be an even number, i.e., n = 2m for some integer m. The term 1+x can be

factorized as

1+$=1+.2:1:r2...:z:,,
= l +.$1$2...$m$m+1$m+2...$2m
z (1 + “751172 . . . (Em)(1+ .00...0C162 . . . Cm)

: (1+.W$1$2$m)(1+ .61C2...Cm 2‘7") (2)

15

16

Let

a: .5L'11L‘2...$m
b=.xm+1xm+2...$2m
b'=b~2"'m

C: .C102...Cm
c'=c-2'm

then (1 + :r) can be expressed as

1+z=1+a+H

 

=(L+@u+w0 B)
By Eq. 3, c can be computed as
b
c _ 1 + a (4)

then the logarithmic value of (1+x) is derived as

log(1 + x) z log(1 + a) + log(1 + c’) (5)

ROM look-up table approach has been an efficient way to generate binary
logarithms. For generating the logarithm of an n-bit binary number, a

look-up table can be implemented with a 2" x n. ROM.

17

By Eq. 5, two look-up tables need to be used:

one table — ROMl, for

log(1 + 331272 . . .xm)

and the other — ROM2, for
log(1+.00...0c1c2...cm)

where each table is implemented with at most a 2'" x n ROM.

In other words, instead of using a 2" x n ROM in the conventional
implementation, this approach uses two 2’" x n ROMS, whose total size is

only a small fraction of the original one, namely, ”—14.

For example, for n = 16 and m = 8, the ROM table size is reduced from
216 x 16, or 1M bits, to two 28 x 16, or a total of 8K bits. The reduction is
signiﬁcant. Table 1 summarizes the ROM size reduction for various values

of n.

18

 

n m 2"-n 2-2m-n

 

10 5 10K 640
12 6 48K 1.5K

14 7 112K 3.5K

16 8 1M 8K
18 9 4.5M 18K
20 10 20M 40K

 

 

 

 

 

 

Table 1: TLFA ROM size comparison

However, the above approach requires the computation of c as in Eq. 4,
which involves a slow division process. As a result, the speed improvement
gained by the use of look-up tables for Eq. 5 may be offset by the slow
division process. Therefore, attempts are made to get around the division in
Eq. 4 to improve the speed performance. Taking advantage of the existing

look-up table ROMl used for log(1 + .$1$2 . . . mm), the slow division process

19

can be simpliﬁed. More speciﬁcally, by Eq. 4, c can be expressed as

 

 

_ b
C_1+a
_1+b 1
_1+a_1+a

: 2108(1+b)—108(1+0) _ 2—log(1+a)
2 2H ‘ 2"“ (6)

where A =log(1+ a.) and B = log(1+ b).

Note that the values of both A and B can be quickly retrieved from ROMl.

Therefore, the only problem remaining is the evaluation of the function 2‘.

Interestingly, the functions y = (22 — 1) and y = l — log(2 — 2), as shown in

Fig. 2, are very close to each other when 0 g 2 S 1. i.e.,
2‘—1z1—log(2—z)
or
2Z z 2 — log(2 — z) (7)

In other words, the function 2‘ can be approximated by the nonlinear

function 2 — log(2 — z) for all z 6 [0,1].

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

20

 

 

 

0.1

0.2 0.3 0.4 0.5 0.6 0.7 0.8

Figure 2: The graphs of 2‘ — 1 and 1 -— log(2 -— z)

0.9

 

21

Since

log(2 — z) = log[1 + (1 — 2)],

its value can be found from the look-up table ROMl because (1 — z) E [0, 1]

for all z 6 [0,1].

Fig. 3 plots the approximation error between 2z —— 1 and 1 — log(2 — z) for
all z 6 [0,1]. Results show that the maximum error is 0.004198, which is
much better than 0.086071 with a linear function approximation in [1].
Note that the maximum approximation error of 0.004198 is equivalent to 7
bit accuracy. To further reduce the maximum approximation error, we use

the function

1 — log(2 — z) + 2'12 + 2’13 (8)

to approximate 2‘ — 1.
As such, the maximum approximation error can be reduced to 0.0038038,
which is less than 2‘8. So m = 8 is the appropriate value suggested with

this implementation.

22

 

 

 

 

 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 3: Difference between 22 — 1 and 1 — log(2 — z)

23

It should be mentioned that, since the approximation to 2" in Eq. 7 is valid
only for all z 6 [0,1], the exponents in both terms of Eq. 6 must be ranged

between 0 and 1.

Notice that A and B as in Eq. 6 are in the range of [0,1]. Hence Eq. 6 is

re-written as
c = 2B-A — 21-A/2 if A < B (9a)
01‘
c = 21+B—A/2 — 21-A/2 if A 2 B (9b)

In Eq. 9a, by Eq. 7, the second term 21‘A is approximated by the following

21—A z 2 — log(1 + A)

=2—A’ am

where A’ = log(1 + A)

and the ﬁrst term 23’A is approximated by

zB-A z 2 — log(2 — (B — A))

24
Letp=1—B+A,thenp< 1 and
2 —1og(2 — (B — A)) = 2 — log(1 +p)
Similarly, the ﬁrst term 21+B‘A in Eq. 9b is approximated by

21+B'A z 2 — log(2 — (1+ B — A))
= 2 - log(1?)
where p Z 1 because A 2 B in Eq. 9b.
Denote p = pa + p f, where p0 and p, are the integral and fractional parts of
p respectively. If p Z 1, then p0 = 1 and p — l = p f. On the other hand, if

p<1,thenp0=0andp=pf.

InEq.9a,A<B,orp<1,

2B—A z 2 —log(1+p)

where P = log(1 -+— 1);).

25

From Eq. 10,

c = 2B-A — 21—4/2

z (2—P)—(2—A')/2

i.e.,

C: (1 —P)+A’/2 ifpo =0 (11a)

Similarly, for A 2 B, or p 2 1, in Eq. 9b,

214'13‘A a: 2 — log(p)

= 2 —log(1+pf)

= 2 — P
From Eq. 10,
c = 21+B-A/2 — 21—A/2
z (2 -— P)/2 — (2 — A')/2
i.e.,

c = (—P)/2 + A'/2 ifpo =1 (11b)

26

Note that the above simpliﬁcation makes the comparison of A and B
unnecessary in the implementation and hence saves a subtraction of length

m. Algorithm 1.1 summarizes the procedure described above.

27

 

 

Algorithm 1.1

Step 0.

Step 1.

Step 2.
2.1
2.2
2.3
2.4

Step 3.

Generate look-up tables, 2'" x n,
ROM1(x) for log(1.:r1232 . . .xm) and
ROM2(c) for log(1.00...001c2 . . -Cm)
a: = .1131232. ..zm$m+1xm+2 . . .332",

= a + b . 2""
c = £102 . . .cm
A = log(1+ a) = ROM1(a)
B = log(1 + b) = ROM1(b);
Calculate c from Eq. 11a or Eq. 11b
p=1—B+A=po+pf.
P = ROM1(pf); A’ = ROM1(A);
IF p0 = 0, THEN 0 =1— P; ELSE c = (—P)/2;
c = c + A’/2.
C = log(1 + c - 2"") = ROM2(c);
log(1+ x) =log(1+ a) +log(1+ c - 2"")

=A+C

 

 

28

The following examples illustrate the stepwise procedure of Algorithm 1.1.

Example 1.1

Consider a 16-bz't binary number a: = .10111011 11101010.

a = .10111011 and b = .11101010

From ROMI,

A = log(1+ a) = .11001011

A' = log(1 + A) = .11011000
and

B =log(1+ b) = .11110000
By Eq.11a,

p=1—log(1+b)+log(1+a)

= .11011011

Note that 130 = 0 and pf = p.

29

Since p0 = 0, c = .10001000.

From ROM2,

C =log(1+ c- 2"")
= ROM2(c)

2 .0000000011000100

This results in

log(1 + 3:) = .1100101101001110

The actual value is log(1 + :r) = .1100101101001101. The approximation

error is 1ulp (unit in the last position).

Example 1 .2

Consider again a 16-bit binary number a: = .10110100 01011011.

a = .10110100 and b = .01011011

30

From ROM],

A = log(1+ a) = .11000101

A' =log(1+ A) = .11010011
and

B =log(1+ b) = .01110000
By Eq. 11b,

p: 1 —log(1+b) +1og(1+a)

= 1.01010101

In this case p0 =1 and p; = p — 1.

Since p0 = 1, c = .00110100.

By ROM2,

C =log(1+ c - 2"")
= ROM2(c)

= .0000000001001011

31

This results in

log(1+ :c) = .1100010011110011

The actual value of log(1 + 2:) = .1100010011110101. The approximation

error is -2ulp.

To demonstrate the effectiveness of algorithm 1.1, approximation errors
have been analyzed by comparing the approximated value with the actual
value of log(1 + 2:). The simulation assumes that m = 8 and n = 16. Hence
the input fraction takes the form of .r = .x1x2 . . .5516. All possible (217 — 1)
combinations of $1332 . . .3316 are simulated for approximated values and
compared with the true values. Let DI FF denote the absolute value of the
difference between the approximated value and the actual conversion value.

Simulation results are shown in Table 2.

It can be seen that the maximum approximation error is 4ulp, or 2‘”. The
number of combinations with DI F F = 2‘14 is 27 out of (217 — 1), which is
0.041%. Because of nonlinear approximation error and look—up table

truncation, the computation of c as described in the algorithm can have an

32

 

 

 

 

 

 

 

DI F F(ulp) Number of Cases Percentage
0 22426 34.2
1 31388 47.9%
2 10417 15.9%
3 1278 1.95%
4 27 0.041%
2 5 0 O

 

 

 

 

 

Table 2: Simulation result of TLFA 1 with n = 16

error of up to 1-2ulp, this error can be magniﬁed by a factor of (1 + a) as in
Eq.3, which can then be further carried into the ﬁnal stage of taking

logarithm from the look-up tables.

Since none of the conversion results of these combinations produce
DI FF > 2‘”, in other words, all DI FF g 2‘13. This concludes that the

developed algorithm achieves an accuracy of 13 bits.

The two tables — ROMl and ROM2, are used and each has a size of

33
28 x 16, or 4k bits. Note that
log(1 + :13) < 22:

for all :1: > 0.
It follows that the ﬁrst 7 bits of the output of ROM2 must be 0. So the size

of 28 x 9, or 2.25k bits is actually enough for ROM2.

It should be mentioned that the slow division process in Eq. 4 is improved
by approximating 2’ using the existing ROM table. However, the nonlinear
approximation approach developed in this algorithm limits to m = 8 and
the accuracy of log(1 + z) to 13 bits. The accuracy can be improved by
either selecting better nonlinear approximation functions, or alternatively,

using additional ROM table for the values of 2’, where z E [0, 1).

The ﬁrst alternative seems difﬁcult due to the fact that any nonlinear
approximation function to the function 2’ itself need to be effectively
evaluated ﬁrst. In this study a connection between 2‘ and available
logarithmic function is discovered and hence there is not much hardware
overhead involved. But in general, it’s not easy to ﬁnd a very convenient

approximation function. As discussed in chapter 2, linear function and

34

truncated Taylor series have been tried before but are not very satisfactory
because of either low precision or heavy computation involved in the

evaluation of the approximation function.

Difference grouping technique can be used in the second alternative. Let
ROM3 be the look-up table for the difference between 22 — 1 and the
nonlinear approximation function (8) with the size of 2'" x m bits. Thus,

the approximation of 2‘ can be expressed as

2z z 2 -— log(2 — z) + A(z) for z 6 [0,1] (12)

where A(z) is the difference error by the ﬁrst stage nonlinear

approximation and can be stored in ROM3.

The value c in Eq. 10 can then be calculated as

(1—P)+A(1—p)+ﬂ§‘:—Al ingB
C: (13)

Algorithm 1.2 summarizes the procedure for this implementation. It has

been developed and simulated for IEEE single precision conversion (23 bits)

35

with three internal guarding digits, making the total number of bits in a

word to be 26, which corresponds to m=13. In this case, ROM1 has size

213 x 26 bits, ROM2 has size 213 x 14 bits since the ﬁrst 12 bits of log(1 + c’)

are all 0, and ROM3 has size 213 x 6 since, by previous analysis, A(z) is less

than 2‘8 and hence only the last 5 bits plus a sign bit need to be stored.

 

Algorithm 1.2

Step 0.

Step 1.
Step 2.
2.1
2.2

2.3

2.4

Step 3.

 

(ROM1(x) and ROM2(c) are the same as in Algorithm 1.1)
ROM3(x) for 2‘ — 2 + log(2 — :r) with size of 2'" x m
(Same as Algorithm 1.1)

Calculate c from Eq. 13

p=1—B+A=po+pf.

P = ROM1(pf); A’ = ROM1(A);

IF p0 = 0

THEN c =1— P + ROM3(1— p) + ROM1(1 — A)/2;
ELSE c = (—P + ROM3(2 — p) + ROM3(1— A))/2;

c = c + A’/2.

(Same as Algorithm 1.1)

 

 

36

Simulation results show that the maximum conversion error is 2‘24 after
truncating the last two guarding digits. Note that the look-up table size
can be further reduced by using PLA and other look-up table
compactiﬁcation techniques. The use of PLA in [1] reduces the size to
about 16% of the original ROM size, similar effect can also be achieved by

using PLA in this scenario.

37

3.2 Continuous Factorization Algorithm

In the previous section, nonlinear approximation method is proposed to
reduce the size of ROM for look-up table. The limitation of the scheme is
that its reduction of ROM size is only one-step and hence is not suitable for
high precision conversion. Limited by the look-up table size, a natural
alternative to increase conversion speed is to use parallel processing. The
Continuous Factorization Algorithm (CFA) is proposed that uses look-up
table and SRT division technique [2] to achieve fast high precision

conversion. We ﬁrst formulate the problem in this approach.

For an n-bit fractional binary number
a: =.:1:1:r:2...:r,,
we factorize 1 + a: as

1+1: = 1+.x1x2...xn
z (1+.q1)(l+.0q2)...(1+.0...0qn) (14)

where each q,- is either 0 or 1.

Then log(1 + x) can be approximated as

log(1+ x) zlog(1+.q1)+log(1 + .0q2) + . . . + log(1 + .0. . .an) (15)

38

Since each q,- is either 0 or 1, the term log(1 + .0. . . 0q,-) is either 0 or

log(1 + .0. . .01), which can be pre-stored in a look-up table. Therefore this
approach employs a look-up table of at most n x n bits. For example, if
n=64, then the ROM size is no more than 64 x 64 = 4K bits. The question

now is how to determine the qi’s.

In principle the factorization process of 1 + a: can be carried out as follows.

First

1+2: = 1+.x1z2...xn

z (1 + .$1)(1+ .00.],2013 . . . 0.1,”) (16)

where q1 = $1 and

 

.a1,2a1,3...a1,n = %—f—" (17)
In a similar way,
1 + .0a1,2a1,3 . . . a1,n 2 (1+ .0a1,2)(1 + .00a2‘3a2,4 . . .a2,,,) (18)
where q2 = am, and
.a2,3a2,4 . . .02," = .a1,3a1,4...a1,n (19)

1 + .0013

39

In general, at step j, j = 1,2,... ,n — 1,

1+ .00 . . . OajJHajJ-H . . . 07',” Q: (1+.00 . . . Dam-+1) X

(1 '1' .00 . . . 0aj+1,j+2aj+1,j+3 . . . 0.341,") (20)

where q,- = aj.j+1 and

'aj1j+2aj1j+3 ' ' ' ajin (21)

.a '+1, 31.20. '+1, '+3 . . . a '+1,n :
J I I I J 1+.00...0a.,-,,-+1

For example, consider n = 6 and :r = .111101, By Eq. 21,

1.111101 z (1.1) - (1.010011) q1=1
1.010011 5:: (1.01) . (1.000011) q2 :1
1.000011 m (1.000) . (1.000011) q3 = 0
1.000011 2 (1.0000) . (1.000011) 04 = 0
1.000011 z (1.00001) . (1.000001) qs = 1
1.000001 z (1.000001) - (1.000000) qe = 0

As a result,

(1.1)(1.01)(1.000)(1.0000)(1.00001)(1.000001)=1.11101111

40

In the next section we analyze the approximation error and ways to reduce
look-up table size and computation steps. In that section we give a
sufﬁcient condition that guarantees the conversion precision. Following that
we ﬁrst develop a parallel processing structure that reduces the
computation steps from 0(n2) to 0(n), then we develop a more effective

implementation algorithm which is based on the SRT division spirit.

41

3.2.1 CFA Approximation Error Analysis

In this section we analyze the various approximation errors in the CFA
conversion scheme which is based on the factorization process. We assume
the number of fractional bits is n and use ulp (unit of last position) to
denote 2‘". Also we assume that in any division if the word length of the
divisor is less than that of the divident, then the divisor is truncated. This

makes the hardware structure simpler.

Theorem 1. Suppose the register has fractional length of n, then

.0. . .Oak,k+1 . --ak,n
(1+ .0...0qk)

 

gives an n-bit fraction with error less than ulp.

Proof. If g). = 0, then clearly the error is 0. If qk = 1, note that in the

division process, the quotient bits are accurate until the last A: fractional

42

bits when the divisor 1 + .0 . . .qu becomes 1 due to truncation. Hence

0 - ' ° Oyn—k+lyn-k+2 - ° - yn '0 - - - Oyn—k+lyn—k+2 - - -yn
error = —

1+.0...01 1

(.0 . . .Oy._..1y.-.+2 . . .y.)(.0 . . .01)
1 + .0...01

 

 

 

< 2“"4‘) - 2"c
: 2_n

= ulp

Theorem 2. For any ﬁxed n = 2m > 0, the approximation error in the

following

1 + .0 . . . 0$m+1$m+2 . . . (L'Qm

z (1+ zm+12-m-1)(1 + xm+22-m-2) . . . (1 + x2m2-2'") (22)
is less than %ulp. Here each 2:,- is either 0 or 1.

Proof. Clearly the maximum approximation error occurs when xm+k = 1

for all 1 S k g m. We assume this and proceed as follows:

For 1 S k g m, let rk denote the remainder (error) of the product of the

43

ﬁrst k terms on the right side of the Eq. 22, i.e.,

r), = (1+ 2""H)(1+ 2‘m‘2) . . . (1 + 2"""‘)

m k

—(1+.0...011...1)

Note that rm equals the maximum error stated in the theorem.

The ﬁrst few rk’s are easy to compute. For example,

1+2—m-1=1+2‘""1+0
=> T1=0

and

(1+ 2-m-1)(1+ 2-m-2) = (1+ 2-m-1 + 2-m-2) + 2-2m—3

=> T2 = 2-2m—3

It can also be checked that

7.3 : 2—2m-3 + 2—2m—4 + 2—2m-5

< 2—2m—2

So the theorem holds for m S 3.

44

To estimate the bound on rk in general, ﬁrst note the following facts.

Fact 1: 1+x<e$ forallx>0.

Fact 2: e“ < 2:1: +1 for all :1: E (0, %].

Form2k24,

Tk-Tk—i

= [(1+2‘m‘1)(1+2'm‘2)...(1+2"")’° (1+ M]
— [(1+ 2'm‘1)(1 + 2‘m’2) . . . (1 + 2"" "+1)1+ M]

= (1 + 2—m—1)(1+ 2"”) . . . (1 + 2-""-’°+1)2-"'*—’c — 2”“

= 2-m-k[(1 + 2-m-1)(1+ 2-m-2) . . . (1 + 2-m—k“) — 1]

45

The term (1 + 2'm‘1)(1 + 2‘m’2) . . . (1 + 2‘m‘k“) is estimated as follows,

(1 + 2-m-1)(1+ 2"“) . . . (1 + 2-m-k+1)
00

< H(1+ 2"“)

00
-—m—i
<11e2
1'21

00 2—m—i
2 eg1 (by Fact 1)

2—m

=e

< Tm“ + 1 (by Fact 2)
Thus

7-,, — rk_1= 2—m-’=[(1 + 2-m-1)(1+ 2‘m-2) . . . (1 + 2—m-k+1)— 1]

< 2—m-k . 2—m+1

= 2-2m-k-l-I

46

and it follows that for m 2 4,
m
Tm = 7'3 + :01: — Tic—1)
k=4
00
< Ts +201: — Tic—1)
k=4

m
< T3 + Z 2-2m—k-H

k=4

= T3 + 2-2m—2
< 2—2m—2 + 2—2m—2
< 2—2m—1

1

1:]
Corollary 1. For any n = 2m-bit fraction factorization as in Eq. 15, only
the ﬁrst m steps are needed and
Qm+k = am,m+ka 1 S k S m
The error caused by this is less than ulp.

Proof. By theorem 2,

error < (1+.q1)(1+.0q2)...(1+.0...0qm)-rm
< 2rm

< ulp

47

El

Theorem 3. For each n = 2m-bit fractional input 2:, at most m(m — 1)/2
quotient bit operations are needed and the maximal error resulting from

factorization is less than n . ulp.

Proof. From corollary 2, only m divisions are needed. Because of
truncation at the end of each division, for 1 S k S m, only m — k quotient

bit operations are needed. So the total number is

i:k=m(m— 1)/2
k=1

Note that the error caused by air—,3 is at most §ulp < ulp and is less than

2ulp for each divisor 1 + .0 . . .qu, 2 S k g m.

By theorem 2,

total error < ulp + 2ulp(m —- 1) + ulp
= 2m - ulp
= n - ulp
1:]

Corollary 2. The ﬁnal conversion error due to the factorization process is

71-11!
less than —31n2 .

48

Proof. Since the function log(1+x) is continuously differentiable for

x 6 [0,1], for any x and y such that 0 g x < y S 1,

dlog(1 + x)

log(1 + y) -— log(1 + x) g Max dx

x6(0,1)

 

'(y—Ir)

 

 

 

 

Cl

Corollary 3. In order to achieve n-bit conversion accuracy, i.e., for each
n-bit fractional input, the logarithmic conversion error is less than ulp, the

following condition on the number of internal guarding digits g is sufﬁcient.

log(n + g) — log(ln 2) < g

Proof. Note that the error caused by factorization is positive and the error
caused by look-up table truncation is negative. So we can just restrict the
error from factorization and error from look-up table truncation to be both

less than ulp.

From Corollary 2, we have

"_l'gg-(Mg) < Q-n

ln2

49

which leads to

log(n + g) — log(ln 2) < g

For the look-up table truncation error, it is less than (n + g) - ulp. So we

have

(n+g)2-(n+g) < 2-n

which leads to

log(n + g) < 9

By comparing the above two conditions we have

log(n + g) — log(ln 2) < g (23)

Example 2.1

Suppose n = 56, then by Eq. 23 we can choose 9 = 7 to guarantee the

conversion accuracy.

Note that the condition developed in this section is just sufficient and may

not be optimal. Some estimates in the analysis can be further reﬁned. For

50

example, it can be easily shown that not all qi’s can be 1 whenever n > 3;
the bounds in the proof of Theorem 1 can also be lowered. Nevertheless, as
shown in the above example, the number of guarding digits is not big,
especially in long word case. So the result can be used in designing high

precision conversion structure.

The next two theorems are developed to reduce the size of the look-up table

used to store log(1+ 2“), i = 1,2,. . ..

Theorem 4. For each i > 1, log(1 + 2") is a fraction with (i-I) leading

0’s.

Proof. From the identity

log(1 + x) <

we have

—i

ln2

 

log(1 + 2") <

< 2““)

51

Theorem 5. Suppose n = 2m, then for k > m,

1
log(1+ 2"’°) z ~2- log(l + 24‘“)

1

Proof. From calculus technique, we have that for x E (0, 5],

1
5 log(1 + x) — log(1 + 9]

= logv1+x—log(1+ £)]

 

 

2
\/1+x
=log 1
1+5
1 _§
-1+c 2
=log(1—8( +2 x2) where0<c<x
5

 

 

12
<_
213

Then for x = 2"5 as in the theorem, we have that

1
510g(1+ 2"“) — log(1+ 2"“1) < 2‘2"—1

The implication of the above theorem is that only the ﬁrst half of the
look-up table for

log(1 + .0...01)

52

need to be stored. For the second half of the look-up table, the result is

only a shift of the middle entry.

Example 2.2

The following table for n = 8 shows that for k 2 5, log(1 + 2"“) is just a

right shift of log(1 + 2"”1), as shown in the previous theorem.

 

 

i log(1 + 2")
1 .10010101
2 .01010010
3 .00101011
4 .00010110
5 .00001011
6 .00000101
7 .00000010
8 .00000001

 

 

 

 

Table 3: 8-bit look-up table for log(1 + 2“)

53

From these two theorems we get the following corollary on the size of the

look-up table.

Corollary 4. For n = 2m word length conversion, the logarithmic look-up

table size can be as small as Egg—+12 bits.

Proof. This is a simple result from Theorems 4 and 5. El

Example 2.3

If n=64, then the look-up table size is at most

32-(3.32+1)
2

 

bits

which is less than 1.5K bits.

In the following subsections we ﬁrst propose a procedure based directly on
the factorization process described before. Then we incorporate the SRT

division technique to improve the speed.

54

3.2.2 Restoring Approach

To implement the factorization process described at the beginning of this

section
1+.x1332“.$n : (1 +41)“ + .OQQ)...(1+.0...0qn)

we can do n divisions with the divisor of the form 1 + 2‘j at each division
step j. Or we can do n multiplications to change the divisor at step j to be
of the form (1 + .q1)(1 + .0q2) . . . (1 + .0 . . .Oqj), which reduces the division

to a single comparison.

In the ﬁrst approach, because of the special form of the divisors, at each

division step j, after certain number of quotient bits are determined, the

next division process can start. The following example shows this in detail.

Example 2.4

Suppose a fraction

.amalg . . . = "$11132 . . .

At step 1, we do the division

.00.],2013 . . .

 

— .002,202’302,402,5 . . .
Lam

at step 2, we do the division

0002,3024 . . .
1.00.23

 

= .00a3,3a3,4a3,5a3,6 . . .

Step 2 doesn’t have to wait for the completion of step 1 to start. In fact,
after am in step 1 is determined, step 2 can be carried out. Each additional
determined quotient bit from step 1 immediately participates in step 2
division in a synchronized way. Similarly, step 3 can start after a3; is

determined in step 3.

The parallel processing structure illustrated by the above example can
increase speed dramatically in high precision situation. In fact, it can be
proved that the number of total quotient bit operations is reduced from
0(n2) to 0(n). However, this erases the recursive property of the algorithm

in the hardware implementation and causes large array-like structure. So

56

the trade-off here has to be weighed depending on the application

requirement.

In the rest of the section we develop the second implementation approach.
As will be seen in later subsections, it allows the design of high speed

implementation with still low hardware complexity.

We utilize the standard restoring division procedure [2] which uses
comparison and subtraction operations. Deﬁne the partial quotient DJ- and

partial remainder Rj, j = 1,2, . .. ,n, as

and

Rj = (1+.$1$2...$n) —Dj

with D0 = 0 and R0 = .x1x2 . . .xn.

Note that the choice of each q,- is to try to make DJ- to increase
monotonically from 1 to 1 + .x1x2 . . .xn which also corresponds to the

partial remainder Rj decreasing from .x1x2 . . .xn to 0. So at each step j, we

57

can ﬁrst try q,- = 1. If D,- turns out to be too big (Dj > 1 + .IL'1C172....’L'n or
R]- < 0), then we just reset q,- = 0 and try to choose a smaller factor
(1 + qj+12’(j+1)) and so on.

Since

R,- = (1 + .561$2...$n) — DJ-
: (1 +.x1x2...xn) — (1+.q1)(1+.0q2)..(.1+.0...0qj)
= (1 + .x1x2 . . .xn) — (1 + .q1)(1 + .0q2) . . . (1 + .0...0qj_1)
— (1 + .q1)(1 + .0q2) . . . (1 + .0. . .0q,_1) -q,2-J'

= j—1 — DH 'qu’ (24)

we have the following algorithm.

 

Algorithm 2.1

Step 0: Let R0 = .x1x2 . . .xn, D0 =1.
Step j: j=1,2,...,n
LEI. RJ' 1' Rj_1 — Dj__1 ° 2_j

If Rj 2 0111' = 1, D3“ = Dj_1+ Dj_12_j

 

 

If Rj < 0, (13' = 0, Rj = Rj_1, DJ' = Dj_1

 

58

Example 2.5

Table 4 shows the detailed steps for

R0 = x = .111101

implemented by algorithm 2.1.

 

j R1 (11' DJ“

 

1 .011101 1 1.1

2 .000101 1 1.111

3 .000101 0 1.111

4 .000101 0 1.111

5 .000010 1 1.111011

6 .000001 1 1.111100

 

 

 

 

 

 

Table 4: Example 2.5

Note that because of truncation D6 is different from x in the above example,

the error estimation is done in the previous section.

59

The restoring division nature of the procedure can be improved by a
non—restoring procedure with the use of signed digits. In fact, it will be
shown that the SRT division technique can be used to achieve the
computation complexity of 0(N) with still small chip area. In the rest of

this section the development is described.

3.2.3 Modiﬁed Restoring Approach

In this subsection we develop another version of algorithm 2.1 which is
more suitable for hardware design and also serves as a basis for the

development of SRT division techniques in the next subsection.

First note that in Algorithm 2.1 R, < 23'+1 for each j = 0,1,. . .. This
implies that for large value of j the ﬁrst j — 1 fractional bits in Rj are all 0.

Secondly, R,- is only involved in the computation of

R,- _ Dj . 2-(J'+1)

60

where D,- - 2'43“) has the ﬁrst 3' fractional bits as 0. So it can be seen that

the ﬁrst j-l fractional bits of 0’s in
R,- _ Dj . 2-(j+1)

doesn’t really contribute to the result. Plus considering the hardware
implementation structure which often uses shift registers in this situation,

we do the following modiﬁcation of algorithm 2.1.

Let F,- = 23R], then from Eq. 24,

= 2’(Rj-1 —‘ Dj—l '(IjQ-j)
= 2’Rj—1 - Dj—l 'CIj

= 2Fj_1 — Dj_1 ' Qj (25)

Thus we have the following algorithm which is more suited to hardware

implementation.

61

 

Algorithm 2.2

Step 0: F0 = .x1x2 . . .xn and Do =1
Step 1: For each 3' = 1,2,... ,n,
1.1 F,- = 2FJ-_1— Dj-1
1.2 If F]- 2 0, then {qj =1; D,- = D,-_1+ 2"ij_1}

1.3 else {qj 2' 0; F} = 2173-1; DJ‘ = Dj_1}

 

 

 

Example 2.6
Consider a fractional number x = .111101, the stepwise procedure of the

algorithm 2.2 is tabulated in Table 5.

Finally, (1.1)(1.01)(1.000)(1.0000)(1.00001)=1.11101111.

62

 

 

 

j 2Fj-1 Dj—l 91' F1 D1

1 1.111010 1.000000 1 0.111010 1.100000
2 1.110100 1.100000 1 0.010100 1.111000
3 0.101000 1.111000 0 0.101000 1.111000
4 1.010000 1.111000 0 1.010000 1.111000
5 10.100000 1.111000 1 0.110000 1.111011
6 1.100000 1.111011 0 1.100000 1.111011

 

Table 5: Example 2.5

 

63

3.2.4 Non-restoring Approach

For an n-bit factorization process, algorithm 2.2 requires 72 steps, where
each step needs two n-bit addition/subtraction operations. In the whole
process the most time consuming steps are the full length subtraction

FJ- = 2Fj_1 — Dj_1 and the full length addition Dj = Dj_1 + Dj_12‘j
because of carry propagation. Note that at each step j in algorithm 2.2 we

are essentially doing the following division problem

2153-1

Di—l = qJ- + . . . ' (26)

in which the dividend is 2FJ-_1 and the divisor is Di_1. The SRT division
technique was ﬁrst invented to avoid full length subtraction and make the
quotient bit selection rule be data independent [2]. In other words, we can
make a simpler selection rule which allows the dividend and the divisor to
be in a certain range instead of being exact and still guarantees the
convergence. The use of this technique solves the carry propagation
problems just mentioned. The price paid for this technique is that sign
digit has to be used, which means that in the radix—2 case, the quotient bit
qj has to be allowed to assume value from the set {-1,0,1}. This result in

the look-up table to be doubled to also include the values log(1 — 2‘1).

64

This increase is not signiﬁcant since the original look-up table has a very
small size. On the other hand, the speed performance is improved

dramatically, especially in long word length situation.

In the rest of this subsection the detail of the application of SRT division

technique is developed.

 

 

 

 

Figure 4: Radix-2 P-D plot

Let the quotient digit set be {-1, 0, 1}. For simplicity of notation, let “T”

denote “-1”. Consider the P-D plot in Fig. 4, to guarantee IP — qDl < D,

65

the ranges for q is
(-1+q)D __ P S (1 +q)D, where 0.5 S D < 1

i.e.,OSPS2Dforq=1;-DSP§Dforq=0;and —2DSPSOfor
q = —1. As a result, the ranges of the overlapping area 130,1 for both q = O
and q = 1 is: 0 S Pm S D, and the ranges of the overlapping area P_1,o for
both q = —1 and q = 0 is: —D g Pm g 0. Therefore, two comparison
constants, % and —%, can be chosen to determine q for divisor D between 0

and 1, and between 0 and -1 respectively.

In the implementation of Eq. 26, the divisor at each step j is 2173-1 and the

dividend Dj_1 is always in the range [1,2).

Thus, we consider D = 2121 so that 1 > D 2 0.5. The division

 

 

2Fj—1 : Fj—l
DH —
: Fj_1

D

deﬁnes P = Fj_1.
This implies that D is ranged between 0.5 and 1.0 and P is between -1.0

and 1.0. As shown in Fig. 4, two comparison constants cl = 0.2510 2 0.012

66

and c_1 = —0.2510 = 1.112 (2’s complement representation) can be used.

Thus, the quotient digit qj is estimated by

1 If Fj_1 > C1
Qj = 0 If C_1 S Fj__1 S Cl (27)

_1. lfFj_1< 6.1
The improved continuous factorization scheme is summarized in the

following.

 

Algorithm 2.3

Step 02 F0 = $311132 . . .13" and D0 =1
Step 1: For each 3' = 1,2,... ,n,
1.1 Estimate g, from Eq. 27

1.2 E] = 2Fj_1— qJ'DJ'_1

 

 

1.3 Dj = Dj_1 + Qj ' 2_ij_1

 

Example 2.7

 

67

 

217,.1

D,-_1

(13'

F1

Di

 

1.111010
1.110100
0.101000
1.010000
1.010000

1.011110

1.000000

1.100000

1.111000

1.111000

1.111111

1.111100

l—‘I

0.111010
0.010100
0.101000
0.101000
0.101111

0.100010

1.100000

1.111000

1.111000

1.111111

1.111100

1.111101

 

 

 

Table 6: Example 2.6

Let’s reconsider the previous example in this algorithm as shown in Table 6.
In this example D6 = x. In general there could be some error whose range

is analyzed in the section 3.2.1.

Note that steps 1.2 and 1.3 in algorithm 2.3 can be completed very fast by
the use of carry-save-adder (CSA), which eliminates carry propagation and
produces an approximate value, whose accuracy determines the number of

bits kept for a short length full addition.

68

3.2.5 Modiﬁed Error Analysis and Hardware Implementation

Structure

Because of the simplication involved in algorithm 2.3, the error analysis in
subsection 3.2.1 needs to be modiﬁed. It is done in this subsection and the
schematic structure to implement the developed algorithm 2.3 is also

proposed.

Suppose the input word length is n bits and let g be the number of
guarding digits needed to ensure conversion accuracy of also n bits. This
means the internal register length is n + g. Let ulp = 2‘("+9). Note that in
algorithm 2.3 the only error is caused by the truncation in calculating DJ- at
each step j, which result in an error no larger than ulp. It then follows
easily that we get exactly the same result as given by Corollary 3 in
subsection 3.2.1, under the additional condition that log(1 + 2") is stored

in truncated form and log(1 — 2”) in round-up form.

In the following the implementation structure of algorithm 2.3 is deve10ped.
Suppose the input word length is n, then the look-up table used to store

log(1 :1: 2’1) will have the size of 2n2. As an example, if n = 64, then

69

2n2 = 8K (bits). It’s interesting to note that by the look-up table reduction
technique developed in subsection 3.2.1, and the fact that

log(1 — 2:) z — log(1 + 1:), the look-up table size can be just n2, but this
involves a little hardware overhead. Three CSA’s are needed. one for the

addition of

Zlogu :l: qj2'j)

1:1

and two others for the factorization process. One is used for the dividend

F], the other for the divisor Dj.

Note that q1 = x1 is directly determined. For the comparison of partial
remainder F ,- with comparison constant 0, note from the P-D plot that the
allowable error is i. To ensure convergence in the algorithm, we need to
retain 4 fractional bits of the CSA for Dj, plus the ﬁrst integral digit and
sign bit, we need to do a full addition of 6 bits for the comparison step 2.1
in algorithm 2.3, which can be implemented by a carry-look-ahead adder to
improve speed or a usual full adder to save chip area if the carry

propagation delay is bearable in the application.

70

 

«9 a f1f2-f4 q q. q.

 

001***101

 

 

 

 

 

Table 7: Truth table for q

After the result sa.f1f2f3f4 of full addition is produced the quotient bit q
can be determined. Table 7 shows the encoding of q and its logic function
depending on sa.f1f2f3.

The logic functions of q, and q, are then derived as

‘13 = 3(6'1' 71)

qa=a€Bs+a€Bf1

The implementation structure is shown next.

71

Shift bits Shiftj bits

 

Figure 5: Block diagram of the implementation of algorithm 2.3

4 Conclusion

Logarithmic number system (LN S) is an attractive alternative to the
conventional number system when the data need to be manipulated at very
high rate over a wide data range. The major problems in dealing with the
logarithmic number system are to derive logarithm and anti-logarithm
quickly and accurately enough to allow conversion to and from the

conventional number representation.

This thesis study develops two efﬁcient algorithms for binary logarithmic
conversion : TLFA (Two Layer Factorization Algorithm) and CFA
(Continuous Factorization Algorithm). The former uses a factorization
approach to reduce the look-up table size and employs a nonlinear
approximation method to reduce the computational complexity. Simulation
results on IEEE single precision conversion (23 bits) show that the
algorithm requires only one ROM table of 213 x 26 bits, one of 213 x 14 bits,
and one of 213 x 5 bits, or a total of 360K bits. This algorithm has
advantages over other work on this problem in both speed and
implementation structure which is mainly due to the discovery of a

convenient nonlinear approximation function. For handling long word

72

73

length numbers, such as in IEEE double precision (53 bits), CFA uses a
continuous factorization process to further reduce the look-up table size
and reduces the problem to a division one which can be implemented using
the SRT division technique [2] to give a high-speed yet simple structure.
More speciﬁcally, the conversion time is about three additions irrespective

of word length.

The speed of SRT-based divisions is mainly determined by the complexity
of the quotient-digit selection [2, 15, 16]. To speed up the division process,
one may reduce the number of iteration steps by increasing the radix 6 of
the sign digit number system used in the process. Selecting ,6 = 2'" allows
the generation of m quotient bits at each step and the number of steps can
be reduced to [i] where n is the word length. However, the complexity of
the quotient bit selection and remainder updating increase for high radices,
offsetting the advantages of the reduction in the number of iterations [2].
Therefore, exploring the speed performance in this logarithmic conversion
problem using the high-radix SRT division scheme is an interesting topic

for future work.

Appendices

Appendix

A Source code of simulation program for

TLFA 1.1 (in Java)

// This program simulates algorithm 1.1 for all

// possible 16—bit input combinations.

class misc {
static int N=8;
static long one = (long) (Math.pow(2,N)+.5);

static double e=Math.pow(2,-N);

public static void toBinary(long x, int n) {
int 1;
int [J a=new int [128];
for (i=n-1; i>=0; i--) {

74

75

long t=x&1;

a[i]= (int) t;

}

for(i=0; i<n; i++)
System.out.print(aIil);

System.out.print(" ");

public static long expconv(long x) {
double yd=Math.pow(2,x*e)-1;

return (long)(yd/e+ 5);

public static long lt1(long x, int tag) {

double xd

(double) x * e;

double yd Math.log(1+xd)/Math.1og(2);
long y = (long) (yd/e/e+.5);

switch(tag) {

76

case 0: return y;
case 1: return (y*2/one+1)/2;
case 2: return y%one;

};

return -1;

public static long 1t2(long x, int tag) {

double xd (double)x*e*e;

double yd Math.log(1+xd)/Math.log(2);
.long y = (long) (yd/e/e+.5);
switch(tag) {

case 0: return y;

case 1: return y/one;

case 2: return yZone;

};

return -1;

77

public static long 1t(long x) {

double xd (double)x*e*e;

double yd

Math.log(1+xd)/Math.1og(2);
long y = (long) (yd/e/e+ 5);

return y;

class 8 {
public static void main(String[] args) {
int N=8;
int[] diff=new inth];

long c,C,x,y,A,Aprime,B,p,pf,P,r,t,t1,cnt=0,err=0;

long one (long) (Math.pow(2,N)+.5);

one*2;

long two
double xd,yd;

double e=1/(double) one;

78

for (x=0; x<one; x++) {
A = misc.1t1(x,1);
Aprime = misc.1t1(A,1);

for (y=0; y<one; y++) {

B = misc.lt1(y,1);
p = one-B+A;
pf=p%one;

P = misc.1t1(pf,1);
if (p<one) c = (one-P)+Aprime/2;
else c = -P/2 + Aprime/2;

C = misc.lt2(c,0);

r = misc.1t1(x,0)+C;
t = misc.lt(x*one+y);
t1 = r-t,

diff[(int)Hath.abs(t1)l++;

79

for (int i=0; i<N; i++)

System.out.println(i + " " + diffEiJ);

B Source code of simulation program for

TLFA 1.2 (in Java)

// This program simulates algorithm 1.2 in IEEE single precision.

class misc {
static int N=13;
static long one = (long) (Math.pow(2,N)+.5);

static double e=Math.pow(2,-N);

public static void toBinary(long x, int n) {
int 1;
int [J a=new int [128];
for (i=n-1; i>=O; i--) {
long t=x&1;
x /= 2;
a[i]= (int) t;
}

for(i=0; i<n; i++)

80

81

System.out.print(aIiJ);

System.out.print(" ");

public static long expconv(long x) {
double yd=Math.pow(2,x*e)-1;

return (long)(yd/e+.5);

public static long lt1(long x, int tag) {

double xd (double)x*e;

double yd Math.log(1+xd)/Math.log(2);
long y = (long) (yd/e/e+.5);
switch(tag) {

case 0: return y;

case 1: return y/one;

case 2: return y%one;

};

return -1;

82

public static long lt2(long x, int tag) {

double xd

(double)x*e*e;

double yd Math.log(1+xd)/Math.log(2);
long y = (long) (yd/e/e+.5);
switch(tag) {

case 0: return y;

case 1: return y/one;

case 2: return yXone;

};

return -1;

public static long lt(long x) {

double xd (double)x*e*e;

double yd

Math.log(1+xd)/Math.log(2);
long y = (long) (yd/e/e+.5);

return y;

83

class 3 {
public static void main(String[] args) {
int N=13;

long c,x,y,x1,y1,t,t1,t2,err=0;

long one (long) (Math.pow(2,N)+.5);

long two one*2;
double xd,yd;

double e=Math.pow(2,-N);

for (x=0; x<one; x++) {
x1 = misc.lt1(x,1);
t2 = misc.expconv(one-x1)/2;
for (y=0; y<one; y++) {

// misc.toBinary(x,N);

84

// misc.toBinary(y,N);

y1 misc.lt1(y,1);

t1 y1-x1;
if(t1>=0) c=misc.expconv(t1)+one/2-t2;
else c=misc.expconv(one+t1)/2-t2;
y1 = misc.1t1(x,0)+misc.lt2(c,0);
t = misc.lt(x*one+y);

// misc.toBinary(y1,N); misc.toBinary(t,N);
t1 = yl-t;
if (Math.abs(t1)>err) err = Math abs(t1);

// System.out.println(tl + " " + err);

}

System.out.println("Maxerror=" + err);

References

[1] H.Y. Lo and Y. Aoki, ”Generation of a precise binary logarithm with
difference grouping programmable logic array,” IEEE Trans. on

Computers, C-34(8) pp.681-691, 1985

[2] I. Koren, Computer Arithmetic Algorithms, Prentice Hall, 1993.

 

[3] NC. Kingsbury, ”Digital ﬁltering using logarithmic arithmetic”,

Electron. Lett., Vol 7, pp. 56-58, 1971

[4] W.F. Wong and E. Goto, “Fast evaluation of the elementary functions
in single precision”, IEEE Transactions on Computers, Vol 44, No. 3,

pp. 453-457, 1995

[5] S.-C. Huang and LG. Chen, ”The chip design of A 32-b Logarithmic
Number System,” Proc. IEEE International Symp. on Circuits and

Systems, pp. 167-170, 1994.

[6] MC. Arnold, T.A. Bailey, J .R. Cowles and MD. Winkel, “Applying
features of IEEE 754 to sign/ logarithm arithmatic”, IEEE

Transactions on Computers, Vol 41, No. 8, pp. 1040 1049, 1992

85

[7]

[8]

[9]

[10]

[11]

[12]

86

F.-S. Lai and C.-F. E. Wu, “A hybrid number system processor with
geometric and complex arithmetic capabilities”, IEEE Transactions on

Computers, Vol 40, No. 8, pp. 952-962, 1991

D.K. Lostopoulos, “An algorithm for the computation of binary
logarithms”, IEEE Transactions on Computers, Vol 40, No. 11, pp.

1267-1270, 1991

I. Koren and O.Zinaty, “Evaluating elementary functions in a
numerical coprocessor based on rational approximations”, IEEE

Transactions on Computers, Vol. 39, No. 8, pp. 1030- 1037, 1990

D.M. Lewis, “An architecture for addition and subtraction of long
word length numbers in the logarithmic number system”, IEEE

Transactions on Computers, Vol 39, No. 11, pp. 1325-1336, 1990

T. Stouraitis and F.J. Taylor, “Floating-point to logarithmic encoder
error analysis”, IEEE Transactions on Computer, Vol 37, NO. 7, pp.

858-863, 1989.

T. Kurokawa, J. Payne and S. Lee, “Error analysis of recursive digital
ﬁlters implemented with logarithmic number systems”, IEEE 'Ilrans.

Accost., Speech, Signal processing, Vol ASSP-28, pp.706-715, 1980

87

[13] F.J. Taylor, “A hybrid ﬂoating-point logarithmic number system

processor”, IEEE Trans. Circuit System, Vol GAS-32, pp. 92-95, 1985

[14] F.J. Taylor, R. Gill, J. Joseph and J. Radke, “A 20-bit logarithmic
number system processor”, IEEE Transactions on Computer, Vol 37,

pp. 190-199, Feb. 1988

[15] T .-H. Pan, H.-S. Kay, Y. Chun and CL. Wey, “High-Radix SRT
division with speculation of quotient digits,” Proc. IEEE International
Conference on Computer Design: VLSI in Computers & Processors

(ICCD’95), Austin, TX, pp. 479-482, Oct. 1995

[16] CL. Wey and C.-P. Wang, “VLSI implementation of a fast radix-4
SRT division,” Proc. of 39th Midwest Symp. on Circuits and Systems,

Iowa, 65-68, August 1996

MICHIan STQTE UNIV. LIBRARIES
[I]ll[ll[l[llllllllllll[I[I]Ill][Illllllllll
31293017865969