7' V4" .
. I~Il \ h?‘
i: ‘1‘:

V..;' “
39:02“ .

.n, ‘

”ﬁn.”

.. u.
n

.-‘.‘
8‘

. '93:!

 

. 1
4w:
.5“

'1

m..:u

u;- = ~

‘ _ It V > v!
1“ 1’ . ‘ ' ' Jeff“ .‘34.
‘51,. V and . 3.
LI, ‘3» ,

 

.
1%.;
““5

p x

.. .
4.1. .‘L J!

, .1
,
11.2%: '

(I . v?.

 

'. ,'

... ,
£111
(1‘ ..
t

I «a.
I ‘ ~ 1.3: ‘
g "m, n
t‘i: Q! .9
1

 

 

..
.,.
MN...

 

..

1

“3'3 ‘
. ,‘r'ﬁgﬂz Jr
, . ,. , m
‘ . ’ . . ‘ A [yum‘iu' -\"\:. If
, ‘ . -- -. . _ .nH‘rIH‘
' ' ’ ' ‘, 1‘ ' L ‘6‘?” ‘ ' v t \ .U‘u:,‘,’;‘4f“v\¥‘;® ‘
“A "“' > ‘- ‘~ ".|1¢ w ‘a. ‘ «L'r', .
LR‘U‘ ‘ , . . . . a‘ . , {‘3
,5 “a“. 3 1 . I 32'3“??? I. 3“ J,
, ‘ .r‘ . ,, V
.3 ‘ {Pp _‘ 39:51:;

 

‘ ,. H
‘.\. 1““

 

THESIS

lmillililillimilllll

Date

01409 6527

This is to certify that the
thesis entitled

AN APPROACH TO BAR CODE IDENTIFICATION
ON A CUSTOM COMPUTING MACHINE

presented by
Sandeep Dhameja

has been accepted towards fulfillment
of the requirements for

Master's degree in Electrical Engineering

 

 

730-»

Major professor

9/4/er

 

0-7639

MS U is an Afﬁrmative Action/Equal Opportunity Institution

 

LIBRARY
MIchlgan State
Universlty

 

 

 

PLAOE II RETURN BOXtonmovothh Mouton yowrocord.
TO AVOID FINES mum on or More data duo.

    
 

DATE DUE DATE DUE DATE DUE

 

 

ES 0 5 mr
- e ‘

    
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

l

 

MSU I. An Mun-tin Adlai/Equal Opportunlty Institulon
Wows-m

AN APPROACH TO BAR CODE IDENTIFICATION ON A CUSTOM
COMPUTING MACHINE

BY

Sandeep Dhameja

A THESIS

Submitted to
Michigan State University
in partial fulﬁllment of the requirements for the degree of

MASTER OF SCIENCE

Department of Electrical Engineering

1995

ABSTRACT

AN APPROACH TO BAR CODE IDENTIFICATION ON A CUSTOM
COMPUTING MACHINE

By

Sandeep Dhameja

This thesis focuses on a detailed study of the custom computing machine Splash 2 and
suggests an approach to identifying bar codes based on mapping an algorithm using the
multi-channel ﬁltering technique onto the reconﬁgurable architecture.

First the Splash 2 architecture is discussed in detail. The reconﬁgurable architecture of
Splash 2 is based on Xilinx ﬁeld programmable gate arrays (FPGAs) that behave as the
processing elements, allowing investigation of compute- and data—intensive algorithms.
The programming environment of Splash 2 is discussed in detail. The suggested approach
to identify bar codes is supplemented by VHDL code for mapping the design onto FPGAs.

Finally, the basic communication and computation algorithms are discussed along with the
VHDL simulation code and the partition, place and route results, which provide estimates
of spwd and complexity of the design. The experimental results obtained for the basic
algorithms show that the application of identiﬁcation of bar codes is suited for

implementation on a custom computing machine.

Copyright by
SANDEEP DHAMEJ A

1995

Acknowledgments

I would like to acknowledge the contributions of the following to my academic and
personal pursuits. Firstly, I would like to thank Dr. Diane T. Rover, advisor for my
research work at Michigan State University. I have been motivated and inspired by her to
pursue research of my own interests. This thesis could not have been organized so

carefully without Dr. Diane T. Rover’s meticulous edits.

I would also like to thank the other members of my Oral Defense Committee, for their
willing discussion of my efforts and participation in my degree program: Dr. P. David
Fisher and Dr. Michael A. Shanblatt. I would like to thank Duncan Buell, Jeff Arnold and
Brian Schott for their invaluable guidance and support at the Supercomputing Research
Center, Bowie, Maryland. I appreciate the assistance of Dr. Anil K. Jain for giving me the
opportunity to work at the Pattern Recognition and Image Processing laboratory. Several
other colleagues and friends have been patient to listen to my ideas and have provided

constructive comments at several stages of the research.

Part of the research was completed under a research contract from the Institute for
Defense Analyses, Alexandria, Virginia.

Also, thanks to my friend Rebecea Doma for her patient editing and valuable suggestions.
'Ib my parents for their enduring and endearing guidance; to my sister, for being a ﬁiend;
to dear Anita for caring about me and my work. Finally, I thank The Almighty for giving
me the strength and courage to complete this work.

iv

Table of Contents

 

 

 

 

 

 

 

 

 

 

 

List of Tables .................................................................................................................... viii
List of Figures .................................................................................................................. ix
1 Introduction 2
1.1 Thesis Motivation ........................................................................................ 2

1.2 Research Approach ...................................................................................... 3

1.3 Thesis Organization ..................................................................................... 3

2 Background S
2.0 Introduction to Bar Code Information Theory ............................................ .5

2.1 Bar Code Types ..... - - - .......................................... 6

2.1.1 Universal Product Code - - -6

2.1.2 Code 39 ......................................................................................... 9

2.2 Bar Code Scanning _______ - _ - _ -- ...11

2.3 Summary .................................................................................................... 16

3 Related Work 17
3.1 TIaditional Rapid Prototyping ................................................................... 17

3.1.1 Quicktum Systems, Inc - -_ _ _ _- - 18

3.1.2 T'InMANN .......... - - ................................... 19

3.1.3 The AnyBoard System ................................................................ 20

3.1.4 Summary .................................................................................... .20

4 The Splash 2 Programming Environment 21
4.0 Programming the Xilinx FPGAs ................................................................ 21

4.1 Splash 2 Simulator .................................................................................... .21

4.1.1 The Interface_board Entity ........................................................ .23

4.1.2 The Splash_boards Entity .......................................................... .23

4.1.3 The Splash2__board Entity.. -- ............................. .24

4.2 Simulator Files ........................................................................................... 24

4.3 The Xilinx Control Part .............................................................................. 28

 

 

 

 

 

 

4.4 Interface Board ........................................................................................... 29
4.4.1 XLeft Entity ................................................................................. 29

4.4.2 XRight Entity .............................................................................. 31

4.5 The Xilinx Processing Part ........................................................................ 32
4.6 Xilinx Control Part Entity .......................................................................... 41
4.7 Crossbar ..................................................................................................... 45
4.8 Memory ...................................................................................................... 50
4.9 Summary .................................................................................................... 51
Bar Code Identiﬁcation On Splash 2 - - - - ‘2
5.0 Bar Code Identiﬁcation--- -- - -------------- -- - .----52
5.1 Texture Analysis ........................................................................................ .52
5.2 Multi-channel Filtering .............................................................................. 53
5.3 Unsupervised Segmentation Method ........................................................ .56
5.3.1 Implementation Using Serial Code ............................................. 56

5.3.2 Unsupervised Segmentation. ....................................................... 57

5.3.3 Rotation-invariant Features ......................................................... 5 8

5.3.4 Supervised Segmentation Method .............................................. 63

5.4 A Suggested Approach to Bar Code Identiﬁcation on Splash 2 ................ 64
5.5 A Suggested Mapping of Simple Perceptron onto Splash 2 ...................... 70
5.6 Summary .................................................................................................... 72
Mapping an Application Onto Splash 2 -- 73
6.0 Design Implementation .............................................................................. 73
6.1 Run-time Environment. .............................................................................. 73
6.1.1 Splash 2 Execution - -- - -- -- - - --...74

6.1.1.1 Initializing the Splash 2 Environment ................................ 74

6.1.1.2 Preparing the Splash 2 Application .................................... 74

6.1.1.3 The Ungroup_all and Boundary_optimization Options ..... 75

6.2 General Compiling Methodology - - - -- - -------------- 76
6.2.1 Compiling \Vrthout Constraints .................................................. 76

6.2.2 Setting Design Constraints .......................................................... 76

6.3 VI-IDL_to_EDIF ............. - - - - .................... 77
6.4 EDIF_to_XNF Conversion ........................................................................ 77
6.4.1 Environment. ............................................................................... 77

vi

6.4.2 Execution .................................................................................... 79

 

 

 

 

 

 

6.5 Xilinx Placement and Routing ................................................................... 79

6.5.1 Environment ................................................................................ 79

6.5.2 Execution .................................................................................... 80

6.6 The Xilinx Synopsys Interface (XSI) ....................................................... 82

6.6.1 Compiling the Design ................................................................. 83

6.6.2 Interpreting the Results ............................................................... 84

6.7 The Execution Process- -- - ......... - ........ 85

6.8 T2 Debugger ............................................................................................. 85

6.9 T2 Debugger Commands ---------------------- ------- 87

6.10 Results ................... . .................................................................................... 88

6.10.1 PPR Summary Explanatory Notes--- ..... - -....93

6.10.2 Chip Pinout Description for the XBar Algorithm ....................... 96

6.11 Additional Results of Simulations ------ - =....98

6.12 Summary .................................................................................................. 102

7 Conclusions and Future Research 103
Appendix ......................................................................................................................... 106
Bibliography .................................... -- - - -- - 119

Vii

List of Tables

1 Speciﬁcation of the Universal Product Code .................................................. 8
2 Part of the Speciﬁcation of Code 39 ............................................................. ll
3 Chip Pinout Description ............................................................................... 96

viii

List of Figures

2.1 Derivation of the UPC Scan Code Words .......................................................... 7
2.2 Illustration of the Speciﬁcation of the UPC Symbol ......................................... 8
2.3 Illustration of the Code 39 Character .............................................................. 11

2.4 Illustration of the Invariance of the Edge-to-Edge Distance under Ink
Spread .............................................................................................................. 13

2.5 Encoding of the Binary String 100111000 by the Delta Code (top) and

Width Code Giottom) ......................................................................................... 15
2.6 Delta Code Module Representation ................................................................. 16
4.1 The Splash 2 System Architecture ................................................................... 22
4.2 Splash 2 Simulator ........................................................................................... 23
4.3 Interface_Board Entity Declaration........................... ...................................... 25
4.4 (a) Interface Library Declaration .......................................................................... 26
4.4 (b) Interface Library Declaration .......................................................................... 26
4.5 Splash2_Board Entity Declaration .................................................................. 27
4.6 Splash2_Board Structure Declaration ............................................................. 27
4.7 Xilinx_Control_Part Declaration .................................................................... 28
4.8 The MTest Architecture ................................................................................... 29
4.9 The MTest Declaration ................................................................................... 34
4.10(a) File "Parts.vhd" Processing Element Declaration .......................................... 37
4.10(b) File "Parts.vhd" Left_to_Right Architecture Declaration ............................... 38
4.10 (c)File "Parts.vhd" Left_to_XBar Architecture Declaration ............................... 39

 

 

4.11 File "Parts.vhd” XBar_to_Right Architecture Declaration ............................. 40
4.12 File "Parts.vhd" Control_Part Architecture Declaration ................................. 44
4.13 File ”Parts.vhd” Blank Architecture Declaration ............................................. 45
5.1 An Overview of the Bar Code Localization Algorithm ................................... 55
5.2 Bar Code Image ............................................................................................... 60
5.3 Filter Images Generated from the Bar Code Image as shown in Figure 5 .2 ..... 60
5 .4 Feature Images Generated from the Filtered Images as shown in Figure 5.3..61
5.5 Mean Feature Values for the Feature Images ................................................... 62
5.6 (a) Steps involved in Identifying Bar Code Images on Splash 2 ............................ 66
5.6 (b) Steps involved in Identifying Bar Code Images on Splash 2 ............................ 67
5.7 Table Lookup Architecture Declaration for Storing Data in the Local PE
RAMS - -- .......... - - ............................................. 68
5.8 Exponential Function Architecture Declaration .............................................. 69
5.9 Local RAM LUT for Weight Values ............................................................... 71
6.1 Mapping of an Application onto Splash 2 ....................................................... 83
6.2 Data Transfers from the Processing Element .................................................. 89
6.3 PE to Crossbar Declaration... -- ...... - ......... - - - .................... 90
6.4 Crossbar Processing Element XP_XBAR_SLAVE Architecture .................... 91
6.5 Results for the XP_XBAR_SLAVE Algorithm Obtained from PPR .............. 92
6.6 Left_to_XBar Architecture Declaration .......................................................... 99
6.7 Processing Element to Crossbar based on SIMD Tag Declaration ............... 100

Chapter 1

Introduction

Mind has transformed the world, and the world is repaying it with interest.
It has led man where he had no idea how to go.

Paul Valery

Field-Programmable Gate arrays (FPGAs) are a new type of user-programmable inte-
grated circuits providing a fast, fairly inexpensive solution to custom Very Large Scale
Integration (VLSI). An FPGA consists of an array of logic cells that can be interconnected
via programmable routing switches, where the general purpose routing switches allow the
user to conﬁgure FPGA based multiple level logic cells. FPGAS combine the features of
Mask Programmable Gate Arrays (MPGAs) and Programmable Logic Devices (PLDs).
The two-dimension array feature is adopted from the MPGAs while the user-programma-
bility is adopted ﬁ'om the PLDs. The research reported in this thesis is focused on a sug-
gested implementation of image analysis technique, Bar Code analysis, on an FPGA-

based computing machine.

Multilevel FPGAS were introduced in 1985 by Xilinx, Inc. [1]. Since then, multilevel
FPGAS have evolved considerably as new devices [5]. They have gained widespread

application, owing to their relatively low costs and quick production. As an implementa-

FPGAS offer some unique advantages over the alternate technologies (MPGAs, standard

cells, and full custom design) [2, 3, 4, 5, 6]:

- FPGAS provide a reduction in manufacturing cost of custom VLSI from tens of thou-
sands of dollars to approximately one hundred dollars.

- FPGAS reduce the turnaround time of manufacturing from months to several minutes.

These features can be attributed to the user-programmability of FPGAS. FPGA-based cus-
tom computing machines provide the designers with a hardware-software based design
platform, allowing multiple design iterations to be done both rapidly and cost-effectively.
However, user-programmability does have its limitations: the logic density and the speed
performance of an FPGA circuit are considerably lower than that of its counterparts [5].
Although developments over the last decade have resulted in a signiﬁcant improvement in
FPGAS, there are numerous opportunities for innovation before the best FPGA designs

will evolve.

1.1 Thesis Motivation

An FPGA-based custom computing machine allows the user to implement circuits by
interconnecting its logic cells through the user-programmable logic or routing switches.
The routing switches serve two distinct purposes: to connect the logic cells to the routing
wires, and to connect one routing wire to another. Regardless of their implementation,
routing switches consume a signiﬁcant chip area and thus they involve a tradeoff among
device ﬂexibility, logic density, and speed performance. A high ﬂexibilty yields a machine

that can be conﬁgured easily, but if the ﬂexibility is too high then there is an enormous

waste of logic area because of unused switches.

The primary focus of this thesis is the study of an implementation of bar code identiﬁca-
tion algorithm on a hardware-software co—designed, custom computing machine. The goal
of this study is to determine the implementation of the algorithm and further, to determine
the minimal ﬂexibility that is necessary to provide sufﬁcient interconnection to satisfy the
requirements of real circuits. This research is part of a project sponsored by the Supercom-
puting Research Center, Bowie, Maryland [7-10] that examines many aspects of imple-
mentation of Image Analysis algorithms on an FPGA-based custom computing machine

Splash 2 [39].

1.2 Research Approach

A digital document image usually consists of a mixture of text, graphics, and half-tone
patterns. These regions of varied textures [13, 15] should be separable so as to allow sub-
sequent processing of the region of interest : the bar code in this case. Several algorithms
have been proposed for segmenting images in the literature [14-17]. The segmentation
method applied to the bar codes as part of our research does not assume any a priori
knowledge of the layout or the font styles of the document, and can be applied to input
gray level scanned images. The segmentation method is based on the multi-channel gabor
ﬁltering technique which has been successfully used for texture segmentation [15, 16].
The areas containing the regions of interest are captured using gabor ﬁlters. The ﬁlters are

applied directly to the input scanned images requiring no additional thresholding.

1.3 Thesis Organization

This thesis is organized as follows:

0 Chapter 2 provides background information: a discussion of the deﬁnitions of bar codes
and their design aspects, including the UPC Bar Code and Code 39.

0 Chapter 3 presents an overview of the background related research that has led to the
development of custom computing machines.

0 Chapter 4 describes the Splash 2 system.The system architecture and the component
building blocks are discussed.

- Chapter 5 describes the Splash 2 programming environment in terms of the simulator
and the ﬁles necessary to emulate designs based on the applications of interest. The sug-
gested approach and the VHDL codes for this approach are also discussed.

0 Chapter 6 presents the bar code localization algorithm in addition to an overview of the
Gabor ﬁltering technique applied to texture segmentation. In additon, Chapter 6 pro-
vides the details of several algorithms related to the application. These algorithms pro-
vide an excellent groundwork essential for the implementation of the bar code
localization algorithm and their mapping onto the Splash 2 system. The preliminary
results of these algorithms are presented in this chapter.

° Chapter 7 summarizes the thesis by providing conclusions and recommendations for

future research.

Chapter 2

Bar Code Information Theory

The eye sees only what the mind is prepared to comprehend

Robertson Davies

2.0 Introduction to Bar Code Information Theory

Bar codes were introduced almost 20 years ago. They are now being widely used in super-
markets, department stores, factory ﬂoors, the health industry, insurance industry, and

more.

Early codes, in particular UPC (Universal Product Code), evolved from careful studies.
These studies were done under the technological constraints of 20 years ago. The continu-
ing improvements in computing hardware have resulted in a feasible use of powerful pro-
cessors that would have been prohibitively expensive to use in the past. In recent years,
this process has been applied to enhance the density of the information that is represented.
This has created a portable data ﬁle, implemented using convenient bar code symbols as
opposed to the license plate ﬁle [31]. The difference in the two types of implementations
can be illustrated using the following example. The bar code of the supermarket item con-
sists of 11 digits which represents the product identifying number but provides no infor-
mation about the description. The price look-up must be accessed in a database keyed to

the number in the bar code. This provides an alternative to a longer bar code with the rele-

vant information, such as the price, name of the product, manufacturer, weight, inventory
data, expiration date, etc. Thus providing a portable data ﬁle of the product wherein the
relevant information can be retrieved without accessing the database. Pricing information,
although useful and necessary for the retail stores, is not very useful information for the
receiving and distribution centers. Thus, some inefﬁciencies are introduced into them

deliberately to facilitate their robust reading for a variety of bar codes.

2.1 Bar Code Types

2.1.1 Universal Product Code (UPC)

UPC is the symbology used in American supermarkets for more than 15 years. Symbology,
as used in the bar code industry, represents the bar code scheme, while the symbol repre-
sents the bar code itself. The code that has been described in this section is the version A,
where each symbol encodes 12 digits. However, all versions of the UPC bar code use the
same basic structure for the code words. Each UPC code word consists of two bars and
two spaces with a total width of seven modules as shown in Figure 2.1. The seven mod-
ules (a bit 0 or 1 as shown in the top line) have six boundaries labeled as 1...6 as shown in
Figure 2.1. A partition of the modules into a set of two bars and two spaces requires three
internal boundaries that can be placed in each of the six locations. The number of possible
code words can be computed as follows: Two bars and two spaces require three dividers in
one of the six positions (between modules) as shown in the Figure 2.2. Therefore, the total
number of partitions will be the same as the number of ways to choose three objects out of

six, or 20. Thus 40 code words can be constructed depending on whether the ﬁrst element

is a baror a space. However, code words that are mirror images of each other cannot be
included owing to the need for bi-directional scanning.This results in 20 possible code
words. While symbols should be readable by scanning in either direction, the direction of
scanning is also important. UPC achieves this by assigning two code words for each digit:
those used on the right part of the symbol start with a bar and those used on the left part

start with a space.

 

 

 

 

 

 

Figure 2.1. Derivation of the UPC Scan Code Words.

The two versions can be distinguished easily because the ones starting with the bar have
an even number of bar modules (even parity) and those starting with a space have an odd
number of bar modules (odd parity). Table 1 shows the arrangement of the UPC code

speciﬁcation.

Table 1: Speciﬁcation of the Universal Product Code.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Left half Right half Width
(odd) (even) pattern
0 0001101 1110010 3, 2,1,1
1 0011001 1100110 2, 2, 2, 1
2 0010011 1101100 2,1,2,2
3 0111101 1000010 1,4,1,1
4 0100011 1011100 1,1, 3, 2
5 0110001 1001110 1, 2, 3, 1
6 0101111 1010000 1,1,1,4
7 0111011 1000100 1,3,1,2
8 0110111 1001000 1,2, 1,3
9 0001011 1110100 3,1,1, 2
Left guard bar (101) Center guard bar (01010) Right guard bar(101)

 

 

 

 

 

 

Left ﬁve characters Right ﬁve characters
of code of code
I
l
i

45 67890

Figure 2.2. Illustration of the Speciﬁcation of the UPC Symbol.
(The diagram is not drawn to scale)

The symbol shown in Figure 2.2 contains the following group of code words:
0 a left guard pattern 101
0 six digits of odd parity:
- one digit denoting the industry type (such as a 0 for grocery, 3 for pharmaceutical, etc.)
- ﬁve digits labeled as the Leﬁ ﬁve characters, denotethe manufacturer code
0 a center guard pattern 01010 consisting of ﬁve modules
0 six digits of even parity:
- ﬁve digits labeled as the Right ﬁve characters, denote the item code
- one check digit

0 a right guard pattern 101

2.1.2, Code 39 (three of nine)

Code 39 is a width code with three wide elements out of a total of nine: ﬁve bars and four
spaces between them. It has been the standard for the Department of Defense since 1980
[31]. The total number of code words that it can generate is the number of ways three
items can be chosen out of nine. It has a code for the 10 digits, the 26 letters, and eight
special symbols (hyphen, period, space, asterisk (*), $, /, +, and%) so that a total of only
44 code words are used. The asterisk is used only as the ﬁrst and the last code word of a

symbol.

Table 1 shows some of the codes. A 1 means a wide element and a 0 means a narrow ele-
ment. If an width of an element or bit is altered then an edge shift (the most common read—

ing error) occurs. This shift, will change both, the bar and the space patterns. For example,

10

10001,0100 rrright become 11001,0000. The latter does not correspond to any Code 39
pattern. The patterns for both the bars and the spaces have been chosen in such a way that
changing a single bit in either of them results in an illegal word. Spaces have only an odd
number of wide elements and the bars have only an even number of elements. This allows
for the detection of the single errors. For this reason code 39 is called self-locking for this

1' 638011.

The number of bar patterns with two wide elements is 10 (the number of ways to choose
two out of ﬁve), and there are four space patterns with one wide element. This yields 40
code words. Four additional code words are obtained by using the four patterns with three

wide spaces and only narrow bars.

In contrast to UPC, Code 39 has no speciﬁcations for the arrangement of code words on
the symbol except that the asterisk must be the ﬁrst and the last character and can appear
only there. The special start-and-stop character, the asterisk makes it possible to determine

the direction of scanning.

As shown in Table 2, Code 39 contains code words that are rrrirror images of each other,
such as the pairs (P,*) and (K, U). Therefore, the direction cannot be decided locally but
must be determined from the start-and-stop code word. If the ﬁrst code word is decoded as
P, then we conclude that we are looking at the last code word of a symbol. The major
drawback of Code 39 is that it does not allow local detection of the scan direction and can-

not be decoded, even partially, without at least one of the start-and—stop code words.

11

Table 2: Part of the Speciﬁcation of Code 39 [31].

 

 

 

 

 

 

 

 

 

 

Characters Bars Spaces
1 10001 0100
2 01001 0100
A 01001 0010
K 10001 0001
P 01100 0001
U 10001 1000
Z 01100 1000
* 00110 1000
$ 00000 1110

 

 

 

 

10001 0100
Figure 2.3. Illustration of the Code 39 Character.

2.2 Bar Code Scanning

Bar codes encode information along one dimension with intervals of alternating black and
white color. The arrangement of bars in an alternate fashion helps to diffuse reﬂectivity.
The intervals are stored as rectangles whose vertical height carries no information but
facilitates the scanning process. The term bars denotes the rectangles having the fore-

ground (darker) color, while the term spaces denotes the intervals having the background

12

color between the bars. In general, the bars absorb light and the spaces reﬂect light, which
is detected by the sensor. On soda cans, the bars have a metallic color that reﬂects more
light than the background. However, the reﬂection from the bars is specular and directed
away from the sensor. The spaces have a dull color with the diffuse reﬂection, part of
which is always captured by the sensor. Similarly, on transparent bottles the bars are not

colored; so light goes through them rather than reﬂecting to the sensor.

Laser scanners in the supermarkets use essentially orthogonal patterns. The height of each
half of the symbol is expected to exceed its width so that one of the two beams is guaran-
teed to scan at least the symbol half. (It is a property of a block with height greater than
width that an x pattern is guaranteed to read it by at least one line of the x in any orienta-
tion of the symbol). The symbol can be assembled afterwards because each part is identi-
ﬁed by the parity pattern. UPC allows “scanning by halves” because the left and the right

parts of the code are inherently identiﬁable from the local properties of each code word.

One major source of distortions of bar code is the uniform ink spread, which causes bars
to be comparatively wider than spaces. For this reason bar code readers measure not the
width of the edge but the edge-to-edge distances (called t-distances) of successive bars as

shown in Figure 2.4.

13

I I l I I I |
Printing without
I I ink spread
I— 4—I
Figure 2.4. Illustration of the Invariance of the Edge-to-Edge
Distance under Ink Spread.

Printing with ink
spread

 

There are two fundamentals ways of encoding binary information in such a one-dimen-
h
sional medium. In one we subdivide the available material into modules. Each of the mod-

ules is assigned a ‘1’ or ‘0’ with the 0’s corresponding to the spaces. A single space or bar

may contain many modules. Such schemes are called delta codes in communication the-

ory.

In another scheme of representation, bars or spaces alternate per bit, where an element is
wide if the bit is 1 and narrow if the bit is 0. These codes are referred to as width codes.

The common name used in the barcode industry is binary codes since only two widths are

used.

14

The Figure 2.5 shows the two encodings of the binary string 100111000. In both cases, the
minimum printed width (i.e., the width of a module or element) is the same. The delta
code requires nine such widths (the number of bits), while the width code requires thirteen

such widths if a wide element is twice as wide as a narrow element.

Scanning speed is extremely and unpredictably variable in hand—held wands and laser
scanners. This problem imposes the constraint that printed codes be self-clocking, which
means that both the number of modules and the number of bars and spaces per code word

must be ﬁxed.

The delta codes must have a ﬁxed number of bars and spaces. A code with n modules and
k pairs of bars and spaces is called an (n, k) code. Mdth codes are required to have not
only a ﬁxed number of bars and spaces but also a ﬁxed number of wide elements to
achieve a ﬁxed width. Among the popular bar codes, the UPC is a (7,2) delta code with
seven modules, two bars and two spaces). Code 39 is a width code with nine elements

(ﬁve bars and four spaces), three of which are wide.

15

I - Module boundary

.I/

|1|0|0|1|1l1|0l0|

1001 1 1000

Figure 2.5. Encoding of the Binary String 100111000 by a Delta Code (top) and
Width Code (bottom).

 

The self-clocking constraint for the codes is more stringent than that for optical and mag-
netic recording, which share many of the features of bar codes. In such systems, informa-

tion is encoded under the run-length-limited (RLL) constraint. This constraint is speciﬁed

by a pair of constraints (d, k), where 0 _<. d < k , which means that consecutive 1’s are
separated by at least d, but no more than k, 0’s. The constant d is used to control the inter-
symbol interference effects. The constant k is used to provide self-clocking through a
phase-locked loop (PLL) with windowing. This weaker constraint is possible because
optical and magnetic recording systems are closed systems where the distance between the
scanner and the symbol as well as the size of the symbol are ﬁxed and known. Neither is

known for bar codes, which are classiﬁed as open systems.

16

Besides self-clocking, a means to detect the direction of the scan is necessary. Practical
requirements may demand that bar codes be scanned in either direction. (Many scanning
devices, including those used in retailing, have repetitive scans in alternative directions.)
There are two possible of solutions to this problem. One solution is to include a unique
start/stop code word on each symbol. The other is to use only one of two code words if
they are mirror images of each other. For example, three modules of the delta code with
only one bar and one space yields the combinations shown in Figrne 2.6. In addition, the
Figure 2.6 shows that the last two code words are mirror images of the ﬁrst two and cannot
be distinguished in case the scanning direction is not known. This results in taking only
half of all possible code words. UPC uses only code words whose mirror image is not in

the code, while the Code 39 uses a unique start/stop code word.

II II_II_II -_II III

Figure 2.6. Delta Code Module Representation.

 

 

 

 

 

 

 

 

 

2.3 Summary

This chapter has provided an overview of bar code theory and the most commonly used
codes. As new applications have deve10ped, there has been a demand for new types of
codes. The UPC bar code versionA and the code 39 have been described in detail in this
chapter. The UPC bar code, version A, has been used as an example to identify bar code

images.

Chapter 3

Related Work

Intuition becomes increasingly valuable in the new information society precisely

because there is so much data.
John Naisbitt

3.1 'Itaditional Rapid Prototyping

Over the past decade, prototyping of application speciﬁc designs has gained signiﬁcant
importance owing to several reasons. Complex projects cannot start with a complete set of
ﬁnal speciﬁcations at the beginning of the product’s life cycle. Furthermore, the product
development cycle cannot proceed until the current version reaches completion and is

evaluated.

Hardware prototyping has been greatly inﬂuenced by software tools in recent years. Com-
puter-aided design (CAD) tools have emerged, allowing the designer to integrate testabil-
ity into designs ensuring proper operation of the ﬁnal product. For example, the Test
Engineer’s Assistant (TEA) software tool is one such prototyping tool which enhances the
designer’s ability to design a robust system prototype by the use of test vectors [38]. Thus,
the integration of hardware and software, also termed as hardware-software codesign, is
aimed towards reducing the ﬁnal system developmental costs while ensuring its reliability.
TEA analyzes a design and indicates aspects that tend to inhibit the testability of the prod-
uct. Several alternate approaches are suggested to provide for greater ﬂexibility in the test

and fault isolation.

l7

18

3.1.1 Quickturn Systems, Inc.

While Application Speciﬁc Integrated Circuits (ASICs) have become an important tech-
nique of design implementation, one of their shortcomings has been the inability to test
the designs at actual speeds before the product fabrication. Quickturn Systems, Inc. has
developed a hardware-software codesign tool capable of performing the functions of the
ﬁnal desired product through the use of FPGAS [29]. Although similar designs can be
implemented using other conventional logic devices (PALS and PLDs), designs have
become so complex that such an implementation grows into an entirely separate design
project. Often, such a task takes longer to complete than the actual production of the IC
itself. Thus FPGAS provide a logical solution to the prototype design with the advantages
of allowing for high density logic designs along with rapid reconﬁgurability for testing

and modiﬁcations of the design.

Quickturn systerrrs ease the implementation of prototypes by automating the FPGA proto-
typing procedure. Normally, the designer must translate the design into a format accept-
able to the FPGAS and then build the board supporting the design. Often the design must
be partitioned into several FPGAS implying that proper commmunication amongst the
FPGAS exists. Quickturn systems accept ﬁles from the ASIC suppliers called netlist ﬁles,
and translates the descriptions into a FPGA compatible description. A process termed as
Emulation Synthesis automatically connects the FPGAS [29]. Once the design is converted
into an FPGA format, it is downloaded onto a generalized bank of FPGAS, thus complet-

ing the prototyping phase.

19

3.1.2 TInMANN

TInMANN, a massively-parallel, stochastic architecture for the Kohonen Neural Net-
works. Previously, uniprocessor systems were used for neural network simulation pur-
poses. In order to exploit the massive parallelism in the neural network algorithms,
specialized hardware, using a high -speed arithmetic logic units (ALU) [39] or multiple
digital signal processing (DSP) chips connected through communication busses [39], was
implemented. These approaches could not realize the full beneﬁts of the parallelism or did

not approach the minimal requirements of the hardware implementation.

Modern VLSI technology overcomes these drawbacks by providing a highly connected,
dense logic on a single chip. However, large and complex VLSI designs require to simu-
late, create a test plan, and fabricate a chip in a short time duration. This is possible by
using a rapid prototyping system: ﬁeld programmable gate arrays (FPGAS) provide the

solution.

The FPGAS allow the user to incorporate theoretical advances while the designs remain on
the cutting edge of VLSI technology. With these concepts and ideas a custom computing
machine using Xilinx FPGAS, TInMaNN was conceived [38]. An IBM-PC host was used
to interface and conﬁgure the array board as well as provide the software design tools to
implement the neural networks on FPGAS. Along with FPGAS Static Random Access
Memories (SRAMs) were provided to allow for the storage of weight values. The FPGAS
cormnunicate amongst themselves through local busses which provide communication

between adjacent chips, or global busses allowing the signals to be broadcast globally to

20

the other array chips.

While such a system of FPGAS cannot operate as fast as an ASIC, it does provide rapid
realization of new theoretical knowledge at very low costs. A customized VLSI chip

(ASIC) can be easily produced once the FPGA prototype has been evaluated [28].

3.1.3 The AnyBoard System

The Anyboard array card is based on Xilinx FPGAS and has evolved as a sucessor to the
TInMANN sytem [38]. The AnyBoard consists of a ﬁeld of programmable gate arrays
(FPGAS), that can be modiﬁed by software. The FPGAS have both local busses between
FPGAS and access to a global bus for interfacing the external hardware. Each FPGA has a
local memory that can be mapped to the AnyBoard. A pre-programmed FPGA and several
other buffer chips control the PC bus interface for the AnyBoard. The Anyboard array
board resides in the IBM-PC host connected to a local network. Test vectors generated by

the IBM—PC act as inputs to the board for diagnostics.

3.1.4 Summary

This chapter provides an overview of several custom computing machines that have been

developed as reconﬁgurable architectures to implement and verify designs.

Chapter 4

The Splash 2 Programming Environment

There is but one art, to omit

Robert Louis Stephenson

4.0 Programming the Xilinx FPGAS

The ﬁeld programmable gate arrays (FPGAS) of the Splash 2 system, shown as His
X0...Xl6 in Figure 4.1, are programmed in the VHSIC Hardware Description Language
(VHDL). VHDL [34] is a hardware description language and a simulation language. In

addition, VHDL behaves as an input language for logic synthesis.

4.1 Splash 2 Simulator

The Splash 2 simulator [2, 4] is a collection of VHDL components that, together with the
user applieation code, models the Splash 2 system behavior. The simulator structure as
shown in Figure 4.2 is based on the Splash 2 architecture structure. The top level entity is
the Splash_System, which contains the components for the Sparcstation Host, the
Interface_Board, and the collection of Splash2_Boards. The Splash_System entity

deﬁnes all of the top level signals, such as the host SBus and the Splash 2 SIMD Bus.

21

22

Splash Boards

  
    
 
 

X1X2X3X4X5X6

   

Crossbar

 
 
   
 
 

X16 X15 X14 X13 X12 X11 X10

    

Interface Board

   

X1X2X3X4X5X6

  
  

     

XL

 
 
 

Crossbar

  
 

X16 X15 X14 X13 X12 X11

 

.0.

1""
I
§ u

”.9.

 

 

) ;

.___. X1 X3 X4 X5 X6 X7 X8
I I I I I I l I
_. xo Crossbar

I I L I I I I
xrs x15 X14 x13 x12 X11 x10 xs

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.1. The Splash 2 System Architecture.

23

 

Sparcstation Host
Interface

 

 

 

 

Interface Board

 

 

 

 

Splash 2 Array
Board

 

 

 

Figure 4.2. Splash 2 Simulator.

4.1.1 The Interface_Board Entity

The interface_board entity contains components for the input and output FIFOs and the
XLeft and XRight ICs, and is responsible for generating the system clock. The FIFO com-

ponents read and write ASCII data ﬁles.

4.1.2 The Splash_Boards Entity

The Splash_boards entity contains one or more component instantiations for the Splash 2

array boards. This entity declares the set of signals used to interconnect the Xilinx compo-

24

nents of each of the array boards and also the previous/next interboard connections. The
slot ID for each board is declared at this level and is then passed on to each of the array

boards.

4.1.3 The Splash2_Board Entity

The Splash2_board entity contains component instantiations for the programmable Xilinx
components, their associated RAM memories, and the crossbar switch. This entity also
performs a variety of control functions, including selecting the source of the left input to
X1, generating the memory write enable pulse, and checking the timing restrictions on

memory accesses (e. g. write after a read constraint).

4.2 Simulator Files

The system ﬁle, system. vhd contains the architecture and the entity for the top level
Splash_System component. The Splash 2 simulator requires that this ﬁle be present in the
current working directory, however, there is nothing within the ﬁle that the user needs to

modify.

The Splash 2 simulator (see Figure 4.2) reads a conﬁguration ﬁle by the ﬁle, conﬁg.vhd.
This section describes the components of the Splash 2 simulator conﬁguration. This con-
ﬁguration ﬁle must identfy the libraries of the Splash 2_Board which contains the default

architectures of the Splash 2 array board parts, and the Interface, which contains the

25

XLeft and XRight architectures.

The ﬁrst part of the ﬁle conﬁgures the interface board, in which the component identiﬁer
within the Splash_System is IFACE, the entity name is Interface_Board and the archi-

tecture name is Structure as shown in Figure 4.3.

for [F ace: Interface_board
use entity Interface: Interface_board( Structure )

Figure 4.3. Interface_Board Entity Declaration.

The generic parameters of the Interface_Board entity are:
Input_Filel: A data string representing the name of the ﬁle to be read by the input FIFO
1. The default value string is a null string.
Input_File2: A data string representing the name of the ﬁle to be read by the input FIFO
2. The default value is a null string.
,13_Filel: A data string representing the name of the ﬁle to be read by the external input.
The default value is a null string.
Output_Filel: A data string representing the name of the ﬁle to be written by the output
FIFO 1. The default value is a null string.
Output File 2: A data string representing the name of the ﬁle to be written by the output
FIFO 2. The default value is a null string.
File_Type: Format of the output ﬁles, i.e., either BIN/ HEX. The default value is BIN.

Clock_Freq: The frequency of the system clock. The default value is 20 MHz.

26

The structural components of the interface board includes the XLeft and XRight FPGA
parts. The entity names are also XLeft and XRight; the default architectures of both parts

are named Valid and are found in the Interface library declaration in Figure 4.4 (a).

for Structure
for all: XLeft
use entity Interface.XLeft( Valid);
end for;

for all: XRight
use entity Interface.XRight( Valid ),'
end for;
end for;

Figure 4.4 (a). Interface Library Declaration.

The component identiﬁer for the Splash 2 array board component within the

Splash2_System is named as SPLASH, the entity name is Splash2_Boards and the archi-

tecture name is Structure as shown in Figure 4.4 (b).

for Splash: Splash2_Boards
use entity S2b0ard. Splash2_Boards( Structure )

Figure 4.4 (b). Interface Library Declaration.

The only generic parameter of the Splash2_Board is Number_of_Boards. This parameter

deﬁnes the total number of Splash 2 boards in the system. The default value is 1.

27

The array of Splash boards is generated with the identiﬁer SBOARDS. Within the gener-
ate statement, the Splash2_Board component is instantiated as BD. The architecture of
the Splash2_Board as shown in Figure 4.5, may be a user deﬁned behavioral model from

the current working library:

for Sboards
for Ed: Splash2_Board
use entity work. Splash2_Board(Behavior);
end for;

Figure 4.5. Splash2_Board Entity Declaration.

or the default structural model which instantiates individual FPGA components (architec-

ture Structure in library SZBoard) as shown in Figure 4.6:

for Sboards
for Ed: Splash2_Board
use entity SZBoard. Splash2_Board( Structure );

Figure 4.6. Splash2_Board Structure Declaration.

The Structure architecture of Splash2_Board instantiates the following components:

Splash_Crossbar a model of the central crossbar
ControLPart the FPGA control part, X0
Processing_Part the FPGA processing parts, X1 through X16

Memory_Part the external memories associated with X0 - X16

28

The Splash_Crossbar component has a single generic parameter:
Conﬁg_File: A data string specifying the name of the ﬁle containing up to eight crossbar
conﬁgurations. The default value is a null string, indicating that the crossbar

is disabled.

The Memory_Part component has a single generic parameter:
Load_File: A string specifying the name of the ﬁle containing directives for pre-loading
the memory contents. The default value is a null string, indicating that the ini-

tial value of the memory contents are unknown.

4.3 The Xilinx Control Part

The Xilinx_Control_Part component (X0) is conﬁgured as shown in Figure 4.7.

for all: Control _part
use entity Board. Control_Part( Blank);
end for;

Figure 4.7. Xilinx_Control_Part Declaration.

The Xilinx_Procsssing_Part components and their associated memories are instantiated

with a VHDL generate statement with the identiﬁer XPARTS.

In the following example all the 16 processing elements are conﬁgured using the “MTest”

architecture such that:

29

for Xparts(1 to 16)
for all: Xilinx_Processing_part
use entity work.X ilinx_Pr0cessing_Part( M Test);
end for;
for all: Memory_Part
use entity SZBoard.Memory_Part( Dynamic)
generic map (Load_File => “test.mem”);
end for;

Figure 4.8. The MTest Architecture.

4.4 Interface Board

This section describes the interface board and its programmable components.

4.4.1 XLeft Entity

A detailed summary of each of the ports of the XLeft entity is given below.

XLeft_FIFO_IN : The 36-bit bi-directional DataPath from the input FIFOs.

XLeft_FIFO_Sel : A bi-directional signal that selects either of the two input FIFOs to
read from. When low (logic ‘0’) FIFO #0 is selected, when high
(logic’ 1’) FIFO #1 is selected.

XLeft_FIFO_EnabIe_L : A bi-directional signal that is an active low signal which
enables either FIFO#1 or FIFO#2 (depending on the value
of XLeft_FIFO_Sel) to drive the XLeft_FIFO_IN data bus.

XLeft_FIFO3_OE : An active low signal (bi-directional) which enables the external

input FIFO to drive the XLeft_FIFO_lN data bus.
XLeft_FIFO3_OE and XLeft_FIFO_Enable_L must not be

asserted simultaneously.

3O

XLeft_FIFOl_RE : A bi-directional signal is the read signal of the input FIFO #1. When
asserted (logic ‘1’) and the FIFO has valid data (XLeft_FIFOl_EF
is ‘0’), data are clocked out of FIFO #2. This signal is an output
from XLeft.

XLeft_FIF02_RE : A bi-directional signal is the read enable signal of the input FIFO #2.
When asserted (logic ‘1’) and the FIFO has valid data
(XLeft_FIFO2_EF is ‘0’), data are clocked out of FIFO #2. This
signal is an output from XLeft.

XLeft__FIF03_RE : A bi-directional signal is the read enable signal to the external input.
When asserted (logic ‘1’) and the external input has valid data
(XLeft_FIFO3_EF is ‘0’), data are clocked out of FIFO #3. The
signal is an output from XLeft.

XLeft_FIFOl_EF : A bi-directional signal is the empty ﬂag from the FIFO #1. When low
(logic ‘0’), it indicates the FIFO has valid data available to be
clocked. This signal is an input to XLeft.

XLeft_FIF02_EF : A bi-directional signal is the empty ﬂag from the FIFO #2. When low
(logic ‘0’), it indicates the FIFO has valid data available to be
clocked. This signal is an input to XLeft.

XLeft_FIF03_EF : A bi-directional signal is the empty ﬂag from the FIFO #3. When low
(logic ‘0’), it indicates the FIFO has valid data available to be
clocked. This signal is an input to XLeft.

XLeft_BUF_Clk : An input bit signal that represents the free running version of the sys-

tem clock and can be used to control the state machine which starts

31

and stops the system clock.

XLeft_Enable_Clk : An out bit signal which may be used to start and stop the system
clock. When XLeft_Enable_Clk is high (logic ‘1’), the system
clock is enabled; when low (logic ‘0’), the system clock is dis-
abled.

XLeft_SYS_CLK: An in bit that is a wire-ORed global OR result signal from the Splash

array board.

XLeft_GOR_Result : An input bit that is a wire-ORed global OR result signal from the

Splash array board.
XLeft_To_XRight : A bi-directional datapath that is a 36-bit wide to chip XRight.
XLeft_SIMD : A bi-directional datapath that is a 36—bit wide output to the global SIMD
Bus.

XLeft_Int : A bi-directional signal that is an interrupt to the host.

4.4.2 XRight Entity

A detailed summary of the XRight Entity is given below.

XRight_FIFO_IN : A bi-directional datapath that is 36-bit wide data to the output FIFOs.

XRight_FIFOl_WE : A bi-directional data path that is a write enable output to the out-
put FIFO #1. When high (logic ‘1’) and the XRight FIFO#1 is
low, data are clocked from XRight_FIFO_IN into FIFO #1.

XRight_FIF02_WE : A bi-directional data path that is a write enable output to the out-

put FIFO #2. When high (logic ‘1’) and the XRight FIFO#2 is

32

low, data are clocked from XRight_FIFOJN into FIFO #2.
XRight_FIFOl_FF : A ﬂag input from the output FIFO #1. When high (logic ‘1’) and the
XRight FIFO] is low, data are clocked from XRight_FIFO_IN
into FIFO #1.
XRight_FIF02_FF : A ﬂag input from the output FIFO #2. When high (logic ‘1’) and the
XRight FIF02 is low, data are clocked from XRight_FIFOJN into
FIFO #2.
XRight_SYS_Clk : The system clock for the splash 2 system.
XRight_To_XLeft : A bi-directional datapath is a 36-bit wide bi-directional data path to
the chip XLeft.
XRight_R : A 36-bit wide bi-directional data path to the chip XLeft.
XRight_Readir : An output bit that controls the direction of the RBus. When the signal is
low, the data are driven from the last Splash 2 array board onto the
RBus. When the signal is high, data on the XRight_R will be driven
onto the RBus.

XRight_Int: A bi-directional signal is an interrupt to the host.

4.5 The Xilinx Processing Part

Xilinx_Processing_Part is the name of the basic processing element of the Splash 2 sys-
tem, Xilinx processing elements X1 through X16 as shown in Figure 4.1. Two generic
parameters are provided to allow the user to customize a processing element according to
its location in the system. During simulation the Splash 2 simulator automatically assigns

the values to these generics, but when computing for synthesis the user is responsible for

33

specifying the range of values.

The Xilinx_Processing_Part generics are:

BD_ID : An integer used to specify the board position within the system (0 to 12). The ini-
tial value of this is speciﬁed to 0.

PE_ID : An integer specifying the processing element number (1 to 16) within the board.

The initial value of this is speciﬁed to 0.

A detailed summary of each of the ports of the Xilinx_Processing_Part is given below.

XP_Left : A 36-bit wide bi-directional data path to the left (lower numbered) neighbor
chip. XP_Left of chip X1 is either connected to the SIMD Bus if it’s on board
number 1, or to the previous board if it is on board 2 or greater. XP_Left and
XP_Right together form the links in the linear data path.

XP_Right : A 36—bit wide bi-directional datapath to the right (higher numbered process-
ing element) neighbor chip. XP_Right of the processing element X16 is
connected to the R-Bus if the board number matches the value of the Size
bus. In case there is another Splash 2 board as part of the conﬁguration,
XP_Right of the processing element X16 is connected to XI of the next
board.

XP_XBar : A 36-bit wide bi-directional datapath between the Xilinx PE and the central
crossbar.

XP_XBar_Enable_L : An output Bit_Vector (4 downto 0), is a collection of the output

enables of the ﬁve crossbar ports. When the enable bit is low

(Logic ‘0’), the crossbar drives the corresponding byt e of the

34

data path. When the bit is high (Logic ‘1’), the crossbar may
receive data on the corresponding byte. The correspondence

between the enable bits and the data path is as follows:

XP_XBar_Enable_L(0) XP_XBar (7 downto 0)
XP_XBar_Enable_L(1) XP_XBar (15 downto 7)
XP_XBar_Enable_L( 2 ) XP_XBar (23 downto I6)
XP_XBar_EnabIe__L( 3 ) XP_XBar (31 downto 24)
XP_XBar_Enable_L( 4 ) XP_XBar (35 downto 32)

Figure 4.9. The MTest Declaration.

XP_Clk : The global Splash 2 system clock’s duty cycle is not guaranteed, so all the
application programs should only be implemented on the rising edge.
XP_Int : An input bit that is an interrupt output signal. The interrupt signals from all the
seventeen Xilinx PEs are combined with a 17-bit mask value to form the board
interrupt signal to the Interface Board.
XP_Memory_Addrsss : A bi-directional 18-bit address to the external memory.
XP_Memory_Data : A bi-directional data bus to and from the external memory.
XP_Memory_Read_L : A bi-directional bit is an external memory read signal. It is
A asserted (active low) by the Xilinx PE synchronous with the
address to initiate a memory read operation. Valid data are
available on thr clock edge following the assertion ‘of
XP_Memory_Read_L.
XP_Memory_Write_L : A bi-directional bit is the external memory write signal. It is
asserted (active low) by the Xilinx PE synchronous with the

data and address to perform a memory write operation.

35

XP_Memory_Disable : An input bit is a system level input to the Xilinx PE, which when
asserted, causes the Xilinx chip to tristate all of its 110 pads.
This port should not be used by the application programs
XP_Broadcast : An input bit that is a common input signal to all the Xilinx Processing
Parts. It is driven by the control PE X0.
XP_Reset : An input bit that is a system signal to all of the Xilinx PE which, when
asserted, causes the Xilinx chip to reset the state of all the internal ﬂip ﬂops.
This port should not be used by the application programs.

XP_HSO, XP_HSl : The bi-directional signals that are signals of two handshake regis-
ters. Signal XP_HSl is a common input to all 17 Xilinx PEs,
while XP_HSO is an independent bi-directional signal to each
chip.

XP_GOR_Result : A bi-directional signal connected to chip X0 conventioanally used to

perform a global OR among all of the Xilinx PEs.

XP_GOR_Valid : A bi-directional signal connected to the control PE X0. It is conven-

tionally used to perform a “global OR” among all of the Xilinx PEs.

XP_LED : An output bit that is an active high. An output which controls the correspond-

ing LED on the array board.

There are a number of pre-deﬁned architectures of the Xilinx_Processing_Part available in
the Splash 2 library, including a null design for the applications that use less than all 16
PBS. The set of the pre-deﬁned architectures are as follows:

Blank : This architecture is completely passive, and should be inserted for Xilinx PEs not

36

being used in the application.

Left_to_Right : This architecture moves the 35 bits of data from the left side (XP_Left) to
the right side (XP_Right) with a two cycle pipeline delay. All the other
ports are passive.

Left_to_XBar : This architecture moves the 36-bit data from the left side (XP_Left) to
the crossbar (XP_XBar) with a two cycle pipeline delay. All the other
ports are passive.

XBar_to_Right : This architecture moves the 36-bit data ﬁorn the crossbar (XP_XBar) to

the right side (XP_Right) with a two cycle pipeline delay. All the other

ports are passive.

37

The architectures for the above mentioned routines and the processing elements X0...X16

are shown in Figures 4.10 (a), (b), and (c).

--Architecture Blank is a completely passive Xilinx part

architecture Blank of X ilinx_Processing_Part is

begin
XP_Left <= TriState (XP_Left);
XP_Right <= TriState (XP_Right);
XP_Xbar <= TriState (XP_Xbar);
XP_Memory_Address < = TriState (XP_Memory_Address);
XP_Memory_Data <= TriState (XP_Memory_Data);
XP_Memory_Read < = '0';
XP_Memory_Write < = '0';
XP_HSI <= 'Z';
XP_HSZ <= ‘Z';
XP_GOR_Result <= 'Z';
XP_GOR_Valid <= 'Z';
XP_Int <= '0’;
XP_Xbar_Enable_L <= "11111";

end Blank;

Figure 4.10 (a). File “Parts.vhd” Processing Element Declaration.

38

—- Architecture Left_to_Right feeds all 36 bits of data
-- from the Left port: to the Right port.Data are latched
-- in the Left port on the first cycle, and latched inthe
-- Right port the next cycle, for a total of 2 cycles

-- delay.

architecture Left_to_Right of X ilinx_Pr0cessing_Part is
signal PassThru: Bit_Vect0r(DATAPATH_WIDTH-I downto 0);
signal Left: Bit__Vect0r( DATAPATH_WIDTH-1 downto 0);
signal Right: Bit_Vect0r(DATAPATH_WIDTH-I downto 0);
begin
Pad_Input (XP_Left, Left);
Pad__0utput (XP_Right, Right);
process
begin
wait until XP_Clk'Event and XP_Clk = '1';
Right <= Passlhru after 2 ns;
PassThru <= Left after 2 ns;

end process;

XP_Xbar <= TriState (XP_Xbar);
XP_Memory_Address < = TriState (XP_Memory_Address);
XP_Memory_Data < =TriState (XP_Memory_Data);
XP_Memory_Read <= ’0';

XP_Memory_Write < = '0’;

XP_HSI <= 'Z';

XP_HSZ <= 'Z';

XP_GOR_Result < = 'Z';

XP_GOR__Valid < = 'Z';

XP_Int <= '0';

XP_Xbar_Enable_L <= "11111";

end Left_to_Right;

Figure 4.10 (b). File “Parts.vhd” Left_to_Right Architecture Declaration.

39

—-— Architecture Left_to_XBar feeds all 36 bits of data
-— from the Left port to the Crossbar port.Data are
-- latched in the Left port on the first cycle,and
-- latched in the Crossbar port the next cycle, for a
-- total of 2 cycles delay.

architecture Left_to_XBar of X ilinx_Processing_Part is
signal PassThru: Bit_VectorfDATAPATH_WIDTH-1 downto 0);
signal Left: Bit_VectodDATAPATH_WIDTH-I downto 0);
signal Right: Bit__VectodDATAPATI-LWIDTH-1 downto 0);
begin
Pad_Input (XP_Left, Left);
Pad_Output (XP_XBar; Right);
process
begin
wait until XP_Clk'Event and XP_Clk = '1 ';
Right <= PassThru after 2 ns;
Passlhru <= Left after 2 ns;

end process;

XP_Right <= TriState (XP_Right);
XP_Memory_Address < = TriState(XP_Memory_Address);
XP_Memory_Data < = TriState (XP_Memory_Data);
XP_Memory_Read <= '0';

XP_Memory_Write < = '0';

XP_HSI < = 'Z';

XP_HSZ <= 'Z';

XP_GOR_Result < = 'Z';

XP_GOR_Valid <= 'Z';

XP_Int < = '0';

XP_Xbar_Enable_L <= "11111";

end Left_to_XBar;

Figure 4.10 (c). File “Parts.vhd” Left_to_XBar Architecture Declaration.

4O

architecture XBar_to_Right of Processing_Part is
signal PassThru: Bit_Vector(DATAPATH_WIDTH-I downto 0);
signal Left: Bit_Vector(DATAPATH_WIDTH-1 downto 0);
signal Right: Bit_Vector(DATAPATH_WIDTY-I—I downto 0);
begin
Pad_Input (XP_XBar, Left);
Pad_Output (XP_Right, Right);
process
begin
wait until XP_Clk'Event and XP_Clk = ’1 ';
Right <= PassThru after 2 ns;
PassThru <= Left after 2 ns;

end process;

XP_Left <= TriState (XP_Left);
XP_Memory_Address <= TriState (XP_Memory_Address);
XP_Memory_Data <= TriState (XP_Memory_Data);
XP_Memory_Read < = '0';

XP_Memory_Write < = '0';

XP_HSI <= 'Z';

XP_HS2 <= 'Z';

XP_GOR_Result <= 'Z';

XP_GOR_Valid <= 'Z';

XP_Int <= '0';

XP_Xbar_Enable_L < = "00000";

end XBar_to_Right;

Figure 4.11. File “Parts.vhd” XBar_to_Right Architecture Declaration.

41

Architecture XBar_to_Right as shown in Figure 4.10, feeds all 36 bits of data from the
Crossbar port to the Right port. Data are latched in the Crossbar port on the ﬁrst cycle,

and latched in the Right port the next cycle, for a total of 2 cycles delay.

4.6 Xilinx Control Part Entity

The Xilinx_Control_Part entity also supports the same two generics to specify the physi-

cal location, although PE_ID is always bound to O.

BD_ID : An integer specifying the board position within the system (0 to 12). The initial
value of the signal is set to O.
PE_ID : An integer specifying the computing element number (always 0) within the

board.

The detailed summary of each of the ports are as follows:

XO_SHV[D : A bi-directional datapath that is the input from the common SIMD bus.
XO_XB_Data : A bi-directional datapath that is the output to the crossbar. This path is

shared with the XP_XBar data path of the PE X16.
X0_Memory_Address : A bi-directional memaddress that is the 18-bit address signal to
the external memory.
X0_Memory_Data : A bi-directional datapath that is the l6-bit bi-directional data bus to
and from the external memory.
X0_Memory_Read_L : A bi—directional bit that is the external memory read signal. It is

asserted (active low) by the Xilinx PE synchronous with the

42

address to initiate the memory read operation. Valid data are
available on the clock edge following the assertion of
X0_Memory_Read_L.

XO_Memory_Write_L : A bi—directional bit that is the external memory write signal. It
is asserted (active low) by the Xilinx PE synchronous with the
address and the data to perform a memory write operation.

X0_GOR_Result_In : A bi-directional vector that is the array of XP_GOR_Result sig-

nals from each of the Xilinx_Processing_Parts on the Splash 2
board.

X0_GOR_Valid_In : A vector that is the bi—directional array of XP_GOR_Valid signals

from each of the Xilinx_Processing_Parts on the Splash 2 board.

X0_GOR_Result : An output bit that is the output from X0 to the wire-ORed

GOR_Result signal of the backplane.

X0_GOR_Valid : An output bit that is the output from X0 to the wire-ORed GOR_Result

signal of the backplane.

X0_Clk : An input bit that is the global Splash 2 system clock.

X0_XBar_Set : An output Bit_Vector (0 to 2) that is the three bit crossbar conﬁguration

selector. When X0 drives the value of “000”, the crossbar selects the
pre-loaded conﬁguration 0, “001” selects the conﬁguration 1 and “002”
selects conﬁguration 2 and so on.

X0_X16_Disable : An output bit that when asserted, causes Xilinx PE X16 to be isolated

from the crossbar and allows X0 to send or receive data from the

crossbar.

43

XO_XBar__Send : An output bit that when asserted causes the Xilinx PE X16 to be iso-
lated from the crossbar and allows X0 to send or receive data from
the crossbar.

X0_Int : An output bit that is an interrupt output signal. The interrupt signals from all the
17 Xilinx PEs are combined with the 17th bit mask value to form the board
interrupt signal to the interface board.

X0_Broadcast_ln : An input bit that is the broadcast signal input from the Interface

Board.

X0_Broadcast_0ut : An output bit that is the common broadcast signal to each of the 16
Xilinx PEs.

X0_Reset : An input bit that is a system level input to the Xilinx PEs, which when
asserted, causes the Xilinx ICs to reset the state of all internal ﬂip ﬁops.This
port should not be used by the application programs.

X0_HSO, X0_HSl : The signals of the two handshake registers are bi-directional signals.
Signal X0_HSl is a common input to all 17 Xilinx PEs. X0_HSO
is an independent bi-directional signal to each chip.

X0_LED : an out Bit that is an active high output which controls the X0 LED on the back

of the board.

There is a pre-deﬁned architecture for the Xilinx_Control_Part in the Splash 2 library.

Blank : Completely passive architecture that should be inserted whenever the control X0

is not used by an application.

The VHDL code for the Control_Part in the “parts.vhd” is shown in Figure 4.12 as fol-

lows:

library SPLASHZ;
use SPLASH2.TYPES.all;
use SPLASH2.SPLASH2.all;

use SPLASHZ. C 0MPONENT S.all;

entity Xilinx_Control_Part is
Part (X0_SIMD)
X 0_XB_Dataata
X0_Memory_Address
X0_Memory_Data
X0_Memory_Read
X0_Memory_ Write
X0_Memory_Disable
X0_GOR_Result_In
X0_GOR_Valid_In
X0_GOR_Result
X0_GOR_Val id
X0_Clk
X0_XBar_Set
X0_XI 6_Disable
X0_Int
X0_Broadcast_ln
X0_Broadcast_0ut
X0_Reset
X0_HSI, X0_I-IS2
),.

end Xilinx_Control_Part;

: bi-directional datapath;
: bi-directional datapath;
: bi-directional memaddr;
: bi-directional memdata;
: bi-directional bit;

: bi-directional bit;

: in bit;

: bi-directional_vector;

: bi-directional _vector;

: out bit;

.- out bit;

: in bit;

: out bit_vector(0 to 2);

.- out bit;

: out bit;

: in bit;

: out bit;

: in bit;

: bi-directional bit

Figure 4.12. File “Parts.vhd” Control_Part Architecture Declaration.

Architecture Blank is a completely passive control part and is shown in Figure 4.13.

45

architecture Blank of Xilinx_Control_Part is
begin

X0_SIMD
X0_XB_Dataata
X0_Memory_Address
X0_Memory_Data
X0_Memory_Read
X0_Memory_ Write
X0_GOR_Result_in
X0_GOR_Valid_in
X0_GOR__Result

<= TriState(X0_SIMD);

< = TriState(X0_XB_Dataata);
< = TriState(X0_Memory_Address);
< = TriState(X0_Memory_Data);
<= ’0';

<= '0';

< = TriS tate( X 0_GOR_Result_In );
< = TriS tate( X0_GOR_ Val id_ln );
<= '0';

X0_GOR_Valid <= '0';
X0_XBar_Set <= "000";
X0_X16_Disable < = '0';
X0_Int <= '0';
X0_Broadcast_0ut <= '0';
X0_HSI <= 'Z';
X0_HSZ <= ‘Z';
end Blank;

Figure 4.13. File “Parts.vhd” Blank Architecture Declaration.

4.7 Crossbar

The crossbar connects a 36—bit bidirectional port of each of the 16 Xilinx processing part
(X1 - X16). The control IC, X0, shares the crossbar port of X16. The crossbar can store
upto eight conﬁgurations, with X0 selecting between them. The simulator model for the

crossbar reads the control ﬁle which describes up to eight crossbar settings.

The Xilinx_Processing_Part entity contains a 5-bit port to control the active-low output
enables of its corresponding crossbar port. The control port is declared as :

XP_XBar_Enable_L : out Bit_Vector (4 downto 0);

46

The least signiﬁcant bit (XP_XBar_Enable_L(0)) controls the least signiﬁcant byte of the
crossbar (XP_XBar (7 downto 0)) while the most signiﬁcant (bit 4) controls the most sig-
niﬁcant byte (XP_XBar (35 downto 32)). Thus, since the crossbar output enables are
active-low, a ‘1’ on an enable signal implies the corresponding crossbar byte is receiving
from the Xilinx IC. Similarily, a ‘0’ implies the crossbar is sending to Xilinx chip. The
user must be careful not to allow both the crossbar and the Xilinx chip to drive these sig-

nals simultaneously. The default state of XP_XBar_Enable_L therefore should be ‘1’.

The Xilinx_Control_Part entity contains two ports, X0_X16_Disable and
X0_XBar_Send, which control the shared crossbar connection. When X0_X16_Disable
is high (logic ‘1’), the output enables associated with X16 are disabled, independent of the
state of X16’s XP_XBar_Enable_L port, and the direction of the crossbar is controlled by
the X0_XBar_Send: a logic ‘1’ causes X0 to drive into the crossbar, a logic ‘0’ enables the
crossbar to drive to X0. When X0_X16_Disab1e is low, X0 is isolated from the crossbar.
However, X0 cannot selectively control individual bytes of the crossbar, when
X0_X16_Disable is asserted, X0 must drive or receive all 36 bits. The default architecture

for X0 (Blank) sets both the outputs to ‘0’, isolating X0 from the crossbar.

The crossbar model in the simulator is passed a generic parameter that contains the name
of the ﬁle, which consists of up to eight settings. The ﬁrst line of the setting consists of the
key word “conﬁguration” followed by the integer value 0 to 7 (both inclusive). The
remaining lines of the ﬁle are of the form:

output-port-number input-port-speciﬁer

47

where the “output-port-number” ranges from 1 to 16 and corresponds to the Xilinx PE
number. The “input-port-speciﬁer” is either a single integer (0 to 16), or 5 integers (0 to 16
each). If it is a single integer in the range (1 to 16), it speciﬁes the source port of all 36 bits
of the output. A value of zero is used to indicate port “output-port-number”. This value is

the input to the crossbar switch.

If the “input-port-speciﬁer” consists of S integers, each of the integers speciﬁes the source
for one byte of the output, from the most signiﬁcant to the least signiﬁcant bit. If an output
port number is missing from the conﬁguration, it is assumed to be set to zero. Also, in
order to disable the crossbar port, both the “input-port-speciﬁer” and the

XP_XBar_Enable_L signals must be set to ‘0’ and ‘1’ respectively.

Since the crossbar conﬁgurations are deﬁned from 0 through 7 (both inclusive), they can
be deﬁned as follows :

dd 50 $1 32 s3 54
where dd represents the destination PE identity, 50 represents the source (processing ele-
ment) for the bits (0...7), 31 represents the source for the bits (8...15), 52 represents the
source for the bits (16.23), 83 represents the source for the bits (24...31), and s4 repre-

sents the source for the bits (32.35) respectively.

48

For example conﬁguration 0 may be deﬁned as:

16 16 l6 16 16
16 l6 l6 16 16
16 16 16 16 16
16 l6 16 16 16
l6 l6 l6 l6 l6
16 16 16 16 16
16 16 l6 l6 16
16 16 16 16 l6
l6 l6 l6 16 16
10 16 16 16 16 16
ll 16 16 16 16 16
12 16 16 16 l6 16
10 16 l6 16 16 16
13 l6 l6 16 16 16
14 16 16 l6 l6 16
15 l6 16 16 16 16
16 16 16 l6 l6 l6.

\OOO\)O\UIJ>U)Nv—‘

In this example, the X16 represents the X0 data since X0 and X16 both share a common
port on the crossbar. Thus 1 16 16 16 l6 16 represents “PE X1 gets data from PE X16
(X0)”. Also whenever the ConﬁgXBar ﬁle is invoked by the T2 debugger, multiple con-
ﬁguration ﬁles, as shown, program the crossbar with the appropriate conﬁguration ﬁles

(maximum of eight conﬁgurations).

49

Conﬁguration 1 may be deﬁned as :

l 1 l 1 1 1 PE X1 gets no data from the data bus
2 l6 l6 l6 16 16 PEXZ getsdatafromPEXO

3 3 3 3 3 3 PE X3 gets no data from the data bus
4 16 16 l6 l6 l6 PEX4getsdatafrom PEXO

5 5 5 5 5 5 PE X5 gets no data from the data bus
6 16 16 16 16 16 PEX6getsdatafrom PEXO

7 7 7 7 7 7 PE X7 gets no data from the data bus
8 16 16 l6 l6 l6 PEX8getsdatafrom PEXO

9 9 9 9 9 9 PE X9 gets no data from the data bus
10 16 16 16 l6 l6 PEXIOgetsdatafromPEXO

11 11 11 ll 11 ll PEXIIgetsnodatafromthedatabus
12 16 l6 16 16 16 PEXIZ getsdatafrom PEXO

13 13 13 13 13 13 PEX13getsnodatafrom thedata bus
14 l6 16 16 16 16 PEXI4getsdatafrom PEXO

15 15 15 15 15 15 PEXI5getsnodatafrom thedata bus
16 16 l6 l6 16 16 PEX16getsdatafromPEXO.

In this example, 1 l l 1 1 1 represents “PE X1 gets no data from the data bus” however,

2 16 16 16 16 16 represents “PE X2 gets data from PE X0". Thus only the even number of
PBS receive data from PE X0. Also, whenever the ConﬁgXBar ﬁle is invoked by the T2
debugger, multiple conﬁguration ﬁles, as shown above, program the crossbar with the
appropriate conﬁguration ﬁles (maximum of eight conﬁgurations). The different conﬁgu-
rations are selectable from the control PE X0. A port, X0_XBAR_SET, tells the crossbar
which of the conﬁgurations is to be used. In case multiple conﬁgurations (greater than six-

teen) are to be selected, multiple ConﬁgXBar calls can be made.

In the following example, the crossbar settings (0 and 1) are deﬁned. In the ﬁrst setting,
the odd number of the Xilinx PEs drive the crossbar, and the even numbered Xilinx PEs
receive the data from the crossbar. In the second setting, only PE X2 receives, but it is

receiving only one byte from each of the PEs X9, X7, X5, X3, and XI. The crossbar ports

50

of PBS X3 and X16 are explicitly disabled.

The ﬁle containing the conﬁguration of the crossbar is passed to the crossbar model in the
Splash 2 conﬁguration ﬁle “conﬁg.vhd” using the following lines of code to the Splash 2
structure:

for Xbar : Splash_Crossbar

use entity SZBoard.Splash_Crossbar (Behavior)
generic map (Conﬁg_File => “test.xbar”);

end for;.

4.8 Memory

The memory model used by Splash 2 simulator allows the user to specify a set of initial
contents for the on-board memory from an ASCII input ﬁle. The name of the ﬁle is passed
in as a generic parameter to the entity Memory_Part. The default value (a null string)
implies that no initialization of memory is performed. During simulation, a read operation
from an uninitialized and unwritten memory location will lead to an error. Thus, in order
to specify the initialization of the memory, the ﬁle name must be passed on to the appro-

priate Memory_Part through the Load_File generic in the “conﬁgvhd” ﬁle.

The format of this initialization ﬁle is fairly straightforward. For each set of the contiguous
blocks of data, the base address of the block is given followed by one or more data values.
The base address speciﬁcation is the line containing the address followed by the decimal,
unsigned integer. The data values are speciﬁed beginning on a new line, each line may

specify any number of values. The data values are also interpreted as unsigned decimal

51

values. For example, to load the ﬁrst eight locations of the memory with consecutive inte-
gers, the following ﬁle may be speciﬁed:
address 0
1 2 3 4 5
6 7 8 9 10.
Similarily, multiple blocks of the data may be speciﬁed within a single ﬁle separated by an
address speciﬁcation. For example :
--First block of data
address 0
1 2 3 4 5
7 8 9 10 11.
«Second block of data
address 255
2 3 5 7
ll 12 13 17
«End of ﬁle.
Alternately, in case the entire memory has to be initialized to zero with a single line in the

ﬁle consisting of the key word “clear”:

clear
--End of ﬁle.

4.9 Summary

This chapter provides a description of the programming environment of Splash 2 and the
ﬁles that are required to emulate designs on ﬁeld-programmable gate arrays. The ﬁles
described allow conﬁguration of the crossbar and the processing elements based on the

application.

Chapter 5

Bar Code Identiﬁcation On Splash 2

If a conceptual description is to be made, the machinery for making it ought to show
itself in language. If a distinction cannot be made in language, it can not be made concep-
tually.

N. R Hansen

5.0 Bar Code Identiﬁcation

As per the earlier discussion on bar codes and their ever increasing applications, it is
highly desirable to design a bar code recognition system which can automatically scan the
bar code and decode the relevant information from it. In such a system, it is necessary to
ﬁrst obtain an estimate of the location of the bar code. Locating or localizing a bar code
from an image can be a highly complex task, as the image may consist of text and graphics

in addition to the bar code.

5.1 Texture Analysis

A bar code represents a homogeneous region in an image that can be distinguished in
terms of texture. Conventional segmentation techniques cannot be applied directly to a
textured image as the gray scale levels of individual pixels do not provide adequate infor-

mation. Contextual information describing the spatial relationship between the gray levels

52

53

of the pixels is necessary.

5.2 Multi-channel Filtering

Jain and Farrokhnia [11] have developed a texture segmentation system based upon a
multi-channel ﬁltering technique using gabor ﬁlters. This system has been applied to doc-
ument image processing applications such as text/graphics separation and address block
identiﬁcation on envelope and magazine covers [14]. The algorithm used for the texture
segmentation based on gabor ﬁlters is shown in the Figure 5.1. This algorithm can be
decomposed into ﬁve modules: (i) ﬁltering, (ii) nonlinear transformation, (iii) feature

computation, (iv) clustering, and (v) classiﬁcation.

The ﬁltered images are obtained by applying a bank of gabor ﬁlters with different fre-
quency and orientations to the input image. The ﬁltering technique in the spatial domain

can be successfully applied on the image using Splash 2.

0 By using a k x k mask, g(m,n) for 1 S m, n S k , each ﬁltered output pixel is computed

usrngbi’j= 2 2 gm.n.ai-m-i-n
n=l,km=l,k

° The nonlinear transformation, being a point operation can be performed using a look-up
table technique. This is a processing module that can reside on the PBS of Splash 2.

Thus, based on the nonlinear function, f, each transformation pixel is deﬁned as

ci’j = f(bi’j).

54

0 The feature images deﬁne the “local energy” measure in a small window around each
image pixel in the response image. Computation of the feature image is a neighborhood
operation and is efﬁciently mapped onto Splash 2.

0 The square—error clustering algorithm is performed on a set of pixels which are randomly
sampled.

0 The ﬁnal segmentation is implemented by applying the minimum distance classiﬁer to
every pixel using the cluster center information generated by the square error clustering

algorithm.

 

Input Image

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

l l | l I Gabor Filter Masks
l l H}
l l I ' * Filtered Images
lit e
l l l M—H Feature Images
1 (local energy)

 

l l Classiﬁcation

 

Fig. 5.1 An Overview of the Bar Code Localization Algorithm.

56
5.3 Unsupervised Segmentation Method

5.3.1 Implementation Using Serial Code

The pixels in the bar code region display rapid tonal variations along one direction. Thus,
high radial frequencies are useful to identify these variations. In case of the unsupervised

method, the pixels of the image are not classiﬁed into classes. The two highest radial fre-

quencies 32./i and 64/2 c.p.i. (cycles/image) for a 256 x 256 image and four orientations
(0, 45, 90, and 135 degrees) are of interest. The eight ﬁltered images as shown in Figure
5.3. The 256 x 256 image is scanned at a resolution of 100 dpi. The input image is passed
through each of the eight ﬁlters to generate eight images. The ﬁltered images further gen-
erate feature images. Figure 5.5 gives the mean values of the eight gabor ﬁlter features
shown in Figure 5.4. There are three different lines representing three different texture
regions present in the input image. The regions are bar code (solid line), text (dashed line),

and picture/half tone (dotted line) regions.

The difference in the responses to the ﬁlters of the bar code and the non-bar code regions

is apparent in Figure 5.4. When 9 = 90° (ﬁlters: 3 and 7), the feature corresponding to

the bar code has the lowest response, while when 0 = 0° (ﬁlter nos: 1 and 5), it has the
highest response. The difference between the mean value of the adjacent ﬁlters for the bar
code region is larger than the one for other regions. This characteristic of the bar code dis-

tinguishes it from other non-bar code regions in the input image.

57

The unsupervised segmentation algorithm consists of the following steps:

0 Input the bar code image through a bank of gabor ﬁlters to obtain eight features for
every pixel. Normalize these features to zero mean and unit standard deviation in order
to avoid some features dominating the others.

° Cluster the normalized pattern vectors using a squared-error clustering algorithm [11].
In order to reduce the computation overhead, the algorithm is applied to 6% of the ran-
domly chosen patterns to obtain the centroid of each cluster.

0 Assign all the patterns (pixels) into k clusters using the minimum Euclidean distance
classiﬁer; k is predeﬁned.

' Identify the bar code region in the segmented image using the following heuristic: per-
form the connected component analysis, and remove all “small” or “large” connected
components. Another heuristic, based on the aspect ratio, can be used if more than one
connected component survives the size heuristic.

0 Finally, an identifying or bounding rectangle is computed for the identiﬁed bar code

region.

5.3.2 Unsupervised Segmentation

Since we speciﬁcally need to localize the bar code from the background, a priori knowl-
edge can be used to design a two—class supervised classiﬁer. This not only results in a bet-
ter classiﬁcation accuracy, but requires less computation time. However, the eight gabor
ﬁlter features used in the unsupervised segmentation method are not inherently rotation-

invariant. Although this is not a problem for the unsupervised segmentation, using these

58

features in the supervised method does limit the applicability of this method to bar code
images with a speciﬁc orientation rotation. Thus, it becomes important to ﬁnd features

with rotation-invariance property.

5.3.3 Rotation-invariant Features

An important characteristic of bar code is that the ﬁlter with its orientation perpendicular
to the bar code always has the highest response (see Figures 5.3), and the ﬁlter with orien-
tation along the bar code lines always has the lowest response (see Figure 5.3). The ﬁlters
with orientations in the middle of the above two ﬁlters (45, and 135 degrees) have almost
the same response (see Figure 5.3). Suppose that we have gabor ﬁlter for every orienta-
tion. After taking the difference between pairs of orthogonal features, one of the differ-
ences should give the highest mean value for the bar code region, while another one
should give the lowest mean value. These two differences are rotation-invariant. In order
to obtain more accurate results, more ﬁlters with additional orientation selection property
are needed. However, a large number of ﬁlters will increase the computation time. Thus, a

trade-off between the number of ﬁlters and computation time.

For each image, the new feature images at eight different orientations are generated. The

orientations are 0°, 225°, 45°, 675°, 90°, 112.5°, 135°, 157.5 °, respectively. The algo-
rithm for generating the rotation-invariant features for each radial frequency has the basic
three steps:

0 Generate eight gabor features, f(0), corresponding to the eight orientations.

' Compute the difference between the pairs of orthogonal features:

59

Af0=|f(0i)-f(9i+90°)|,i=1,2,3,4

where 91 = 0°,92 = 22.5°,93 = 45°, 94 = 675°.

0 For each i, i = 1, ..., 4 , compute the mean value over a window centered at each pixel,

then select v9. which has the maximum value:

1(x0, yo) = Z (Afep (x+x0, y +y0 ) —Afe_ 59 (x, y) ),where W is a window
(x,))) E W I 11:4

of 50 x 50 pixels (less than the size of the bar code). Using A f 9* + 45° for i = l, 2 and
A f 9* _ 45., for i = 3, 4. Then, select 0k , which has the maximum value of v among the

values {v9 , ..., ve } . The new rotation-invariant features are:
1 4

F1 = A f 9* and F 2 = A f e+45°' Using two highest radial frequencies, a total of four

rotation invariant features are available for the supervised segmentation method of analy-

sis.

.ll lllllllllllllll.

Figure 5.2. Bar Code Image.

hill!!!)

   
    

1

3I

til. 1.... u .s: \ : “If“ T?” ‘
(i)

(ii) (iii) (iv)

(v) 9 (vi) (vii) (viii)

Figure 5.3. Filter Images Generated from the Bar Code Image as shown in Figure 5.2.

61

$.7m-m"-B

 

(V) (Vi) (vii) (viii)

Figure 5.4. Feature Images Generated from the Filtered Images as shown in Figure 5.3.

62

 

 

 

 

 

9
Q 1
Q)
2 <0 -
(U
>
C
3 v
2
N '1
o I T r I
2 4 6 8
FilterNo.

Figure 5.5. Mean Feature Values for the Feature Images. (Each line represents a different
texture region for the input image.)

63

5.3.4 Supervised Segmentation Method

The supervised segmentation method uses the two highest radial frequencies with eight
orientations. Each of the frequencies has four features calculated by taking the difference
of pairs of orthogonal features. Applying window averaging operation on these different
images enables selection of the difference image with the maximum mean value and its
corresponding difference image with the minimum value. Four features, two for each

radial frequency, are obtained.

A one-layer feedforward neural network (4 input units, 1 output unit) performs the simple
two-class classiﬁcation (bar code versus non-bar code). The training patterns for the neu-
ral network are picked randomly from a set of rotated versions of the same bar code
image. The number of training pixels for each image is 3000. The network is trained to
perform 2000 iterations of the weight vectors. After the training is completed, the same

network is capable of classifying other bar code images.

Automatic bar code localization based on multi-channel gabor ﬁltering technique is
applied to the bar codes successfully. Both supervised and unsupervised methods are

robust to variations in the orientation, scale, and shift of bar codes.

The ﬁltering and the feature extraction operations in the algorithm account for most of the
required computations. However, regardless of the number of ﬁlters used, these algo-

rithms can be implemented to perform operations in parallel. The use of non-linear trans-

64

formation following the linear ﬁltering operations has been suggested as one way to
account for the inherent nonlinear nature of the biological visual systems [12]. The local-
ized ﬁltering operation using gabor ﬁlters followed by the “squashing” nonlinear transfor-

mation can be interpreted as a multi-scale blob detector [1 l].

The limitation of the texture segmentation algorithm is that it lacks a criterion for choos-

ing the value of Cl in the nonlinear transformation. The value of or = 1/ 4 is used for the
bar code localization. Also, the algorithm assumes that different channels are independent
from each other. However, there is a psychophysical and physiological evidence indicat-
ing inhibitory interactions between different spatial frequency channels [11]. Some of the
research incorporates such interactions in their experiments [11]. Allowing inhibitory
interactions among the channels is shown to have the capability of reducing the effective

dimensionality of the feature space [12].

5.4 A Suggested Approach to Bar Code Identiﬁcation On Splash 2

The previous section describes the serial implementation of bar code analysis. Extending

the concept to ﬁeld programmable devices, the suggested approach to bar code analysis on

Splash 2 can be sub-divided into the following basic steps:

- Step 1: Deﬁne lockup tables in the local RAMs of multiple PEs. Load the Gabor ﬁlter
mask data as the entries to the tables (see Figure 5.6 (a) Step 1).

0 Step 2: Apply convolution to the image data using the ﬁlter masks data stored in the
lookup tables to obtain a set of ﬁltered images. The convolution of the bar code image

can be performed in the PEs whose local RAMs store the mask data (see Figure 5.6 (a)

65

Step 2).

' Step 3: Apply a non-linearity to the set of ﬁltered images to obtain the feature images of
the bar code images. This can be achieved by a table lookup performing convolution
of the ﬁlter image data with a hyperbolic tangent function (see Figure 5.6 (b) Step 3).
The trigonometric function data for the convolution resides in the local RAMs as
lookup table entries. The resulting image data corresponds to the feature data for each
of the ﬁltered bar code images. The hyperbolic tangent function can be expressed in
terms of an exponential function (see Figure 5 .8).

0 Step 4: Classify the image feature data using a simple perceptron implementation in
VHDL (see Figure 5.6 (b) Step 4). Altemately, the feature data can be returned back to

the Sun Sparcstation to perform the classiﬁcation of bar code data.

Each of the above mentioned steps have been illustrated in Figures 5.6 (a) and (b) and

described using VHDL code in Figures 5.7, 5.8, and 5.9.

RAM

66

RAM

 

 

 

Gabor Mask
LUT

 

 

Gabor Mask
LUT

RAM

 

 

 

 

 

,l

3

PE

3

Gabor Mask
LUT

L

 

l

H . H .

Step 1: Load Gabor mask data into the local RAMs of the PEs

RAM

RAM

 

 

 

Gabor Mask
LUT

 

 

 

Step 2: Load bar code data into the PBS and perform convolution with Gabor masks

i

Gabor Mask
LUT

RAM

 

 

 

 

l

H FE

i

Crossbar

Gabor Mask
LUT

 

- l

H mi

l
I

Figure 5.6 (a). Steps Involved in Indentifying Bar Code Images on Splash 2.

 

 

67

 

 

 

RAM RAM RAM
tanh () tanh () tanh ()
LUT LUT LUT

 

 

 

 

 

 

 

 

 

+1 H m H . l

I Crossbar

 

Step 3: Perform convolution of the bar code ﬁlter images with tanh () LUT stored
in RAM to obtain feature images

RAM

 

Weights
LUT

PE
Simple Perceptron

Step 4: Perform classiﬁcation of the feature images using a neural network classiﬁer

 

 

 

 

Figure 5.6 (b). Steps Involved in Identifying Bar Code Images on Splash 2.

68

The lookup tables illustrated in Figures 5.6 (a) and (b) can be implemented efﬁciently as

shown in Figure 5.7.

subtype Byte_Int is Integer range -128 to 127;

subtype Index_1 is Integer range 0 to 12;

subtype Index_2 is Integer range 0 to 12;

type 7Table is array (Index_1, Index_2 ) of Byte_Int;

( VHDL arrays are assumed to be allocated in row-major,
last-index varying fastest, order.)

--Assuming that Index;2 is ascending
type Tfable is array (Index_1, Index_2 ’Low to Index_2 ’High/Z ) of Integer;

- -Rou tine to access the array
ﬁtnction Tablelookup (TableﬂTable; I_I : Index_1; I_2: Index_2) return Byte_Int
ts
variable Temp: Integer := TI‘able (I_I, I_2/2);
variable Bias: Integer := I_2 mod 2;
begin
if Bias = 0 then
- - Need lower half of temp:
return Temp mod 256;
else
- -Need upper half of temp;
return Ramp/256;
end if
end Table Lookup;.

Figure 5.7. Table Lookup Architecture Declaration for Storing Data in the Local
PE RAMs.

69

The VHDL code for the exponential function is shown in Figure 5 .8.

function EXP (X: real) return real is
--returns e**X; where e =MATH_E
-- This function computes the exponential using the following

-- series:exp (x) = 1+ x + x**2/2! + x**3/3! + ...; x > 0
constant MATH_E; real := 2.71828_18284_59045_23536;
constant Eps : real := 0. 000001; -- precision criteria
variable Reciprocal: boolean := x < 0.0; -- check the sign of the argument
variable Xlocal : real := abs (x); -- use positive value
variable Oldval : real; -- variables for series evaluation

variable Num : real;
variable Count: integer;
variable Denom : real;
variable Newval : real;

begin
-— compute value for special cases
If X = 0.0 then
return 1.0;
else
if X = 1.0 then return MATH_E;
end if;
-- compute value for general cases
Oldval := 1.0;
Num := Xlocal;
Count := I;
Denom := 1.0;
Newval := Oldval + Nam/Denom;
while ( abs(NEWVal - Oldval) > eps ) loop
Oldval := Newval;
Num := Num * Xlocal;
Count := Count + I;
Denom := Denom * (real ( Count) );
Newval := Oldval + Num/Denom;
end loop;
if Reciprocal then
Newval := 1.0/Newval;
end if;
return Newval;
end EXP;.

Figure 5.8. Exponential Function Architecture Declaration.

70

5.5 A Suggested Mapping of Simple Perceptron onto Splash 2

Lately, artiﬁcial neural networks have been extensively simulated on conventional com-
puters. They are capable of providing solutions to complicated problems. However, highly
parallel architectures like Splash 2 can be optimized to accommodate these neural models.
The simple perceptron neural network model has been used for pattern recognition and
robotics [15]. The architecture suggested for implementation on Splash 2 is designed to
exploit the concurrency of the neural networks [41]. The processing elements on Splash 2
are capable of performing multiplication (as a table lookup), addition, and function evalu-
ation, while the local RAMs are capable of storing I/O data. The host supplies the data and

receives trained data results from Splash 2.

The three major entities required are the controller conﬁguration, the neuron conﬁgura-

tion, and its associated RAM based LUT for storing the weight values (see Figure 5 .9).

In order to perform a simple classiﬁcation, the Splash 2 is conﬁgured as a SIMD machine.
The PEs are conﬁgured to allow a single layer of the algorithm to run on Splash 2. The
PEs local RAMs contain the initial weights and thresholds. The contents of the local

RAMs are duplicated to allow broadcast to the left and right processing elements.

Finally, the neural network based classiﬁcation has the PEs perform the sum of products
using the broadcast values and their corresponding local weights. Once all the input pat-

terns of the bar codes are presented, the ﬁnal sum of products plus the threshold is com-

71

puted and the sigmoid function is performed on all the PBS. The number of physical
neurons may be limited by the number of FPGAS available on the Splash 2 array board.

Sum of products operation takes place in parallel as the required weight values are avail-

able locally.

entity Local_ram is

port (XP_Data: inout Datapath;
XP_Address: in Address;
XP_Clk : in Bit;
XP_Control : in Bit;

),'

end Local_ram;

architecture Behav of Local_rarn is
type Ram_array is array (1 to memsize) of ﬂoat_num;
signal Count: integer;
begin
process
ﬁle In_datajile: TEXT IN “./data_ﬁle_1 ”,-
ﬁle 0ut_data_ﬁle: TEXT OUT “./data_ﬁle_2
variable In_line, 0ut_line: line;
variable Weight: ﬂoat__nurn;
variable V_address: addr;
variable L_rarn: Ram_array;

begin
wait on control, address, clock;
if control = load then «load data to memory

while not (endﬁle (In_dataﬂe» loop

readline (In_datajile, In_line);

read (In_line, V_address);

read (In_line, Weight);

L_ram (intval(V_address)) := Wright;
end loop;

Figure 5.9. local RAM LUT for Weight Values.

72

control = read then -- read from memory
data <= L_ram ( intval( address) );
else -- write to memory

L_ram ( v_intval( address ) ) : = data;
write (0ut_line, Weight);
write (0ut_line, V_address);
writeline (0ut_data_ﬁle, 0ut_line);
endif;
end process;
end behav;

Figure 5.9 (continued).

5.6 Summary

This chapter describes the approach to bar code identiﬁcation on Splash 2 based on the
multi-channel ﬁltering technique of image analysis. The VHDL synthesis code for this
approach suggests that the technique can be extended to implement bar code identiﬁcation

on Splash 2.

Chapter 6

Mapping the Application onto Splash 2

A hard beginning maketh a good ending.

John Heywood

6.0 Design Implementation

This chapter provides a detailed study of the programming environment of Splash 2 and
the steps involced in mapping and application onto PFs. Using the Synopsys Design
'lbols, the VHDL code is combined into one interface ﬁle, the EDIF ﬁle. Next the EDIF
ﬁle is converted to a Xilinx Netlist Format (XNF) ﬁle. The circuit is partitioned into LCA
blocks with consideration given to the user-supplied constraints. The fourth phase consists
of placing and routing these partitioned LCA blocks. The run—time environment for the

design implementation is described in the next section.

6.1 Run-time Environment

The run-time environment of Splash 2 is composed of the Splash 2 environment variables,
the C runtime libraries, and the symbolic debugger T2. This environment provides the user

interface for executing and debugging Splash 2 applications.

73

74
6.1.1 Splash 2 Execution

6.1.1.1 Initializing the Splash 2 Environment

The environment for Splash 2 is established by sourcing the setup script, namely using
the setenv and set path commands. The environment variables deﬁning the default bit-
stream ﬁle for the Control element, the default bitstream ﬁle for all the other processing

elements, and the binary symbol table may be set optionally.

6.1.1.2 Preparing the Splash 2 Application

An application that is implemented on Splash 2 uses the scripts from the deﬁned directory
in the following sequence is:

vhdl_to_edifﬁlename.vhd

edif_to_xnfﬁlename.edif

xnf_to_bit ﬁlename.xnf
The script vhdl_to_edif command invokes the shell script of the Synopsys synthesis tool.
The user deﬁned VHDL application program goes through the three processes: reading the
source code (“analyzing” and “elaborating” in VHDL terminology), optimizing the source

code (or “synthesizing” the logic), and writing the source synthesis to an output ﬁle (in the

EDIF 2 0 0 netlist format [1]).

The FPGA architecture imposes a different kind of constraint on synthesis tools in com-
parison to ASICs. The XC4000 devices consist of an array of CLBs. Each CLB has func-

tion generators (i. e., lookup tables) and registers. By default, the Design Compiler and

75

FPGA Compiler do not ﬂatten out the design, and thus designs should be ﬂattened to elim-
inate all of the existing logic structure. F lattening is recommended when the number of
CLBs needed to implement a boolean function seems too high or there are too many logic
levels. However, highly regular structured designs (such as adders and ALUs) that have

been designed with an explicit structure should not be ﬂattened.

Flattening is especially useful for FPGA CLB structures. The FPGA Compiler has a built
in optimizer for boolean logic. In order to use the algorithm effectively, the structure must
be decomposed so that the boolean logic can be mapped into the CLB function generators.

The portions of the design that are not going to change are recommended to be ﬂattened.

Any changes made using an optimization tool to a module can change the entire module.
Thus the old module will no longer serve as a guide to the changed modules partitioning,

placement and routing.

6.1.1.3 The Ungroup_all and Boundary_optimization Options

The CLBs contain both boolean logic implementation in Function Generators and regis-
ters. When a hierarchial design which uses a DesignWare module is compiled, the logic is
NOT optimized across the hierarchial boundary. This means that some of the combinato-
rial logic will end up with unused ﬂip-ﬂops while the CLBs that implement the registers
will have unused function generators. Also the boolean logic from one hierarchy to

another will not be optimized to reduce the CLB area or logic levels.

76

The choice of hierarchial boundaries impact the quality (i. e., area. or speed) of the synthe-
sized design. By using the Design Compiler, designs can be optimized while preserving
the hierarchial boundaries. However these optimizers help to improve designs only par-
tially. Complete optimization of the design is possible across parts of the design hierarchy

that are ﬂattened in the Design Compiler.

A quick check to ﬁnd out if the design has been partitioned suboptimally is to ﬁrst opti-
mize the design with hierarchy and then compare the results with those obtained by opti-

mizing the design without hierarchy.

6.2 General Compiling Methodology

6.2.1 Compiling Without Constraints

The wireload model of the application may not represent the application accurately. Thus
it becomes difﬁcult to set timing constraints in Design Compiler or FPGA Compiler that
correlate to the actual end results (after the PPR tools are executed). Thus in order to set

realistic constraints, the design should be compiled without any constraints.

6.2.2 Setting Design Constraints

Using the max _period, max_delay or max_area commands the timing of the design or the

area constraints are set up. Thus the typical design will include the following commands:

max _period 50.0 CLK

77

set _flatten true -eﬁort medium
compile -map-eﬁort me -ungroup_all.

6.3 VHDL_to_EDIF

The command syntax for vhdl_to_edif command is speciﬁed using the parameters top
level design entity, the speciﬁed output ﬁle, bounds for the board, bounds for the process-
ing elements. The speciﬁed output ﬁle is created with a “.edify” extension, and the synthe-

sis tools are invoked separately for each unique pair of the two bounds.

6.4 EDIF_to_XNF Conversion

The EDIF netlist produced by the Synopsys synthesis tools is translated into XNF by the
program edif_to_xnf. In the process of translating the edif ﬁle to the netlist format, the
VHDL port names get assigned to physical pins, IOB logic gets synthesized along with
the global clock buffers, hard macros are identiﬁed and ﬂip ﬂop enable inputs are
exploited, and the STARTUP block is inserted to connect the global tristate and the global

reset signals.

6.4.1 Environment

The design environment for the mapping of the applications includes the following ﬁles:

78

edif_to_xnf :
The command loads three ﬁles (subroutines) from the library. Each of these subroutines is

in a Common Lisp ﬁle.

xilinx.lisp :

The ﬁle contains the cell name and pin name translations and the pin directions for the set
of Xilinx XC4000 logic primitives. The table is represented as a Lisp association list
(alist). Each entry in the alist is the list that represents the primitive component. The ﬁrst
element of an entry is the name of a primitive in the Synopsys technology library; the sec-
ond element is the name of the corresponding Xilinx primitive; and the rest of the entry is
a list of the pin identiﬁers. Each pin identiﬁer is a list with three parameters: the ﬁrst ele-
ment is the pin generated by the Synopsys tools; the second element is the corresponding

Xilinx pin name; and the third element is the direction of the pin.

xhard.lisp :

The ﬁle contains a list of the Xilinx 4000 hard macros.

xpins.lisp :

The ﬁle contains tables of design port name to physical pin name translations for each of
the Splash 2 computing elements. These tables are represented by the Lisp association lists
(alists). The CAR of each element is a port name as it appears in the EDIF netlist; the CDR
is a list of the physical pin number or number, optionally followed by a set of properties

use to deﬁne the global signals.

79

6.4.2 Execution

edif_to_xnf may be invoked from the Unix command line using the command with the
parameters output (.xnf) ﬁle, input (.edif) ﬁle. The.edif extension on the input ﬁle is
optional, the XNF output is written to the standard output. In addition, the Xilinx part
name (4010), is speciﬁed as used in the Splash 2 system, the library directory in which all

the related ﬁles are deﬁned, the start-up block deﬁnition option.

6.5 Xilinx Placement and Routing

The FPGA vendor, Xilinx Inc., has provided the software for the placement and routing,
ppr; makebits, the software that converts the placed and routed netlist into the binary
FPGA conﬁguration format; and xact, the graphical tool for manual editing of the FPGA

conﬁguration.

6.5.1 Environment

The environment for the Xilinx tools is established by same setup script as that used by the
Synopsys software. This script allows for the deﬁnition of the several environmental vari-
ables that identify the locations of the executable and library ﬁles and also modiﬁes the

user’s search path to include the Xilinx executable directory.

80

The script in addition binds the environment variable XACI‘ to point to the directory con-
taining the latest release of the Xilinx software, invokes a Xilinx shell script and deter-

mines the amount of physical memory available for the tool.

6.5.2 Execution

A single shell script, xnf_to_bit, is available to facilitate the conversion from Xilinx
Netlist Format (XNF) to Xilinx bitstream format (.bit). The script is invoked using this
command with the parameters timing optimization, PPR/LCA option for makebits, and
XNF basename design ﬁle. If PPR is executed, xnf_to_bit makes the bit stream by invok-

ing makebits, and symbol_to_symbol to produce the bitstream and symbol tables.

The Xilinx PPR, may be invoked from the command line using the parameters deﬁning
the base name of the ﬁle containing parameter/value pairs. PPR reads the ﬁle design.xnf
and if it is deﬁned in the current directory, the constraint ﬁle, and produces the following

set of the ﬁles:
ppr.log : A transcript ﬁle of the output screen that is produced by PPR.
designJca : The ﬁle that deﬁnes the placed and routed LCA ﬁle generated by PPR.

designxpt : A report ﬁle that includes design statistics, pin locations, and the information

related to optimization.

81

In case the timing parameter is speciﬁed as ignored, the timing optimization phase of the

routing is ignored, and this helps producing inefﬁcient results quickly.

The program xdelay performs the static timing analysis of a placed and routed LCA ﬁle.
Xdelay produces a detailed list of all the critical path delays from one clock edge to the
other, from the pin to the clock, and clock to the pin. Xdelay is invoked by deﬁning the
parameters base name of the ﬁle produced by PPR, redirect option for the output report to

the ﬁle, short format timing option, or long format timing option.

The program makebits is used to create conﬁguration bitstreams from the placed and
routed LCA ﬁles produced by PPR. Makebits may be invoked from the command line by
deﬁning the parameters for the base name of the LCA ﬁle produced by PPR, the design bit
ﬁle, enable internal logic used by the readback option, tie unused internal interconnect so
as to reduce noise and the power consumption, logical pin map ﬁle to produce a T2 sym-

bol table.

The output generated by PPR discusses the design implementation of the broadcast of the
pattern found in the memory to the crossbar switch by a processing element based on the
handshake signal.The PPR Report ﬁle contains important information in the following cat-

egories.

0 Partition, Place and Route Summary

0 Chip Pinout Description

82

0 Critical Nets
0 Feedthrough Split Nets

0 Deletion Traceback

The Partition, Place and Route Summary includes the number of occupied CLBs that cor-
responds roughly to the total area number provided as part of the Synopsys report. An

example of a PPR report ﬁle is shown in Figure 6.5.

6.6 The Xilinx Synopsys Interface (XSI)

The XSI design toolkit allows Xilinx FPGA designs to be implemented using either the
Synopsys FPGA Compiler or Design Compiler software. Synopsys HDL synthesis soft-
ware creates and optimizes circuit designs from hardware description languages such as
VHDL. The FPGA Compiler uses libraries supplied by Xilinx, and VHDL that you create
and produces Xilinx, and VHDL ﬁles that you create and produces Xilinx Netlist Format
(XNF) ﬁles for the XC4000 family. The FPGA Compiler is automatically invoked if the
FPGA Compiler-speciﬁc libraries are speciﬁed as the target and link libraries. The Design
Compiler uses libraries supplied by Xilinx for the XC4000 families to produce EDIF ﬁles.

These EDIF ﬁles are translated to XNF ﬁles which are used to conﬁgure FPGAS.

The design ﬂow of VHDL-to—XC4000 as shown in Figure 6.1 comprises of the following
steps:

0 design analysis of the VHDL ﬁle using the Design Analyzer
0 conversion of .vhd ﬁle to a .sxnf ﬁle

83

- conversion of the .sxnf ﬁle to a .xnf ﬁle
- conversion of the .xnf ﬁle to ppr ﬁle

0 conversion of the ppr ﬁle to a .lca ﬁle
0 conversion of the .lca ﬁle to a .bit ﬁle to download to the LCAs on FPGAS.

 

.vhd ﬁle

l

Design Analer

l

.sxnf ﬁle

1

.xnf ﬁle

1

ppr ﬁle

I

.lca ﬁle

I

makebits ﬁle—h XChecker

 

 

 

 

 

 

 

 

 

,___> _report.out

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.1. Mapping of an Application onto Splash 2.

6.6.1 Compiling the Design

The optimization process is part of the Compile command. Optimization is guided by the

84

series of constraints speciﬁed. Mapping of the technology transforms the Boolean logic

network to a set of interconnected gates that are selected form the technology library.

6.6.2 Interpreting the Results

The timing results speciﬁed by Synopsys are in nanoseconds and are estimates determined
prior to the placement and routing. Accurate timing is not possible until after PPR is run
on the design. Synopsys estimates timing delays to include the wire-load delays in addi-
tion to the gate delays. The results obtained using PPR are better than the pre-placement

and routing delays estimated by Synopsys.

The area reported by Synopsys is for three parts: combinatorial area, non-combinatorial
area, and total area. The area reported is in terms of the number of Xilinx Combinatorial
Logic Blocks (CLBs) used. The CLB consists of four input function generators and two
ﬂip-ﬂops. The ﬂip-ﬂops and the 2-,3-, and 4-input Boolean functions are weighted for area

(assigned a value) at 0.5 CLBs. The 5-input primitives are weighted for area at 1 CLB.

Register intensive designs require the number of CLBs to be equal to the non-combinato-
rial area reported. If the design is highly combinatorial, then the number of CLBs required
is roughly equal to the combinatorial area reported. The CLBs required by the design can

be accurately computed only by the PPR software.

85

6.7 The Execution Process

Each processing element of each array board is associated with a bitstream ﬁle to be exe-
cuted and a symbol table for the evaluation of the associated symbols. If previously pro-
duced merged raw load ﬁles for the application exist then the assignments will be
automatically made when the speciﬁed raw ﬁles are loaded. In case new raw load ﬁles are
deﬁned then default bitstream ﬁles will be assigned for the PEs not associated by the user
based on the environment variables. Also, in case the user has not provided a bitstream
assignment or a default for a PE, then the default system ﬁles are used. The bitstream ﬁles
are merged into a raw load ﬁle via a “comerturn” operation. Once the “rawﬁle” is saved to
disk, an additional ﬁle that records the association of the PE—bitstream-symbol table ﬁles.

Each array board may load a unique raw load ﬁle.

Thus an array board that is being used for an application must be conﬁgured prior to exe-
cution. The sequence of operations related to the process are namely, download the raw

load arrays, download the crossbar conﬁguration ﬁles if used.

6.8 T2 Debugger

The Splash 2 debugger performs a large number of tasks including initialization, associa-

tion and merger of the bitstream data and conﬁguring of the Splash 2 array boards.

The initialization of Splash 2 includes opening, resetting, and setting the information lev-

els of the Splash 2 array boards. Association of the Splash 2 includes assigning the bit-

86

stream to the processing elements, reading and writing the Splash 2 merged bitstream to a
ﬁle. Conﬁguring the Splash 2 includes downloading the merged bitstream, downloading
the crossbar conﬁguration to the Splash 2 array, and clearing the PE local memory by ini-

tializing it with a zero ﬁll.

The interface board registers are also accessible using the T2 Debugger. The registers are
addressed using an identiﬁer and an appropriate read/write command for the related
“class” of the register. Typically, the read and write commands may be speciﬁed as fol-
lows,

RD identiﬁer

WR identiﬁer, value.
A typical T2 Debugger session includes T2 initialization (start-up), Splash 2 unit initial-
ization and execution of the Splash 2 array system. Execution of the Splash 2 array sys-
tem involves stepping through the program, debugging the program and ﬁnally shutting

the Splash 2 system down.

87

6.9 T2 Debugger Commands

The T2 debugger command consists of the syntax, command unit-list, board-list, pe-list,

arg-list.

The unit-list and the board-listt specify the Splash 2 array unit and the board address of the
corresponding Splash 2 array system, respectively. The pe-listt speciﬁes the processing
element related to the Splash 2 array system. In order to execute an application using T2,
the process involves creation of the related T2 customizing ﬁle, creation of the application

start—up ﬁle, and initialization of the application using the “source” ﬁles.
The T2 Debugger supports commands categorized as miscellaneous functions, readback
functions, message functions, ﬁle and data stream functions, including:

Readback unit-list, board-list

The status of the Splash 2 boards are readback using this command in order to determine
the current state of the hardware. This command allows the symbol table readback for

each of the PEs on the speciﬁed boards.

Ascii_to_Bin ﬁle

The binary data ﬁle is converted to an ascii ﬁle.

Clear_Memory unit-list, board-list, pe-list

The memory contents are cleared and the addressed memory is initialized to contain zeros.

88

Conﬂgure_Array unit-list, board-list

The selected Splash 2 array board is conﬁgured using the downloaded bitstream data.

Conﬁgure_CrossBar unit-list, board-list

The crossbar of the selected Splash 2 array is conﬁgured based on the conﬁguration ﬁle.

Fill_Memory unit-list, board-list, pe-list

The memory associated with the speciﬁed PEs are ﬁlled with the values speciﬁed.

List_PE unit -list

List the current symbol table associations for all the PEs speciﬁed in the unit list.

6.10 Results

Based on the suggested approach discussed in chapter 5, several communication and com-
putation algorithms are discussed in this section. These algorithms are essential for the

implementation of the bar code identiﬁcation using the multi-channel ﬁltering technique.

The PE reads the crossbar conﬁguration data and stores the value in the local RAMs. The
data are passed onto the right neighbor for additional computation. This algorithm allows
debugging via the RBus from X16 (see Figure 4.1). The data tranfer paths and VHDL
code for the crossbar algorithm are shown in Figures 6.2, 6.3, 6.4 and 6.5. This algorithm
provides an estimate of the execution speed for lookup table based calculations. In addi-
tion, this algorithm can be used for transferring data among PEs via the crossbar conﬁgu-

ration during the lookup table calculation for classiﬁcation of image pixels using a

89

perceptron as described in Chapter 5.

 

S‘BUS Read RD

 

 

 

 

 

   
 

 

 

 

 

 

 

 

 

—’ 256 x16 RD
. Memory WR
S-Bus Wnte rWR
S B 32 ' '16
- us
Address -/ 2', D...
S-Bus 32 I
Data 'l‘
J? 4
Processor
Inhibit ’
Processin .
To Left 36 Element g To l-hght
[ 3.
36
'Ib Crossbar

Figure 6.2 Data Transfers from the Processing Element.
(Dotted lines indicate the data transfer paths.)

90

type States is (Init, One, Two, Three);

signal State
signal Data
signal Address
signal Xbar_hold
signal Xbar_in

: STATES := INIT}

: Bit_Vector (15 downto 0);
: Bit_Vector (I5 downto 0);
: Bit_Vector (35 downto 0);
: Bit_Vector (35 downto 0);

begin

XP_Left <= Tristate (XP_Left);
XP_Right <= Tristate (XP_Right);
XP_Int <= ‘0’,
XP_GOR_Result <= ‘0’;
XP_GOR_Valid <= ‘0’;

process

variable Counter: Integer

begin

WAIT until XP_Clk’event AND XP_Clk = ‘1 ’
Pad_Output (XP_Mem_D, Data);
Pad_Output (XP_Mem_A, Address);
XP_Mem_RD_L < = ’1 ’,'
XP_Mem_WR_L <= ’0 ’;
case State is
When Init =>
Xbar_hold <= Xbar_in;
Address (I downto 0) <= “”;00
Adress (1 7 downto 2) <= itobv (counter; 16);
if (Xbar_in /= “000000000000000000000000000000000000”) then
Counter := Counter + I;
if (XP_HSI = ‘0’) then

State <= One;
else
Counter := 0;
end if;
end if;

Figure 6.3. PE to Crossbar Declaration.

91

when One =>
Address (1 downto 0) <= “00”;
Data (15 downto 0) <= Xbar_hold (15 downto 0);
State <= Two;

when Two =>
Address (I downto 0) <= “01”;
Data (15 downto 0) <= Xbar_hold (31 downto 16);

State <= Three;
When Three =>
Address (1 downto 0) <= “10”;
Data (15 downto 0) <= “ 000000000000” & Xbar_hold (35 downto 32);
State <= Init;

end case;

——connect the crossbar registers
Pad_Input (XP_Xbar; Xbar_In);

--enab1e crossbar output if HSO is low

if (XP_HSO = ‘1 ' then

XP_Xbar_En_L <= “11111 ”;
XP_LED <= ‘1 ’;

else

XP_Xbar_En_L <= “00000”;

XP_LED <= ‘0 ’;

end if;

end process;

end Xbar_to_mem;

Figure 6.4. Crossbar Processing Element XP_XBAR_SLAVE Architecture.

92

Input XNF Design Statistics (refer Note 1)

Number of Logic Symbols: 189
Number of Flip Flops: 167
Number of 3-State Buffers: 0
Number of IO Pads: 87
Number of Hard Macros: 0
Number of Nets: 465
Number of Pins: 1448
Equivalent “Gate Array” Gates: 1344 (refer Note 2)
- From Logic: 1344
- From Random Access Memories: 0

- From Read Only Memories: 0

Partitioned Design Utilization Using 4010 (refer Note 3)

Occupied CLBs (refer Note 4): Used=79 Max:400
Packed CLBs (refer Note 4): Used=45 Max:400
Package Pins (refer Note 5): Used=85 Max:160
FG Function Generators: Used=91 Max:800
H Function Generators: Used=28 Max:400
Flip Flops (refer Note 5): Used=l65 Max=1120
Memory Write Controls: Used=0 Max:400
3-State Buffers: Used=0 Max:960
3-State Buffer Output Lines: Used=0 Max=80
Address Decoders: Used=0 Max=32

Routing Summary

Number of unrouted connections: 0

Number of pips used: 1563

Number of local lines used: 505

Number of double lines used: 289

Number of long lines used: 172

Number of global lines used: 21

Number of decoder lines used: 0

Util=l9%
Util=1 1 %
Util=5 3%
Util=1 1 %
Util=7%
Util= 14%
Util=0%
Util=0%
Util=0%
Util=0%

Figure 6.5. Results for the XP_XBAR_SLAVE Obtained from PPR.

93

6.10.1 PPR Summary Explanatory Notes

This section contains the explanatory notes for the entries and values which are presented
on the previous page (see Figure 6.5).
Note 1 :

The ﬁrst section gives statistics about the design which served as input to

PPR. The data ﬁle used to compute these statistics is PPR’s internal net list

ﬁle which is derived from (and closely related to) the input XNF ﬁle.
There is one case where the internal ﬁle may be very different from the input XNF ﬁle: if
the XNF ﬁle contains references to other XNF ﬁles, PPR automatically merges the lower
level net list into the top level net list. The internal net list ﬁle therefore represents a FLAT

version of the design.

Note 2 :
The value listed for the number of “gate array” gates is an estimate which
is provided as a reference. It is based on the symbols (AND, OR, FDRD,
etc.) in the internal net list ﬁle. The value is computed by adding the equiv-
alent gate counts of each symbol. The symbol gate counts are derived from

gate array data book [1].

Since the number of “gate arrays” is based on the internal net list ﬁle. This includes the
contents of any hard macros which are used in the design. As shown in Figure 6.5, there

are no hard macros used for the implementation.

94

Note 3 :
The second section gives statistics related to the design after it has been
partitioned. The utilization of the LCA by the design is dependent on the

LCA resource of interest.

Note 4 :
The CLB utilization is the most common and general utilization statistic
speciﬁed. The “Occupied CLB” utilization is obtained by counting the
CLBs that are not utilized (empty CLBs). The value includes the
feedthroughs that are inserted into the design by the router as being part of
the design.The “packed CLB” utilization is computed by counting the
number of function generators and ﬂip ﬂops in the design, dividing each by

two, and then taking the maximum of the two results.

The packed CLB value represents the LCA utilization for the case in which the design has
been packed into as small an area as possible. PPR does not pack the design CLBs tightly
but instead spreads them out over all available CLBs. This may however result in the con-
dition in which a CLB may not be fully utilized. Further packing of the CLBs is possible

by manual partition, place and route of subsections of the design.

The packed CLB utilization is a measure of how much more logic can be included in the

design and can be ﬁtted on an LCA.

95

Note 5 :

Logic trimming that occurs before the partitioning process results in the
removal of pads if they connect to a loadless or sourceless net. Flip Flops
are removed if one of the following conditions are valid:
(1) The output pin (Q) has no load connected.
(2) The data pin has no load connected.
(3) The data pin is attached to GND.
(4) The data pin is attached to Vcc.
(5) The clk pin is connected to no source, or it is attached tchc or GND.
(6) The reset pin is attached to Vcc.
(7) The set pin is attached to Vcc.
Flip Flops may also be deleted if they are part of the sequential logic loop which feeds

itself and no other symbols.

Based on the results obtained from PPR only 11% of the CLBs are packed for the imple-
mentation of the XBar algorithm. By including hard macros into the algorithm, it will be
possible to pack more CLBs and also free the total number of pins utilized for the design.
Thus a PE (XC4010), can accommodate additional logic conﬁgurations to perform data

handling operations.

The wireload model results obtained from PPR indicate that the algorithms execute at
speeds up to 12 MHz using soft macros. A hard macro for the building blocks, namely, the

integer counter and ripple-carry adder; would improve the execution speeds signiﬁcantly.

6.10.2 Chip Pinout Description for the XBar Algorithm

This section describes the location of the design’s pins in terms of the package pins. The

96

list is sorted by the Xilinx PPR based on the location of the package pin.

Table 3: Chip Pinout Description.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Rage Pin Name Pamge Pin Name

A 16 XP_XBAR_EN_L_4_PAD F1 XP_MEM_A_5_PAD
A17 XP_XBAR_EN_L_3_PAD F2 XP_MEM_A_6_PAD
B1 XP_XBAR_EN_L_2_PAD F3 XP_MEM_A_7_PAD
132 XP_HSl_PAD F16 XP_XBAR_29_PAD
B 16 XP_HSO_PAD F17 XP_XBAR_28_PAD
B 18 XP_XBAR_EN_L_1_PAD F18 XP_XBAR_27_PAD
C1 XP_MEM_WR_L_PAD Gl XP_MEM_A_8_PAD
c2 XP_MEM_RD_L_PAD 62 XP_MEM_A_9_PAD
c 17 XP_XBAR_EN_L_0_PAD G 17 XP_XBAR_26_PAD
C18 XP_XBAR_35_PAD 618 XP_XBAR_25_PAD
D1 XP_MEM_A_0_PAD H1 XP_MEM_A_10_PAD
D2 XP_MEM_A_1_PAD H2 XP_MEM_A_11_PAD
D17 XP_XBAR_34_PAD H3 XP_MEM_A_12_PAD
D18 XP_XBAR_33_PAD H 16 XP_XBAR_24_PAD
E1 XP_MEM_A_2_PAD H 17 XP_XBAR_23_PAD
132 XP_MEM_A_3_PAD H18 XP_XBAR_22_PAD
E3 ' XP_MEM_A_4_PAD 11 XP_MEM_A_13__PAD
E16 XP_MEM_A_32_PAD 12 XP_MEM__A_14_PAD
1317 XP_MEM_A_31_PAD J3 XP_MEM_A_15_PAD
E18 XP_MEM_A_30_PAD J 16 XP_XBAR_21_PAD

 

 

 

 

97

Table 3: (continued)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Pacpliﬁge Pin Name Pagiige Pin Name

J 17 XP_XB AR_20_PAD N 18 XP_XBAR

J 18 XP_XBAR_19_PAD P l XP_MEM_D_8_PAD
Kl XP_MEM_A_16_PAD P2 XP_MEM_D_9_PAD
K2 XP_MEM_A_I 7_PAD P3 XP_MEM_D_10_PAD
K3 XP_LED_PAD P16 XP_XBAR_7_PAD
K16 XP_XBAR_18__PAD P17 XP_XBAR_6_PAD
K17 XP_XBAR_17_PAD P18 XP_XBAR_5_PAD
K18 XP_XBAR_16_PAD R l XP_MEM_D_] l__PAD
Ll XP_MEM_D_0_PAD R2 XP_MEM_D_12__PAD
L2 XP_MEM_D_1_PAD R17 XP_XBAR_4_PAD
L3 XP_MEM_D_2_PAD R18 XP_XBAR_3_PAD
L16 XP_XBAR_15_PAD Tl XP_MEM_D_13_PAD
L17 XP_XBAR_14_PAD T2 XP_MEM_D_14_PAD
L18 . XP_)GBAR_13_PAD T3 XP_MEM_D_15_PAD
Ml XP_MEM_D_3_PAD T4 XP_MEM_DIS_PAD
M2 XP_MEM_D_4_PAD T16 XP_GOR_VALID_PAD
M17 XP_XBAR_12_PAD T17 XP_XBAR_2_PAD
M18 XP_XB AR_l I_PAD T18 XP_XBAR_1_PAD

N1 XP_MEM_D_5_PAD U1 XP_CLK_PAD

N2 XP_MEM_D_6_PAD U 16 XP_GOR_RES_PAD
N3 XP_MEM_D_7_PAD U18 XP_XBAR_0_PAD
N16 XP_XBAR_10_PAD V2 XP_INT_PAD

 

N17

 

XP_XB AR_9_PAD

 

 

 

98

6.11 Additional Results of Simulations

The following are basic algorithms for data handling, namely:

° Selecting the crossbar conﬁguration based upon the SIMD tag as shown in Figure 6.7.

° Feeding bits of data from the left port of Splash 2 to the Crossbar Port (left_to_Xbar).
Data are latched into the left port on the ﬁrst clock cycle. In the next clock cycle the data

are latched into the crossbar port. The architecture declaration is shown in Figure 6.6.

These algorithms are an essential part of the suggested approach discussed in Chapter 5
since crossbar conﬁgurations and the data paths are changed based on the data obtained

from the PBS for computation of intermediate steps (see Figures 5.1, 5.6 (a) and 5.6(b)).

architecture Left_to_XBar of Processing_Element is
signal PassThru: Bit_ Vector (Datapath_width -I downto 0);
signal Leﬁ: Bit_ Vector (Datapath_ width -1 downto 0);
signal Right: Bit_ lector (Datapath_ width -1 downto 0);

signal Madr, Nadr - .Bit_ Vector (MAR_ RANGE -I downto 0);
signal Mdata_out : Bit_ Vector (MEM_WIDTH— -1 downto 0);
signal Mdata_in : Bit_ Vector (MEM_WID7H -I downto 0);
signal erite : Bit;

signal Mem_Data : Bit;

begin

Mem_Access: process
begin
WAIT until XP_Clk ’Event and XP_Clk = ‘1
Pad_Output (XP_Mem_A, Madr);
Pad_Input (XP_Mem_D, Mdata_out, Mdata_in, Mem_Data);
Pad_Output (XP_Mem_WR_L, erite);
erite < = ‘1
Mem_Data < = ‘0’;
if (leﬁ (35) = ‘1'):hen
erite < = ‘0’;
Mem_Data < = ‘1 ’;
Madr < = Nadr;
Nadr <= Nadr + I;
Mdata_out < = Leﬁ (15 downto 0);
end if
end process;
process
begin
WAIT until XP_CLK ’Event and XP_Clk =
Pad_Output (XP_XBar; Leﬁ);
Pad_Input (XP_Left. Leﬂ);

end process;

XP_Right < = Tristate (XP_Right);
XP_Mem_RD_L < = ‘1 ’;

XP_HSO < = ‘2 ’,°

XP_HSI < = ‘Z
XP_GOR_ResuIt < = ‘Z ’;

XP_Int < = ‘0 ’;
XP_ﬁar_EN_L < = “1111 ”;

end Leﬁ_to_XBar;

Figure 6.6. Left_to_XBar Architecture Declaration.

100

type States is (Init, One, TWO, Three);

signal State
signal Data
signal Address
signal xbar_hold
signal Xbar_in

begin

XP_Leﬂ
XP_Right
XP_Jnt
XP_GOR_ResuIt
XP_GOR_ lblid

process

: States := Init;

: Bit_ vector (15 downto 0);
: Bit_ Wetor (15 downto 0);
: Bit_ lecmr (35 downto 0);
: Bit_ “ram (35 downto 0);

< = Tristate (XP_Leﬁ);
< = Innate (XP_Right);

<= ‘0’;
<= (0”,
<= «0:;

variable Counter : Integer

begin

case State is

when Init = >

WAIT until XP_Clk ’event AND XP_Clk = ‘1 ’
Pad_Output (XP_Mem_D, Data);
Pad_Output (XP_Mem_A, Address);
XP_Mem_RD_L < = ’1
XP_Mem_WR_L <= ’0’;

Xbar_hold < = main;
Address (I downto 0) < = “00 ”,'
Adress (I 7 downto 2) < = itobv (counter; 16);

If (Xbar_in /= “W ”) THEN
Counter := Counter + I;
if (XP_HSI = ‘0’) then

state < = One;
else
Counter := 0;
end if
end if-

Figure 6.7. Processing Element to Crossbar based on SIMD Tag Declaration.

when One =>
Address (1 downto 0)
Data (15 downto 0)
State

when Two =>
Address (I downto 0)
Data (15 downto 0)
State

when Three =>

Address (1 downto 0) < =
Data (15 downto 0) < =
State < =
end Case;
X0_XB_Data <=
X0_Mem_A <=
X0_Mem_D <=
X0_Mem_RD_L <=
X0_Mem_WR_L <=
X0_GOR_Result_In <=
X0_GOR_Valid_In <=
X0_GOR_Result <=
X0_GOR_Valid <=
X0_Int <=

X0_Broadcast_0ut

end X 0_XBAR_C ONT R OL;

101

< = “00 n;
<= xbar_hold (15 downto 0);
<= Two;

< = “01 n;
< = xbar_hold (31 downto 16);
<= Three;

u 10 n;
“ 0000000000
Init;

” & xbar_hold (35 downto 32 );

Tristate (X0_SIMD);
Tristate (X0_XB_Data);
Tristate (X0_Mem_A);

«I r;

:1 r;
“ZZZZZZZZZZZZZZZZ ”;
“ZZZZZZZZZZZZZZZZ ”;
‘0 r;

:0 9,,

(0 r;

40 9,,

Figure 6.7. (continued)

102

6.12 Summary

This chapter provides a detailed explanation as to how an application can be mapped onto
the reconﬁgurable architecture, Splash 2. In addition results of basic communication and

computation algorithms are provided.

Chapter 7

Conclusions and Future Research

Every man with an idea has at least two or three followers

Brooks Atkinson

Bar code identiﬁcation has been an active research topic in computer vision for almost two
decades and has been successfully implemented on application speciﬁc hardware. How—
ever, identiﬁcation of the bar codes ﬁom the image that includes graphics and text can be
time consuming and complex task. The multi-channel ﬁltering technique applied to identi-
ﬁcation of bar code images is a more general approach in comparison to the other
approaches and is applicable to a diverse class of images. This approach is general owing
to the fact that the technique relies on frequency (size) and the orientation, two basic

attributes of any image.

In this thesis, the major contributions are:

1. A detailed study of the architecture of a custom computing machine Splash 2.

2. A detailed study of the programming environment of Splash 2 and the steps involved in
programming an application on Splash 2.

3. A suggested approach to mapping the multi-channel ﬁltering technique for bar code
identiﬁcation onto Splash 2 PBS.

4. A detailed investigation of some of the fundamental communication and computation
algorithms that are essential for implementation of the suggested approach, including

performance results and partitioning results obtained on Splash 2 architecture.

103

104

One limitation of the suggested approach is the inability to perform all the steps involved
in multi—channel ﬁltering on Splash 2. Data from the feature images can be classiﬁed more
effeciently (at a faster rate) when intermediate results are computed on the host Sparcsta-

tion.

The multi-channel ﬁltering technique can be applied to textured images only [15]. Uni-
form (almost) gray level regions cannot be discriminated using this technique. Images, as
they occur in the real world, contain both textured regions as well as regions with nearly

uniform gray levels.

In Chapter 5, an approach to implementing bar code identiﬁcation is recommended. This
approach does not suggest routing intermediate data, between steps, to the host Sparcsta-
tion for computation. Computation of intermediate data on the host is suggested to obtain

a comparative study in terms of execution speed.

Veriﬁcation of the correctness of the Splash 2 implementation can be performed by com-
paring it with a sequential implementation on a Sparcstation. Timing results will be gener-

ated as part of completing the suggested implementation.

Some bar code image boundaries are enhanced to a greater extent in comparison to others
by the process of adding magnitude responses. It is suggested that an adaptive threshold-
ing technique be investigated instead of the current universal or global thresholding tech-

nique applied during multi-channel ﬁltering.

105

The implementation of the dense ﬁeld programmable gate arrays, replacing the the exist-
ing Xilinx XC4010 series, would enhance Splash 2 performance. In addition, hardware
components like rapid programmable interconnects should be investigated and incorpo-
rated into the next version of Splash 2 to improve both the connectivity among PBS and

reduce the capacitative delays that affect the rate of data transfer from and to the PEs.

106

Appendix

Xilinx’s FPGAS, also known as Logic Cell Arrays (LCAs) [39], are high-density user-pro-
grammable gate arrays. The major conﬁgurable elements of the FPGAS are: conﬁgurable
logic blocks (CLBs), input/output blocks (IOBs), and interconnections.The CLBs provide
the functional elements to construct the logic based on the user’s application.The IOBs
provide the interface between the package pins and the internal signal lines. The program-
mable interconnects provide a routing path to connect the inputs and outputs of the CLBs
and IOBs into desired conﬁgurations. Internal static memory cells are programmed based

on the application. These cells determine the logic functions and interconnections imple-

mented on the LCAs.

Conﬁgurable Logic Blocks

The conﬁgurable logic block of the Xilinx 4000 family contains three combinatorial func-
tion generators, two ﬂip-ﬂops, and their associated control logic. The powerful and ﬂexi-

ble CLBs are surrounded by a dense set of routing resources as shown in Figure A.1.

The function generators and ﬂip-ﬂops are accessed by 13 CLB inputs and 4 CLB outputs.
These inputs and outputs can also connect to programmable interconnect resources out-
side the CLB. Four independent inputs are provided to each of the two function generators
(Fl-F4 and Gl-G4). The function generators F and G, are capable of implementing any

Boolean logic function with 4 inputs. The function generators are implemented as look-up

107

tables, allowing for the generation of any arbitrary function. The propagation delay
through the function generator is independent of the function being implemented. A third
function generator, labeled H’ as shown in Figure A.l, can implement any Boolean func-
tion with three inputs: F’ and G’ and a third input (H1) from outside the block. Signals
from the function generators can exit the CLB at two outputs: F or G. The output F is
either F’ or H’ and the output G is either G’ or H’. Thus the CLB can be used to imple-
ment either two independent functions of upto 4 variables, any single function of 5 vari-
ables, or some single functions upto 9 variables. Thus, implementing a large number of
functions in a single block, reduces the total block utilization and the length of the signal

path. These parameters contribute towards increased logic density and speed.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

T i T

Dill 80! EC ]
.._ 1 L T
.._ —l~ J_.
0t-1 ‘ >

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

up?) >

 

 

 

 

 

 

 

 

 

 

Figure A. 1. Simpliﬁed Block Diagram of XC 4000 Conﬁgurable Logic Block.

108

The two storage elements in the CLB are edge-triggered D-type ﬂip-ﬂops with common
clock (K) and clock enable (EC) inputs. A third common input (SIR) can be programmed
as either an asynchronous set or reset signal independently for each of the two registers.
The S/R input can also be disabled for either ﬂip-ﬂop. A separate global set/reset line (not
shown in Figure A.1), sets or clears each register during power-up, reconﬁguration, or
when a dedicated RESET net is driven active. (The RESET net can be connected to any
package pin, providing a global reset input). Either polarity of the clock signal can be
selected, allowing each ﬂip-ﬂop to be rising —edge or falling -edge triggered. The source of
a ﬂip-ﬂop’s data input also is programmable: the functions F’, G’, and H’, and the Direct
In (DIN) block input are the options. The ﬂip-ﬂops’ outputs can exit the block via the Q1

and Q2 CLB outputs.

In addition, each CLB contains dedicated arithmetic logic for the fast generation of carry
and borrow signals, greatly increasing the efﬁciency and performance of applications such
as comparators and ALUs. This logic is used in conjunction with the F’ and/or G’ function
generators to implement high-performance arithmetic functions (see Figure A2). The
CLB’s two carry-in inputs (Cin) and a carry-out (Cout) are propagated on dedicated rout-
ing lines that connect neighboring CLBs. The carry-out of one CLB connects to the carry-
in of the adjacent CLBs above and below it in that column. At the top and bottom rows of
the chip, the carry-out signal propagates to the neighboring CLB to the right. Thus, chains
signals can wind up and down columns of CLBs in a serpentine manner to efﬁciently

implement very wide adders or similar functions.

 

109

Multiplexers in the CLB allow its four control inputs, labeled C1 through C4 in Figure
A.l, to be mapped one-to-one into the CLBs four internal control signals (H1, DIN, S/R,

and EC) in any arbitrary manner.

The ﬂexibility and symmetry of the CLB architecture facilitates the placement and routing
of a given application. Since the function generators and registers have independent inputs
and outputs, each can be treated as a separate entity during placement to achieve high
packing densities. Inputs, outputs, and the functional themselves can freely swap positions

within a CLB during placement and routing operations.

CLBs and On-Chip Memory

In addition to providing unprecedented speed and performance, the XC4000 family LCAs
are the ﬁrst programmable logic devices to include on-chip static memory resources, fur-

ther increasing system integration levels.

An optional mode for each CLB allows the memory look-up tables in the F’ and G’ func-
tion generators to be used as either a 16 x 2 or 32 x 1 bit array of read/write memory cells
( see Figure A.3). The F1-F4 and Gl-G4 inputs to the function generators act as address
lines in this conﬁguration, selecting a particular memory cell in the function generator’s
look-up table. The functionality of the CLBs control signals change in this conﬁguration;
the H1, DIN, and SIR lines become the D1, D0 and Write Enable (WE) inputs, respec-

tively. When the 32 x 1 conﬁguration is selected, Fl-F4, G1-G4, and D1 act as address bits

 

110

and D0 is the data input during writes. In the 16 x 2 orientation, F1-F4 and G1-G4 are the
address bits, and D0 and D1 are data inputs during write operations. The contents of the
memory cell(s) being addressed are available at the F’ and G’ function generator outputs,

and can exit the CLB through its F and G outputs.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

c1 (2 a
l l l l
[we or no a: l
l L. I
-_D
.._. .. or". a
03—1 0'
mm m ' ,
a.
Let
I
u I! can
:: RIG?!" 1— m..— __w -I

 

 

 

Figure A.3. CLB’s Function Generators being used as Read/Write Memory Cells.

111

Conﬁguring the CLB’s function generators as read/write memory does not affect the func-
tionality of the other portions of the CLB, with the exception of redeﬁnition of the control
signals. The H’ function generator can be used to implement Boolean functions of F’, G’,

and D1, and the D ﬂip-ﬂops can latch the F’, G’, H’, or D0 signals.

Groups of CLBs can be used together to form a memory array of the desired length and
width. This capability is useful for a large range of applications, including FIFO buffers

and large state machines.

Input/Output Block
User-conﬁgurable I/O Blocks (IOBs) provide the interface between external package pins
and the internal logic (see Figure A.4). Each IOB controls one package pin and can be

deﬁned for input, output, or bidirectional signals.

Two paths, labeled 11 and 12, bring input signals into the array. Inputs are routed to an
input register that can be programmed as either an edge-triggered ﬂip-ﬂop or a level-sensi-
tive transparent latch. Optionally, the data input to the register can be delayed to compen-
sate for the delay on a clock signal that ﬁrst passes through a global buffer before arriving
at the IOB, thereby eliminating any hold time requirement on the data at the external pin.
The 11 and 12 signals that exit the block can each carry either the direct or registered input

signal.

Output signals can pass directly to the pin or be stored in an edge-triggered ﬂip-ﬂop.

112

 

 

 

 

 

 

 

 

 

in

137
t
lt_.§§
_ "ulcer"

t
ﬂ-
3:
1f

 

 

 

 

 

 

 

 

 

 

 

--------------——-———---——

Figure A.4. Block Diagram of the XC 4000 Input/Output Block.

 

Figure A.5. Typical CLB Connections to Adjacent Single-length Lines.

113

Optionally, an output enable signal can be used to place the output buffer in a high-imped-
ance state, allowing three-state outputs or bi-directional pins. The output (OUT) and out-
put enable (OE) signals can be inverted, and the slew rate of the output buffer can be
controlled to minimize power bus transients when switching non-critical signals. The out-
put buffers are capable of sinking upto 12 mA. Two adjacent output buffers can be

“wire_ANDed” externally to sink upto 24 mA of current.

There are a number of programmable options in the IOB. Programmable pull-up and pull-
down resistors are useful for tying unused pins to Vcc or ground to prevent unnecessary
power consumption. Separate clock signals are provided for the input and output registers.
These clocks can be inverted, allowing either falling-edge or rising-edge triggered ﬂip-
ﬂops. As with the CLB’s registers, a global set/reset signal can be used to set or clear the

input and output registers whenever the RESET net is active.

Embedded logic attached to the IOBs contains test structures compatible with IEEE Stan-
dard 1149.1 for Boundary scan testing, permitting easy chip and board-level testing of

LCA-based applications.

Programmable Interconnect

All internal connections are composed of metal segments with programmable switching
points provided to implement the desired routing. Routing is scaled to the size of the array,
i.e., the number of routing channels varies among the devices in the family, increasing

with the array size.

 

114

In previous generations of LCAs, the logic inputs were located on the top, left, and bottom
of the block, and outputs exited the block on the right, favoring left-to-right data ﬂow
through the device. For the third generation, XC 4000 family, the CLBs inputs and outputs
are distributed on all four sides of the block, providing additional routing ﬂexibility (see
Figure A.5). In general, the entire architecture is more symmetric and regular than the pre-
vious generations, and is more suited to well-established placement and routing algo-

rithms developed for conventional mask-programmed gate array design.

There are three main types of interconnects. characterized by the relative length of their
segments: single-length lines, double-length lines, and long lines. The routing scheme var-
ies with the size of the array and is designed to minimize the resistance and capacitance of

the average routing path, resulting in signiﬁcant performance improvements.

The single-length lines are a grid of horizontal and vertical lines that intersect at a “Switch
Matrix” between each block. Figure A.5 illustrates the single-length interconnect lines
surrounding one CLB in the array. Each Switch Matrix consists of programmable n-chan-
nel pass transistors used to establish the desired connections between the single-length
lines; for example, a signal entering on the right side of the Switch Matrix can be routed to
a single-length line on the top, left, or bottom sides, or, any combination thereof if multiple
branches are required (see Figure A.6). Single-length lines are normally used to conduct
signals within a localized area and to provide the branching for nets with fanout greater

than one.

115

 

Figure A.6. Single-length Line Switch Matrix Connections.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Jr. ﬁt .r. _
L‘ 3‘ qua}
CLB CLB
njxr 9%

if: ..r L Lt-
{J 3 ft;
>4 3 rJ—x
x

 

 

 

 

 

 

 

 

 

 

 

 

 

‘v-l L—
131 T13 u I?

. it:
Klmij

 

 

Figure A.7. Double-length lines in CLB Array

116

The number of possible connections through the switch matrix has been streamlined com-
pared to the previous generations of the LCA architectures, in order to reduce capacitative
loading to minimize routing delays and increase performance. However, a much richer set
of connections between the single-length lines and the CLB’s inputs and outputs more
than compensate for this reduction in switch matrix options, resulting in overall increased
“routability”. The function generators and control inputs to the CLB (Fl-F4, Gl-G4, and
C1-C4) can be driven from any adjacent single-length line segment (Figure A5). The
CLB’s clock (K) input can be driven from one-half of the adjacent single-length lines.
Each CLB output can drive several of the single-length lines, with connections to both the

horizontal and vertical planes.

As implied by their name, the double-length lines consist of a grid of metal segments
twice as long as the single-length lines. As shown in Figure A.7, a double-length line runs
past two CLBs before entering a switch matrix. These lines are grouped in pairs, with their
switch matrices staggered so that each member of the pair has a switch matrix nearby
alternating CLBs in that row or column of the array. As with single-length lines, all the
CLB inputs except K can be driven from any adjacent double-length line, and each CLB
output can drive nearby double-length lines in both the vertical and horizontal planes.
Double-length lines provide the most-effective implementation of intermediate length,

point-to-point interconnections.

Long lines are a grid of metal interconnect segments that run the entire length or width of

the array as shown in Figure A.8. Additional long lines in the vertical plane can be driven

117

only by special global buffers designed to distribute clocks and other high fanout control
signals throughout the array with minimal skew. Long lines are intended for high fan-out,
time-critical signal nets. Each vertical long line has a programmable “splitter switch” at its
center, allowing the long line to be used as two independent routing channels that each run
half the height of the array. CLB inputs can be driven from a subset of the adjacent long
lines; CLB outputs are routed to the long lines via three-state buffers or the single-length

interconnect lines.

Programmable interconnect points allow signals to pass between the long lines and some
single-length lines at their intersections. However, there are no routing corrections

between the double-length lines and the other types of interconnect.

Three-state Buffers

Associated with each CLB in the array are a pair of three-state buffers that can be used to
drive signals onto the nearest horizontal long lines above or below the block. The three-
state buffer’s input can be driven from nearby single-length lines; the buffer’s enable can
come from nearby vertical single-length lines or long lines. A three-state buffer with simi-
lar connectivity is located near each I/O block along the right and left edges of the array.
These buffers can be used to implement multiplexed or bi-directional busses on the hori-
zontal long lines. Programmable pull-up resistors attached to these long lines facilitate

their use for wide wired-AND logic functions

Special long lines running along the perimeter of the array can be used to “wire-AND”

118

signals input through nearby 1085, for applications such as the fast decoding of wide

address busses entering the array.

 

Figure A.8. Long Line Routing Resources with Typical CLB Connections.

[1]
[21
[3]
[41
[5]

[6]
[7]

[8]

[9]

[10]

[11]

[12]

119

Bibliography

The Programmable Gate Array Data Book, Xilinx Inc., 1992

XAC T Development System Reference Guide, Xilinx Inc., January 1993.

Xilinx Synopsys Interface User Guide, Xilinx Inc., July 1993.

Xilinx Synopsys Interface, Preliminary notes, Xilinx Inc., December 1993.

Freeman, Ross, “User Programmable Gate Arrays”, IEEE Spectrum, pp. 32-35,
December 1988.

RPM Emulation System, Quickturn Systems, Mountain View, CA 1990.

Walters, S., “Prototyping ASICs in reprogrammable hardware meets system require-
ments”, Computer Design, pp. 76, May 1990.

Gokhale, M., Holmes, W., Kopser, A., Lucas, 8., and Minnich, R., Sweely,D.,‘ and
Lopresti, D., Building and Using a Highly Programmable Logic Array, IEEE Com-
puter, 24, pp. 81 - 89, January 1991.

Arnold, J. M., Buell, D. A., and Davis, E. G., Splash 2, Proc. 4th Annual ACM Symp.
on Parallel Algorithms and Architectures, pp. 316 -322, 1992

Arnold, J. M., The Splash 2 Programming Environment, Proc. of IEEE Workshop on
FPGAs as Custom Computing Machines, April 1993.

Arnold, J. M., and McGarry, Margaret A., Splash 2 Programmer’s Manual, SRC
Technical Report SRC-TR-93-107, September 1993.

Gokhale, M., Kopser, A., Lucas, 8., and Minnich, R., The Logic Descriptor Genera-

tor, SRC Technical Report SRC-TR-90-Oll, 1990.

[13]

[14]

[151

[16]

[17]

[18]

[19]

[20]

[211

[22]

120

Erdogan, S. S., VHDL Modeling and Simulation of the Back-Propagation Algorithm
and its mapping to the RM, IEEE 1993 Custom Integrated Circuits Conference,
1993.

Pavlidis, Theo, Swartz, J ., et. al., “Fundamentals of Bar Code Information Theory”,
IEEE Compter, pp. 74-86, 1990.

Farrokhnia, F., Multi-channel Filtering Techniques For Texture Segmentation and
Surface Quality Inspection, Ph. D. Thesis, Dept.of Electrical Engg., Michigan State
University, 1990.

Wong, K. Y., Casey, R. G., Wahl, F. M., Document analysis system, IBM J.
Research and Development, 6:642 - 656, November 1982.

Jain, A. K., Bhattacharjee, S. K., and Chen, Yao, Text Segmentation Using Gabor
Filters for Automatic Document Processing, Machine Vision and Applications,
5:169 - 184, 1992.

Jain, A. K., Chen, Yao, Bar Code Localization using texture analysis, Proc. 2nd Int.
Conf on Document Analysis and Recognition, pages 41 - 44, Ibaraki, Japan, October
1993.

Gabor, D., Theory of communication, J. Inst. Elect. Engr, 93:429 - 457, 1946.
Pryor, D., Thistle, M., and Shirazi, N., Text Searching on Splash 2, Proc. of IEEE
Workshop on FPGAs as Custom Computing Machines, April 1993.

Gupta, R. K., and Micheli, Giovanni De, Hardware - Software Cosynthesis for Digi-
tal System, IEEE Design and Test of Computers, pp. 29 - 41, September 1993.
Berthet, C., Rampon, J., and Sponga, L., Synthesis of VHDL Arrays on RAM Cells,

IEEE Transactions, pp. 726 - 731.

121

[23] Lis, Joseph S., Gajki, Daniel D., Synthesis from VHDL, IEEE Transactions, pp. 378 -
381, November.

[24] Wang, Kuochen, and Kuo, Sy-Yen, Computer - Aided Modeling and Evaluation of
Reconﬁgurable VLSI Processor Arrays with VHDL, IEEE Transactions on Com-
puter-Aided Design, no. 2, pp. 185- 197, February 1992.

[25] Nagaswamy, Vijay, Berry, Neerav, and Dangelo, Carlos, Speciﬁcation, Planning, and
Synthesis in a VHDL Design Environment, IEEE Design and Test of Computers, pp.
58 - 68, June 1992.

[26] Brown, Stephen, Dean, Routing Algorithms and Architectures for F held-Programma-
ble Gate Arrays, Ph.D. dissertation, University Of Toronto, Canada, January 1992.

[27] Brown, Stephen, Rose, J, Vranesic, 2., G., A Detailed Router for Field-Programmable
Gate Arrays”, Proc. IEEE International Conference on Computer Aided Design, pp.
382-385, November 1990.

[28] Francis, R., J., Rose, J ., and Vranesic, Z., “Chortle: A Technology Mapping Program
for Lookup Table-Based Field-Programmable Gate Arrays”, Proc. 27th Design
Automation Conference, pp. 613-619, June 1990.

[29] Rose, J., 8., Francis, R., J., Chow, P., and Lewis, D., “The Effect of Logic Block
Complexity on Area of Programmable Gate Arrays”, Proc. 1989 Custom Integrated
Circuits Conference, pp. 5.3.1-5.3.5, May 1989.

[30] Rose, J ., 8., Francis, R., J., and Chow, P., “Architecture of Programmable Gate
Arrays: The Effect of Logic Block Functionality on Area Efﬁciency”, IEEE Journal
of Solid State Circuits, Vol. 25, No. 5, pp. 1217-1225, October 1990.

[31] Singh, 8., Rose, J ., Lewis, D., Chung, K., and Chow, P., “Optimization of Field Pro-

[32]

[33]

[34]

[35]

[36]

[37]

[38]

122

grammable Gate Array Logic Block Architecture for Speed”, Custom Integrated
Circuits Conference, 1991.

Perry, Douglas, VHDL, McGraw Hill, Inc., 1991.

Harr, E. Harr, Alec G. Stanculescu, Applications of VHDL to Circuit Design, Kluwer
Academic Publishers, 1991.

Mazor, Stanley, and Patricia Langstraat, VHDL, Synopsys, Inc., Kluwer Academic
Publishers, 1992.

Navabi, Zainalabedin, VHDL - Analysis & Modeling of Digital Systems, McGraw
Hill, Inc., 1993.

J ayaraman, Bhasker, A VHDL Primer, Prentice Hall Series in Innovative Technol-
ogy,1992.

Kung, S. Y., Digital Neural Networks, Prentice Hall, 1993.

Dhameja, S. and Rover, D. T., Mapping Compute- and Data-Intensive Algorithms
onto a FPGA Based Custom Computing Machine, Canadian Microelectronics Cor-

poration Conference on Field Programmable Devices, June 1994.

HICHIGQN STATE U

1 [11111111] 1111111

rrrrr

II
3014