“FY”?
u . E

‘ NM.“ 3
ﬁlrihwm

 

 

 

 

 

l

VQ\ v.)

“sign“- u

l. .52?

i1... .3
{1.1.

, 635.»: ix.

.. n if?!
€10...“
.ILO hwmkaUqutl-

.\\Q1 1... .

,f. 7 ply... .0 .
3x. Jﬁiun
«but! A! 59.x...
1‘; MINUS: 1.5!}-.3 .1 1 |||'u|

1. ... ’27::
) . .

 

ﬁasélgﬁgg §2§wﬁtaaﬁﬁx .. . _ , ‘ ‘ ..

"HESiS

LIBRARIES

m"'“llllllllllllllll Ill/Ill

 

 

 

 

 

 

 

ll! ,1

3 1293 01572 5660

 

 

This is to certify that the

thesis entitled

SHAPE-BASED IMAGE RETRIEVAL

presented by

Aditya Vailaya

has been accepted towards fulﬁllment
of the requirements for

M.S. degree in Computer Science

 

V

Major professor

R
Date W 13( ‘7? b

 

0-7639 MS U is an Afﬁrmative Action/Equal Opportunity Institution

 

LIBRARY
Mlchlgan Stem
University

 

 

 

PLACE II RETURN BOXto remove thll checkout from your record.
TO AVOID FINES return on or bdoro dot. duo.

DATE DUE DATE DUE DATE DUE

 

 

 

 

l

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MSU Is An Afﬁrmative Adlai/Emu Opponunlly Institulon
Wan-9.1

SHAPE-BASED IMAGE RETRIEVAL
By

Aditya Vailaya

A THESIS

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

MASTER OF SCIENCE

Computer Science

1996

ABSTRACT
SHAPE-BASED IMAGE RETRIEVAL
By

Aditya Vailaya

Retrieval efﬁciency and accuracy are two important issues in designing a content-
based database retrieval system. We propose a new image database retrieval method
based on shape information. This system achieves both the desired efﬁciency and
accuracy using a two—stage hierarchy: in the ﬁrst stage, simple and easily computable
statistical shape features are used to quickly browse through the database to gen-
erate a moderate number of plausible retrievals; in the second stage, the outputs
from the ﬁrst stage are screened using a deformable template matching process to
discard spurious matches. We have tested the algorithm using hand drawn queries
on a trademark database containing 1,100 images. Each retrieval takes a reasonable
amount of computation time (~ 45 seconds). The top most retrieved image by the
system agrees with that obtained by human subjects, but there are signiﬁcant dif-
ferences between the top 10 retrieved images by our system and that provided by
human subjects. This demonstrates the need for developing shape features that are

better able to capture human perceptual similarity of shapes. An improved heuristic

has been suggested for more accurate retrievals. The system matches filled-in query
images against ﬁlled—in images from the database thus using only the gross details in
the image. Experiments have shown that matching on the ﬁlled-in database extracts
more images that have similar content. The proposed scheme has also been tested on
a database of master-marks of Norwegian silversmiths containing 171 images. The
experiments show the effectiveness of the integration of shape features for matching
faded, oriented, and scaled master-marks extracted from the bottom of the silver
tankards. We believe that it is a challenge to develop an automatic retrieval algo-
rithm which models the human perception well. However, considering the time and
effort for humans to perform such a retrieval task, an automatic retrieval algorithm

can provide a powerful tool to facilitate the retrieval task for human beings.

To My Family

iv

ACKNOWLEDGMENTS

I would like to acknowledge all the people who have assisted me throughout my studies
at Michigan State University. I am extremely grateful to my advisor, Professor Anil
Jain, for his continuous guidance and encouragement, and the valuable time that he
spent on this project with me. I owe my great interest in research to him.

I am extremely grateful to Dr. Ming—Yee Chiu and the technical staff at Siemens
Corporate Research, Princeton for their guidance and encouragement during my stay
there in 1995 summer and the use of their database for this research project.

I would also like to acknowledge Dr. Britt Kroepelien for providing a strong
basis for the project on authentication of Norwegian silver tankards. It was with her
patience and determination that we were able to start this project in such a short
time. I would like to specially acknowledge Chitra Dorai for being my mentor in the
initial period of my stay here and for patiently assisting me to adjust to the new
environments.

I would like to dedicate this thesis to my family who have constantly encouraged
me to achieve new heights. Without their love, patience, understanding, and support,

I would not have ﬁnished this thesis.

I would like to thank Yu Zhong for her assistance in using deformable template
matching. I would also like to acknowledge the help and support I have received from

members of the PRIP lab and from my friends at Michigan State University.

vi

TABLE OF CONTENTS

LIST OF TABLES ix
LIST OF FIGURES X
1 Introduction 1
1.1 Image Database ................................ 3
1.2 Information on Trademarks ......................... 6
1.2.1 Registration of Trademarks ........................ 7
1.2.2 Search for Conﬂicting Marks ........................ 7
1.3 Problem Deﬁnition .............................. 14
1.4 Motivation ................................... 16
1.4.1 Database .................................. 16
1.4.2 Similarity .................................. 17
1.5 Thesis Outline ................................ 18
2 Literature Review 22
2.1 Methods using a Single Cue ......................... 23
2.1.1 Color .................................... 23
2.1.2 Shape .................................... 24
2.1.3 Texture ................................... 26
2.2 Using Multiple Cues ............................. 27
2.2.1 QBIC .................................... 28
2.2.2 Photobook ................................. 30
2.2.3 STAR .................................... 32
2.2.4 Incorporating Learning ........................... 33
2.3 Discussion ................................... 34
3 An Integrated System for Efficient Retrieval 36
3.1 Image Attributes ............................... 37
3.2 Fast Pruning Stage .............................. 38
3.2.1 Edge Directions ............................... 38
3.2.2 Invariant Moments ............................. 42
3.2.3 Other Shape Features ........................... 43
3.2.4 Integration of Image Attributes ...................... 44
3.3 Deformable Template Matching ....................... 47
3.3.1 Representation of the Prototype Template ................ 48
3.3.2 ' Deformation Transformation ........................ 48
3.3.3 Dissimilarity Measure ........................... 49

vii

3.4 Summary ................................... 51

4 Experimental Results 52
4.1 Results on Trademark Images ........................ 53
4.1.1 Experiments on Accuracy ......................... 53
4.1.2 Experiments on the Proposed System ................... 57
4.1.3 Retrieval by Humans ............................ 65
4.1.4 Retrieval using Additional Heuristics ................... 67
4.2 A Case Study in Norwegian Silver

Authentication .............................. 91
4.2.1 Background ................................. 93
4.2.2 Database Search for Master—marks and City-marks ........... 95
4.2.3 Preprocessing ................................ 97
4.2.4 Experimental Results ............................ 98
4.3 Discussion ................................... 99
4.3.1 Trademark Images ............................. 100
4.3.2 Norwegian Silver .............................. 101
5 Conclusions 109
5.1 Summary ................................... 109
5.2 Future Research ................................ 111
APPENDICES 112
A A Representative Subset of Trademark Images 112

viii

4.1

4.2

4.3

4.4

4.5

LIST or TABLES

Edge directions-based retrieval results for the 1,100 database images. . .
Invariant moments-based retrieval results for the 1,100 database images.
Integrated shape—based retrieval results for the 1,100 database images;
71 refers to the position of the correct retrieval, R: rotated query, .3:
scaled query, N: query with a noisy image; The last column indicates
percentage of time the query image was not retrieved in the top 20
matches ...................................
Dissimilarity values for the ﬁve query images when the deformable tem-
plate matching is applied to the top 10 retrieved images from the fast
pruning stage. ..............................
Recall and precision rates for three queries (shown in ﬁgures 4.14, 4.16,
and 4.18) based on the image ﬁlling; n1 refers to the number of images
retrieved in the top 20 positions that are similar to the query, n2 refers
to the number of images in the database that are similar to the query.
Recall rate is (111/722), precision rate is (721/20). ............

ix

55
55

55

1.1
1.2
1.3

1.4

1.5

1.6

1.7

1.8
1.9
1.10
1.11
1.12
1.13

3.1

3.2

3.3

4.1
4.2
4.3

4.4

LIST or FIGURES

Model of a multimedia system. .......................
Traditional image retrieval model .......................
Trademarks considered for opposition by the USPTO; registration was
accepted by the PTO. The trademarks have been enlarged to Show the
details ....................................
Trademarks chosen for opposition by the USPTO; registration was refused
by the PTO. The trademarks have been enlarged to show the details.
Trademarks chosen for opposition by the USPTO; registration was ac-
cepted by the PTO. The trademarks have been enlarged to Show the
details ....................................
Trademarks chosen for opposition by the USPTO; registration was refused
by the PTO. The trademarks have been enlarged to Show the details.
Trademarks chosen for opposition by the USPTO; registration was refused
by the PTO. The trademarks have been enlarged to Show the details.
Example of a trademark with multiple design codes .............
Proposed image retrieval model ........................
Two perceptually similar images of a bull head. ..............
Two perceptually similar images of a bear ..................
Another image of a bear ............................
Images of wheat and corn that vary in their appearance. .........

Example of shape representation using edge directions; (a)-(c) Show 3
database images, (d)-(f) corresponding edge images, and (g)-(i) corre—
sponding edge direction histograms; D,(g, h) = 0.065, D,(g, i) = 0.69,
D,(h,i) = 0.70. ..............................

Example of Shape representation using invariant moments; (a)—(c) Show 3
database images; Dm(a,b) = 0.033, Dm(a,c) = 0.85, Dm(b,c) = 0.85. .

Constructing the query template for deformable template matching. (a)
A hand drawn query trademark, and (b) the query template obtained
by calculating the edge map of the image in (a). ...........

Database images not retrieved correctly in the presence of rotation.
Examples of hand drawn query trademarks ...................
Database pruning results for the hand drawn bear sketch shown in Fig. 4.2.

The top 20 retrievals are given in the increasing order of dissimilarity.

Database pruning results for the hand drawn bull sketch shown in Fig. 4.2.

The top 20 retrievals are given in the increasing order of dissimilarity.

10

11

12

12
14
15
19
20
20
21

41

43

48

56
58

59

60

4.5

4.6

4.7

4.8

4.9

4.10

4.11

4.12
4.13
4.14

4.15

4.16

4.17

4.18

4.19

Database pruning results for the hand drawn boomerang sketch shown
in Fig. 4.2. The top 20 retrievals are given in the increasing order of
dissimilarity .................................

Database pruning results for the hand drawn deer sketch shown in Fig. 4.2.
The top 20 retrievals are given in the increasing order of dissimilarity.
Database pruning results for the hand drawn kangaroo sketch shown in
Fig. 4.2. The top 20 retrievals are given in the increasing order of
dissimilarity .................................
Deformable template matching; (a) initial position of the bull template
overlaid on the edge map of a bull logo using the generalized Hough
transform, (b) ﬁnal match for the bull logo. ..............
Deformable template matching; (a) initial position of the boomerang tem-
plate overlaid on the edge map of a boomerang logo using the gener-
alized Hough transform, (b) ﬁnal match for the boomerang logo. . . .

Deformable template matching; ﬁnal match for a bull template and a
boomerang image ..............................

Nine topmost retrieved trademark images by the human subjects for the
bull template. The number below each retrieved image is the number
of human respondents (out of 5) who placed the image in the top 10
retrievals. .................................

Five images of a bull present in the database. ...............
Filled-in images for the ﬁve bull images shown in Figure 4.12. ......

Database pruning results for a bull image (query image corresponds to the
top most retrieval). The top 20 retrievals are given in the increasing
order of dissimilarity ............................

Database pruning results for a bull image (query image corresponds to the
top most retrieval). The top 20 retrievals are given in the increasing
order of dissimilarity. Results are based on unﬁlled images .......

Database pruning results for a bear image (query image corresponds to the
top most retrieval). The top 20 retrievals are given in the increasing
order of dissimilarity ............................

Database pruning results for a bear image (query image corresponds to the
top most retrieval). The top 20 retrievals are given in the increasing
order of dissimilarity. Results are based on unﬁlled images .......

Database pruning results for a kangaroo image (query image corresponds
to the top most retrieval). The top 20 retrievals are given in the in-
creasing order of dissimilarity. ......................

Database pruning results for a kangaroo image (query image corresponds
to the top most retrieval). The top 20 retrievals are given in the in—
creasing order of dissimilarity. Results are based on unﬁlled images. .

4.20 Database pruning results for an image consisting a set of vertical bars

(query image corresponds to the top most retrieval). The top 20 re-
trievals are given in the increasing order of dissimilarity. .......

xi

61

62

63

64

65

66

68
69
70

72

73

74

75

76

77

4.21

4.22

4.23

4.24

4.25

4.26

4.27

4.28

4.29

4.30

4.31

4.32

4.33

4.34

4.35

4.36

4.37

4.38

4.39

4.40

Database pruning results for a bee image (query image corresponds to the
top most retrieval). The top 20 retrievals are given in the increasing
order of dissimilarity ............................

Database pruning results for a Sun image (query image corresponds to the
top most retrieval). The top 20 retrievals are given in the increasing
order of dissimilarity ............................

Database pruning results for a word mark (query image corresponds to the
top most retrieval). The top 20 retrievals are given in the increasing
order of dissimilarity ............................

Database pruning results for a lion image (query image corresponds to the
top most retrieval). The top 20 retrievals are given in the increasing
order of dissimilarity ............................

Two images containing a square border; information contained in the holes
(white) is neglected when ﬁlling in the database .............

Database pruning results for an image with a square border (query image
corresponds to the top most retrieval). The top 20 retrievals are given
in the increasing order of dissimilarity. .................

Database pruning results for an image with a circular border (query image
corresponds to the top most retrieval). The top 20 retrievals are given
in the increasing order of dissimilarity. .................

Segmentation of image into object entities; (a) a trademark image; (b)
segmented objects. ............................

Segmentation of image into object entities; (a) a trademark image; (b)
segmented objects. ............................

Segmentation of image into object entities; (a) a trademark image; (b)
segmented objects. ............................

A block diagram of Master identiﬁcation based on master and city-marks.

A representative set of master-marks in the database ............
A representative set of real master-marks extracted from the tankards.
Preprocessing an input master—mark; (a) Greyscale master—mark; (b)
Thresholded (binary) master-mark; (c) Binary master-mark after noise
cleaning ...................................
Retrieval of master seals; (a) Input Master Mark; (b) Top 10 retrievals
based on the integrated match. .....................
Retrieval of master seals; (a) Input Master Mark; (b) Top 20 retrievals
based on the integrated match. .....................
Retrieval of master seals; (a) Input Master Mark; (b) Top 10 retrievals
based on the integrated match. .....................
Retrieval of master seals; (a) Input Master Mark; (b) Top 20 retrievals
based on the integrated match. .....................
Retrieval of master seals; (a) Input Master Mark; (b) Top 20 retrievals
based on the integrated match. .....................
Retrieval of master seals; (a) Input Master Mark; (b) Top 20 retrievals
based on the integrated match. .....................

xii

79

80

81

82

83

84

85

88

89

90

96

97

98

98

99

103

104

105

106

4.41 Retrieval of master seals; (a) Input Master Mark; (b) Top 20 retrievals
based on the integrated match. .....................

A.1 Examples of trademarks present in the database. .............

xiii

Chapter 1

Introduction

Information is inherently multimodal. Humans can efficiently and effectively process
information Simultaneously in multiple dimensions. These multiple media, that aid
effective communication, can be characterized into Speech, audio, image, video, and
textual data.

Advances in computing and networking are generating a signiﬁcant amount of
interest in multimedia services and applications [1, 2, 3, 4]. Powerful processors,
high-speed networking, high-capacity storage devices, improvements in compression
algorithms, and advances in processing of audio, Speech, image, and video Signals are
making multimedia systems technically and economically feasible.

Multimedia systems suggest a wide variety of potential applications such as in-
teractive entertainment, video news distribution, video rental services, and digital
multimedia libraries. These services aim to provide the user with on-demand mul-
timedia services. Any system providing these services will have to address the issue

of representation, indexing, retrieval, and manipulation of the multimedia data. Fig-

 

 

 

Speech

 

 

 

 

 

 

 

 

 

l
l
I
Text & 8‘ |
|
|
l
|

Multimedia Database

l__-——__——'I

|
|

l

I

I

i
”l
I

I

I

I

J

'i

4

Feature Extraction
&

Representation
(Feature Database)

__Z___(r _____ JI_—_

l

I Feature .

| Query InsertIon
I

l

l

 

 

 

 

 

 

Processing
Module Module Module

 

 

 

 

 

 

 

 

L User Interface

 

 

 

 

Figure 1.1: Model of a multimedia system.

ure 1.1 represents a generic block diagram of a multimedia database. The multimedia
objects are archived in a database. A feature extraction module extracts various fea-
tures from the database, and represents the objects in terms of these features. The
user interface provides the user with the capabilities to query, extract features, and

insert multimedia objects.

Image and video are an integral part of multimedia data. There are a number
of applications where images need to be automatically retrieved by their content.

This necessitates the need for powerful image processing and understanding tools.

3

Various applications in digital libraries and image databases have been described in
the literature. In the following section, we provide a brief introduction to image

databases.

1 . 1 Image Database

Digital images are a convenient media for describing and storing spatial, temporal,
spectral, and physical components of information contained in a variety of domains
(e.g., aerial / satellite images in remote sensing, medical images in telemedicine, ﬁnger—
prints in forensics, museum collections in art history, and registration of trademarks
and logos). These databases typically consist of thousands of images, taking up gi-
gabytes of memory Space. While advances in image compression algorithms have
alleviated the storage requirement to some extent, the large volume of these images
makes it difﬁcult for a user to browse through the entire database. Therefore, an
efﬁcient and automatic procedure is required for indexing and retrieving images from
databases.

Traditionally, textual features, such as ﬁlenames, caption and keywords have been
used to annotate and retrieve images. But, there are several problems with these
methods. First of all, human intervention is required to describe and tag the contents
of the images in terms of a selected set of captions and keywords. In most of the
images there are several objects that could be referenced, each having its own set of
attributes. Further, we need to express the Spatial relationships among the various

objects in an image to understand its content. AS the Size of the image databases

4

grow, the use of keywords becomes not only complex but also inadequate to represent
the image content. The keywords are inherently subjective and not unique. Often, the
preselected keywords in a given application are context dependent and do not allow
for any unanticipated search. If the image database is to be Shared globally then the
linguistic barriers will make the use of keywords ineffective. Another problem with
this approach is the inadequacy of uniform textual descriptions of such attributes as
color, shape, texture, layout, and sketch. AS an example, we can consider a picture
of a group of people. Various objects that are associated with the picture which may
later be used for querying, could be individual people in the picture, the clothes they
are wearing, their gender, the color of their clothes, the background in the image,
the activity of the people in the image (standing, sitting, laughing, etc.), the spatial
relationships between the various people, and so on. A textual representation of all
these attributes may become very complex. Later, while querying the database, a
user can think of a number of other textual attributes to describe this picture, such
as, “Show me the picture with X holding a glass of wine”, which had not been tagged
with the images in the database. There are thus many complex attributes that are
typically needed to represent an image. In the case of humans, we surely do not store
textual descriptions for various images, but have a general notion of what an image
contains. The goal of research in content-based addressing is to make steps in the
direction of extraction of features that aid in representing and retrieving pictorial
data.

It is generally agreed that image retrieval based on image content is more desirable

than text-based retrieval in a number of applications. AS a result, there iS a need to

5

automatically extract primitive visual features from the images and to retrieve images
on the basis of these features. Humans use color, shape, and texture to understand
and recall the contents of an image. Therefore, it is natural to use features based
on these attributes for image retrieval. Most of the work in image database retrieval
has concentrated on identifying appropriate models for image features such as color,
shape, or texture. Figure 1.2 Shows a block diagram of the traditional image retrieval
model. The input images are preprocessed to extract the features which are then
stored along with the images in the database. When a query image is presented, it is
similarly preprocessed to extract its features which are then matched with the feature
vectors present in the database. A ranked set of images with high matching scores
are presented at the output. Retrieval based on a Single image attribute might lack
a sufﬁcient discriminatory information and might not be able to accommodate large
scale and orientation changes. Our goal is to build an image database retrieval system
which is insensitive to large variations in image scale, rotation, and translation. In
other words, even if the query image differs from its stored representation in the
database in its orientation, position, or Size, the image retrieval system should be
able to correctly match the query image with its prototype in the database. Recent
studies [5, 6, 7, 8] have combined various image features for an efﬁcient and effective

querying by image content.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Input F tu l
ea re ma 9
_, Scanner _._ . a [ 9
Document ExtractIon Database
Database Creation
Database Retrieval
Query ‘ .
Retneved
——A Scanner —> Feature —>» Matching —>
Trademark ExtractIon Images

 

 

 

 

 

 

 

 

 

Figure 1.2: Traditional image retrieval model.

1.2 Information on Trademarks

Trademarks represent a gamut of pictorial data. There are over a million registered
trademarks in the US. alone, and they represent a number of goods and products
which are sold by different manufacturers and service and other organizations. Most
of the trademarks are an abstract representation of a concept in the world, like an
abstract drawing of an animal, or a natural object (Sun, Moon, etc). It is extremely
challenging and instructive to study and address the issue of image database retrieval
on this huge source of pictorial data. A trademark is either a word, phrase, symbol
or design, or combination of words, phrases, symbols or designs, which identiﬁes and
distinguishes the source of goods or services of one party from those of others. A
service mark is the same as a trademark except that it identiﬁes and distinguishes the
source of a service rather than a product [9]. Thus, while a trademark appears on the

product or its packaging, the service mark appears in advertising for the services.

7
1.2.1 Registration of Trademarks

Trademarks in the US. are registered with the USPTO (US. Patent and Trademark

Ofﬁce). An applicant may apply for federal registration in three principle ways.

0 An applicant can process a use application if the applicant has already com-
menced the use of the mark in products or services. This application is ﬁled on

the basis of the use of the mark.

0 If the applicant has not yet used the mark but intends to use it in the future,
the applicant can ﬁle an intent—to-use application on the basis of a bona ﬁde

intention to use the mark.

0 The third option allows international applicants to apply for a mark that has
already been applied or registered in another country. Thus, an applicant from

outside the United States may ﬁle within the United States.

The application must be ﬁled in the name of the owner of the mark. The owner
is usually an individual, a corporation, or a partnership. This application may be
submitted and executed by the owner for registration, or may be represented by an
attorney. The nature of the goods and services provided by a mark are controlled by

the owner of the mark, under whose name the application is ﬁled.

1.2.2 Search for Conﬂicting Marks

Before a mark is registered with the USPTO, an examining attorney conducts a

search for conﬂicting marks. Usually, it is not necessary for an applicant to conduct a

8

search for conﬂicting marks prior to ﬁling an application. The application fee covers
processing and search costs, and is not refunded in case a conﬂict is found and the
mark cannot be registered.

To determine whether there is a conﬂict between two marks, the PTO determines
whether there would be a likelihood of confusion, i.e., whether relevant consumers
would be likely to associate the goods or services of one party with those of the other
party as a result of the use of the marks at issue by both parties [10]. The principal
factors to be considered in reaching this decision are the similarity of the marks
and the commercial relationship between the goods and services identiﬁed by the
marks [10]. In order for a conﬂict to occur, it is not necessary that the marks be
very similar when subjected to side-by-Side comparison, but the issue is whether the
marks are sufﬁciently similar to produce a likelihood of confusion as to the source of
the goods or services. Thus, marks representing Similar meanings and used to sell
similar goods or services may cause a conﬂict to the general public. While evaluating
the Similarities between the marks, emphasis must be placed on the recollection of
the average purchaser who may normally retain a general rather than any speciﬁc
details of the trademarks.

In case of design marks (trademarks consisting of symbols and designs), the issue
of similarity is primarily based on visual similarity. Again, consideration must be
placed on the fact that the purchaser’s recollection of the mark is of a general and
hazy nature. Figures 1.3, 1.4, 1.5, 1.6, and 1.7 Show a few pictures of trademarks that
were considered for Opposition based on their visual similarity.

Figure 1.3 shows the example where Red Carpet Corp. applied for a stylized house

 

(a) ' S (b)

Figure 1.3: Trademarks considered for opposition by the USPTO; registration was
accepted by the PTO. The trademarks have been enlarged to show the details.

design (a) for managing the real estate properties for others. Opposition was put up
by Johnstown American Enterprises, Inc. with a registered trademark of a stylized
house design (b) for real estate brokerage services. The PTO accepted the registration.
Figure 1.4 presents the example where United Services Distributors, Inc. was refused
registration of the design mark with a silhouette of two proﬁles facing right within a
teardrop background (a) for services in the ﬁeld of health and beauty aids against a
registered mark of silhouette of two proﬁles facing left within an oval background (b)
used for skin creams. Figure 1.5 presents the case between Ocean Spray Cranberries,
Inc. and Ocean Garden Products, Inc. for their respective design marks. The PTO
accepted the registration of the marks. Figure 1.6 presents the example where Steury
Corp. was refused registration of its design mark (a) for boats and camper trailers
as against a registered mark (b) for boats and campers. Figure 1.7 shows the case

where Matsushita Electric Industrial Co. was refused registration of its design mark

10

 

(b)

Figure 1.4: 'Itademarks chosen for opposition by the USPTO; registration was refused
by the PTO. The trademarks have been enlarged to show the details.

of triangular arrow within a square border (a) for various items of electrical and
electronic equipment as against a registered mark of Sanders Associates, Inc. with
triangular arrow design (b) for various items of electrical and electronic components
and equipment. For further details on these and other cases, please refer to the
trademark manual for examining procedures [10].

In general, applicants want to conduct searches prior to submitting an application
in order to save on the application fee (~ $700) if a conflicting mark does exist. Unless
the PTO is acting on an application, it does not conduct searches for the public to
determine if a conﬂicting mark is registered, or pending. The applicants are allowed

to conduct searches of their own in a number of ways.

0 The applicant can conduct a search in the PTO public search library. This is

   

(a) . . . (b)

Figure 1.5: ”Itademarks chosen for opposition by the USPTO; registration was ac-
cepted by the PTO. The trademarks have been enlarged to show the details.

located on the second ﬂoor of the South Tower Building, 2900 Crystal Drive,

Arlington, Virginia 22202.

0 An alternate method is to visit a Patent and Depository Library located in every
State. These libraries have CD-ROMS containing the trademark database of

registered and pending marks.

0 The search can also be conducted through a private trademark search company

or an attorney who deals with trademark law.

The USPTO organizes the existing trademarks into categories for a more efﬁcient
retrieval of trademarks. The marks are initially grouped with respect to the goods
or services they sell. For example, chemicals are assigned the code 001, vehicles are

assigned the code 012, and services like advertising and business are assigned the

12

 

Figure 1.6: Trademarks chosen for opposition by the USPTO; registration was refused
by the PTO. The trademarks have been enlarged to show the details.

    

(a)
Figure 1.7: Trademarks chosen for opposition by the USPTO; registration was refused
by the PTO. The trademarks have been enlarged to show the details.

 

code 035. The TRADEMARKSCAN Design Code Manual has been developed by
Thomson and Thomson to assist an applicant in ﬁnding trademarks that have, or
are, speciﬁc designs. A search mechanism based on design codes associated with the
trademarks is incorporated in the CD-ROMS at depository libraries in each State.
Trademarks are organized in a three-layered hierarchical structure. The highest level
consists of 30 main categories (including celestial bodies, human beings, animals,

foodstuffs, geometric ﬁgures and solids, etc). Each category is further divided into

13

divisions (the second level of the hierarchy), which are further divided into sections.
Every trademark is assigned as many as possible Six digit design codes, two digits
corresponding to each of the levels. For example, the category celestial bodies is
divided into divisions such as stars and comets (01.01), constellations and starry sky
(01.03), sun (01.05), etc. Similarly, the division sun is divided into sections such as
sun rising or setting (01.05.01), sun with rays or halo (01.05.02), sun representing a
human face (01.05.03), etc. It can be seen that a trademark can be assigned more
than one design code. For example, ﬁgure 1.8 Shows a trademark with a sun in a
ﬁlled square background. It will thus have two design codes corresponding to the
ﬁlled square and sun with rays. The CD-ROM based search uses these design codes
to search the database of current and pending trademarks to retrieve other marks

with a similar design code as that of the query.

The current search strategy is based on manually assigning as many design codes
as possible (referring to the design code manual) and then conducting the search
based on the design codes at the Patent Depository Libraries. In general, tens of
thousands of marks are retrieved for a single query due to the presence of multiple
design codes for a trademark. An additional, content-based scheme is required to rank
order these marks according to their visual Similarity to the query trademark. This

thesis addresses the problem of efﬁcient shape-based retrieval from large databases.

 

Figure 1.8: Example of a trademark with multiple design codes.

1 .3 Problem Deﬁnition

In this thesis, we address the problem of an efﬁcient and accurate retrieval from a
large image database based on image content. Since we desire a system that has both
high speed and high accuracy of retrievals, we propose a two-tiered hierarchical image
retrieval system. Figure 1.9 shows a block diagram of our proposed image retrieval
scheme. The ﬁrst stage computes simple image features to prune the database to a
reduced set of plausible matches. As simple shape features are used in screening the
image database, this ﬁrst stage can be performed very fast. The small set of plausible
candidates generated by the fast screening stage are then presented to a detailed
matcher in the second stage. This stage applies comprehensive feature matching

techniques to eliminate false matches.

The proposed hierarchical content-based image retrieval algorithm is applied to

a trademark image database with 1,100 images (See Appendix A for representative

15

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

F
“m" Matcher 1

Set 1

Feature

Set 2 Matcher 2
Input Top I) “tomb“ Top m Choices
———->I Integrator _’
Query i ] Choices Template (m < n)
Image ]

Feature M tch K

Set K a or

 

 

 

 

 

 

Figure 1.9: Proposed image retrieval model.

images belonging to this database). In the fast pruning stage, Simple and easily
calculated Shape features, including edge direction histograms and invariant moments,
are used to quickly browse the database for a small set of plausible matches. The
retrieved images are then screened using a more elaborate, but costly deformable
template matching process to remove false retrievals. The contributions of this work

are as follows:

We have explored how to sensibly combine some easily computable statistical

Shape features to give fast and accurate retrievals.

o The two-stage hierarchical retrieval process is able to improve the retrieval re-

sults presented to the user.

The retrieval process is reasonably fast.

Images resembling the queries are correctly retrieved even if they differ in their

orientation, position, or size.

16
1.4 Motivation

Although content—based image retrieval is extremely desirable in many applications,
it is a very difﬁcult problem. The ease with which humans capture the image content
has not been understood well enough to automate it. Problems arise in segmenting
the images into regions corresponding to individual objects, extracting features from
the images that capture the perceptual and semantic meanings from the images, and
matching a database and a query image based on the extracted features. In this
thesis, we conduct experiments on a large database of trademark images, addressing

some of the issues in this application.

1.4.1 Database

The image database used in this study was created by scanning a large number
of trademarks from several books [11, 12, 13]. About 400 images were scanned
at MSU while 700 were scanned at Siemens Corporate Research, Princeton. The
database consists of these 1,100 images. A design mark is registered in the binary
form at the USPTO. Therefore, we have converted our scanned gray level images to
binary images. The 400 trademarks gathered at MSU were scanned using a Hewlett
Packard Scanjet Ich ﬂatbed scanner at a resolution of roughly 75 dpi. The images
are approximately 200 x 200 pixels in size. The images scanned at Siemens Corporate
Research are 500 x 500 pixels in size. The trademarks in our database were selected
so that many of them have Similar perceptual meaning to make the retrieval problem

more challenging. These trademarks encompass a wide variety of objects; some are

17

based on the alphabets of the English language, while others represent the Sun, Earth,
humans, eyes, animals, etc. A representative subset of trademark images which reside

in our database is shown in Appendix A.

1.4.2 Similarity

Whether two trademarks are similar enough to cause a (legal) infringement is decided
by an attorney. The goal of the proposed retrieval system is to present to the user a
subset of database images that are visually similar to the query. The aim is that all
the registered trademarks that may cause a conﬂict to the query image be presented
in the retrieved images. We can assume that a design code-based search has already
been implemented to aid applicants to conduct searches for a conﬂict. A major
drawback of the search based on the design codes is that it produces too many marks
that have the same design code, regardless of the visual appearance of these marks.
Thus, there is a need for some content-based (Shape—based) retrieval to narrow down

the search space.

A major drawback with content-based retrieval is that images that do seem to be
perceptually similar need not be exactly similar in appearance. In fact, two images
that might produce a conﬂict need not have a high pixel-based correlation value. As
an example, ﬁgure 1.10 shows two images of a bull head. Although, these two images
provide a similar perceptual meaning, they differ considerably in their correlation
value. While one image is a ﬁlled head, the other is just an outline with some interior

details. Figure 1.11 shows another pair of perceptually similar marks that differ in

18

their image correlation. While both the marks show a bear holding Similar objects
(a bottle in the ﬁrst case, and a glass of wine and a bottle in the second), they
are markedly different in their similarity to each other, for example, the direction
in which the head is pointing, the difference in the bottles they are holding, and the
manner in which the bottles are being held. We (humans) can identify these images as
containing a bear but it is extremely difﬁcult for a computer vision system to identify
such objects automatically. As another example, ﬁgure 1.12 shows an image of a bear
that is made up of a number of parts rather than a single component image. Though,
humans can use Gestalt principles and perhaps top-down processing to identify the
presence of a bear in this ﬁgure, it is not possible to do so with an automatic system,
given the current state-of-the-art in computer vision. As a ﬁnal example, ﬁgure 1.13
presents several trademark images containing corn and wheat, that are perceptually
similar but are markedly different in their appearance. While some of these images
are made up of a single component, others are made up of multiple components.
Since a content-based technique cannot by itself retrieve the perceptually Similar
images, we believe that it should be augmented by traditional text-based approaches

to facilitate the search and retrieval tasks.

1.5 Thesis Outline

The outline of the thesis is as follows. Chapter 2 brieﬂy reviews relevant literature
on content-based image database retrieval methods. We describe the proposed hier-

archical retrieval system in chapter 3. Experimental results on a digital trademark

19

 

(a) (b)

Figure 1.10: Two perceptually similar images of a bull head.
library and a library of master marks of craftsmen of old Norwegian silverware are
presented in chapter 4. Chapter 5 presents the conclusions and some ideas for future

research.

20

' .
A
(a) (b)

Figure 1.11: Two perceptually similar images of a bear.

 

 

Figure 1.12: Another image of a bear.

21

 

Figure 1.13: Images of wheat and corn that vary in their appearance.

Chapter 2

Literature Review

Much of the past research in image retrieval has concentrated on the feature ex-
traction stage. For each database image, a feature vector which describes various .
visual cues, such as shape, texture, or color is computed. Given a query image, its
feature vector is calculated and those images which are most Similar to this query
based on an appropriate distance measure in the feature space are retrieved. The
traditional image matching methods based on image distance or correlations of pixel
values are in general too expensive and not meaningful for such an application. The
methods preposed in the literature can be broadly classiﬁed into two categories on
the basis of the approach used for extracting the image attributes [14]. The spatial
information preserving methods derive features that preserve the spatial information
in the image. It is possible to reconstruct the image on the basis of these feature
sets. Representative techniques include polygonal approximation of the object of
interest, physics-based modeling, and principal component analysis (PCA). The non-
spatial information preserving methods extract statistical features that are used to

22

23

discriminate among objects of interest. These include heuristics based on edge angle
histograms, and other ad hoc shape measures. In the following sections, we brieﬂy

review the color-, Shape-, and texture-based image retrieval methods.

2.1 Methods using a Single Cue

Traditional image retrieval systems use a Single cue such as shape, texture, or color
to represent the image and retrieval is based on the features that represent the cho-
sen cue. Although color seems to be a highly reliable attribute for image retrieval,
situations where color information is not present in the images require the use of
Shape and/or texture attributes for image retrieval. Retrieval based on a single im-
age attribute might lack sufﬁcient discriminatory information and might not be able
to accomodate large scale and orientation changes. For example, color-based ap-
proaches cannot distinguish between a red apple and a red Ferrari. Additional shape

information can very easily distinguish the two.

2.1.1 Color

We (humans) can effectively use color for object segmentation and recognition. Due
to ease of availability of color information in digital images in the form of the three
RGB color components, various systems have been built for color-based segmentation
and retrieval.

Examples of recent work with color include color indexing using histogram inter—

section [15, 16], which considers intersection of color histograms; and retrieval using

24

distance and reference color methods [17], which employ metric distance-based com-
parison of color histograms. These schemes do not preserve the Spatial information
in the image, whereas the PCA on color features used in [6] maintains the spatial

ad jacencies.

Retrieval on the basis of color histograms has been shown to outperform retrieval
on the basis of shape both in terms of efficiency and robustness. A 3-D histogram
intersection technique [15] uses 16 x 16 x 16 bins and the resulting matching is rel-
atively fast. Reference color method [17] improves the performance further by using
fewer reference color bins. Color histograms are generally invariant to translation
and rotation of the images and normalizing the histograms leads to scale invariance.
However, color histograms do not incorporate spatial adjacency of pixels in the image

and may lead to inaccuracies in the retrieval.

2.1.2 Shape

Although color can be an effective means of querying, color alone as a retrieval cue
cannot be effective for querying large image databases. Applications with grayscale
or binary images have to use other cues such as shape and texture for retrieval.
Although, humans can effectively use color to differentiate among natural objects,
many artiﬁcial (man made) objects cannot be distinguished on the basis of color
alone. Moreover, humans when presented with binary or grayscale images can easily

distinguish among these.

Various schemes have been proposed in the literature for shape-based retrieval.

25

These include polygonal approximation of the shape [16]; shape matching using relax-
ation techniques [18], which uses relaxation methods to ﬁnd acceptable combinations
of matches between pairs of angles on two shapes; image representation on the basis
of strings [19, 20], which represent the Shape of objects as strings and consider string
matching techniques for retrieval; comparing images using the Hausdorff distance [21],
which measures the extent to which each point of the stored database image lies near
some point of the query and vice versa; experiments in point matching techniques [22],
which extract a set of distinctive local features from the query and the model, and
then match the resulting point patterns; image registration by matching relational
structures [23], which uses relational structures to represent images; shape match-
ing based on chord distributions [24], which uses chord length distribution for image
matching; image representation using Codons [25], which uses continuous curve seg-
ments in an image that are separated by concave cusps to represent the object shape;
matching objects using Fourier descriptors [26]; and object matching using invariant
moments [27].

The above schemes introduce various Shape-based features for image retrieval.
These techniques rely on a single concise feature to describe the shape. A major
limitation of using a Single shape model in image database retrieval is that it might
not be possible to extract the corresponding features in a given application domain.
Moreover, shape-based representation schemes are not generally invariant to large
variations of image Size, position, and orientation. In order to incorporate invariance
to rigid motions (rotation and translation), these methods need to be applied for all

possible rotations and translations thereby reducing the speed of the retrievals. When

26

we consider large image databases, this speed reduction can become considerable.
We, therefore, need to identify shape features which are either invariant to rigid
transformations or which can be efficiently computed over a number of possible rigid

transformations.

2.1.3 Texture

Many machine vision and image processing algorithms neglect the presence of texture
in an image. Variations of intensities that form certain repeated patterns are called
visual texture [28]. These patterns can be the result of physical properties of the
object surface (roughness, peakedness), or be the result of reﬂectance differences such
as the color on a surface. Humans can very easily recognize a texture upon seeing it,
yet it is very difﬁcult to deﬁne it. Texture analysis is an important and useful area of
study in computer vision. Most natural surfaces exhibit texture and it may be useful

to extract texture features for querying.

Texture models developed in the literature can be divided into the following four
classes [28]. Statistical methods deﬁne texture in terms of the spatial distribution of
grey values. These include the use of co—occurrence matrices [29] and autocorrelation
features (extracting repetitive nature of placement of texture elements). Geometric
methods are characterized as being composed of “texture elements” or primitives.
These include Voronoi tessellation features [30] and structural methods that extract
the texture primitives [31]. Model based methods assume an underlying image model

to describe and synthesize texture. These include the use of random ﬁeld models [32]

27

and fractals [33]. Signal processing methods use frequency analysis of the image to
classify texture. The schemes include the use of Spatial domain [34] and Fourier

domain [35] ﬁltering, and the use of Gabor ﬁlters and wavelet models [36].

The above schemes deﬁne various simple texture analysis schemes. In the absence
of color and shape information, texture can act as a vital cue for image classiﬁcation.
Scenes containing pictures of wood, grass, etc. can be easily classiﬁed based on the
texture rather than Shape or color. The numerous different texture deﬁnitions that
have been attempted in the literature demonstrate the difficulty in deﬁning texture.
Although, humans can easily identify texture, a concise deﬁnition has evaded past

research.

Retrieval based on a single image attribute lacks sufﬁcient discriminatory infor-
mation and might not be able to accomodate large scale and orientation changes. As
we consider large databases, we need to extract multiple features for querying the
database. Recently, a number of studies have been carried out [5, 6, 7, 8, 37] which

combine the various features for efﬁcient and effective querying by image content.

2.2 Using Multiple Cues

Recently, attempts have been made to develop general purpose image retrieval systems
based on multiple features that describe the image content. These systems attempt

to combine Shape, color, and texture cues for a more accurate retrieval.

28
2.2.1 QBIC

The QBIC system has developed methods to query large on-line image databases
using image content as the basis of the query. In other words, the system allows
users to search through databases consisting of very large numbers of images using
sketches, layout or structural descriptions, texture, color, and sample images. QBIC
techniques serve as database ﬁlters and reduce the search complexity for the user.
These techniques limit the content-based features to those parameters that can be
easily extracted, such as color distribution, texture, global shape of an image, and
layout. The system offers a user a virtually unlimited set of unanticipated queries
thus allowing for general purpose applications rather than catering to a particular
application. Color- and texture—based queries are allowed for both images and objects,
whereas shape-based queries are allowed only for individual objects and layout-based
queries are allowed only for the entire image. QBIC uses the following image features

for content-based retrieval.

0 Color: QBIC uses a k-element color histogram for matching based on color. The
entire color space is quantized into k levels, such that these k cells correspond to
super-cells in the color space. The Similarity between a pair of color histograms,
X and Y can be computed in a number of ways. The following method proposed

in [38] is used for matching the color histograms.

k k
dﬁist(xi Y) = (X " Y)‘A(X ’ Y) = Z: 200(31— yi)($j — yj),

29

where the superscript t denotes vector transposition, and the matrix A has
entries a,,- which describe the Similarity between color i and color j. Euclidean
distance is a special case of this distance measure when the matrix A is the

identity matrix.

Texture: QBIC uses modiﬁed versions of three texture features, namely, coarse-
ness, contrast, and directionality. Coarseness measures the texture scale, con-
trast measures the gray level variance, and directionality describes whether an
image has a preferred direction. The texture distance between two images is
computed as a weighted Euclidean distance in the three-dimensional texture
space, where the weights are the inverse of variance along the three dimensions.

Distance between image i and image j is thus deﬁned as,

(or — 0V (C, — C‘)2 (D, - D“)2
40 = —;2—2— + ‘7’]— + 7—]—
0 c o

where 0, C, and D are the coarseness, contrast, and directionality measures,

respectively.

Shape: QBIC primarily uses statistical features to represent the shape of an
object. These include a combination of area, circularity, eccentricity, major
axis orientation, and algebraic moment invariants. These features are computed
from binary images representing individual objects. The distance between two
images is computed similar to the distance in the case of texture using a weighted

Euclidean distance with the inverse of feature variances used for normalization.

30

QBIC allows the user to use any subset of these Shape features in their query.

0 Sketch: A reduced resolution edge map is used to represent the sketch of an
object. The Similarity measure is computed based on the spatial correlation of

the subimages.

Although the QBIC system allows the use of multiple cues such as color, shape,
texture, and Sketch for querying, users have to be trained well enough before they can
use the system effectively. Due to the presence of a large variety of images (natural
scenes, man made objects, etc), the user needs to use effective features that are most
expressive in identifying the class of the query object and retrieve stored images from
that class. For example, while querying for scenes of buildings, it is more meaningful
to query on the basis of texture than on the basis of color, but while trying to identify
images of plants and grass, it may be more meaningful to query on the basis of green
color and texture. Since the user has to Specify a combination of features to use for
querying, the retrieval results critically depend on the ability of the user to identify

expressive features for the query.

2.2.2 Photobook

Photobook is a set of interactive tools for browsing and searching image sequences.
The features used for querying can be based on both text annotations and image
content. The key idea behind the system is semantics-preserving image compression,
which reduces images to a small set of perceptually signiﬁcant coefﬁcients. These

features describe the shape and texture of the images in the database. Photobook

31

uses multiple image features for querying general purpose image databases. The user
is given the choice to select features based on the appearance, shape, and texture
to browse through large databases. These features can be used in any combination
with textual features to improve the efﬁciency and accuracy of the retrievals. The

following schemes are used to extract the semantics-preserving features.

0 Appearance Speciﬁc: Appearance Photobook measures the effective similarity
in visual appearance within an object class. Similarity measure is based on
image correlation. Since, images are typically of large sizes (typically 512 x 512),
a pixel by pixel correlation is very expensive. Moreover, this requires storing
large size images. An efﬁcient way to extract semantics-preserving features is
by applying principal component analysis to gray scale images. The images
are represented in terms of a small number of dominant eigenvectors which
preserve a large fraction of the gray level variance present in the input image.
The projection of the images onto these orthogonal eigenvectors represents the
image features. Usually, a low dimensional (as compared to the original number
of pixels) feature vector (about 20 features) is very effective in capturing the
image content. Euclidean distance based on the low-dimensional image features
is used as an approximation to the image correlation. Such a scheme makes no
assumptions about the image content and uses the training samples (many

objects belonging to a class) to extract the principal components.

0 Shape Speciﬁc: Shape Photobook compares the Shape of the two objects by

measuring the deformations that relate them. Physics-based modeling is used

32

to deﬁne the transformations of real world objects. Due to changes in viewing
direction and distance of the object from the sensor, and presence of physical
deformities (bent, stretched objects), appearance—based matches produce inef-
ﬁcient results. Under these circumstances, the transformations between these
objects can be modeled to extract a similarity value between the objects. The
shape model uses a virtual material to ﬁll up the space between various shape-
based image features. These features include, edges, corners, and high-curvature
points. A ﬁnite element method is used to compute this interconnectedness of
the shape. Eigenvectors of the corresponding stiffness matrix are used as the

semantics-preserving features.

0 Texture Speciﬁc: Texture Photobook is applied to databases of texture patches,
where both appearance- and shape—based features prove ineffective. The seman-
tics preserving texture features are represented based on Wold decomposition
for regular stationary stochastic processes in 2-D images [39]. Photobook also
provides many other models for texture such as color histogram differences,
color histogram energy and entropy, color histogram invariant features, eigen-
vectors of R03 covariance, multiscale simultaneous auto-regressive model, and

tree-structured wavelet transforms [40].

2.2.3 STAR

STAR (System for Trademark Archival and Retrieval) [41, 42] uses a combination of

color and shape features for retrieval purposes. The color of an image is represented

33

in terms of the R, G, and B color components, whereas the shape is represented
in terms of combination of outline-based features (sketch of the images) and region-
based features (objects in an image). The features used to describe both the color and
Shape in an image are non-information preserving or ambiguous in nature. Though
these features cannot be used to reconstruct the image, they are useful as approximate

indicators of shape.

0 Color: Color in an image is represented as a histogram of reference colors. These
reference colors are selected manually and the color Space is divided to represent
bins corresponding to each reference color. A histogram intersection based on

27 reference colors is used to query on the basis of color.

0 Shape: Multiple features are used to query the database on the basis of shape.
These include the use of invariant moments and Fourier descriptors for Shape-
based matches. In order to describe the Fourier descriptors, the system assumes
a segmentation of objects of interest. An integrated retrieval based on the two

features is used to query on the basis of Shape.

2.2.4 Incorporating Learning

Although, the image retrieval systems based on multiple cues outperform systems
based on a single cue, they have various limitations. The features used for image
retrieval are hard-coded into the system and are usually application speciﬁc. For dif-
ferent applications, these features and the associated weights have to be speciﬁcally

tweaked for efﬁcient performance. A typical user of these systems does not have the

34

basic knowledge of feature extraction and thus, is unable to use the system effectively.
A general purpose image database system should thus be able to automatically decide
on the image features for retrieval purposes [43]. A recent project [44] describes an
interactive learning system using a society of models. Instead of requiring universal
Similarity measures or manual selection of relevant features, this approach provides
a learning algorithm for selecting and combining groupings of the data, where these
groupings are generated by highly Specialized and context-dependent features. The
selection process is guided by a rich user interaction where the user generates both
positive and negative retrieval examples. A greedy strategy is used to select a com-
bination of existing groupings from the set of all possible groupings. These modiﬁed
groupings are generated based on the user interactions and over a period of time,
these replace the initial groupings that have a very low weightage. Thus, the sys-

tem improves on its performance over a period of time through user interaction and

feedback.

2.3 Discussion

While multiple cue-based schemes prove more effective than Single cue—based retrieval
systems, content-based retrieval still faces many problems and challenges. The multi-
ple cue-based schemes generally employ various cues such as color, shape, and texture
for the retrievals. In the case of trademark images, color does not play a useful role
in distinguishing between various marks. The design marks are registered in binary

image form with the USPTO. When searching for a conﬂict, the USPTO bases its

35

decision on the Shape information present in the binary images. Thus, cues like color
and texture are not applicable for query purposes. We therefore, need to investigate
multiple shape models for the retrieval of trademark images. In the next chapter, we

describe our proposed hierarchical retrieval scheme in detail.

Chapter 3

An Integrated System for Efﬁcient

Retrieval

Retrieval speed and accuracy are two main issues in designing image databases. Sys-
tem accuracy can be deﬁned in terms of precision and recall rates. A precision rate
can be deﬁned as the percent of retrieved images Similar to the query among the total
number of retrieved images. A recall rate is deﬁned as the percent of retrieved im-
ages which are similar to the query among the total number of images similar to the
query in the database. It can be easily seen that both precision and recall rates are
a function of the total number of retrieved images. In order to have a high accuracy,
the system needs to have both a high precision and a high recall rate. Although,
simple image features can be easily extracted, they lack sufﬁcient expressiveness and
discriminatory information to determine if two images have a similar content. Thus,
there exists a trade-off between Speed and accuracy.

In order to build a system with both high Speed and accuracy, we use a hierarchical

36

37

two-level feature extraction and matching structure for image retrieval (Fig. 1.9). Our
system uses multiple Shape features for the initial pruning stage. Retrievals based
on these features are integrated [8] for better accuracy and higher system recall rate.
The second stage uses deformable template matching to eliminate the false retrievals
present at the output of the ﬁrst stage, thereby improving the precision rate of the

system.

3. 1 Image Attributes

In order to retrieve images, we must be able to efﬁciently compare two images to de-
termine if they have a Similar content. An efﬁcient matching scheme further depends
upon the discriminatory information contained in the extracted features.

Let {F(.’L‘, y); 3:, y = 1, 2, ..., N} be a two-dimensional image pixel array. For color
images, F(;r,y) denotes the color value at pixel (r, y) Assuming that the color
information is represented in terms of the three primary colors (Red, Green, and
Blue), the image function can be written as F(:r, y) = {FR(.T, y), FG(a:,y), FB(a:,y)}.
For black and white images, F(:r, y) denotes the gray scale intensity value at pixel
(:L‘, y). Let f represent a mapping from the image space onto the n-dimensional feature
space, x = {221,232, ...,rn}, i.e.,

sz—Ix,

where n is the number of features used to represent the image. The difference between
two images, F1 and F2, can be expressed as the distance, D, between the respective

feature vectors, x1 and x2. The problem of retrieval can then be posed as follows:

38

Given a query image P, retrieve an image M from the image database, 8, such that

D(f(P)If(M)) S D(f(P),f(F)), VF E S, F 75 M-

3.2 Fast Pruning Stage

It is desirable to have an image retrieval system which is insensitive to large variations
in image scale, rotation, and translation. Hence, the pruning stage has to be not only
fast but should also extract invariant features for matching. We use the following

features to represent the shape of an image.

0 Edge Angles: A histogram of the edge directions [8] is used to describe global

shape information.

o Invariant Moments: The global image shape is also described in terms of seven

invariant moments [27].

3.2.1 Edge Directions

A histogram of the edge directions is used to represent the Shape attribute. The
edge information contained in the database images is generated in the preprocessing
stage using the Canny edge Operator [45]. Euclidean distance is used to compute the
dissimilarity value between two histograms.

There are many advantages and limitations in representing an image with its edge

directions.

39

0 Use of edge directions captures the general Shape information.

0 Histogram of the edge directions is invariant to translations in the image. Thus,
the positions of the objects in the image have no effect on the edge directions.
This may also turn out to be a limitation, as two totally different images may

yield Similar edge direction histograms.

o The use of edge directions is inherently not scale invariant. Two images identical
in every respect except their size will yield different numbers of edge points and
hence different histograms. In order to have invariance to scale, we normalize
the histograms with respect to the number of edge points in the image. A
drawback of normalized histograms is its inability to match parts of images. If
an image, Q is a part of another image, I, then the histogram of Q is contained

within the histogram of I . Normalized histograms do not satisfy this property.

0 A histogram of the edge directions is also not invariant to rotation. A shift of the
histogram bins during matching partially takes into account image rotations.
A rotation of the image shifts each of the edge directions by the amount of
rotation. But, due to quantization effects in binning the edge directions, the
effect of rotation is more than a Simple shift in the bins. Rotation also affects
the bin memberships. For example, if a quantization of 45° (bin size = 45°) is
used, then two edge points with directions of 30° and 40° will fall into the same
bin. But, if the same image is rotated by 10° then these two points will fall into
adjacent bins. We note that for a rotation of 45°, there is no such effect. To

reduce this effect of rotation, we smooth the histograms. A histogram can be

40

treated as a 1-D discrete signal. Smoothing can then be deﬁned as

_ 2:19-. I [j]

Isiil _ 216 +1 I (3.1)

where I is the original histogram, Is is the smoothed histogram, and the param-

eter k determines the degree of smoothing. In our experiments we have used

k=1.

0 The matching results depend on the bin Size. By choosing a larger bin Size,
the matching Speed is increased. But, this also reduces the accuracy in case of
an arbitrary rotation of the image. A rotation not only causes a shift in the
histogram bins but also affects the bin membership. Use of a very small bin
size reduces the matching speed (since the number of bins increases). Use of
a small bin Size also requires that the edge directions be found to a very high

degree of accuracy.

Fig. 3.1 shows three database images ((a) and (b) are Similar to each other but
(c) is different from (a) and (b)) and their respective edge angle histograms. The
value of D, denotes the shape-based dissimilarity value between two images. Note
that D3(g, h) < D,(g,i) and D,(g, h) < D,(h,i), where Figs. 3.1(g)-(i) represent the

shape histograms for the three database images.

41

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(c) i (f) 1' III

Figure 3.1: Example of shape representation using edge directions; (a)-(c) show 3
database images, (d)-(f) corresponding edge images, and (g)—(i) corresponding edge
direction histograms; D,(g, h) = 0.065, D,(g,i) = 0.69, D,(h,i) = 0.70.

42

3.2.2 Invariant Moments

We also represent the shape of an image in terms of 7 invariant moments [27]. These
features are invariant under rotation, scale, translation, and reﬂection of images and
have been widely used in a number of applications due to their invariance properties.

For a 2-D image, f (2:, y), the central moment of order (p + q) is given by

um = Z ZOE - T)”(y - WASH, y)- (3-2)

Moment invariants based on the 2nd- and 3rd—order moments are given as follows:

M1 = (#20 + #02),

M2 = (#20 - #02)2 + 4#i1,

M3 = (#30 — 3#12)2 + (3#21 - #03)2,

M4 = (#30 + #12)2 + (#21 + #oalz,

M5 =(#3o + #12)(#30 — 3#12)l(#3o + #12)2 “ 3(#21 + #03)2l
+(3#21 — #03)(#21 + 3#03)[3(#03 + #21)2 - (#21 — #03)le

M6 = (#20 — #02)l(#3o + #12)2 — (#21 + #03)2
+4#11(#3o + #12)(#21 + #03),

M7 = (3#21 - #03)(#30 + #12)l(#3o + #12)2 - 3(#21 + #03)2l

—(#30 — 3#12)(#21 + #03ll3(#03 + #21)2 - (#21 - #0102].

M1 through M6 are invariant under rotation and reﬂection. M7 is invariant only
in its absolute magnitude under a reﬂection. Scale invariance is achieved through the

following transform.

43
Mi =M1/n, Mé=M2/T4, MéZMg/Ts, M42M4/7‘6,

M; = M5/T12, Mé = Mfg/T8, M; = M7/T12,

where n is the number of object points and r is the radius of gyration of the
object:

7‘ = (#20 + 1102)”?-

Fig. 3.2 shows 3 database images ((a) and (b) are similar to each other but (c)
is different from (a) and (b)) and their respective distances based on the invariant

moments. Dm represents the moments-based dissimilarity value of the images. Note

that Dm(a, b) < Dm(a,c) and Dm(a, b) < Dm(b,c).

[ante NAIMII'U

   

(a) (b) (C)

Figure 3.2: Example of shape representation using invariant moments; (a)-(c) Show
3 database images; Dm(a, b) = 0.033, Dm(a, c) = 0.85, Dm(b, c) = 0.85.

3.2.3 Other Shape Features

We have also investigated several other simple shape features reported in the litera-

ture. We attempted to use the features based on the turning edge angles [46] on the

44

object boundary as the shape feature, based on the assumption that the object has
only one closed contour. The object boundary is uniformly sampled and approximated
by a polygon. The turning angle at each node represents the shape feature. When
two shapes are compared, the correlation between the two turning angle signatures
is calculated as the matching score. This measure is translation, scale and rotation
invariant. However, it requires the object to have one closed boundary and can not
be applied to complex images which are present in our database. Design marks are
usually stylistic and artistic and turning angle of subsampled image points are not
sufﬁcient to capture the ﬁne details in the image. We also tried to use the Principal
Component Analysis (PCA) method to extract the shape features which has been
used to organize and index a large image database [47]. However, this approach also
does not work for our trademark image database. The PCA method merely computes
the image distance in a reduced space (compared to the N 2-dimensional feature space,
where the image is of size N x N), and it does not effectively capture the shape char-
acteristics of our binary images. We feel that image correlation is not an efﬁcient way
to capture the perceptual similarity between images. Perceptually similar images as
Shown in Figures 1.10, 1.11, and 1.13 cannot be classiﬁed as belonging to the same

class with a simple image correlation method.

3.2.4 Integration of Image Attributes

Use of a Single image attribute for retrieval may lack sufﬁcient discriminatory infor-

mation and might not be able to support large variations in image orientation and

45

scale. In order to increase the accuracy of the retrievals, we need to integrate the
results obtained from the query based on individual shape features.

The edge direction—based matching takes into consideration the boundary of the
objects. It is thus a boundary based matching scheme. Invariant moments are deﬁned
over the object region. Thus, it is a region-based matching scheme. By integrating the
two schemes, we try to retrieve those images that resemble the query image in either
the boundary or its entirety. The output of a query on the basis of either the edge
directions or the invariant moments is a ranked set (based on the dissimilarity value)
of database images. An integrated rank of a retrieved image can be computed from
either the ranks of the retrieved image in the individual queries (query on the basis of
edge directions, or query on the basis of invariant moments) or the actual dissimilarity
value of the retrieved image in the individual queries. We have integrated the results
of the shape—based queries based on the rank of the retrieved images. This integrated
scheme performed worse than each of the individual feature-based queries. Using the
individual ranks may not be very effective since it does not use the dissimilarity value
between the query and the retrieved image. The rank of the retrieved images may
give a false sense of similarity when actually the dissimilarity value may be very high.
Hence, we need to look into an integration scheme that takes into account the actual
dissimilarity values between the images.

We have integrated the results of the shape-based retrievals by combining the
associated dissimilarity values. Let Q be a query image and I be a database image.
Let S.g be the dissimilarity index between Q and I on the basis of edge directions and

Sm be the dissimilarity index between Q and I on the basis of invariant moments.

46

We deﬁne an integrated dissimilarity index 5, between Q and I as,

_we*Se+wm*Sm
we+wm

St

 

. (3.3)

where w,3 and wm are the weights assigned to the edge direction-based Similarity and
the invariant moment-based similarity, respectively. A set of top ten images on the
basis of the total dissimilarity index St is presented as a result of the query. We have
used equal weights (wc = w, = 1) in our experiment. Another way of determining
the weights is on the basis of the accuracy of the individual feature-based queries.
In general, we found that the edge direction-based queries are more accurate than
moment-based queries and this fact can be used to assign a higher weight to the edge

direction—based similarity values.

One of the difﬁculties involved in integrating the two Shape—based retrievals is the
difference in the range of values that are present for these retrievals. In order to have
an efﬁcient and robust integration scheme, we normalize the dissimilarity values based
on the two shape-based schemes to be within the same range of [0,1]. We normalize
the retrieval of each query within the range of [0, 1] using the following equation,

(D,(i,j) — distmin)
(distmarr — distmin) ’

 

03M) =

(3.4)

where D, is the dissimilarity value between the ith query and the j th database image,
and distmin and distmaa: are the distance of the most similar and dissimilar image

in the database to the query image, respectively.

47
3.3 Deformable Template Matching

Both the edge direction histogram and the small number of invariant moments used
in the previous section are necessary but not sufﬁcient measures for Shape matching.
In other words, two dramatically different shapes can have very similar edge direc-
tion histograms and invariant moment features. It is observed that using the above
features, similar shapes are likely to be among the top retrievals; however, the top re-
trievals also contain some trademarks that seem to be perceptually very different. To
further reﬁne the retrievals to guarantee that only visually similar shapes are reported,
we use a more elaborate matching technique based on deformable templates [48, 49].
During the reﬁned matching stage, the edge map of the query trademark is com-
pared to the edge maps of the top N retrieved trademark images (referred to as test
trademarks). The edge map of the query trademark is used as a prototype template;
this template is deformed towards the edge map of the test trademark. The good—
ness of the matching is given by an energy function which describes the amount of

deformation and how the deformed template ﬁts the test edge map.

In the matching scheme deﬁned in Jain et a1. [48], the deformation model consists
of (i) a prototype template which describes the representative shape of a class of
objects, and (ii) a set of parametric transformations which deforms the template. The
query image is used as the prototype, which is deformed to match the test trademarks.

A ﬁtness measure is then calculated to quantify the quality of the match.

48
3.3.1 Representation of the Prototype Template

The prototype template consists of a set of points on the object contour, which is not
necessarily closed, and can consist of several connected components. In particular,
the edge map of the query trademark image is used (Fig. 3.3 (b)). Since the images
in the database are binary, the edge map is obtained as the set of foreground pixels

which have at least one neighboring background pixel.

 

(a) (b)

Figure 3.3: Constructing the query template for deformable template matching. (a)
A hand drawn query trademark, and (b) the query template obtained by calculating
the edge map of the image in (a).

3.3.2 Deformation Transformation

The prototype template describes only one of the possible (though most likely) in-
stances of the shape of interest. Therefore, it has to be deformed to match similar
trademarks. A deformation of the template is performed by introducing a displace-
ment ﬁeld in the 2D template image. Without a loss of generality, it is assumed

that the template is drawn on a unit square 8 = [0,1]2. The points in the square

49

are mapped by the function (:r, y) I——) (23,31) + (Df(a:,y),D~'/(:r,y)), where the dis—
placement functions Df(:I:, y) and Dy(a:, y) are continuous and satisfy the following
boundary conditions: D“(0, y) E D“(1,y) E Dy(r,0) E ’Dy(.r, 1) E 0. The space of

such displacement functions is spanned by the following orthogonal bases [50]:

efnn(:r, y) = (2 Sin(7rn:1:) cos(7rmy), 0)

93",,(33, y) = (0,2 cos(7rm:r) sin(7rny)), (3.5)

where m, n = 1, 2, . . .. Speciﬁcally, the displacement function is chosen as follows:

M N
a: fun ' e:rrnn + gnu ° eynn
7306.31): (13 (x,y),Dy(ar.y)) = z z 5 , 5 ,

m=l n=l

 

(3.6)

where Am” 2 onr2(n2 + m2), m,n = 1,2,... are the normalizing constants. The
parameters _£_ 2 {( fun,€,%’n,,),m,n = 1,2, . . .}, which are the projections of the dis-
placement function on the orthogonal basis, uniquely deﬁne the displacement ﬁeld,

and hence the deformation.

3.3.3 Dissimilarity Measure

The discrepancy of a trademark edge map to the query template is described by
a dissimilarity measure. This measure consists of two terms: the ﬁrst term takes
into account the amount of deformation: the larger the deformation, the more the
deformed template deviates from the prototype; the second term considers the ﬁdelity

of the deformed prototype in terms of the discrepancy between the deformed template

50

and the edge map of the test trademark. These two factors account for the difference

between the query and test trademarks.

Formally, the dissimilarity measure of a trademark Y and query template T is

deﬁned as:
M N
£(T’ Y) = gliend{7 Z 2( $1112 + Sit/mg) + 8(7;(_),£,d1 Y)}? (3'7)
31 I__I_ m=l n=1 —_

where 7;.e,§,d denotes the deformed template T with scale s, orientation 6, deforma-
tion g, and position 4; the ﬁrst term, which is the sum of squares of the deformation
parameters, measures the deviation of the deformed template from the prototype
template; the second term is the energy function that relates the deformed template

7:, 9 6 d to the edges in the test trademark image Y:

muggy. Y) = —1—- 2(1 + <1><m>Icos<I3<x.yI>I>. (3.8)

”r

where <I>(:t:, y) = — exp{—p(6§+6§)1/2} is deﬁned in terms of the displacements (6m, 6,)
of a pixel (:r, y) to its nearest edge pixel, [3(2, y) is the angle between the tangent of the
nearest edge and the tangent direction of the template at (:r, y), p is a smoothing factor
which controls the degree of smoothness of the potential ﬁeld, the summation is over
all the pixels on the deformed template, n7 is the number of pixels on the template,
and the constant 1 is added so that the potentials are positive and take values between
0 and 1. Intuitively, the ﬁrst term in Eq. (3.7) favors small deformations, and the

second term requires that the deformed template be in the proximity of and aligned

51

with the edge directions of the test image. The parameter 7 provides a relative
weighting of the two penalty measures; a larger value of 7 implies a lower variance of
the deformation parameters, and as a result, a more rigid template. This dissimilarity
is always nonnegative and it is zero if and only if the query template matches the
edge map of the test trademark image exactly.

The function E deﬁned in Eq. (3.7) involves minimizing an energy function w. r. t.
the pose and deformation parameters s,O,§_, and d. The optimization is approx-
imated by ﬁnding an initial guess for the pose parameters 3,9 and 4 using the
generalized Hough transform, and then performing a gradient descent search in the
parameter space.

The proposed deformable template-based matching scheme has been applied to
several objects in different scenes. The work presents a systematic approach to object
localization and retrieval using deformable templates. The scheme has been tested on
a number of cluttered images as well as images with objects present under different
scales and orientations. For a detailed analysis of the results of this work, interested

readers are requested to refer to [48].

3.4 Summary

We have presented our framework for a hierarchical system for image retrieval. The
retrieval consists of a two-stage process, the fast pruning stage for efﬁcient selection
of a small subset of plausible similar images, and a comprehensive matching strategy

baSed on deformable template analysis to discard false retrievals.

Chapter 4

Experimental Results

We have applied the hierarchical shape-based retrieval algorithm to a trademark

image database and a database of master-marks on Norwegian silver tankards.

o Trademark Database: In the case of trademark images, our goal is to present
the user with a subset of images that are most similar to the user. We conduct
experiments with rotated, scaled, and noisy versions of database images, and
with hand drawn sketches of trademarks present in the database. The system
integrates the retrievals based on the edge directions and invariant moments to
present to the second stage a small subset of images ranked in their similarity to
the query image. The second stage applies deformable templates to the pruned

set of images to further reﬁne the output.

0 Norwegian Silver Authentication: In the case of master-marks, the system uses
the integrated Shape-based match (edge directions and invariant moments) on
a database of stored master-marks. The query images are master-marks cap-

52

53

tured from the bottom of the silver tankards. These marks are very noisy and
sometimes the marks are faded due to the wear and tear of the tankard. The
master-mark images are processed to improve their quality and the user is pre-
sented with a subset of database images with a high similarity value to the
query image. For the retrieval results to be good, we expect that the actual
master-mark be retrieved, given the noisy query. Once, the master has been
identiﬁed, the input tankard can be matched in detail with the known tankards

from the same master for authentication.

4.1 Results on Trademark Images

In this section, we present experimental results on the accuracy of the pruning stage
and the need for integration of multiple shape features. We next present the retrieval
results of the entire system. In the end, we discuss experiments with human subjects

and possible heuristics to improve the retrieval results.

4.1.1 Experiments on Accuracy
In order to measure the accuracy of the retrievals of the pruning stage, we conducted
the following three experiments.

0 Rotated (R): Every image in the database was rotated arbitrarily and then

presented as the query image.

0 Scaled (3): Every image in the database was scaled and presented as the query

image.

54

o Noisy (N): A uniform i.i.d. additive noise model was used to change either
5% of the pixel values (regions for moments) or 5% of the edge orientations

(addition of noise to edges in the case of edge directions).

Tables 4.1 and 4.2 present the results of the retrieval using shape features based
on the edge direction histograms and the invariant moments. We notice that both
the individual shape features are not very eﬂective in retrieving rotated images. The
performance of these shape-based features improves for scaled and noisy images. The
results of the integrated query are presented in Table 4.3. We present the results with
the integration weights we 2 wm 2 1. Another possibility is to choose the weights
on the basis of the accuracy of the individual feature-based queries. For retrieving
scaled and noisy images, the topmost retrieved image is the correct result for each
of the presented query. In the case of rotated images, the query matches the correct
image within the top 20 positions in all but 17 of the 1, 100 cases. Figure 4.1 shows
the 17 database images that were not retrieved in our rotation experiments. Most
of these 17 images are line drawings and present very few image points for robust
calculation of the invariant moments. Moreover, the edge directions also cannot be
retrieved accurately for thin line drawings and small isolated blobs. We feel that both
the edge directions and invariant moments are not very robust shape measures for
line drawings. Integrating the two schemes reduces the number of false retrievals as
it is highly unlikely that a pair of perceptually different images is assigned a high

Similarity value in both the schemes.

55

 

 

 

 

 

Query n=1 n32 n35 n£20 Not
Nature (%) (‘76) (‘70) (‘70) Retrieved
(‘70)
R 73 79 86 94 6
S 100 100 100 100 0
N 92 95 96 100 0

 

 

 

 

 

 

Table 4.1: Edge directions-based retrieval results for the 1, 100 database images.

 

 

 

 

 

Query n=1 n32 n35 ng20 Not
Nature (%) (‘76) (‘70) (‘70) Retrieved
(‘70)
R 44 55 67 88 12
S 100 100 100 100 0
N 88 95 97 100 0

 

 

 

 

 

 

Table 4.2: Invariant moments-based retrieval results for the 1, 100 database images.

 

 

 

 

 

Query 7221 n_<_2 n_<_5 n$20 Not
Nature (‘70) (%) (‘70) (‘70) Retrieved
(‘70)
R 85 91 95 98.5 1.5
S 100 100 100 100 0
N 96 99 100 100 0

 

 

 

 

 

 

 

 

 

Table 4.3: Integrated shape—based retrieval results for the 1, 100 database images; n
refers to the position of the correct retrieval, R: rotated query, S: scaled query, N:
query with a noisy image; The last column indicates percentage of time the query
image was not retrieved in the top 20 matches.

56

—v ‘r
\

ll’l;

. LD 4348
um;

 

 

IVs-n: \.

stair-gag...

  
    

..o\ .

new”

In ... ..
. ... .. X

Figure 4.1: Database images not retrieved correctly in the presence of rotation.

 

57
4.1.2 Experiments on the Proposed System

The various stages involved in the image retrieval are mentioned below.

0 Preprocessing: The edge direction and invariant moment features are pre-
calculated and stored for all the images in the database. As a result, two

feature vectors are associated with each database image.

0 Query image: A query consists of a hand drawn image of a Shape. It can
be disconnected, or may contain holes. Fig. 4.2 shows ﬁve hand drawn query

trademarks used in the experiments.

0 Fast pruning: The query image is compared to the database images based on
the edge direction histogram and invariant moment shape features using the
integrated dissimilarity index Dt (Eq. (3.3)). Figures 4.3, 4.4, 4.5, 4.6, and 4.7
Show the top 20 retrieved images in the order of increasing dissimilarity for each
of the ﬁve hand drawn query images. Note that the correct database image that
is retrieved has the smallest dissimilarity value for queries involving bear, bull,
boomerang, and deer, and the second smallest dissimilarity value in the case
of kangaroo query. Although the query image is linearly compared to all the
images in the database, the pruning process is still reasonably fast. For each
query, this stage takes about 45 seconds on a Sun Sparc 20 workstation (for a

database containing 1, 100 images).

0 Comprehensive matching: Under the assumption that all plausible candidates

for a query image are contained in the top 10 retrievals in the fast pruning stage,

58

“it )
AI

Figure 4.2: Examples of hand drawn query trademarks.

   

we apply the deformable matching scheme on these candidates only to further
reﬁne the results. The initial pose parameters of the deformable template (po-
sition, scale, and orientation) are estimated using the generalized Hough trans—
form. Figures 4.8 and 4.9 illustrate the initial and ﬁnal conﬁgurations of the

deformable template match for the bull and boomerang trademarks.

59

 

Figure 4.3: Database pruning results for the hand drawn bear sketch shown in Fig. 4.2.
The top 20 retrievals are given in the increasing order of dissimilarity.

60

i»

“be”
0

&
a

 

 

A.

 

A

A

 

> t» 4 “C

I?

Figure 4.4: Database pruning results for the hand drawn bull sketch shown in Fig. 4.2.
The top 20 retrievals are given in the increasing order of dissimilarity.

 

61

 

Figure 4.5: Database pruning results for the hand drawn boomerang sketch shown in
Fig. 4.2. The top 20 retrievals are given in the increasing order of dissimilarity.

62

 

Figure 4.6: Database pruning results for the hand drawn deer sketch shown in Fig. 4.2.
The top 20 retrievals are given in the increasing order of dissimilarity.

 

 

? i
Mi
fl:

5

 

R0 M

 

Figure 4.7: Database pruning results for the hand drawn kangaroo sketch shown in
Fig. 4.2. The top 20 retrievals are given in the increasing order of dissimilarity.

 

pussies:

 

64

   

242

(a) (b)

Figure 4.8: Deformable template matching; (a) initial position of the bull template
overlaid on the edge map of a bull logo using the generalized Hough transform, (b)
ﬁnal match for the bull logo.

Table 4.4 presents the dissimilarity measures of the ﬁve hand drawn logos (Fig. 4.2)
to the top 10 retrieved images by the pruning stage. In four out of the ﬁve queries, the
simple integrated shape dissimilarity index ranks the correct logo in the ﬁrst place,
and in one case, the correct logo is ranked in the second place. The dissimilarity
score using the deformable matching ranks the desired images (underlined) in the ﬁrst
place for all the ﬁve queries. An incorrect match should result in a large dissimilarity
measure (Fig. 4.10) so that the matching hypothesis is rejected. It typically takes 5—8
seconds to calculate the initial conﬁguration using the generalized Hough transform.
The iterative deformable matching process takes about 6 seconds on a Sun Sparc 20

workstation.

65

 

(a) (b)

Figure 4.9: Deformable template matching; (a) initial position of the boomerang
template overlaid on the edge map of a boomerang logo using the generalized Hough
transform, (b) ﬁnal match for the boomerang logo.

 

Table 4.4: Dissimilarity values for the ﬁve query images when the deformable template
matching is applied to the top 10 retrieved images from the fast pruning stage.

4.1.3 Retrieval by Humans

It is instructive to compare the automated retrieval results with those obtained by
human subjects. For this purpose, we asked ﬁve human subjects to retrieve images
from the same database using the ﬁve queries. Fig. 4.11 shows the top nine retrievals
for the bull query by the human subjects. To ﬁnd the top ten retrievals for the ﬁve

query images, it took each subject between 1 to 2 hours. In comparing Figs. 4.4

66

 

Figure 4.10: Deformable template matching; ﬁnal match for a bull template and a
boomerang image.

and 4.11, we make the following observations. The top most retrieval for the bull
query and the human respondents is the same. In fact, the top most retrievals for
all the ﬁve queries are consistent for both the human respondents and the proposed
algorithm. This is expected since the hand drawn queries closely resemble one of
the database images. As there is a substantial amount of diversity in the image
database, only two to three (out of the ten) top retrievals are in common between
the outputs of the algorithm and human respondents for each query. We note that
for all the queries, the retrievals obtained by the ﬁve respondents are somewhat con-
sistent for the following reasons: (i) human subjects can easily decide the foreground
greyscale of the object, no matter whether it is 0 or 255, and (ii) human subjects
tend to abstract the query image for some conceptual information. As an example,
for the bull query, most human respondents retrieved all the trademark images in

the database which contain a bull head, even though the shape of the bull in the

67

retrieved images is quite different from the query shape. Although, we (humans) can
extract other images with Similar semantic content (like bull head in this example),
we lack the ability to match objects at different orientations. The proposed system
retrieves images invariant to orientation, and hence, picks up images that resemble a
triangle though at arbitrary orientations. These observations explain the difference
between the retrievals by human subject and the proposed algorithm. We believe it
is a challenge to develop an automatic retrieval algorithm which models the human
perception well. However, considering the time and effort for humans to perform
such a retrieval task, an automatic retrieval algorithm can provide a powerful tool to

facilitate the retrieval task for human beings.

4.1.4 Retrieval using Additional Heuristics

The performance of the proposed system does not approach that of humans. We
can see that simple shape features do not capture the semantic content of an image.
A concise theory on how to automatically extract the semantics of an image from
its content has not been developed so far. At best, with the current state-of-the-art
knowledge, we can propose certain heuristics that improve the retrieval accuracy of

the system.

Image Filling

Matching of trademarks is based more on the general global shape of the marks rather
than on ﬁne details. Referring to Figs. 4.4 and 4.11, we see that while querying for the

ﬁlled bull image, the system tends to retrieve images that were ﬁlled, rather than the

68

 

 

 

 

3 3 2

Figure 4.11: Nine topmost retrieved trademark images by the human subjects for
the bull template. The number below each retrieved image is the number of human
respondents (out of 5) who placed the image in the top 10 retrievals.

69

bull images made up of line drawings. This suggests that the object outline itself can
be used for extracting the semantic content in the images. It may thus seem reasonable
to extract the outline of objects within the images and extract edge direction— and
moment—based features on the outlines. We tried this heuristic but found that the
moments and the edge directions did not yield good results when applied to the few
outline points only. Thus, instead of using just the outline, we ﬁlled in the objects

and based our features on the ﬁlled-in images. Figures 4.12 and 4.13 present ﬁve bull

images in the database, and their corresponding ﬁlled-in images.

 

Figure 4.12: Five images of a bull present in the database.

We have conducted experiments to match the ﬁlled-in query images against the

ﬁlled-in images from the database. The aim of these experiments was to study the

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 4.13: Filled-in images for the ﬁve bull images shown in Figure 4.12.

71

effect of using only gross details in the images (ﬁlling in leads to a more general
representation of the images in the database) on the efﬁciency and accuracy of the
fast pruning stage. Figures 4.14, 4.16, 4.18, 4.20, 4.21, 4.22, 4.23, 4.24 Show retrieval
results for eight query images (a bull image, a bear image, a kangaroo image, a set
of bars, a bee image, a Sun image, a mark consisting of text (word mark), and a lion
image). It can be seen that in each of the case, the retrieval based on the ﬁlled-in
database extracts more images that have similar semantic content. In the case of the
query on a bull image (ﬁgure 4.14), the system extracts four other images of a bull
head other than the query one. These are retrieved in the second, third, sixth, and
fourteenth places. We note that the bull head retrieved in the third place was not
retrieved by the human subjects. Similarly, we see that with the bear and kangaroo
queries (ﬁgures 4.16 and 4.18), we retrieve two more images of a bear and a kangaroo,
respectively. AS a basis of comparison, ﬁgures 4.15, 4.17, and 4.19 show the retrieval
results when the bull, bear and kangaroo images from the database were retrieved
without ﬁlling the holes in the images.

Filling in the holes in the image improves the efﬁciency of a match making it
more robust, since the ﬁner details are neglected (in the case of edge directions, holes
contribute to the edge direction histograms and in the case of invariant moments
the moments are evaluated over larger number of points improving their robustness).
With the proposed ﬁlling in, we can now match images which are either just outlines
or are regions. A major drawback of the scheme though, is the disregard for the
information contained in the holes. Many a times, certain marks contain useful

information in their holes. Figure 4.25 Shows two database images where the holes

72

 

Figure 4.14: Database pruning results for a bull image (query image corresponds to
the top most retrieval). The top 20 retrievals are given in the increasing order of
dissimilarity.

73

 

Figure 4.15: Database pruning results for a bull image (query image corresponds to
the top most retrieval). The top 20 retrievals are given in the increasing order of
dissimilarity. Results are based on unﬁlled images.

74

 

Figure 4.16: Database pruning results for a bear image (query image corresponds to
the top most retrieval). The top 20 retrievals are given in the increasing order of
dissimilarity.

75

 

Figure 4.17: Database pruning results for a bear image (query image corresponds to
the top most retrieval). The top 20 retrievals are given in the increasing order of
dissimilarity. Results are based on unﬁlled images.

 

creeps
e as:
swg»

   
 

Figure 4.18: Database pruning results for a kangaroo image (query image corresponds
to the top most retrieval). The top 20 retrievals are given in the increasing order of

dissimilarity.

77

 

Figure 4.19: Database pruning results for a kangaroo image (query image corresponds
to the top most retrieval). The top 20 retrievals are given in the increasing order of
dissimilarity. Results are based on unﬁlled images.

78

 

IIIIIIIII II II [E i
it

1.1% i
D it s
r

   

 

 

1 ..

Figure 4.20: Database pruning results for an image consisting a set of vertical bars
(query image corresponds to the top most retrieval). The top 20 retrievals are given
in the increasing order of dissimilarity.

 

79

 

Figure 4.21: Database pruning results for a bee image (query image corresponds to
the top most retrieval). The t0p 20 retrievals are given in the increasing order of

dissimilarity.

80

 

Figure 4.22: Database pruning results for a Sun image (query image corresponds to
the top most retrieval). The top 20 retrievals are given in the increasing order of
dissimilarity.

81

W WWI W m N

W70 GIISK SMK A WHITE

@ﬁ. s<ll ﬁﬁﬁ‘j i +GF+

N55 I1: [I W \gg/ era-a
@

Figure 4.23: Database pruning results for a word mark (query image corresponds to
the top most retrieval). The top 20 retrievals are given in the increasing order of

dissimilarity.

82

 

Figure 4.24: Database pruning results for a lion image (query image corresponds to
the top most retrieval). The top 20 retrievals are given in the increasing order of
dissimilarity.

83

contain signiﬁcant information. Note that, for the sake of clarity, the holes are marked
in white where as the foreground for these trademarks is black. When ﬁlling in the
database images, these holes are neglected, leading to false matches (here matching is
based on squares). A human can very easily identify the foreground and background
in an image and extract various objects from the image. Our proposed system is
not able to do so automatically. We believe that if the system is presented with
segmented objects from images, then it can perform matching based on these objects.
At present, the system extracts images similar to the entire query image, rather than
on the basis of the objects present there in. Figures 4.26 and 4.27 present the retrieval
results based on queries containing images with signiﬁcant information in the holes.
In the case of ﬁgure 4.26 all images with a square border are retrieved, whereas in

ﬁgure 4.27 all images that have a circular border are retrieved.

 

Figure 4.25: Two images containing a square border; information contained in the
holes (white) is neglected when ﬁlling in the database.

It is extremely difﬁcult to quantify the results of the system in terms of the actual

recall or precision rates. The precision rate can be deﬁned as the percent of retrieved

84

 

Figure 4.26: Database pruning results for an image with a square border (query
image corresponds to the top most retrieval). The top 20 retrievals are given in the
increasing order of dissimilarity.

85

 

Figure 4.27: Database pruning results for an image with a circular border (query
image corresponds to the top most retrieval). The top 20 retrievals are given in the
increasing order of dissimilarity.

86

images similar to the query among the total number of retrieved images. A recall rate
is deﬁned as the percent of retrieved images which are similar to the query among the
total number of images similar to the query in the database. It can be easily seen that
both precision and recall rates are a function of the total number of retrieved images.
In order to have a high accuracy, the system needs to have both a high precision and
a high recall rate. A major drawback in quantifying the precision and recall rates
is to identify images that are similar to the query image. We have found that it is
extremely difﬁcult and time consuming for human subjects to identify all images in
the database that are similar to a query image. We have deﬁned similarity to be
perceptual similarity. Thus, we consider two images containing a bear as similar. We
compute the precision and recall rates using this deﬁnition of similarity. Table 4.5
presents the recall and precision rates for three queries as shown in ﬁgures 4.14, 4.16,
and 4.18. We have been able to identify 10 images of a bullhead, 4 images of a bear,
and 3 images of a kangaroo in the database. The precision and recall rates are based
on these values. A more detailed study is needed to have a better understanding of

the system performance in terms of the precision and recall rates.

 

 

 

 

Query n1 n2 Recall Rate Precision Rate
Bullhead 4 10 40% 20%

Bear 3 4 75% 20%
Kangaroo 3 3 100% 15%

 

 

 

 

 

 

 

Table 4.5: Recall and precision rates for three queries (shown in ﬁgures 4.14, 4.16,
and 4.18) based on the image ﬁlling; n1 refers to the number of images retrieved in
the top 20 positions that are similar to the query, n2 refers to the number of images
in the database that are similar to the query. Recall rate is (nl/ng), precision rate is

87

Object Segmentation

Another major issue in image retrieval is the issue of retrieval based on multiple parts
or components in the image. A trademark can consist of multiple objects and it may
be necessary to query the objects rather than the images. Such a scheme would
require a priori segmentation of objects from the database images. We have tried
various heuristics to segment object entities from an image. An object entity may
be deﬁned as a single component or multiple components that can denote a single
entity in the world. Extraction of object entities from binary images can be a very
difﬁcult problem. Quite often an object entity is made up of multiple components.
Many researchers are currently working on the problem of perceptual grouping for
object segmentation. We use a very simple heuristic to extract object entities from
images. The heuristic extracts single components in an edge image and combines
objects that are Similar in size. These objects then represent an object entity in the
image. Figures 4.28, 4.29, and 4.30 Show six database images and the extracted object
entities based on the heuristic described above. Note that the objects are ﬁlled to
extract the general shape and neglect the details. Holes are also extracted as objects

since the components are extracted from edge images.

The proposed heuristic extracts objects that humans perceive as a single entity,
although it may consist of multiple components. But, this heuristic fails in case of
objects having a similar size but constituting different entities (in Figure 4.30(i), the
[bird is mixed in the ﬁrst object entity). Spatial proximity may be applied to over-

come this problem. Another drawback of the scheme is that a large number of noisy

88

 

 

 

 

 

 

 

 

 

 

 

ii(b)

Figure 4.28: Segmentation of image into object entities; (a) a trademark image; (b)
segmented objects.

89

 

 

 

 

 

 

 

i(b)
ii(a)
"(b)

1.2

Figure 4.29: Segmentation of image into object entities; (a) a trademark image; (b)
segmented objects.

 

   

 

 

 

 

 

2(

m

A
Q

i(a)
. b)
O

ii(a)

 

 

a
Q

 

 

 

ii(b)

Figure 4.30: Segmentation of image into object entities; (a) a trademark image; (b)
segmented objects.

91

objects are also extracted from the images. Figure 4.30(ii) shows how components
representing a single entity can sometimes be divided into two objects. The threshold
parameters used to merge components would need to be tweaked for every image.
Thus, during archival of the database, this heuristic can be used to extract individual
entities from the images with the user tweaking the thresholds for each of the im-
ages (this needs to done once only during archival). During retrieval, the thresholds
may be tweaked to select individual entities from the query image. The retrieval can
then be based on these individual objects. As future work, we plan to develop this

framework for matching on the basis of objects rather than images.

4.2 A Case Study in Norwegian Silver

Authentication

Norwegian silverware from the middle of 1500, represents an important part of Nor-
way’s cultural heritage. Silver objects especially from the year 1568 until the be—
ginning of 1700 are highly prized because of their advanced craftsmanship. In this
period the Norwegian silversmiths work had attained an artistic quality that was
comparable with the best international piece of the time. These objects continue to
fetch enormous prices at major auction houses around the world. Museums, private
art collectors and antique dealers are the customers, and the price they are willing to
pay usually depends upon when the silver object was produced and who made it. To

date and attribute silver is, however, not a simple task. An expert on silver considers

92

a number of different criteria. The three most important are: (1) the marks which are
usually stamped on the silver objects, (2) the form of the object, and (3) the style of
decoration. Most silverware is stamped with marks, and the oldest types of marks are
the makers mark (also called master-mark), and a mark representing the city (city-
marks) of production. There are about 6, 000 different known master-marks and 250
city-marks for Norwegian silver, so a manual identiﬁcation of a mark takes a long
time. However, the identiﬁcation of a master is not always sufﬁcient to do a proper
dating of an object. A master-mark may have been used in more than one generation,
e.g., by the master’s son and grandson. In such situations, a stylistic evaluation of
the decoration of the object is needed. An expert usually studies the form and dec-
oration of the object and compares the style with the styles of similar silver objects.
This process is also very time consuming and requires a careful study of the form and
decoration of the object. This study is part of a project to develop a computer-based
system that can assist in dating and attributing of Norwegian silverware [51, 52].

Our project is divided in two major parts. The ﬁrst part is aimed at developing a
system to recognize the different types of marks which are stamped on the tankards.
The second part is the design of an interactive system which enables the user to
accomplish a proper investigation of the style of decoration, based on a comparison
with other objects guided by the expert. The major effort required here will be to
deve10p a graphical user interface and to structure the expert’s knowledge accordingly.
While the ﬁrst part will be mostly automatic, a high degree of user interaction will
be needed in the second part.

Our system will work on images taken by a Kontron — ProgReS 3012 digital color

93

camera, at a resolution of 3400 x 2320. The camera has a dynamic range of 12 bits.

4.2.1 Background

Stamping of silver guarantees the pureness of Sterling’s worth. As early as in 1314,
Norwegian king Haakon V appointed two persons to monitor that the silversmiths
followed the existing rules. From the year 1384, the makers were obliged to stamp
their own mark (master-mark) on the object, and in 1491 a city-mark with a crown

was made compulsory.

The master-mark is extremely important for authenticating and dating an object.
The same master can have many different marks, and some silversmiths applied these
different marks at the same time, while others used them in a chronological order.
The earliest of the master-marks are often trademarks, but usually they are composed
of a silversmith’s initials written, either side by Side, or more or less intertwined (see

-\
Fig. 4.32). Some marks are easy to read, while others are almost unreadable. The
shape of the frame enclosing master-marks can vary: some are rectangular, square,
oval, heart shaped, a cartouch, etc., often depending on the fashion of the style at
the time. If the silversmith’s name is made up of two syllables, like for instance
Hornemand (Horne - mand), he may well use the initial HM instead of H only. If
his ﬁrst name is Isach, he can let the ﬁrst line in the letter H represent the letter
I in Isach. There are numerous other possibilities. In addition to his initials, the

Silversmith sometimes added a date, that usually referred to the time he acquired

citizenship, and not to the year when the object was produced.

94

The city-mark is usually composed of the city’s initial with a crown. Two cities,
like for instance Skien and Stavanger, have the same initials, which makes it difﬁcult
to differentiate between them. In 1540 a law was passed in Copenhagen - Norway
was in a union with Denmark at that time - that the chairman of the guild should
personally stamp this particular mark. At that time, there were no organized gilds in
Norway, except for the ones that were a part of the Hanseatic League in Bergen. The
ﬁrst Norwegian guild was established in Bergen in 1568, and ﬁfty years later a guild
was established in Oslo. From that time, we ﬁnd a certain date written on the city-
marks which refers, most probably, to the time when a new chairman of the guild was
appointed. However, it seems obvious that the silversmiths themselves and not the
chairman stamped these marks, because they differ from Silversmith to Silversmith.
Some of the silversmiths in Vestfold and Telemark used a mark with the initials CL,

which refers to the Christiania Guild.

Besides the master-mark and the city-marks, several other marks were also applied
from 1740 onwards: the assay-mark, year-mark, month-mark, and the standard-mark.
Before 1719, when an ofﬁcial assayer was appointed, we rarely ﬁnd the hallmark on
the Silverwork. But, from this time on the mark of the assayer can be recognized in
known order. After 1781, the year-mark is separated from the city-marks, usually
limited to the two ﬁrst ciphers, although during the decade 1800-1810 the complete
date is given. The month-mark is in the form of a fraction until 1765 and after 1821,
but between these years the signs of the Zodiac are used. The standard-mark on

silver applied after 1840 is 13%L and is often used instead of the assay-mark [53].

95
4.2.2 Database Search for Master-marks and City-marks

There are about 6, 000 known master-marks and 250 city-marks. Given a query mark,
a manual search through this database is demanding and takes a very long time. An
image retrieval system can do this job much faster and more reliably [5]. Our goal is,
therefore, to develop an efﬁcient retrieval of images of master— and city-marks from
large databases based on the shape content of the query image. We will build an
automatic retrieval system to browse through the entire database. The ﬁrst step is to
extract primitive visual features from the images and to retrieve database images on
the basis of these features. So far, we have deveIOped two prototype image databases;
one with master-marks and another with city-marks both obtained from a catalog. We
have scanned about 300 marks using an HP Scanjet Ich ﬂatbed scanner. The eventual
goal is to build an image retrieval system which is insensitive to large variations in
image scale, rotation, translation and occlusion. Even if the query image differs from
its stored representation in the database in its orientation, position, or size, the image
retrieval system Should be able to correctly match the query image with its prototype
in the database. Figure 4.31 shows a block diagram of the recognition part of the

system.

The prototype databases are divided into a Renaissance subset and a Baroque
subset. The ﬁrst step is, therefore, to establish whether the form of the input image
(the unknown tankard) belongs to one of these two periods. The next step is the
preprocessing of the master- and city-mark input images. The images in the prototype

databases have been preprocessed to extract shape features which have been stored

96

Input Master Mark

 

 

“”91" Form Chosen Mastet Mark
Tankard IdentIfIcatIon Database IdentItIcatIon

 

 

 

 

 

 

 

Input City Mark Master

Integrator >
Identification

 

 

 

 

City Mark City Mark
Database Identification

 

 

 

 

Figure 4.31: A block diagram of Master identiﬁcation based on master and city-marks.

along with the images in the database. When a master-or a city-marks image of an
unknown tankard is presented to the system, it has to be preprocessed in the same
way to extract features similar to the features which are stored (as a feature vector)
in the databases. The query image is then matched with the feature vectors present
in the database. There is an integration module, where the system integrates the
retrieved results based on the master- and city-marks. The output is a ranked set of
retrieved images with high matching scores. If one of the retrieved images is similar
to the master-mark on the unknown tankard, the user accepts the automated search

and the identiﬁcation of the goldsmith comes up on the screen. If none is acceptable,

then a new search starts.

We use the integration of the edge direction histograms and the invariant moments
as the basis of the retrievals. Please refer to Chapter 3 and Section 4.1. A mark can
sometimes be worn down and almost invisible. Some image processing techniques are

needed to enhance and restore such images.

97

4.2.3 Preprocessing

Experiments were conducted on a master-mark database of 171 marks. These marks
were scanned from the books [54, 53] using a HP Scanjet Ich ﬂatbed scanner. F ig-

ure 4.32 shows 10 of the 171 master-marks stored in the database.

 

$1

   

Figure 4.32: A representative set of master-marks in the database.

Real master-marks present at the bottom of several silver tankards were digitized
using the Kontron-ProgRes3012 camera and presented as query images. Unlike the
database images, the master-marks captured from the tankards are in color and rather
noisy. In addition, due to the age of the tankards, many of these marks are faded.
Figure 4.33 shows a set of real master-marks extracted from the bottom of various
tankards.

We require a preprocessing stage to binarize and clean the noise in the master—
marks shown in Figure 4.33. The color images are ﬁrst converted into greyscale
images. The greyscale images are converted to binary images using a global thresh-
olding scheme. A second stage of preprocessing is then applied to the binary image

for noise cleaning. Noise in the binary images is usually in the form of speckles. These

98

 

Figure 4.33: A representative set of real master-marks extracted from the tankards.

speckles are ﬁltered out using a connected component analysis, where all components
of a size less than a threshold are removed. The ﬁltering allows for removal of both
white as well as black components. Figure 4.34 shows an input greyscale master-mark

with the corresponding thresholded binary and cleaned image.

 

Figure 4.34: Preprocessing an input master-mark; (a) Greyscale master-mark; (b)
Thresholded (binary) master-mark; (c) Binary master-mark after noise cleaning.

4.2.4 Experimental Results

The cleaned and binarized master—marks were presented as input queries to the
retrieval system. The system extracts the shape features for the query images

and matches these with those stored in the database. A set of top 10 matches

99

based on the integrated dissimilarity index are presented as the output. Fig-
ures 4.35, 4.36, 4.37, 4.38, 4.39, 4.40, and 4.41 present the retrieval results for seven
master-marks. The correct mark is retrieved at the lst, 2nd, 8th, 8th, 4th, 15th, and
lst places, respectively, for the seven query marks. These results demonstrate the
robustness of the matching scheme employed in the presence of noise, loss of data,

and variations in scale and orientation.

It) 0 0

 

e e as or;

Figure 4.35: Retrieval of master seals; (a) Input Master Mark; (b) Top 10 retrievals
based on the integrated match.

4.3 Discussion

Our initial experiments demonstrate the effectiveness of integration of simple shape

features for pruning large databases to a smaller set. The proposed pruning strategy

100

is robust under noisy (hand drawn trademark images and faded master-marks), ar~
bitrarily oriented (master-mark images) and scaled images (multiple trademarks in
database with different scales as well as scale difference in master—marks in database

with respect to real marks extracted from the bottom of the tankards).

4.3.1 Trademark Images

An efficient shape-based retrieval algorithm has been developed to retrieve trademark
images. Efﬁciency and accuracy of retrievals are achieved by designing a two-stage
hierarchical retrieval system: (i) a simple statistical feature-based process quickly
browses through a database for a moderate number of plausible retrievals; and (ii) a
deformable matching process screens the candidate set for the best matches. Prelim-
inary results on a trademark image database show that this is a promising technique
for content-based image database retrieval. The technique is robust under rotated,

scaled and noisy versions of the database images.

We note that image retrieval based on user provided information such as hand
drawn sketches is a challenging problem in multimedia applications. A human’s
perception of shape can be rather subjective and as a result, the representation of
the desired object tends to vary a lot. Our goal is to extract images which have
a similar semantic content to a query image. Figure 4.11 shows that semantically
similar images may actually be visually very different from each other. In order to
retrieve these images in the fast pruning stage, we need to somehow extract semantic

meaning from the images. We present a heuristic that extracts the outline of the

101

objects in an image and ﬁlls up the interior as a basis to generalize the shape of the
objects in the image. This scheme improves the retrieval results for the trademark
images. Another way to extract the semantic content is through a semi-automatic
(manual intervention needed in preprocessing) scheme using textual description of the
trademark images. Text-based search can then be incorporated prior to the pruning
stage. A more automatic and objective approach to preserving the semantic content
is to extract image components that represent different entities in the trademark
images, and to query on the basis of these component entities. We have used a Simple
heuristic that can extract individual object entities from an image. At present the
segmentation scheme has not been completely implemented and in its current state,
many noisy components are also represented as objects. The system does not have
the capability of matching occluded objects. We feel that trademark images represent
a very wide gamut of images, and incorporating partial matches may make the system

highly unstable.

4.3.2 Norwegian Silver

We have been encouraged by the results of our preliminary experiments to recognize
and match master-marks. However, our ultimate goal is to build a system which dates
and identiﬁes the craftsman even if the master-mark is not present. In addition, we
plan to develop a system which is usable by non-experts by incorporating the expertise
of the connoisseur into the system. In this way, we can preserve the expertise for future

generations. Since our system stores digital images, it can also be used to match the

102

input query with objects which reside in distant museums, provided that the scanned
images of these objects are available. The level of object detail captured with high-
resolution, digital cameras also allows a detailed examination of the object without
handling the object. Furthermore, the application of our system is not limited to
Norwegian silver. The comparison of artifacts by shape (and color) features is a
common element in the identiﬁcation and authentication of artistic objects. Our
system will be suitable for the identiﬁcation of other art objects such as ancient

pottery, woodwork, or china which have signiﬁcant identiﬁable features.

103

‘3

6‘3

I
s
-

.3

Q

is
W0
we
as

Figure 4.36: Retrieval of master seals; (a) nput Master Mark; (b) Top 20 retrievals
based on the integrated match.

104

 

 

SI w

fr" ‘5’ ‘53 (I? it}

(b)

Figure 4.37: Retrieval of master seals; (a) Input Master Mark; (b) Top 10 retrievals
based on the integrated match.

(a)
5 as

105

63‘

 

s e s a

e e 5 I5 5
r w e w .@

 

(p, Q m 09 ‘39

(b)

Figure 4.38: Retrieval of master seals; (a) Input Master Mark; (b) T0p 20 retrievals
based on the integrated match.

106

 

 

if?

 

5 s 5 5
5 5 as an «as
r E av; 55 e
as I5. .@ 5

(

Figure 4.39: Retrieval of master seals; (a) Input Master Mark; (b) T0p 20 retrievals
based on the integrated match.

107

 

 

 

@ 1'3 .'-.'_:.‘ #3 [36}

5 o it» e s

Figure 4.40: Retrieval of master seals; (a) Input Master Mark; (b) Top 20 retrievals
based on the integrated match.

108

9
e 5 s 2 a
e Iva-sis

m m it: [’3 W

(b

D—Iv

Figure 4.41: Retrieval of master seals; (a) nput Master Mark; (b) Top 20 retrievals

based on the integrated match.

Chapter 5

Conclusions

We have addressed the problem of content-based retrieval from large image databases.
'II'aditional text-based image database retrieval systems have many limitations and
content-based search may be required to augment the text-based search. Any image
database retrieval system must have a high speed and high accuracy of retrievals.
Since image databases may consist of thousands of images, matching between images
should be based on simple features that can be extracted efﬁciently and that can
capture the contents of an image. This chapter summarizes the results of the research

reported in this thesis. Several directions for future research are also outlined.

5.1 Summary

We have developed a hierarchical two-level feature extraction and matching method-
ology for image database retrieval for a trademarks database. The system takes as
input a query mark, that is either a scanned drawing, or a hand drawn Sketch. Using

109

110

a two-level hierarchy, the system uses simple and easily computable shape features
to prune the large dataset to a small subset of plausible matches. The query is then
matched with the pruned set of database images to eliminate false retrievals. A ﬁnal
subset of similar matches is then presented to the user.

For an efﬁcient and accurate retrieval, we require that the pruning stage be not
only fast, but also should not eliminate any images that are similar to the query.
In the case of trademark images, similarity is based on semantic information, i.e.,
two images that are perceptually Similar must be included in the pruning stage.
These requirements mean that the pruning stage be general in nature rather than
be speciﬁc in terms of its capability to handle images in different orientations, of
different sizes, and at different positions. The proposed pruning stage integrates the
results of edge direction histograms and invariant moments to select a plausible set of
similar images. The moment features are invariant to rigid transformations, where as
the edge direction histograms can be normalized and shifted to make them invariant
to rigid transformations. The retrieved images from the fast pruning stage are then
screened using a more elaborate, but costly deformable template matching process to
remove false retrievals.

We have tested the performance of the entire matching system on a database of
1,100 trademark images, and 171 master-marks. We have explored how to sensibly
combine some easily computable statistical shape features to yield fast and accurate
retrievals. We have demonstrated how a two-stage hierarchical retrieval system can
improve the results presented to the user. We have shown how images can be queried

on the basis of hand drawn Sketches and demonstrated that the system is robust

111

under noisy, oriented, and scaled data (identiﬁcation of faded, noisy, and possibly
scaled and oriented master-marks). We have developed a heuristic to extract the
outline of objects in an image and query on the basis of ﬁlled images to improve the
recall and precision rates at the pruning stage. We have also proposed a strategy
to query on the basis of individual objects rather than the entire image. We have
implemented a Simple segmentation scheme for extracting object entities from an

image.

5.2 Future Research

There are a number of research issues which need to be addressed in the future. The
system can be extended to match on the basis of segmented objects rather than the
entire image. A more robust and automatic segmentation algorithm would be required
to match on the basis of objects rather than the entire image. A further step could
be in the direction for allowing partial matching of objects based on local features.
Currently, features that we use in the fast pruning stage are global features. The edge
direction histograms and invariant moments are global statistical features and do not
maintain local information. Local information can be very useful in eliminating many
false matches. The system currently has no learning capabilities. It cannot improve
its performance with time. We feel that for any system to perform robustly under
various conditions, one needs to incorporate learning capabilities into the system.
A future extension may be in the direction of allowing the system to learn through

positive and negative examples of every query.

APPENDICES

Appendix A

A Representative Subset of

Trademark Images

m- 5 .o- a.

M

1:} '0- MI

I. 3: oooooo
eeeeeee
" :x::',
---.
{H vvvvvv r
01 \\\\\\\ N
case M
I 000.5 04:
:EEEEEII {‘l::::l’ :Illlll:

 

 

 

 

 

 

 

 

Figure A.1: Examples of trademarks present in the database.

112

6

 

Ill
@
C)

e) 3

* i ‘0’

¢ wk 6) 9 9

113

51*evemr

V m 0 TI! E9456 N
.. .. n {b

@ IIIIIIIII- f  . 5,4

¢n9¢$&¢

Examples of trademarks present in the database.

 

   

   

114

 

Examples of trademarks present in the database.

115

H
.j.

 

fie

 

«no—u scam
Examples of trademarks present in the database.

 

s- 9
.419 t“

max WHITE

9 0 5‘

 

a w; .1 Q, 55 u

116

It?“ tit-5 SMK &

rIS'IIIM

 

Cl-

 

\K

5 4
5% 0

Examples of trademarks present in the database.

A
Alllt

58
O

 

 

ﬂé

69

> * $

117

m % tauylu . E

ORB S

J B #35415 li‘ Kl HAG

® @ w a? A
A A a A A

Examples of trademarks present in the database.

 

KI»:

> > Z

9 - b a.
9

 

u
l

118

 

b

     

E? b e
g > p
22>

D3 e» l> I»

 

'3 It; {9 ”72}

Examples of trademarks present in the database.

9 E 3’ &

fl")

119

B) A 3%

5
g ﬂ) 6 G 0

"
C

£- B

CO? ('5

 

 

o 9
P §
4. .m
w ©
£ 3
c g.

a
Z
@
G
9

 

 

the database.

Examples of trademarks present in

120

 

6

gm

 

0 © 0

Q

 

3

ﬁg
5’ A‘ l7 (4' Z 63 6'

E"

 

F [Q gr
II ﬂ H

'é!

Examples of trademarks present in

31

333

abase.

the dat

Ill @TD 99

 

a GD #5
H h R
J. 0 0
W I I

Examples of trademarks present in the database.

121

@3 G

@D 0
Q} Ma
db 0
J A

Bibliography

[1] Stephen W. Smoliar and HongJiang Zhang. Content-based video indexing re-

trieval. IEEE Multimedia, Summer262—72, 1994.

[2] Borko Furht. Multimedia systems: An Overview. IEEE Multimedia, Springz47—

59, 1994.

[3] J. K. Wu, A. D. Narasimhalu, B. M. Mehtre, C. P. Lam, and Y. J. Gao. CORE:
A content-based retrieval engine for multimedia information systems. Multimedia

Systems, Springer-Verlag, 3:25-41, 1995.

[4] William I. Grosky. Multimedia information systems. IEEE Multimedia,

Spring212—23, 1994.

[5] C. Faloutsos, R. Barber, M. Flickner, J. Hafner, W. Niblack, D. Petkovic, and
W. Equitz. Efﬁcient and effective querying by image content. Journal of Intel-

ligent Information Systems, 3:231—262, 1994.

[6] A. Pentland, R. W. Picard, and S. Sclaroff. Photobook: Content-based manipu-
lation of image databases. SPIE Vol 2185: Storage and Retrieval for Image and
Video Databases II, pages 34—47, February 1994.

122

123
[7] C. P. Lam, J. K. Wu, and B. Mehtre. STAR - a system for trademark archival

and retrieval. In Proceedings of the 2nd Asian Conference on Computer Vision,

volume III, pages 214—217, December 1995, Singapore.

[8] A. K. Jain and A. Vailaya. Image retrieval using color and shape. In Proceedings
of the 2nd Asian Conference on Computer Vision, volume II, pages 529—533,

December 1995, Singapore.

[9] USPTO. Basic Facts About Registering a Trademark. US Patent and Trademark

Office, http: / / www.uspto. gov / web / trad_reg_info / basicfactshtml, 1995.

[10] USPTO. Trademark Manual of Examining Procedures. US Patent and Trade-

mark Ofﬁce, http://www.uspto.gov/web/info/ftp.html, 1995.

[11] T. Igarashi, editor. World Trademarks and Logotypes. Graphic-sha, Tokyo, 1983.

[12] T. Igarashi, editor. World Trademarks and Logotypes II .' A Collection of Inter-

national Symbols and their Applications. Graphic-sha, Tokyo, 1987.

[13] Collection of Trademarks and Logotypes in Japan. Graphic-sha, Tokyo, 1973.

[14] Theo Pavlidis. Algorithms for shape analysis for contours and waveforms. In
Proceedings of Fourth International Joint Conference on Pattern Recognition,

pages 70—86, Kyoto, Japan, November 7-10 1978.

[15] M. J. Swain and D. H. Ballard. Color indexing. International Journal of Com-

puter Vision, 7(1):11—32, 1991.

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

124

R. Schettini. Multicolored object recognition and location. Pattern Recognition

Letters, 15:1089—1097, November 1994.

B. M. Mehtre, M. S. Kankanhalli, A. D. Narsimhalu, and G. C. Man. Color
matching for image retrieval. Pattern Recognition Letters, 16:325—331, March

1995.

Larry S. Davis. Shape matching using relaxation techniques. IEEE Transactions

on Pattern Analysis and Machine Intelligence, PAMI-1(1):60—72, January 1979.

G. Cortelazzo, G. A. Mian, G. Vezzi, and P. Zamperoni. Trademark shapes de-
scription by string-matching techniques. Pattern Recognition, 27(8):1005—1018,

1994.

P. W. Huang and Y. R. Jean. Using 2D C+-strings as spatial knowledge rep-
resentation for image database systems. Pattern Recognition, 27(9):1249-1257,

1994.

D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge. Comparing im-
ages using the Hausdorff distance. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 15(9):850—863, September 1993.

D. J. Kahl, A. Rosenfeld, and A. Danker. Some experiments in point pattern
matching. IEEE Transactions on Systems, Man, and Cybernetics, 10(2):105—116,

February 1980.

J. K. Cheng and T. S. Huang. Image registration by matching relational struc-

tures. Pattern Recognition, 17(1):149—159, 1984.

[24]

[26]

[27]

[28]

[29]

[30]

[31]

125

S. P. Smith and A. K. Jain. Chord distribution for shape matching. Computer

Graphics and Image Processing, 20:259—271, 1982.

W. Richards and D. D. Hoffman. Codon representation on closed 2d shapes.

Computer Vision, Graphics, and Image Processing, (31):265—281, 1985.

C. T. Zahn and R. Z. Roskies. Fourier descriptors for plane closed curves. IEEE

Transactions on Computer, C-21(1):269—281, 1972.

S. A. Dudani, K. J. Breeding, and R. B. McGhee. Aircraft identiﬁcation by
moment invariants. IEEE Transactions on Computers, C-26(1):39-45, January

1977.

M. Tuceryan and A. K. Jain. Texture analysis. In C. H. Chen, L. F. Pau, and
P. S. P. Wang, editors, Handbook of Pattern Recognition and Computer Vision,

pages 235—276. World Scientiﬁc Publishing Company, 1993.

R. M. Haralick. Statistical and structural approaches to texture. In Proc. IEEE

67, pages 786—804, 1979.

M. Tuceryan and A. K. Jain. Texture segmentation using Voronoi polygons.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 12:211—216,

1990.

D. Blostein and N. Ahuja. Shape from texture: Integrating texture-element
extraction and surface estimation. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 11(12):1233—1251, December 1989.

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

126

J. Besag. Spatial interaction and the statistical analysis of lattice systems (with

discussion). Journal of Royal Statistical Society Ser.B, 36:192—236, 1974.

A. Pentland. Fractal-based description of natural scenes. IEEE Trans. on Pattern

Anal. Mach. Intell., 9:661—674, 1984.

J. Malik and P. Perona. Preattentive texture discrimination with early vision

mechanisms. Opt. Soc. Am. Series, A 7:923—932, 1990.

J. M. Coggins and A. K. Jain. A spatial filtering approach to texture ananlysis.

Pattern Recognition Letters, 3:195—203, 1985.

A. K. Jain and F. Farrokhnia. Unsupervised texture segmentation using Gabor

ﬁlters. Pattern Recognition, 24:1167—1186, 1991.

B. Holt and L. Hartwick. Visual image retrieval for applications in art and art
history. In SPIE Vol. 2185: Storage and Retrieval for Image and Video Databases

II, pages 70—81, February 1994.

Mikihiro Ioka. A method of deﬁning the similarity of images on the basis of color

information. Technical Report RT —0030, IBM Tokyo Research Lab, 1989.

J. Francos. Orthogonal decomposition of 2-d random ﬁelds and their applications
for 2-d spectral estimation. In Signal Processing and its Applications, pages 287—

327, North-Holland, 1993.

[40]

[41]

[42]

[43]

[44]

[45]

127

R. W. Picard and T. P. Minka. Vision texture for annotation. Technical Report
TR-302, MIT, 1994. In Multimedia Systems: Special Issue on Content-based

Retrieval 3:3-14,1995.

B. M. Mehtre and M. S. Kankanhalli. Content-based image retrieval using a com-
posite color-shape approach. Technical Report T R95-190-0, National University

of Singapore, August 1995.

B. M. Mehtre and M. S. Kankanhalli. Shape measures for content based image
retrieval: A comparison. Technical Report TR95-195—0, National University of

Singapore, September 1995.

J. Weng. SHOSLIF: A framework for sensor-based learning for high-dimensional
complex systems. In IEEE Workshop on Architectures for Semantic Modeling
and situation analysis in Large Complex Systems, August 27-29 1995. Invited

Paper.

T. P. Minka and R. W. Picard. Interactive learning using a ‘society of models’.

Technical Report TR-349, MIT, 1995.

J. Canny. A computational approach to edge detection. IEEE Transactions on

Pattern Analysis and Machine Intelligence, PAMI-8(6):679—698, November 1986.

[46] W. Niblack and J. Yin. A pseudo-Distance measure for 2D shapes based on

turning angle. Proc 2nd IEEE Int. Conf. on Image Processing {ICIP}, III:352—

355, 1995.

128

[47] Y. Cui, D. Swets, and J. Weng. Learning-based hand sign recognition using

[48]

[49]

[50]

[51]

[52]

SHOSLIF—M. Proc. 5th Int. Conf. on Computer Vision (ICCV), pages 631—636,

1995.

AK. Jain, Y. Zhong, and S. Lakshmanan. Object matching using deformable
templates. IEEE Trans. Pattern Anal. and Machine Intell., 18:267—279, March

1996.

A. Vailaya, Yu. Zhong, and A. K. Jain. A Hierarchical System for Efficient Image
Retrieval. Vienna, August 1996. Accepted: 13th International Conference on

Pattern Recognition.

Y. Amit, U. Grenander, and M. Piccioni. Structural image restoration through
deformable template. J. American Statistical Association, 86(414):376—387, June

1991.

Britt Kroepelien. Fra Stil til Algoritme. PhD thesis, University of Bergen, 1995.

B. Kroepelien, A. Vailaya, and A. K. Jain. A Case Study in Norwegian Silver
Authentication. Vienna, August 1996. Accepted: 13th International Conference

on Pattern Recognition.

[53] Jorunn Fossberg. Norske Soelvstempler. Universitetsforlaget, Oslo, Norway, 1994.

[54] Thw. Krohn-Hansen and Robert Kloster. Bergens Gullsmedkunst fra Laugsti-

den. Gullsmedlauget i Bergen 0g Vestlandske Kunstinddustrimuseum, Bergen,

Norway, 1957.