a

. a.)

x... ..

“HI..." c .

“Mlmg Nil)“. .
4.

km)».

 

 

«, I ‘
: .1433. ..
. v.1 ‘
.3?» i;
.— o
.

.
in." .uw.
- ~ 1..«
db. .gnﬁmnb L
trig};
.A. v.) El; 19‘!
. 2.1.: 4. .LI

Jwaumg

 

‘ I. .f91.4.vl.:v
1-3. . .1.
. $.14 .1
. . .r .r.

 

- 3311A
r53. .. 1.. 3n

 

 

 

h. :.1 ._. 1. A. n.

‘ , ‘ “swunmmmmﬁﬁm.y.nmﬂnna . ‘ ,
i

      

THF'TJS

 

This is to certify that the
thesis entitled

OBJECT SHAPE AND STRUCTURE FOR IMAGE
MATCHING AND RETRIEVAL

presented by

NAVEED SARFRAZ KHAN KHATI'AK

has been accepted towards fulfillment
of the requirements for the

Master of degree in Computer Science
Science

 

 

 

Major Professor’s Signature

/3 Decibel

Date

MSU is an Affinnative Action/Equal Opportunity Institution

 

 

 

LTBTiARY
Michigan State
University

 

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6/01 cJClRC/DateDuopss-pJS

 

OBJECT SHAPE AND STRUCTURE FOR IMAGE MATCHING
AND RETRIEVAL

By

Naveed Sarfraz Khan Khattak

A THESIS

Submitted to
Michigan State University
in partial fulfilment of the requirements
for the degree of

MASTER OF SCIENCE

Department of Computer Science

2002

ABSTRACT
OBJECT SHAPE AND STRUCTURE FOR IMAGE MATCHING AND
RETRIEVAL
By

N AVEED SARFRAZ KHAN KHATTAK

This thesis reports results on the use of shape features for content-based image
retrieval. Edges, lines, corners, and straight ribbons are extracted from a query image.
These features and their attributes, along with various relations are used to match
representations of other images in a database via a graph-matching algorithm. Images can
be matched to images or to hand-drawn sketches.

Experimental results on images with man made structures and aerial images are
promising. Images can be matched to images of hand drawn sketches. In addition to the
content-based retrieval, the matching method is also applicable to object recognition and
localization. The algorithm is also suitable for estimation of scene changes in a
continuous movie. Presently, the algorithm is developed in MATLAB. The current
program must be translated to ‘C/C++’ in order to run much faster and should be tested

on a much larger set of images.

Copyright by
NAVEED SARFRAZ KHAN KHATTAK
2002

ACKNOWLEDGMENTS

I would like to express my gratitude to all those who gave me the possibility to
complete this thesis. I want to thank National University of Science and Technology for

their ﬁnancial support and for encouraging me to complete my thesis.

I would also like to express my special thanks to my supervisor Dr. George
Stockman for his advice and guidance during this study and to my committee members

Dr Sakti Pramanik and Dr Jiajuo Qi.

I would like to give my Special thanks to my parents, wife and children whose patient

love enabled me to complete my thesis.

I am also thankful to all those who helped me to complete my thesis, especially

Irafan Asalam and Khurram Waheed.

iv

TABLE OF CONTENTS

 

 

 

 

 

 

List of Tables ......................................................................... ix
List of Figures ........................................................................ x
1 . INTRODUCTION ........................................................................................... 1
2. LITERATURE REVIEW ................................................................................ 6
2.1 Current CBIR Techniques- - - - - 6
2.1.1 Color Based Retrieval ........................................................................................ 6
2.1.2 Texture Based Retrieval .................................................................................... 7
2.1.3 Shape Based Retrieval ....................................................................................... 7
2.1.4 Retrieval by other Types of Primitive Feature ................................................ 8
2.2 Practical Applications of CBIR---- - _ - - 11
3. METHODOLOGY ........................................................................................ 12
3.1 Data Representation---- - - ------ -- -- - - 12
3.2 Image Matching Process ----------------------------- l4
4. FEATURE EXTRACTION MODULE ........................................................... 16
4.1 Edge Detection- - - - - - ---------- - 16
4.1.1 Canny Edge Detector ....................................................................................... 18
4.1.2 Experiment and Results with the Canny Edge Detector .............................. 19
4.2 Line Detection ..... - - - 25

4.2.1 Line Joining ...................................................................................................... 34

 

 

 

 

 

4.2.2 Line Detection Results ..................................................................................... 35
4.3 Corner Detection -- 38
4.3.1 Corner Detection Observations ....................................................................... 40
4.3.2 Corner Detection Results ................................................................................. 42
4.4 Ribbon Detection 43
4.4.1 Ribbon Detection .............................................................................................. 44
4.4.2 Type of Ribbons. ............................................................................................... 44
4.4.3 Ribbon Detection Results ................................................................................. 45
5. IMAGE REPRESENTATIONS .................................................................... 49
5.1 Basic Definitions 50
5.1.1 Ad jacency Matrix ............................................................................................. 51
5.2 Line Structure Representation.-- -- 54
5.2.1 Line Slope. ......................................................................................................... 55
5.2.2 Edge Gradient. .................................................................................................. 55
5.2.3 Distance between Line Segments. ................................................................... 57
5.2.4 Relative Line Orientation. ............................................................................... 57
5.2.5 Behavior of Line Descriptors and Relationships ........................................... 58
5.3 Corner Structure Representation. - 59
5.3.1 Angle of a Corner. ............................................................................................ 60

vi

 

 

 

 

 

5.3.3 Corner’s Orientation. . ..................................................................................... 61
5.3.4 Comer’s Precincts. .......................................................................................... 62
5.3.5 Angle between Corners .................................................................................... 64
5.3.6 Behavior of Corner Descriptors and Relationships ...................................... 65
5.4 Ribbon Structure Representation. 65
5.4.1 Width of a Ribbon. ........................................................................................... 67
5.4.2 Types of a Ribbon. . .......................................................................................... 67
5.4.3 Relative Distance between Ribbons. ............................................................... 68
5.4.4 Ribbon’s Orientation. ...................................................................................... 69
5.4.5 Behavior of Ribbon Descriptors and Relationships ...................................... 70
6. GRAPH MATCHING ALGORITHM ............................................................. 71
6.1 Graph Matching 71
6.2 Proposed Graph Matching Algorithm. 72
6.3 Line-Graph Matching. 73
6.3.1 Matching Algorithm ........................................................................................ 73
6.3.2 Matching Results for Line Segments .............................................................. 78
6.4 Corner-Graph Matching. 84
6.4.1 Matching Algorithm ........................................................................................ 84
6.4.2 Matching Results for Corners ......................................................................... 87
6.5 Ribbon-Graph Matching. 92

 

vii

6.5.1 Matching Algorithm ........................................................................................ 92

 

 

 

 

6.5.2 Matching Results for Ribbons ........................................................................ 95
6.6 Combined Results of Graph Matching Algorithm 100
7. DISCUSSION AND FUTURE WORK ........................................................ 114
7.1 Analysis - 114
7.2 Conclusion 117
7.3 Future Work - - -- 118
APPENDEX-A .................................................................................... 119
APPENDEX-B .................................................................................... 126
BIBLIOGRAPHY ................................................................................ 130

viii

LIST OF TABLES

 

 

 

 

 

 

 

 

 

Table 3-1 Features and their Parameters - - - - - 15
Table 4-1 Confidence Rating for corner detection of hand-drawn images ................ 43
Table 4-2 Confidence Rating for ribbon detection 46
Table 5-1 Line descriptors - - 57
Table 5-2 Behaviour of Line Descriptors and Relationships 59
Table 5-3 Decision table for arm’s orientation of a corner - -- 63
Table 5-4 Type of bounds corner is making ------------ -- ------ 63
Table 5-5 Behavior of Corner Descriptors and Relationships 65
Table 5-6 Behavior of Ribbon Descriptors and Relationships 70
Table 6-1 Different threshold used in Line Structure Graph Algorithms ................. 78
Table 6-2 Different threshold used in Corner Structure Graph Algorithms ............ 87
Table 6-3 Different thresholds used in Ribbon Structure Graph Algorithms ........... 92
Table 7-1 Results for exact matching - 115
Table 7-2 Results for similar matching - - - 116

 

Table 7-3 Graph sizes (number of vertices) for different data-structures and images
116

 

Table 7-4 Run-time for image retrieval 117

 

ix

LIST OF FIGURES

“Images in this thesis are presented in color”

 

 

 

Figure 1-1 Query image of Small Building and results of our retrieval system. ......... 4
Figure 3-1 Overall Methodology for Image Comparison 14
Figure 4-1 Process for feature extraction - 16
Figure 4-2 Edge image of different shapes with default parameter 0:1.0 ................ 19
Figure 4-3 Edge image of different shapes with $20 20

Figure 4-4 (a) Test Image with Gaussian noise of 0.01. (b)Edge image with o=l.0.. 21
Figure 4-5 (a) Test Image with Gaussian noise of 0.02. (b) Edge image with 0:1.0. 21

Figure 4-6 (a) Test Image with Gaussian noise of 0.05. (b) Edge image with 0:1.0. 22

 

Figure 4-7 Edge image with (:30 22

Figure 4-8 (a) Thick W with original scanner noise. (b) Edge image of Thick W with
$1.0 - 23

 

Figure 4-9 (a) Taj Mahal with original scanner noise. (b) Edge image of Taj Mahal
with 0 =1.0 23

 

Figure 4-10 (a) Small Building with original scanner noise. (b) Edge image with

 

 

 

o=l.0 24
Figure 4-11 The parameters (1 and 0 used in the equation 4.2 of a straight line ....... 27
Figure 4-12 Accumulator Array showing the peaks 29
Figure 4-13 Test image -- 29
Figure 4-14 Accumulator Array showing the merged peaks 32

 

Figure 4-15 (a) line detection before applying algorithm 4.1 and (b) line detection

 

 

 

 

 

after applying algorithm 4.1 33
Figure 4-16 (a) Vertical overlapping edge (b) Diagonal overlapping edge ................ 34
Figure 4-17 (a) and (b) Successful line detection 36
Figure 4-18 (a) and (b) Incomplete line detection - -- -- - -- 37
Figure 4-19 (a) Lincoln Memorial and (b) Big Taj, line length = 5 37
Figure 4-20 (a) Narobi 1 and (b) Small Taj, line length = 5 - 38
Figure 4-21 Test image with all the corners detected 40

 

Figure 4-22 Test Pattern with one corner missing (lower left corner of the lower
middle rectangle) - 41

 

Figure 4-23 Corner detection results for (3) Lincoln 25 corners and (b) Taj Mahal

 

 

 

 

 

 

 

 

 

 

60 corners 41
Figure 4-24 (a) is Type-1 ribbon and (b) is Type-2 ribbon 45
Figure 4-25 Thin W ribbon detection“ -- - 46
Figure 4-26 (a), (b) and (c) Shows the detected ribbons 47
Figure 4-27 Narobi 2 showing some successful ribbon detection - 48
Figure 4-28 Showing some successful ribbons of Narobi 2 48
Figure 5-1 Pictorial representation of a graph G(V,E). 52
Figure 5-2 Ad jacency Matrix of Figure 5-1 53
Figure 5-3 Ad jacency Matrix of Figure 5-1 53
Figure 5-4 Pictorial representation of line structure. 54
Figure 5-5 (a) Test image, (b) Lines extracted from (a) 56

 

xi

Figure 5-6 Line Orientation

 

Figure 5-7 Pictorial Representation of Corner Structure

 

Figure 5-8 Corners detected by the corner detection algorithm

 

Figure 5-9 Corner Orientation. -

 

Figure 5-10 Example of comer bounds

 

Figure 5-11 Pictorial Representation of Ribbon Structure

 

Figure 5-12 Ribbon Width - - - --

 

Figure 5-13 Distance between ribbon A and ribbon B

 

Figure 5-14 Ribbon Orientation- - - -- - -- - --------

 

 

Figure 6-1 Query image Drawn Taj and retrieved images

Figure 6-2 Query image Taj Mahal and retrieved images

 

Figure 6-3 Query image Small Building and retrieved images,”

 

Figure 6-4 Query image Spartan Village Apartments and retrieved images ............

Figure 6-5 Query image drawn Taj and retrieved images

 

 

Figure 6-6 Query image Taj Mahal and retrieved images -- -

 

Figure 6-7 Query image Small Building and retrieved images

Figure 6-8 Query image Spartan Village Apartments and retrieved images ............

 

Figure 6-9 Query image drawn Taj and retrieved images

 

Figure 6-10 Query image Taj Mahal and retrieved images

Figure 6-11 Query image Small Building and retrieved images

xii

 

58

59

-61

62

66

67

68

-69

80

81

82

-88

89

96

97

98

Figure 6-12 Query image Spartan Village Apartments and retrieved images .......... 99

 

 

 

Figure 6-13 Query image drawn Taj and retrieved images 101
Figure 6-14 Query image Taj Mahal and retrieved images 102
Figure 6-15 Query image Small Building and retrieved images - 103
Figure 6-16 Query image Spartan Village Apartments and retrieved images ........ 104

xiii

1. Introduction

An important problem in the field of computer vision is the automatic recognition of
objects in two-dimensional images for the classification of scenes. Another problem is
that of matching images based upon similarities between object features. Because of the
recent increase in the wide range of applications depending upon the resolution of such
problems, such as multimedia database searches and augmented reality, the solutions to
these problems should have a high degree of ﬂexibility and automation. Matching images
in a multimedia database offers an example of such a problem. The database may contain
hand-drawn images, images taken from cameras and photographs containing objects with
varying geometric properties, colors, and textures, as well as a wide range in image
resolution and size. The extraction and subsequent classification of object features must
be general enough to deal with the varied contents of the database and must ensure that
reliable matching takes place. At the same time, the system should strive for automation
so that user interaction is kept to a minimum. The shape of a single object and the various
spatial constraints among multiple objects in an image are examples of object
classiﬁcation. Shape and spatial constraints are important data in many applications,
ranging from complex space exploration and satellite information management to
medical research and entertainment. Image retrieval can be categorized into exact match
searching and similarity based searching. For either type of retrieval, the dynamic
aspects of image content require expensive computations and sophisticated
methodologies in the areas of image processing and database systems. In order to
overcome these problems, several schemes for data modeling and image representation

have been proposed [1,2]. In general, each of these schemes builds a symbolic image for

each given physical image, and symbolic images are then used in conjunction with index
structures as proxies for image comparisons to reduce the search Space. One of the
traditional indexing methods for image retrieval is text-based. Although text annotation
is a practical technique, this task is labor intensive, language dependent, vocabulary
controlled, human subjective in nature and also, cannot predict future use. In some cases,
it is rather difﬁcult to characterize certain important real world concepts, entities, and
attributes by means of text only. The shape of a single object and the various Spatial
constraints among multiple objects in an image are examples of such concepts for image

comparison. Another indexing method is content—based similarity.

Content-based image retrieval systems rely on similarity measures. The four major
classes of similarity measures are based upon color, texture, shape, and object and
relationship similarity [3]. Once a measure of similarity is determined, the corresponding
actual images are retrieved from the database. Due to the lack of any unified framework
for image representation, storage, and retrieval (see [4] for information on the emerging
MPEG-7 standard), these symbolic representation schemes and retrieval techniques have

greatly facilitated image management.

The Query By Image Content system commonly known as the QBIC system by IBM
is an example of recent work in content—based image retrieval that is based upon color
and texture similarity [5]. QBIC is able to perform database queries in terms of color
percentage matching, color histogram matching, color layout similarity and texture.
While a system such as this will perform well when the most important image feature is

color, it will suffer when object shape is the dominant feature. The Veggie Vision system

by IBM provides a good example of the performance that can be achieved when the four
similarity measures are combined in one system [6]. This system uses color, texture,
shape, and size histograms for the identification of produce. The success the Veggie
Vision system has achieved, confirms the value for the integration of shape Similarity

with that of color and texture.

This thesis will concentrate on extracting Shape structures with the intent of
developing better representations of scenes and better means of matching in the domain
of scenes with signiﬁcant man-made object content. With a view towards the recognition
of man-made objects in scenes, the natural features to use as the basis of the matching
algorithms are lines, corners, and ribbons. The retrieval process has been divided into
three sub processes; the feature extraction process, data representation, and graph
' matching algorithm. Feature extraction and data representation are ofﬂine processes

whereas graph matching is an online process.

The Canny edge detector and Hough Transform are used for feature extraction.
These features such as lines, comers and ribbons are then processed for extraction of
relative attributes, for example distance, angle, and orientation, for matching purposes.
All this information is then stored into a graph data structure separately for each image in
MAT ﬁle format used by MATLAB for storing data. A detailed discussion on these

processes may be found in Chapter 4 and 5.

Our image retrieval system requires a query image for the image retrieval process.
The query image is then processed for feature extraction and representation. The

extracted features are then compared with the features of all the images already stored in

3

the database and the images are displayed in descending order basing on percentage of
similarity. Figure 1-1 is one of the examples of image retrieval. In this example, a query
image of a Small Building was submitted and the program successfully retrieved all the

images of that Small Building taken from different views.

‘ . query img

    

Figure 1-1 Query image of Small Building and results of our
retrieval system.

The core work of this thesis is presented in Chapter 3 to Chapter 6. Chapter 3

provides an overview of the image retrieval system and explains the logical

decomposition of our system into three main modules: feature extraction module, features
representation module and graph matching module. I will give a detailed description of
the techniques used for edge, line, comer, and ribbon detection in Chapter 4, and
discussing the results of these detectors applied to both hand-drawn images and manmade
object. Chapter 5 and 6 are devoted to graph data structure development and the graph
matching algorithm. Chapter 7 summarizes the strength of the matching algorithm, as

well as its shortcoming by evaluating it in different worst-case scenarios.

2. Literature Review

2.1 Current CBIR Techniques

Content Based Image Retrieval ‘CBIR’ works on a principle of retrieving images
from a database by comparing features automatically extracted from the images
themselves. The commonest features used for CBIR are color, texture or shape. A typical
CBIR system allows users to create queries by submitting an example of the type of
image being sought, though some offer alternatives such as selection from a palette or
sketch input. The system then retrieves images from the database whose feature values
match those of the query most closely and displays thumbnails of these images on the
screen. Some of the more commonly used types of features used for image retrieval are
described below.

2.1.1 Color Based Retrieval

Color is one of the most widely used features in content-based image retrieval systems.
The color feature is relatively robust to background complications, image resolution and
orientation. Color histogram is most commonly used to extract color features. Several
other methods for color-based image retrieval systems have been described in the
literature, but most are variations on the same basic idea. Each image added to the
collection is analyzed to compute a color histogram, which Shows the proportion of pixels
of each color within the image. The color histogram for each image is then stored in the
database. At search time, the user can either specify the desired proportion of each color
(75% olive green and 25% red, for example), or submit an example image from which a

color histogram is calculated. Either way, the matching process then retrieves those

images whose color histograms match those of the query most closely. The matching
technique most commonly used, histogram intersection, was first developed by Swain
and Ballard [7]. Variants of this technique are now used in a high proportion of current
CBIR systems. Methods of improving on Swain and Ballard’s original technique include
the use of region—based color querying [8].

2.1.2 Texture Based Retrieval

“A variety of techniques have been used for measuring texture similarity; the best
techniques rely on comparing values of second-order statistics calculated from query and
stored images” [10]. Fundamentally, these calculate the relative brightness of selected
pairs of pixels from each image. From these Tamura et al[9], calculated measures of
image texture such as the degree of contrast, coarseness, directionality and regularity. Liu
et al[lO] calculated periodicity, directionality and randomness. Texture queries can be
made in a manner similar to color queries, by supplying an example query image. The
system then retrieves images with texture measures most Similar in value to the query.
Ma and Manjunath [11] introduced a new technique of texture thesaurus that retrieves
images based on similarity.

2.1.3 Shape Based Retrieval

Shape is an important feature in content-based image retrieval. However, shape is not a
well-deﬁned concept mathematically. Because of uncertainty in shape representations,
the retrieval system may work well only in certain environments. Shapes can be
represented either globally such as with aspect ratio, perimeters and moments, or locally
such as with boundary segments. An alternative method for shape matching is elastic

deformation of templates by Pentland et a1 [12]. Queries to shape retrieval systems are

7

made either by identifying an example image to act as the query, or as a user-drawn
sketch [l3].

Shape matching of three-dimensional objects is a more challenging task -
particularly where only a single 2-D view of the object in question is available. While no
general solution to this problem is known, some useful inroads have been made into the
problem of identifying at least some instances of a given object from different
viewpoints. One approach has been to build up a set of 3-D models from the available 2-
D image, and match them with other models in the database [14]. Another is to generate a
series of alternative 2-D views of each database object, each of which is matched with the

query image [15]

2.1.4 Retrieval by other Types of Primitive Feature

One of the oldest established means .of accessing pictorial data is retrieval by its
position within an image. Accessing data by spatial location is an essential aspect of
geographical information systems and efficient methods to achieve this have been around
for many years [16]. A similar technique was proposed by Gudivada and Raghavan [17]
to allow users to search for images containing objects in deﬁned spatial relationships with
each other.

Several other types of image feature have been proposed as a basis for CBIR. Most of
these depend on complex transformations of pixel intensities. These techniques aim to
extract features, which reflect some part of image similarity. The most well known
technique of this type uses the wavelet transform to model an image at several different
resolutions. Promising retrieval results have been reported by matching wavelet features

computed from query and stored images [18].

8

Retrieval by appearance is another method that gives interesting results. Two versions
of this method have been developed, one for whole-image matching and one for matching
selected parts of an image. The part-image technique involves ﬁltering the image with
Gaussian derivatives at multiple scales [l9], and then computing differential invariants;
the whole-image technique uses distributions of local curvature and phase [20].

The advantage of all these techniques is that they can describe an image at varying
levels of detail (useful in natural scenes where the objects of interest may appear in a
variety of appearances), and avoid the need to segment the image into regions of interest

before shape descriptors can be computed.

One early system was designed to interpret and retrieve line drawings of objects
within a narrow predeﬁned domain, such as floor plans for domestic buildings. The
system analysed object drawings, labelling each with a set of possible interpretations and
their probabilities. These were then used to derive likely interpretations of the scene
within which they appeared. GRIM_DBMS [21].

One system recently introduced by Qasim Iqbal and I. K Aggarawal [22], used higher
level and lower level vision methodology to retrieve manmade objects. Higher level
vision was used to extract features such as ‘L’ junctions, ‘U’ junctions and parallel
groups. Lower-level analysis is performed globally by using Gabor ﬁlters to extract
texture features. A manmade object region of interest extracted by using perceptual
grouping is used as a frame for conducting lower-level analysis, but the method of image
retrieval is weak. Instead of comparing color, texture and shape features, these features

are weighted and used for comparison.

Object recognition has also been an area of interest to the computer vision
community for many years. Techniques are now being developed for recognizing and
classifying objects with database retrieval in mind. David et a1 [23] have attracted
considerable publicity for themselves by developing a technique for recognizing naked
human beings within images, though their approach has been applied to a much wider
range of objects, including horses and trees. Haering et al [24] also developed a method
for identifying deciduous trees via their foliage. All such techniques are based on the idea
of developing a model of each class of object to be recognized, identifying image regions
which might contain examples of the object, and building up evidence to conﬁrm or rule
out the object’s presence. Evidence will typically include both features of the candidate
region itself (color, shape or texture) and contextual information such as its position and
the type of background in the image.

In contrast to these fully automatic methods is a family of techniques, which allow
systems to learn associations between semantic concepts and primitive features from user
feedback. The earliest such system was FourEyes from Minka [25]. This encourages the
user to explain selected regions of an image, and then proceeds to apply similar semantic
labels to areas with similar characteristics. The system is capable of improving its
performance with user feedback. Another approach is the concept of the semantic visual
template introduced by S F Chang et al [26]. Here, the user is asked to identify a possible
range of color, texture, and shape or motion parameters to express his or her query, which
is then reﬁned using relevance feedback techniques. When the user is satisﬁed, the query

is given a semantic label (such as “sunset”) and stored in a query database for later use.

10

2.2 Practical Applications of CBIR

A wide range of possible applications for CBIR technology has been identified e. g.
0 Crime prevention
0 The military
0 Intellectual property
0 Architectural and engineering design
0 Fashion and interior design
0 Journalism and advertising
0 Medical diagnosis
0 Geographical information and remote sensing systems
0 Cultural heritage
0 Education and training
0 Home entertainment
0 Web searching
0 Robotics

0 Industries

11

3. Methodology

This chapter focuses on the development of a methodology for feature extraction
using higher-level image analysis. Manmade objects like buildings generally have Sharp
edges and straight boundaries. The prominent characteristics of a building are the
apparent regularity and the relationship of its component features. An image containing
large manmade objects such as buildings, bridges and roads, will exhibit a large number
of significant edges, junctions and parallel lines, compared to an image with
predominantly non-manmade objects (such as scene of vegetation). These images are
generated by the presence of corners in the object, such as doors, windows, pillars, and
the boundaries of the buildings. These higher-level features exhibit apparent regularity
and relationship, and are strong evidence of structure in a scene. The presence of these
distinguishing features in an image can be utilized for image comparison and

subsequently for image retrieval.

3.1 Data Representation

To compare different images, the following higher-level features are extracted from
images:

0 Straight lines

o Ribbons (Parallel lines)

o Corners

Manmade objects are generally rigid; therefore, these representations adequately
deﬁne their shape descriptions. The extraction process is hierarchical. The first stage is

extraction of edges from the image, then extraction of straight lines, which tend to be

12

small fragments that are grouped to form longer lines. Lines are then used to find comers
and ribbons. A set of lines terminating at a common point is called a corner. The comer
is an important relation. According to the proximity rule of perceptual grouping, the
human visual system easily groups comers. In fact, it has been suggested that a major
function of eye movement is to determine comers and edges [27].

The straight-line segments are also used to extract parallel lines. A large number of
manmade objects contain parallel structures such as pillars, beams, doors and windows.
Human vision can rapidly identify parallel lines [28]. A parallel relation is a non-
accidental relationship that can be used to infer relationships. These parallel lines are
grouped together to ﬁnd parallel groups, which are then used to extract significant
parallel groups [29]. The rationale for the grouping relations described above is the
following.

o Spatially closed primitive structures are likely to be related and to reflect

meaningful structures.

0 Accidental image relations of natural objects may cause some primitive
structures. For example, line segments extracted from a cluster of tree leaves may
accidentally form a ribbon or comer. Such ribbons or comers tend to be randomly
and sparsely distributed and unlikely to form meaningful structures.

0 Manmade objects usually consist of regular structures.

13

3.2 Image Matching Process

Before explaining each step in detail, here, I would like to give an overall view of my

image retrieval methodology. I have divided my matching process into three main parts

namely:-
0 Feature Extraction Module
0 Features Representation Module, and
0 Graph Matching Module

/ Image A / / Image B /

I
C Feature Extraction Module ) C Feature Extraction Module )

 

 

 

 

 

 

 

 

 

 

C Features Representation Module > ' C Features Representation Module )

 

 

 

C Graph Matching Module )
/ Output Result /

Figure 3-1 Overall Methodology for Image Comparison

 

l4

Figure 3-1 shows the overall methodology for an image retrieval system. In Figure
3-1 the feature extraction module is actually a combination of a number of other
processes (i.e such as Canny edge detector, hough transform, line fit routine that joins
small line segments into longer lines, reordering the lines, comer detector and ribbon
detector). These processes are responsible for extracting higher-level features such as
lines, comers and ribbons from the given gray level images. After extraction of higher—
level features, the feature representation process starts and collects several other
parameters (Table 3-1) of these features before storing them into graph data structures.
When these parameters are calculated, then all these parameters are stored into respective
graph data structures. The graph-matching algorithm actually compares the query image

with a stored image from the database.

 

 

 

 

 

 

 

 

 

 

 

Features Parameters
Lines Slope of Edge Line Distance
Line Gradient Orientation between
Lines
Corners Angle of a Comer Comer Distance Angle
Comer Orientation Precincts between between
Corners Comers
Ribbons Ribbon Slope of Edge Type of Ribbon Distance
Width Line Gradient Ribbon Orientation between
Ribbons

 

 

Table 3-1 Features and their Parameters

15

4. Feature Extraction Module

Feature extraction is one of the most important steps in our image retrieval system as
all other steps depend on this process. This process is divided into sub processes in order
to analyse each step critically. Figure 4-1 illustrates the processing chain that each image

will ultimately pass through for feature extraction.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Comer \
T Detection _’
Edge Line Line Fit & Features
Detection h—T Detection r-H Line 30in T > Representation
' Process
Ribbon
I Detection a)

 

 

 

Figure 4-1 Process for feature extraction
Because of this processing sequence and the number of parameters at each stage, one
must be conscious of a natural dependency. Namely, no matter how robust the comer
and ribbon detectors are, they are dependent upon the line detector, which in turn is
dependent upon the edge detector. Thus, the number of parameters that inﬂuence the last
two detectors may be larger than desired. With this in mind, the edge and line detectors
must exhibit acceptable performance over a wide range of images, balanced with the need

for parameter optimisation and automation.

4.1 Edge Detection

The early stages of vision processing identify features in images that are relevant to

estimating the structure and properties of objects in a scene. Edges are one such feature.

16

Edges are significant local changes in the intensity image and are important features for
analyzing images. Edges typically occur on the boundary between two different regions
in the image. Edge detection is frequently the ﬁrst Step in recovering information from

images. Due to its importance, edge detection continues to be an active research area.

At its simplest, an edge is a sharp discontinuity in gray-level profile. However, the
situation is complicated by presence of noise and image resolution. An edge is Specified
by its magnitude and its direction. A number of linked edge points may be better
approximated by a linear segment called an edge] [30]. There are many types of edges;
they may be classified into three classes: step edge, roof edge and linear edge. A good
edge operator must have the following properties [30]; it must

0 Operate locally

0 Be efficient

0 Be sensitive to the orientation and magnitude of an edge

0 Work in the presence of noise

0 Be insensitive to threshold value

0 A further condition imposed by Canny [Canny 86] is that the operator must not

have multiple responses to a single edge.

To detect these different types of edges many edge operators have been suggested in
the computer vision literature. i,e Sobel [Prewitt 70], Roberts [Roberts, 65], Krish
[Krish, 71], Marr and Hildreth [ Marr, 80]. These edge operators possess some of the
above listed properties, but not all, and also are good for one type of edge and not good
for other types of edge. Canny in 1986 developed an edge operator, which extracted not

only step edges but also ridge and roof edges. Chen et al. [31] have evaluated many of

17

these algorithms for edge detection on brain images obtained from various sources such
as computer tomography (CT), magnetic resonance images (MRI), and positron emission
tomography (PET). Their results indicate that none of the edge detectors mentioned are
universally applicable. Since in our thesis ﬁrst step in the feature extraction sequence is
to extract edges from the grey scale image, the use of the Canny edge detector to satisfy
this step is justiﬁed as the Canny operator has a consistent response with single
smoothing parameter. The Canny edge detector used here is the Canny algorithm

described by Stockman and Shapiro [3].

4.1.1 Canny Edge Detector

The original gray scale image I[i, j] is smoothed with a Gaussian ﬁlter G[i, j, o],

where o is the spread of the Gaussian that controls the degree of smoothing.

S[i,j]= G[i,j , o] * I[i,j] 4.1

The gradient magnitude M[i, j] and direction 9[i, j] are then computed from the
smoothed image S[i, j]. The magnitude image is used to provide votes for possible edge
pixels. If the magnitude is higher than a given low threshold, then the pixel is classiﬁed
as a possible edge. Next, the gradient direction is used in a step called non-maximal
suppression. The purpose of this step is to thin a possible edge by suppressing any pixel
response that is not higher than the two neighbouring pixels on either side of it along the
direction of the gradient. Finally, the magnitude of each pixel in the image of possible

edges is compared against a given high threshold. If a pixel passes the test, it is classiﬁed

18

as an edge pixel, and the magnitudes of its neighbours are recursively tested against the

low threshold.

4.1.2 Experiment and Results with the Canny Edge Detector

The edge detection is the first stage of the segmentation chain; the spread
parameter 0‘ is perhaps the most important of all parameters to be encountered. For
example, Figure 4-2 shows the edge image of different shapes with default parameter of

0:10 and Figure 4—3 shows the edge image with 0:20

 

Figure 4-2 Edge image of different Shapes with

default parameter 6:1 .0

l9

 

Figure 4-3 Edge image of different shapes with 0:20

In Figure 4—3 comers are somewhat distorted when spread 0:2.0 was used. The
smoothing parameter 0 may also have to be adjusted in the presence of noise. Figure 4—4
to Figure 4-7 Show the results of adding Gaussian noise to the test image. Notice that the
adjustment of 0' yields the proper edges and these edges are still as ‘clean’ as those
detected from the noise free image using the same 0. The image with medium noise will
not suffer from using a small 0' because the edges are still ‘clean’. The image with high
noise will undoubtedly suffer in later stages of detection, due to the jagged edges and
noise artifacts. One should not be alarmed at such results, since the amount of noise that
has been added is much greater than what is expected in practice. A more reasonable
amount of noise can be seen in the original Thick W, Taj Mahal and Small Building

images (see Figure 4—8 to Figure 4-10). This noise was added during the scanning

20

process, and it will be shown that this noise does not affect the overall performance of the

detection system.

 

(a) . (b)
Figure 4—4 (a) Test Image with Gaussian noise of 0.01. (b)Edge image with 0:10

 

(a) (b)
Figure 4-5 (a) Test Image with Gaussian noise of 0.02. (b) Edge image with 6:1.0

21

 

(a) (b)
Figure 4-6 (a) Test Image with Gaussian noise of 0.05. (b) Edge image with O'=l.0

 

Figure 4-7 Edge image with 6:30

22

 

(a) (b)

Figure 4-8 (a) Thick W with original scanner noise. (b) Edge image of Thick W with
o=1.0

r‘.

II"! in}
Jill" i] [r1]. [bib

27—57:; ICLj: _:'

 

(a) (b)

Figure 4-9 (a) Taj Mahal with original scanner noise. (b) Edge image of Taj
Mahal with 0 =1.0

23

 

(a) (b)

Figure 4-10 (a) Small Building with original scanner noise. (b)
Edge image with 0:10

24

4.2 Line Detection

The Hough Transform (HT) is used in computer vision and pattern recognition for
detecting geometric shapes (in digital images) that can be deﬁned by parametric
equations. It is a mapping from the image plane to the parameter space, and essentially
consists of a voting process followed by a peak detection stage. The advantages of the
transform are its robustness to noise and discontinuities in the image, while the
disadvantages are its demand for a large amount of computing time and storage (although
with present advancement in computer memory and computing power it’s not a problem).
The time and Space requirements depend on the number of parameters required to
describe the pattern, the resolution of accumulator array, and the image resolution. A
straight line is the simplest of all geometric patterns and can be described with only two

parameters (see Equation 4.2).

Different aspects of the HT have been investigated and reported in the literature
[3,30,32,34]. The compute bound nature of the HT has inspired the development of
efﬁcient algorithms and implementations of the transform. Performance of the transform
with respect to accuracy of detection has been discussed in the literature. A formal

rnathematical deﬁnition of the transform, its properties and relationships appear in

[3 ,30,33,34].

Commonly used parameterizations of a straight line are slope-intercept and the
r1()I‘mal forms. When the HT is used for the detection of straight lines in an image, only
the parameters of the inﬁnite line are obtained. It does not provide any information

IVegarding the length, position and endpoints of the line segment in the image plane.

25

s—fj

III]

However, in computer vision application, the length and exact position of a line segment
in addition to its normal parameters, are required for locating the line segment. The
parameters, the length, and the coordinates of end points of a line segment constitute the
complete line segment description. Techniques to ﬁnd the complete line segment
description using the Hough Transform have not been thoroughly studied, although a few
algorithms are available in the literature [33]. The available methods are based on the
projection of the line on either the x or y axis in the image plane. I also used the

technique of least square error ﬁt to more precisely estimate the line parameters.

In order to detect and report on the line detection process, one must first decide on a
proper deﬁnition of a line. There are certain unavoidable problems that occur when
detecting lines in an image. These problems stem from the fact that, even though a digital
line is not the same as its mathematical counterpart, the mathematical deﬁnition is used
for detection purposes. In order to take advantage of both concepts of a line, or more
properly a line segment, I deﬁne a line to be a set of pixels that ﬁt a mathematical line

model. For the purposes of detection, the model used for a line is

d = xcos(6) + y sin(t9) 4.2

where d is the distance from the line to the origin and I9 is the angle between the vector

perpendicular to the line and the x-axis see Figure 4-11.

26

y 003(0) + y sin (0)

y-axis

 

 

. +
x-axrs

Figure 4-11 The parameters (1 and 0 used in the equation 4.2 of a straight line.

The line detection algorithm employed is a variation of the Hough Transform with the
O’Gorman and Clowes method for extracting straight lines. Refer to Chapter 10 of
Stockman and Shapiro [3] for a detailed description of this method. The idea behind the

Hough transform is to build evidence for the existence of lines using the model

d = c cos(t9) + r sin(6) 4.3

where r and c are the respective row and column location of a pixel and d and 6 are
the length of normal and angle of the normal with respect to the positive x-axis. The
angle 6 was computed using Sobel 5 x 5 gradient operator. The distance from a

potential line to the origin can thus be computed by d 2| c cos(t9) + r sin(t9) I, so that

OSde' 44

27

where d' is half the length of the main diagonal of the image. The distance d and angle
6’ are then quantized for each feature point in the image, i.e. initially d is assigned to the

nearest multiple of 3, and 6? is assigned to the nearest multiple of 10. A 36 x d?

accumulator array, V, is used to gather votes from pixels. For example, if a pixel has a

quantized angle 6 of 180° and a quantized distance d of 240, the value in V [18][80] is
incremented. An array of point lists is also kept to store the list of pixels that voted for a

particular line.

Once the accumulator array has been filled, it is searched for the line with the greatest
number of votes. Then, the set of pixels that voted for the line are processed to find
neighbors whose angles are within ten degrees of the given angle. If such a neighbor is
found, it is deleted from its point list, added to the current point list, and the value in its
accumulator array is decremented. This process is referred to as the Merge stage of the
Hough Transform. The process of choosing the greatest bin and merging is continued as
long as the size of the bin in the accumulator array is greater than a chosen length

parameter. This parameter dictates the length of the lines that can be detected.

28

4 1 3 0 2:9 6 7 4 4 0 0 0 o 0 0 5 4 4

3 0 1 15 31 3 0 3 4 £133 0 0 0 0 r I 5 3

25 6 l 0 o 7 0 0 1 44 0 0 8 4 2 o 0

0 53 4 0 [f1] 4 0 0 3 0 10 4 74 23 0 6 0 o

0 20 6 10 47 0 0 3 0 33 1 2 95 0 2 o 0 0

d; 0 0 0 6 0 0 0 0 2 255 54 4 3 2 2 0 o 0
0 o 0 0 0 0 0 0 0 0 37 0 70 13 0 0 0 o

0 0 0 0 0 0 o 0 0 0 0 5 71 0 0 0 0 0

0 o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

 

Quantized 0]-

Figure 4—12 Accumulator Array Showing the peaks

 

Figure 4-13 Test image

Figure 4-12 is a portion of accumulator array for Figure 4-13. The array size in Figure
4-12 is di * 0j where i=1 to maximum_distance / 3 and j=l to 36. The digits marked with
dark color shows the peaks in accumulator array, which are quite signiﬁcant and can be

detected easily, but if there is some diagonal line in the image plane that has some

29

irregular slope it will be hard to detect. The votes marked with light color actually
represent a diagonal line and we can observe from the accumulator array that votes are
spread from top to bottom in different peaks and it very hard to detect complete diagonal

line.

30

_—_ __ _— __ _ . ______ __ __-_ ._ _ _.__ ._.. _ . _.q

Merge the peak in accumulator array A
A is accumulator array
[x , y] = size ofA
for d = l to x
for 0 = l to y
if (A[d , 0]>threshold)
[i , j ] = Search_Highest_Peak( A, d , 0)
k = i;
while A(i,j) > 0
Merge A(k,j) and A(i+l,j);// also merge point list
A(i+1,j)=0;
i=i+1;
end while
while A(i,j) > 0
Merge A(k,j) and A(i-l,j);// also merge point list
A(i-l,j)=0;
i=i-l;
end while
end if
end for
end for

Search_Highest_Peak( A, k, i’ )
Temp = A[k, K]
i=k;
i=3:
while A[i,j]> 0
if A[i , j] >Temp
Temp: A[i,j];
end if
i=i+l;
end while
i=k;
while A[i,j]> 0
if A[i , j] >Temp
Temp: A[i,j];
end if
i=i—l;
end while
return k, i

 

 

Algorithm 4.1 Algorithm to merge peaks in the accumulator array

31

The algorithm that I designed in Algorithm 4.1 can detect diagonal lines and it’s a
modiﬁed version of the technique proposed by M. Atiquzzaman and M.W Akhtar [34].
The algorithm searches for the peak in accumulator array V(d, 0) from left to right and up
to down by incrementing 0]- and d; respectively. When it finds the first peak in the
accumulator array that is greater then the threshold, it calls a routine to search for the
highest peak in the d direction, searching from up to down and down to up and returns the
position of the highest peak. Now the calling program searches all the adjacent peaks up

and down and merges them into the highest peak making it more prominent.

 

 

 

1 0 0 9 7 7 8 o 0 0 0 0 0 o 0 7
o 0 26 0 o 0 0 o o 0 0 o 0 9 o
0 20 o 0 7 0 0 0 o 0 o o 5 o 0 0
98 o o :-' 4 0 0 3 0 o 10 o 31 0 14 0 o
0 o o 0 0 3 0 0 0 0 314 o 2 o 0 0
d.- o 0 6 0 0 0 2 Is 146 o o o 2 0 0 o
0 0 0 0 0 o o o 0 0 o 15 o 0 0 0
o o o 0 o 0 o 0 o 5 0 0 0 0 0 0
0 o 0 0 o 0 0 0 0 o o o o 0 0 0
+
Quantized0j

Figure 4-14 Accumulator Array showing the merged peaks

I run this algorithm on Figure 4-12, which is a portion of accumulator array of Figure
4-13 by setting threshold to 4. The resultant accumulator array, is shown in Figure 4-14.
The algorithm scans from left to right and up to down. The first peak in Figure 4-12 is 4
since it equals the threshold, therefore the second part of the algorithm will activated and

will search for the highest in that column from up to down. When the algorithm finds the

32

highest peak in the column, such as 292, it starts merging the columns until it finds a
zero. Now the peak in the first column becomes 299. Similarly, the rest of the smaller

peaks are merged into other larger peaks, which makes them quite prominent.

   

(b)
Figure 4-15 (a) line detection before applying algorithm 4.1 and (b) line detection
after applying algorithm 4.1

In Figure 4-15 (a) all the diagonal lines are not completely detected. Diagonal lines 7
and 8 should have been merged together but they were not, similarly lines 12 and 13
should have been merged. Horizontal lines are completely detected, whereas diagonal
lines are not completely detected. The reason for incomplete detection is the same as
peaks are spread in accumulator array. Line segments 7 and 8 in Figure 4-15 (3) are
merged into single line segment 1 in Figure 4-15 (b), similarly line segments 12 and 13 in

Figure 4-15 (a) are also merged into single line segment 7 in Figure 4-15 (b).

33

4.2.1 Line Joining

It is well known that HT may be confused in detecting lines when the image size is
more than 250x250. To be consistent in line detection, images greater than 192x192 are
required to be processed in smaller frames of 128x128 with the overlap of 8 pixels. The
overlap of 8 pixels is chosen so that information present at the intersection of the frames
should not be lost. To rejoin these smaller frames into one image, a precise joining

routine is required.

During the joining process one must keep certain things in mind, first vertical or
horizontal edges that may be in the overlap area Figure 4-16 (a) must be detected only
once and those diagonal lines Figure 4-16 (b), which continue from frame 1 to frame 2,

must be joined perfectly.

 

 

 

r-------«

Frame 2

 

 

   

Overlap Overlap

 

 

 

 

 

 

(a) (b)
Figure 4-16 (a) Vertical overlapping edge (b) Diagonal overlapping edge

Not only does this problem require the line-joining routine but the real images such as

Taj Mahal and Small Building may also require least square error fit and line-joining

34

routines to join any broken line segments detected by the Hough Transform. In practice,
the line segments obtained from the Hough Transform may be fragmented and must be
grouped together to obtain longer line segments. To accomplish this task, a set of closely
bunched and similarly oriented straight-line segments must be found and joined together.
I used least squares error ﬁtting to join those lines segments, which are close to each
other. Before joining these line segments, line segments are labeled in sequential order to
reduce the computational time, as we have to compare only neighboring lines and not all

the lines in the image.

K(L,-, L,) < 8., 4.2

K is function that calculates the least squares ﬁt and joins pair of lines [L,- , Lj] if least

square error is less then the threshold 5“, where i and j are line labels and i at j and j e [i -

5,i+5].

4.2.2 Line Detection Results

The results of the line detection procedure for all of the test images can be found in
Appendix 3. The line detection for every image in this appendix uses the default
parameter of 5 pixels for the line length, any line segments less than 5 pixels are taken as
noise. In this section, I will discuss the results on specific images that exemplify both the
positive and negative aspects of the line detector, and in particular, view how the line

length parameter inﬂuences the number of lines that are correctly detected.

35

The first images that I will investigate are hand—drawn images and test patterns. Since
these images are not as complicated as natural scenes, one should expect that the line
detector would perform with a high degree of accuracy. Figure 4-17 (a) (b) Shows the
results of line detection on two of the test images. I consider the line detection for these
images to be highly accurate. Every line that one may expect to be detected has been
detected. Figure 4-18 (a) (b) shows line detection results that are partly successful. In
Figure 4-18 (a), the lowermost and upper right horizontal lines are missed, and in Figure
4-18 (b), the lowermost horizontal line is split into two lines and the lines in the minarets
are also not joined. The reason that these potential segments escaped detection is because

their individual bins did not receive enough votes to warrant line ﬁtting.

 

*1 /1\
’ i ‘1.
l
-..» . . \
.\ , j /1 \
\‘1 "i. K ‘1 ”I I ’ l / '\.
.7“ \\ If X. x; f , l ‘ \\ \\
1 x [I] “x / 1’ L . i / ‘\\\
‘k k ‘ 1 K1, I, ‘ ’ X
1k L\ 1‘ / ,0" a
K \\ / \ If ,a’/ //1. 'P‘\ ,/
i‘» \. f \ \/ " ~.\ I j 1 -
\k j x“ .r' \ l
\\ ‘.\ f) . | _.1 i \\
‘1 '__— l ;
_._. I L J 5_ I ,“g ___, ' l
(a) (b)

Figure 4-17 (a) and (b) Successful line detection

36

I i
IL . «'I
L‘ II‘L‘ I /
‘1. 1. t j
\ Ia "ff ‘/ IJ/ u.\
." l
H \ ,f 1‘ l l K ’ I l
t / . ‘
\ ‘ / / l ‘1 IR 7 ' ' E
,2 a 1 f ‘ . i
I \ {I i ‘ j ‘1 1 \1 L ,
L ' x z 1 \. i i
1 1 / f 1 1 l _. l 1 '1
1L / f I j l l l l l 1
L. ‘V / r/ l 1 I} l 1
1“ / g 1
K I

 

(a) (b)
Figure 4-18 (a) and (b) Incomplete line detection

As with the test images, the set of scene images exhibit both successes and failures with
respect to line detection. Some of the positive line extraction examples can be seen in
Figure 4-19 (a) and (b). In particular, the vertical lines formed by the columns in the

Figure 4-19 (a) and the broken lines of the Figure 4—19 (b) have been successfully
identiﬁed.

 

 

2‘
-_-- LL - / x
I l1 l/ 1115.
1‘17. - . _.__‘ --\ \
r—~—\ I.” a. in. l"\; “V \l —. _.S _._L ET] , I
, 1111 2!,1’1 : ill I, 2 i [j ‘* 1 \ lIllI ';
1 '11 ‘3" l1 ’1“1l|l 1* 1111“,. 11111“ .
l‘ l a l/ j \ j . j l l

I \’I\ I I1 l I) III I l ll 1, , I l \ /1 II l
17 x . 1 le . I\‘\_ I ll” K

:—__.——— - - 1 1.. 111 111——

T—T/ m4 \ '_*'_ __I__V_ *- Jztigvfs 3" 5M;
; _J __ .:_ ,3 - k __L_ w“

(a) (b)

Figure 4-19 (a) Lincoln Memorial and (b) Big Taj, line length = 5

37

The line extraction results from Figure 4-20 (a) and Figure 4-20 (b) shows some
successful and unsuccessful line detection results. While many lines are correctly
identiﬁed in Figure 4-20 (a), some of the most important lines, namely those formed by
the crossbeams, are not detected. This can be attributed to the soft shadows present
beneath the crossbeams. The dismal lack of line detection in Figure 4-20 (h) is due to the

size of the image, only 150 x 150 pixels, and the distance of the structure from the

 

 

VICWCI.
“\ n
,f .' \k. ‘\, ‘1 \\ \ .f' / \
j '. T4 \\ I \L \A " X” f/ N \
I'I /’ /’H \ \‘ I‘ / 5 i K \ I
e/ I ‘I \ T / "I H ‘ I‘s
.- 11 ,4 \- A . . . ., .1 _-_
/’_f _. I‘\w / ___\\ ‘1_/-‘\~'\_.’ ' [ﬂ/ \ ‘ K \ I .’ 11‘
' 5 r
_ — s. —~ I 1 r.
’ " " L, M. __ 7 . 1 l
/ / \\ /’ 1- . I I1 : i7 \
/ \Jj .‘ N / \ \ \_ ‘ x .1 L f l
(a: I\ j I _ P l f l l l
1 1 TI I ‘1 \J 1 1 i
1 1 1 g . . , ‘ .\
l g I g ,, .d H».- ‘1,
1 I = I‘ ka “L 43.
i l ”1“ l _ ._.
_.__ "1'1. ‘1n\1. a -, 7 __ __ L
_ a”
(a) (b)

Figure 4—20 (a) Narobi 1 and (b) Small Taj, line length = 5

4.3 Comer Detection

A comer is defined to be the point of intersection of two distinct proximal line
segments making a shape of “L”. Here we will not consider corners (“_L”). The attributes
of a comer consist of an angle and a location. Since lines are fit in a non-quantized sense,
it is possible to define the location of a comer with sub-pixel accuracy. In order to detect

comers, the line lists found in the line detection stage must be processed.

38

K(L, , L,) < 6,, 4.3

K is function that finds the comer between pair of lines [Li , L,-] if the threshold is less

than 5,, , where i andj are line labels andi ¢j andj E [i — 10, i + 10].

m = (y - yj) / (x — x1) is the slope of the line L.- with two end points (y , x)(y1 ,xj).
First, the slope of each line L,~ is computed with equation 4.4 and then compared against
the slope of Lj (where i at j and j e [i + 10,1 — 10]). If the Slopes are commensurate, the
point of intersection is computed. Because the points of interest occur as the intersection
of line segments rather than lines, the computed intersection point must be compared
against the initial and terminal points of each line segment. If the point is within a certain
distance, the corner threshold, of either the initial or the terminal point of both segments,
or if the point lies on both segments, the pair is classified as a comer. The default
distance for the comer threshold is ten pixels. Essentially, the comer threshold is the
distance that each line segment is extended in both directions before testing for

intersections.
After detecting the corner point, the measure of the comer angle is computed.
AngleLiLj = tan"I ((mj-mi)/(l+m,*mj)) 4.4

where m is slope of L,- and m is slope of Lj, and i at j

39

4.3.1 Corner Detection Observations

Since corner detection is dependent upon both the edge detection and line
detection, there are three parameters that affect its performance: the Gaussian spread, the
line length tolerance, and the corner threshold. Because the number of parameters is
three, the reasons for incorrect corner detection may be difﬁcult to pinpoint, and it may
be difﬁcult to ﬁnd the best conﬁguration of these parameters for a particular image.
Upon referring to Appendix 4, the reader can verify that, for the majority of the
examples, both false positives and false negatives (non-detection of corners) can be
attributed to faulty line detection. A few exceptions to this, as well as some of the more

successful attempts at corner detection can be seen in Figure 4-21 to Figure 4-23.

Figure 4-21 Test image with all the corners detected

40

 

Figure 4-22 Test Pattern with one corner missing (lower

left comer of the lower middle rectangle)

 

(a) (b)

Figure 4-23 Comer detection results for (3) Lincoln, 25 corners and (b)
Taj Mahal, 60 comers

In Figure 4-21 all of the possible corners are detected but in Figure 4-22 one corner at

the left bottom comer of the lower middle rectangle is missing and it is due to the comer

threshold. In Figure 4-23 (a), the Lincoln memorial scene displays very strong comer
41

evidence in the upper left and upper right hand sides of the building. While this is
encouraging, there are also comers that are incorrectly detected due to the edges of the
columns extending into the strong line at the bottom of the building. This can be directly
attributed to the default comer threshold. However, if the threshold were to be lowered,
the strong corners at the top of monument may not be detected. The comer detection for
the Figure 4-23 (b) Big Taj scene also displays mixed results. Both the points of
intersection on the minarets and those on the windows are promising, but all of the
window corners are not successfully detected, and there are many false positives amongst

the crowd of people.

4.3.2 Corner Detection Results

The general observations that were made in the previous section provide evidence
as to the strengths and weaknesses of the line-driven comer detector. In this section, the
results of the comer detector are quantified so that an objective report of its accuracy can
be made. The accuracy of the detector is determined by the conﬁdence ratio, which is

deﬁned by

R = (C-F)/E 4.6

Where C is the number of corners correctly detected, F is the number of false
alarms, and E is the number of corners expected. Clearly, the number of comers that one
“expects” to find in an image is not well defined. While some might deﬁne a comer to be
a point with high curvature, others, including myself, consider a corner to be the potential

point of intersection between two line segments. It will be impossible to make an

42

objective decision about expected comers in scene images. Therefore, only confidence
ratings for test and hand drawn images will be reported. The comer detection results for
these images are found in Table 4-1. The results are reported with the default values 0 =

1.0, line length = 5, and comer threshold = 10.

 

 

 

 

 

 

 

 

 

 

Image Detected False Expected Conﬁdence
Corners Alarms Comers Rating
Parallelogram l6 0 16 100%
Test Image 35 4 32 97%
Thick W 9 1 12 67%
Thin W 6 0 6 100%
Line Taj 16 0 25 64%

 

 

 

Table 4-1 Confidence Rating for comer detection of hand-drawn images

4.4 Ribbon Detection

Stockman and Shapiro define a ribbon as “an elongated region that is approximately
symmetrical about its major axis” [3]. This definition encompasses many varied objects,
including picture frames; bottles, columns, and any 2-dimensional object that posseses
symmetry. When the images of such objects are projected onto a plane, the symmetry is
translated into curves with reﬂective symmetry about an axis. When an actual cylindrical
object is projected onto a plane, its symmetry is translated into parallel lines. The ribbon
detection algorithm will only identify those ribbons that are the result of projected
cylinders and rectangular prisms, namely straight ribbons. In this section, I will describe
the algorithm for straight ribbon detection and explore the advantages and potential

difﬁculties of ribbon detection.

43

4.4.1 Ribbon Deﬁnition

For the purpose of identifying straight ribbons, a more practical definition of a ribbon
must be given. To this end, I deﬁne a straight ribbon to be two lines whose edge
gradients differ by l80°ilO°, have approximately the same length, approximately the
same slopes and the width should be S 1.5 times the line length. To be more precise,
given two lines, L1 and L2, they form a ribbon if and only if the following conditions are

satisﬁed:

0 1700 S | 61 - 62 | 2 1900 where 191 and 62 are the respective average edge gradient

of L1 and L2.
l(L.) . . . .
o l(L, ) 2 J 2 for IS I, j S 2 where l(L) rs the length of line segment L.

6 The projected line Ll should overlap L2 and vise versa.

A detected ribbon then inherits the attributes of each of its lines, as well as a width,
length, and one of two possible types. The length of the ribbon is deﬁned to be the
average of the lengths of the two lines. The width of the ribbon is the distance between

the lines.

4.4.2 Type of Ribbons.

Ribbons are divided into two different types: Type-l and Type-2. If the edge gradient
9, at line L1 is greater than the edge gradient 62 at line L2, the ribbon is Type-l, otherwise

it is Type 2. Essentially, a Type-l ribbon is the one in which the gradient vectors of the

lines pointing away from each other, and a Type—2 ribbon is one with gradient vectors

pointing towards each other. Figure 4-24 gives examples of the two different types.

 

 

 

Figure 4-24 (a) is Type-1 ribbon and (b) is Type—2 ribbon (Ll must
be left of L2)

4.4.3 Ribbon Detection Results

The confidence rating that was deﬁned for comer detection can also be used to
determine the general accuracy of the ribbon detection algorithm. As with comer
detection, the concept of an expected ribbon is not well defined, and the determination of
such ribbons in images is fuzzy at best. For this reason, confidence ratings will only be
reported for the class of hand-drawn images and images that have some obvious ribbons

(see Table 4-2).

45

 

 

 

 

 

 

 

 

 

 

 

 

 

Image Detected False Expected Conﬁdence
Ribbons Alarms Ribbons Rating
Test Pattern 8 0 8 100%
Thick V 2 O 2 100%
Thick W 4 0 4 100%
Thin W 4 O 4 100%
Line Taj 9 0 9 100%
Taj Mahal 24 6 26 69%
Lincoln scene 44 29 24 62%
Narobi 2 21 5 26 61%

 

 

 

Table 4-2 Conﬁdence Rating for ribbon detection

Appendix 5 contains the results of ribbon detection for each of the test images.
Images that yielded perfect ribbon detections are Test Pattern, Thick V, Thick W, Thin W
and Line Taj. In the Thin W image, one would expect the detection of four distinct

ribbons and all the four ribbons have been detected successfully as shown in in Figure

4.25.

W W WW

Ribbon detection for the remaining hand-drawn images performs as one might
expect: if the corresponding lines are detected, then a ribbon is detected. Non-detection

only occurs when there is incomplete line evidence.

46

Figure 4-25 Thin W ribbon detection

The results of applying the ribbon detector to the set of scene images show varying
levels of success. While most of the obvious ribbons are detected, some surprising
detections are made as well. Figure 4-26 displays some of the more promising detection
results. In Figure 4-26 (a) Narobi l, the strong type 1 ribbon for is clearly detected, as
are the minarets in Figure 4—26 (b) Taj Mahal. The ribbons for all the minarets in Figure
4-26 (h) Taj Mahal have been detected successﬁrlly but the main door and some of the
window’s ribbons are not detected. The most encouraging example can be found in the

Figure 4-26 (c) Lincoln scene. In this image, vertical ribbons of both types are detected.

 

(a) Narobi 1 (b) Taj Mahal

 

(c) Lincoln scene

Figure 4-26 (3), (b) and (c) Shows the detected ribbons

47

 

Figure 4-27 Narobi 2 showing some successful ribbon detection

The scene in Figure 4-27, Narobi 2, has strong ribbon evidence in the crossbeams,
and one would hope that they would be detected, but this is not the case. The problem is
that some of the lines in question were never detected. Other interesting ribbons are

displayed in the Figure 4-28.

   

(a) (b) (C)
Figure 4-28 Showing some successful ribbons of Narobi 2

48

5. Image Representations

There are many types of features such as global, local and relational features that can
be used for object recognition and matching. Global features are region based and can be
obtained either by considering all points inside the region, or only those points on the
boundary of the region. In any of the case, the purpose is to ﬁnd the locations, intensity
characteristics, color, and spatial relations of these points. Local features usually
represent a small area of a region. Some local features are comers, and boundary
segments. Relational features are based on relative position of different entities, such as
distance between features, relative angle, orientations and precincts. These features are
very important in object recognition as even change in lighting conditions, view angle
and noise will not greatly inﬂuence these features. Line segments and orientation were
used for object recognition by B.Huet et al. [35]. We will use relative features to deﬁne
our data structures in order to get optimum and consistent performance from graph

matching algorithm.

Graphs are important data structures for computer vision. They are widely used to
represent the neighborhood relations that exit in an image. Relational graph matching
with model based segmentation for human detection was used by Ozer et al [36] for the
decision of human presence in the image as well as posture recognition. A similarity
based aspect graph approach was used by Christopher et a] [37] to recognize 3D objects.
Section 5.1, contains the basic deﬁnitions from the graph theory that will be used in the

rest of this Chapter. Section 5.2 discusses the line structure representation, Section 5.3

49

discusses the corner Structure representation and Section 5.4 is devoted to ribbon

structure representation.

5.1 Basic Definitions

The definitions from graph theory that are contained in this section can be found in

standard textbooks on graph theory such as [38,39,40,41].

Deﬁnition 5-1 A graph G=(V,E) consists of two sets: a ﬁnite set V of elements called
vertices and ﬁnite set of elements called edges. Each edge creates a binary

relation between a pair of vertices.

Definition 5-2 The vertices vi and Vj associated with an edge e are called the end
vertices of e: e is said to be incident to its end vertices. An edge is

denoted as e=(vi , vj)

Definition 5-3 Two edges are adjacent if they have a common end vertex.

Definition 5-4 Two vertices are adjacent if they are the end vertices of same edge.

Definition 5-5 A path K in a graph is finite alternating sequence of vertices and edges,

such that

o Vertices v3-1 and v, are the end vertices of the edge e, where 1S i S k

0 All edges are distinct

o All vertices are distinct

50

Vertices v0 and V], are called end vertices of the path, and we refer to it as

vo-vk path. The number of edges in the path is called length of the path.

Deﬁnition 5-6 A graph G is connected if there exists a path between every two vertices in

G.

Definition 57 Consider a graph G=(V , E). G’ =(\/, E’) is a sub-graph of G if \/ and E’
are, respectively, sub set of V and E such that an edge (vi , Vj) is in B, only

if V, and vj are in V’.
Deﬁnition 5-8 An n-vertex graph with n(n-1)/2 edges is a complete graph.

Deﬁnition 5-9 The adjacency matrix of an n-vertex graph G = (V , E) is an n x n matrix
A. Each element of A is either zero or one. If G is an undirected graph and

V={ l,2,3,. ..,n }then the elements of A are defined as follows:

lif(i,j)e Eor(j,i)€ E
A(i,j): .<

00therwise
\

 

5.1.1 Ad jacency Matrix

Since I used MATLAB 6.12 for my implementation, it is, therefore important to
highlight some of the MATLAB features. MATLAB as deﬁned by Math works group
[42] “MATLAB is a high-level matrix/array language with control ﬂow statements,

functions, data structures, input/output, and object-oriented programming features. It

51

allows both programming in the small to rapidly create quickly and dirty throw away
programs, and programming in the large to create complete large and complex
application programs”. MATLAB does not support pointers [42], therefore, it is hard to
implement a graph data structure in its classic manner. MATLAB is well known for
matrix manipulation and operations. The graph data structures can also be represented in
matrix form (Adjacency Matrix). Therefore, I will refer to an adjacency matrix as a graph
data structure.

Now let see some examples so that it’s clearer.

   
   

 

 

 

 

 

 

 

 

V] e V2 6 V3
‘ I 2
e4
63 6’5 e 7
V6
88 311
C e, i 6
V7 V8 V9

Figure 5-1 Pictorial representation of a graph G(V,E).

Figure 5-1 shows the example of a graph. Vertices are represented by black spots (0),
and edges by straight lines. The graph in Figure 5-1 has nine vertices: v. to v9. Therefore,
the adjacency matrix will have nine rows and nine columns. The adjacency matrices of

Figure 5-1 are Shown in Figure 5-2 and Figure 5-3.

52

 

r0 1 O l 0 0 O O 0‘

l 0 1 l l 0 0 O O

0 l 0 O l l O 0 0

l l 0 O O 0 l 0 O

G: O l l O 0 l 1 l O
0 O l 0 l 0 O 0 l

O 0 O 1 l O 0 l 0

O O 0 O l O 1 O 1

KO 0 0 O 0 I O 1 OJ

 

Figure 5-2 Adjacency Matrix of Figure 5-1

r \

Q

N
OOOOOOOOO
OOOOOOOOH
OOOOOOOv—O
OOOOOOOt—F—
OOOOOOI—I—O
OOOOt—Ob—OO
ocoo-ooo
OOHOI—OOOO
Ot—Ov—OOOOO

 

 

Figure 5-3 Adjacency Matrix of Figure Sil

Figure 5-2 and Figure 5-3 are adjacency matrices of Figure 5-1. The problem with
adjacency matrices is, it takes n x n memory space and if we want to have sub pixel
accuracy such as distance between two lines or a comer point, then we have to declare
float type and ﬂoat takes 8 bytes of memory space to store each value. So the total
memory space that matrix will occupy in the above-mentioned case is 8n2, where n is the
number of vertices. To reduce this we have to, ﬁrst declare non-float type such as
unsigned character type, which takes one byte of memory space and value ranges from O
to 255, and secondly we can eliminate the lower diagonal of Figure 5-2 as it is identical
to the upper diagonal. To further reduce memory storage we can convert the adjacency

matrix shown in Figure 5-3 into a sparse matrix [40,42].

53

5.2 Line Structure Representation.

I have divided a line structure into two groups depending upon its properties,
descriptor and relationship. Descriptors of a line represent different properties of line
such as line slope and edge gradient, and relationships represent distance between line
segments and line orientations.

For each of the descriptors and relationships we have to calculate the spatial
relationship as described in detail in subsequent sections. A descriptor is a good candidate
for vertex and relationship is a good candidate for an edge in a graph. A pictorial

representation of line structure is as shown in Figure 5-4

 

Figure 5-4 Pictorial representation of line structure.

Figure 5-4 ‘a’ ‘b’ and ‘c’ are properties of vertices and represent line length, line
slope, edge gradient respectively and ‘d’ and ‘e’ are properties for an edge of the graph

and represent distance between two lines and relative line orientation respectively.

54

5.2.1 Line Slope.

Line slope is used as one of the attributes of line segment.

5.2.2 Edge Gradient.

Edge gradient is one of the primitives in computer vision. It has been used as a
property of lines. Since I am using a graph-matching algorithm to match different lines
and line structures to achieve accurate matching results, edge gradient is used as one of
the descriptors of a line. Edge gradient normally changes with the changes in lighting
conditions, therefore, for consistent matching results a high threshold is used for
comparison.

To compute the average edge gradient Equation 5.3 is used.

S — 3 E - 3
2 A(n)+ 2AM)
n=m n=m

2:0 = , 5.3
LmeLength - 6

 

where LG represents the average edge gradient, ‘i’ represents the line number, ‘8’

and ‘E’ represent start and end points of a line respectively, ‘m’ is a midpoint of the line,
‘n’ is a pair of row and column coordinates of the line and ‘A’ is the gradient of the edge.
The Sobel gradient operator normally distorts the gradient at the end-points of an edge.
The degree at which it degrades the gradient depends on the size of the operator. I am
using 5 x 5 Sobel gradient operator to get more accurate and consistent average gradient,
average gradient is computed between end-points.

Figure 5-5 (a) is a test image and Figure 5-5 (b) shows the line segments resulting

from the Hough Transform and line joining routine.

55

 

(a) (b)

Figure 5-5 (a) Test image, (b) Lines extracted from (a)

In Table 5-1 descriptors such as line lengths, line slopes and edge gradients are shown
for Figure 5—5 (b). Line lengths for the horizontal lines are 30 or 32, whereas their line
slope is ‘0‘”, The only thing that differentiates among these horizontal lines is their edge
gradient. The edge gradient divides these lines into two distinct groups. The ﬁrst group
has the edge gradient less than 100° and the second group has the edge gradient more then
200°. Similarly one can see diagonal lines. Line lengths range from 42 to 48 and line
slope ranges from 73° to 79 °. The differences in line lengths and line slopes are less than
10 pixels and 10° respectively, which is less than the threshold that is set for the matching
(thresholds for these descriptors are discussed in Chapter 6). Here also edge gradient
plays an important role by dividing these lines into four groups. Diagonal lines, 1 and 7, 6
and 13, line 3, 10, 11 and 15 are grouped together. Similarly Horizontal lines 2,5,8 and 12

are grouped together and lines 4,9,14,and 16 are grouped together.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Line # Line Length Line Slope Edge Gradient Remarks
(Pixels) (Degree) (Degree)
1 46 77 40 Diagonal
2 32 0 94 Horizontal
3 48 79 195 Diagonal
4 32 0 267 Horizontal
5 32 0 96 Horizontal
6 42 73 18 Diagonal
7 47 77 46 Diagonal
8 32 0 9O Horizontal
9 32 0 270 Horizontal
10 44 74 200 Diagonal
11 48 79 190 Diagonal
12 32 0 95 Horizontal
13 43 75 25 Diagonal
14 32 0 269 Horizontal
15 46 74 198 Diagonal
16 30 0 267 Horizontal

 

 

 

 

 

 

 

Table 5-1 Line descriptors

5.2.3 Distance between Line Segments.

Distance between lines is another important parameter that can refine our matching

process. To compute the distance, Equation 5.4 is used.

 

2 2
IUD:\/(xi—xj) +[yi—yj) i¢j,je[i-10,i+10] 5.4

where IUD represents the line distance adjacency matrix, ‘i’ and ‘j’ represents the line

numbers, ‘D’ represents the distance between line ‘i’ and line ‘j’ in pixels and (x.,y,-) and
(39,») are the mid points of line ‘i’ and ‘j’ respectively.
5.2.4 Relative Line Orientation.

Line orientation means relative positions of the lines, such as ‘left’, ‘right’, ‘up’ and

‘down’. To relate their positions, midpoints are used.

57

 

 

 

 

 

 

 

Figure 5-6 Line Orientation

In Figure 5-6, 360° is divided into four equal angles of 90° each. Up is from 45° to
135 °, left is from 135 ° to 225 °, down from 225 ° to 315 ° and right from 315 ° to 45 °. In

Figure 5—6 line L2 is right of line L1.
L10: K(L,, L!) i ¢j,j E [i - 10,i + 10] 5.5
where Ljo is a line orientation adjacency matrix, ‘i’ and ‘j’ are line segment label

numbers and K is a function that returns the orientation of line ‘i’ with respect to line ‘j’.

5.2.5 Behavior of Line Descriptors and Relationships

Predicted behavior of line descriptors and relationships are shown in Table 5-2.
However, the behavior of these descriptors and relationships will be analyzed and

reported in Chapter 7.

58

 

 

 

 

 

 

 

Orientation

 

 

 

 

Descriptors] Rotation Scale Illumination Occlusion
Relationships Invariant Invariant Invariant Invariant
Line Slope No Yes Yes Yes

Edge Gradient No Yes No Yes
Direction

Edge Gradient Yes Yes No Yes
Magnitude

Distance between Yes No Yes N 0

Line Segments

Relative Line No Yes Yes Yes

 

Table 5-2 Behaviour of Line Descriptors and Relationships
5.3 Corner Structure Representation.

A geometric descriptor of the comer structure is ‘angle of a comer’, and relations that

a COITICI' makes With other comers are:-

0 Distance between Comers.

0 Comer’s Orientation.

0 Comer’s Precincts.

0 Angle between Corners.

The attributes of comer structure are shown in Figure 5-7.

b,c,d,e

 

 

b,c,d,e

b,c,d,e

b,c,d,e b,c,d,e

Figure 5-7 Pictorial Representation of Corner Structure

b,c,d,e

b,c,d,e

59

In Figure 5-7 ‘a’ is a property of vertices and represents the angle of a comer and ‘b’,
‘c’, ‘d’ and ‘e’ are properties of the edge of the graph and represent distance between two
comers, comer orientation, comer precincts and angle between comers respectively.

5.3.1 Angle of 8 Corner.

The procedure to detect a comer and compute its angle was deﬁned in Section 4.3.

The results obtained from Equation 4.5 are stored in the adjacency matrix ’Cg-A.
5.3.2 Distance between Corners.

The distance between comers is an important relationship that differentiates between
comers of doors and windows and those of a building. As one can see in the Figure 5-8
(a) and (b), comers of a door and windows are relatively closer than comers of the
building. We require that objects should match under small changes in the view and
resolution. For instance Figure 5-8 (a) and (b) show the same building but there is change

in the view and scale. Here the resolution of the image is same.

 

(a) (b)

Figure 5-8 Comers detected by the comer detection algorithm

To compute the distance between different comers Equation 5.6 is used.

can: K(‘C,-_,'Cj) < 8 i¢j,j e [i-10,i+10] 5.6

60

where K is function that computes the distance between comers 'C, and “C,- and returns

Euclidian distance between two comers. If this distance is less than the threshold distance

5 then this distance is stored in adjacency matrix 'CU-D
5.3.3 Corner’s Orientation.
As already discussed in Section 5.2.4 for line orientation, comer orientation is also

used to refine the matching results. Unlike the distance between comers, comer

orientation will not often be affected by view and resolution.

 

 

 

 

 

 

 

 

 

 

Figure 5-9 Comer Orientation

In Figure 5-9, 360° is divided into four equal angles of 90° each. Up is from 45° to
135 °, left is from 135 ° to 225 °, down from 225 ° to 315 ° and right from 315 ° to 45 °. In

Figure 5-9 comer C2 is right of corner C l.
“6,30: K('C,~, 'Cj) i¢j,j e [i - 10, i+ 10] 5.7

6'?

where ((700 is a comer orientation adjacency matrix, I and ‘j’ are comer label

numbers and K is a function that returns the orientation of comer “Ci with respect to 'Cj.

61

5.3.4 Comer’s Precincts.

One of the most important relationships among comer structures is represented by the
comer precincts. Precincts mean boundary, the boundary of a comer that arms (lines)
make. In Figure 5-10 precincts are shown with extended doted lines. We assumed that
any two comers of a window or door would close each other in their boundary as in

Figure 5-10 comer Cl and comer C2 close each other in their bounds.

 

 

 

C1 C2 ,/'/.

 

 

C3

 

 

 

Figure 5-10 Example of comer bounds

In Figure 5-10, C1, C2 and C3 are three comers. Dark thick lines show the line

segments and dotted lines show the bounds of these corners. Cl and C2 bound each

other, whereas C3 bounds C2 but C2 does not bound C3.

62

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A1—>B Az—)B
Dot Product Cross Product Dot Product Cross Product Decision
Angle Sign Angle Sign
01 +1 02 +1 Outside
01 +1 02 -1 Inside
01 -1 02 +1 Inside
01 -l 02 -1 Outside
Table 5-3 Decision table for arm’s orientation of a comer
Corner A Corner B Type Cod
Outside Outside > < 1
Outside Inside > > 2
Inside Outside < < 3
Inside Inside < > 4

 

 

 

 

 

 

Comer point ‘A’ and ‘B’ can be any comers shown in the Figure 5-10. A1 and A2 are
the distant endpoints of arms of corner ‘A’. Suppose ‘A’ is ‘Cl’ and ‘B’ is ‘C2’. Now
first we have to decide if comer point ‘B’ is inbound to comer point ‘A’ or not. Dot
product and cross product are taken between, comer point ‘B’ and arm ‘Al’. Comer point
‘A’ is taken as origin. Similarly, dot product and cross product are taken between, corner
point ‘B’ and arm ‘Az’. If both the signs of cross product are same, the comer point ‘B’ is
out side the bounds of ‘A’, if signs are opposite, then comer point ‘B’ is inbound (see
Table 5-3). Similarly, we can find, if comer point ‘A’ is inside the bounds of corner point
‘B’ or not. After getting both the results, Table 5-4 is consulted for the type of bounds. In

this example, both the signs are opposite for ‘C2’; therefore, comer ‘C2’ is inbound to

corner ‘Cl’ and similarly ‘Cl’ is inbound to ‘C2’.

63

Table 5-4 Type of bounds comer is making

 

5.3.5 Angle between Corners.

The angle between two comers also plays an important role in matching the corner
points. C2 is making 0° with C1 in Figure 5-10, whereas C1 is making 180° with C2 and
Cl and C2 relate with code type 4. From this we can conclude that if the difference
between the two angles is 180° and code type is 4, then it is strong evidence that these

comers are of some windows or doors.

c,,(.‘=K(*c,-, 'cj) i¢j,je [i-10,i+10] 5.7

where Cup is a adjacency matrix, ‘i’ and ‘j’ are comer label numbers and K is a

function that returns the angle between comer ”(ii and ‘Cj. function K uses dot product to

calculate the angle.

64

5.3.6 Behavior of Corner Descriptors and Relationships

Predicted behavior of comer descriptors and relationships are shown in Table 5-5.
However, the behavior of these descriptors and relationships will be analyzed and

reported in Chapter 7.

 

 

 

 

 

 

Descriptors/ Rotation Scale Illumination Occlusion
Relationships Invariant Invariant Invariant Invariant
Angle of a Yes Yes Yes Yes
Comer

Relative N 0 Yes Yes Yes
Comer’s

Orientation

Distance Yes No Yes Yes
between

Comers

Comer’s Yes Yes Yes Yes
Precincts

Angle Yes Yes Yes Yes
between

Comers

 

 

 

 

 

 

 

Table 5-5 Behavior of Comer Descriptors and Relationships

5.4 Ribbon Structure Representation.

The detection of ribbons is an important step towards the ultimate goal of image
matching using the graph-matching algorithm. The ribbon structure required by the
matching algorithm mainly uses descriptors of the line structure because a ribbon is

defined as parallel lines whose edge gradients differ by 180°, are in the same relative

65

position and at least some of the portion of lines overlap each other. The descriptors used

for ribbon structure are:-

0 Width of a Ribbon.

0 Line Slope.

0 Edge Gradient.

0 Type of Ribbon.

Except for ‘type of ribbon’ and ‘width of a ribbon’ the rest of the above descriptors
are the same as already explained in Section 5.2 and will not be discussed here. The
relationship between ribbons are:-

0 Distance between Ribbons.

0 Ribbon’s Relative Orientation.

 

Figure 5-11 Pictorial Representation of Ribbon Structure

66

In Figure 5-11 ‘a’ ‘b’ ‘c’ and ‘d’ are the properties of vertices and represents width of
a ribbon, line slope, edge gradient and type of ribbon respectively and ‘e’ and ‘f’ are
properties for edge of the graph and represent distance between two ribbons and ribbon’s
orientation respectively.

5.4.1 Width of a Ribbon.

Figure 5-12 shows two ribbons ‘A’ and ‘B’ with the shaded area between two lines.
In ribbon ‘A’ and ‘B’ some of the portion of lines are overlapping each other. The
difference in edge gradient is 180°. In Figure 5-12, ribbon width ‘w’ is the shortest

distance between two parallel lines.

 

   

w = ribbon width

 

 

 

Figure 5-12 Ribbon Width

5.4.2 Types of a Ribbon.

The types of a ribbon are already described in Section 4.4.2.

67

5.4.3 Relative Distance between Ribbons.

Relative distance between two ribbons can be defined as the shortest distance
between midpoints of ribbons. In Figure 5-13 distance ‘d’ is the shortest distance between

two ribbons.

 

 

 

4
Distance ‘d’

 

 

 

 

1

 

Figure 5-13 Distance between ribbon A and ribbon B

Equation 5.8 shows how to generate the adjacency matrix for distance between

ribbons.

RUC= K(R,~ , R,) i :t-‘j,j E [i - 10, i + 10] 5.8

where joc is an adjacency matrix, 1 and ‘j’ are ribbon label numbers and K is a

function that returns the distance between ribbon Ri and Rj.

68

5.4.4 Ribbon’s Orientation.

Like corners and line orientation, orientation of a ribbon also plays vital role in graph
matching as it helps in narrowing the match. Ribbon orientation means position of the

ribbons relative to other adjacent ribbons, in terms of positions such as ‘left’, ‘right’, ‘up

and ‘down’, taking midpoints of each ribbon as reference.

 

 

u-‘
.5
o
o
u.-
u
.
'-

 

 

 

 

 

Figure 5-14 Ribbon Orientation

In Figure 5-14 360° is divided into four equal angles of 90° each. Up is from 45° to

135°, left is from 135° to 225°, down from 225° to 315° and right from 315° to 45°. In

Figure 5-14 ribbon A is right of ribbon B.
R00: K(Ri, Rj) i¢j,je [i - 10,i + 10] 5.9

where R00 is a ribbon orientation adjacency matrix, ‘i’ and ‘j’ are ribbon label numbers

and K is a function that returns the orientation of ribbon Ri with respect to Rj.

69

5.4.5 Behavior of Ribbon Descriptors and Relationships

Predicted behavior of ribbon descriptors and relationships are shown in Table 5-6.
However, the behavior of these descriptors and relationships will be analyzed and

reported in Chapter 7.

 

 

 

 

 

 

 

Descriptors/ Rotation Scale Illumination Occlusion
Relationships Invariant Invariant Invariant Invariant
Width of a Yes No Yes Yes
Ribbon

Line Slope No Yes Yes Yes
Edge No Yes No Yes
Gradient

Type of Yes Yes Yes Yes
Ribbons

Distance Yes No Yes Yes
between

Ribbons

Relative No Yes Yes Yes
Ribbon

Orientation

 

 

 

 

 

 

 

Table 5-6 Behavior of Ribbon Descriptors and Relationships

70

6. Graph Matching Algorithm

In most of the image retrieval algorithms, researchers have been using only one of
these image features in isolation such as lines and comers with only two attributes. Lines
with two of its attributes, line length and gradient, was used by Benoit Huet and Edwin R.
Hancock [43] for sensitivity analysis for the problem of recognizing line patterns from
large structural libraries. For object based image retrieval Yi Tao and William I. Grosky
[44] also used a single feature (comer) with two attributes, location and color histogram
at'the centre of mass. This thesis develops a graph-matching algorithm using three
structures, (1) lines with four attributes (2) comers with five attributes and (3) ribbons
with six attributes. For details of these features and their attributes see Chapter 5 and
Table 3-1. Although graph matching is comparatively slower than other techniques used
for matching, at the same time the accuracy in graph matching is superior. In this chapter
I will try to prove that with such a large number of feature points, their attributes and

relationships graph matching is robust.

The matching algorithm is divided into three routines; each routine is used for
matching one of the features. This chapter is divided into four sections. The overall
methodology of graph matching is discussed in Section 1. In the remaining three sections,
each matching algorithm is discussed along with the results obtained.

6.1 Graph Matching

The term graph matching refers to the process of comparing two or more graphs with
each other. In our case, we are matching a query graph with the database graph. Now

before explaining the matching algorithm some of the basic terms used in graph matching

71

are introduced. If G1 is the graph for a database image and G2 is the graph for query
image (G1 = { V1 , E1 } and G2 = { V2 , E2 }), then we can match graph G1 with G2 and
G2 with Cl. Since the image is represented by three sets of structures (lines, comers and
ribbons), a separate graph G is defined for each structure. For details refer to Chapter 5.

6.2 Proposed Graph Matching Algorithm.

The algorithm shown in Algorithm 6-1 is a general algorithm that will give the

framework for how the graph matching will be done in subsequent sections.

Input: Graph G1 and Graph G2
Output: List of matching Vertices
I For each vertex V, of graph G2
' Find matching vertex Vj of graph G1 (by attributes)
If (True)
For all adjacent edges of G2 and G1
Compare vertices
If (True)
Increment Vote
End If
End For
Save vertices of 61, G2 and Votes
End If
End For

 

Algorithm 6-1 General graph matching algorithm

The algorithm takes two inputs, graph G1 and graph G2, as graph data structures and
compares each vertex of graph G1 with every vertex of graph G2 and return an array of
matched vertices. The complexity of the algorithm is n x m, where n is the number of
vertices of G1 and m is the vertices of G2. In this section I proposed the general

algorithm for graph matching problems. Now using the same concept, I will develop

72

particular graph matching algorithms that will match the images using specific image
structures.

6.3 Line-Graph Matching.

The particular goal in this section is to incorporate the line structure that is defined in
Chapter 5 into the matching algorithm. Before we proceed to experiment with the line
matching, it is important to understand the way the matching algorithm works and some
of the notation that is being used in the subsequent paragraphs.

If G = (V , E) is a graph then V = { Ira, , 1:13 } are the descriptors of vertices and E
= { I go , I up }are the relationships that vertices make with each other; I [a , I is

represent the edge gradient and line slope respectively, and I ()0 , I 00 represent the line

9

orientation and distance between two line segments, where ‘i and ‘j’ represent the
vertices and i i j.

6.3.1 Matching Algorithm

Algorithm 6-2 shows the line matching routine. It requires two inputs, graph G1 and
graph GZ, as line structures and returns the line-matching result. For comparison of
vertices Algorithm 6-2 calls sub-routine match_vertex(.) given in Algorithm 6-4.
match_vertex(.) compares all descriptors of vertices V, of graph G1 with every descriptor
of vertex V,- of graph G2. If a match found, it returns ‘1; else it returns ‘0’. Based on this
result, match_edge(.) sub-routine Algorithm 6-3 is called. The match_edge(.) subroutine
compares all the descriptors of edges EU of graph G1 with every descriptor of edges E0- of
graph G2. Now if any edge of V, matched with V], match_edge(.) calls the

match_vertex(.) sub-routine for matching the other end of the edge, and on successful

73

vertex match the algorithm increments the variable ‘vote’. The loop will continue until it
compares all the edges of V, with Vj. At the end it returns the ‘vote’ it gathered for
successful matches. Getting control from the match_edge(.) sub-routine, the
LineMatchingRoutine(.) stores lines L,- and L,- in the ‘match’ array, provided the vote is

greater then zero.

74

    

  

//G=(V , E) where V={ IiL, 1.0.133 }and E={ I90 ,1er }
. LineMatchingRoutine( G1 , GZ )
{
n1=size of graph G1
, n2=size of graph G2
vote=0
k=l
for i=1 to n1

for j=1 to n2

if(match_vertex ( Vi , Vj) )// match vertex Vi with Vj
vote=match_edge(Ei( i , : ) , Ej(j , : ) , i ,j )

       
   
   
   
   
   
   
   
   
   
   
   
   
 

if vote > 0
match( k , : ) = [Li Lj vote] //Array that store the matching.
results
k=k+1
end if
end if

 
 

‘ end for
: end for

    
 
  

; FinalMatch = line_sel__by_vote( match )
. return (FinalMatch)

 
 

Algorithm 6-2 Line Matching Routine

75

e1 is edges of Vi of G1
: 62 is edges of Vj of G2
. I is number of vertex V,-
' J is number of vertex Vj-
match_edge( e1 , e2 , I , J )
{
vote=0
for i=1-5 to 1+5 // compares maximum of 10 edges
' for j=J-5 to J+5 // compares maximum of 10 edges
if(1:101: =Ij02 & I 1101 - 1102 I < 5n )
if ( match_vertex ( Vi , Vj ))
vote=vote+l
end if
end if
. end for
'_ end for
i return (vote)

i {
If ([115 -Izs I< OI & [Ira -Iz(; |< 82 ) // compare the
//vertices

return (1)
else return (0)

 

Algorithm 6-4 Vertex matching sub-routine

76

. line_sel_by_vote( LinesMatch )
{
j x = size of LinesMatch
‘ for i=1:5
i=1
‘ while j<x
’ if (LinesMatch(j,l) = = LinesMatch(j+l,l)) // compare the line # for G1
if (LinesMatch(j,3) > LinesMatch(j+l,3)) // compare the votes
LinesMatch(j+l,:) = [ ] // delete the next line
else LinesMatch(j,:) = [ ] // delete the line
end
x=x-1
end
j=j+l
end
‘ end
' LinesMatch=sortrows(LinesMatch,[2]) // sort array by row # 2
‘ x =size of LinesMatch
: for i=1:5
. while j<x
if (LinesMatch(j,2)= =LinesMatch(j+l,2)) // compare the line # for G2
if (LinesMatch(j,3) > LinesMatch(j+l,3)) // compare the votes
LinesMatch(j+l,:) = [ ]; // delete the next line
else LinesMatch(j,:) = [ ]; // delete the line
end
x=x-1;
end
. j=j+l;
‘ end
I end
return (LinesMatch )

 

Algorithm 6-5 Line selection sub-routine

77

After completing the comparison between all the lines of graph G1 and G2, sub-
routine ‘line_sel_by_vote(.)’ given in Algorithm 6—5 is called. This sub-routine examines
the ‘match’ array and if any line has more then one match it will delete all those with non
maximum votes and will return the final line matching results to the parent routine.

Thresholds used in the algorithms are given in Table 6-1

 

 

 

 

Algorithms Thresholds Explanation Default Setting
Algorithm 6.3 8n Distance between two lines 10 pixels
Algorithm 6.4 81 Slope of the line 10°

Algorithm 6.4 52 Edge gradient of the line 30°

 

 

 

 

 

 

Table 6—1 Different threshold used in Line Structure Graph Algorithms

6.3.2 Matching Results for Line Segments

In this section, we will investigate the robustness and quality of the graph-matching
algorithm. Experiments were conducting on an image database consisting of 97 images of
different resolutions. In some of the images Gaussian noise was added. Images and their
resolutions are given in Appendix A. The images in the database are divided into six
groups depending upon their shapes and structural types. Four query images were taken
from these groups and the graph-matching algorithm for lines was applied. The results
are shown in Figure 6-2, Figure 6-3 and Figure 6-4. The first image shows the query
image and the remaining images are the result from the graph matching algorithm.
Retrieved images are shown in descending order from left to right and up to down
depending upon percentage of similarity.

In Figure 6-1, hand drawn Taj was given as query image. The graph-matching

algorithm for line structure successfully retrieved all the images of hand drawn Taj, even

78

those with the Gaussian noise and occlusion. The Taj Mahal images were also retrieved
along with some other images. This was the most successful experiment as out of eight
hand drawn Taj images, seven were successfully retrieved. In Figure 6-2 the Taj Mahal
was given as query. In this experiment, the algorithm successfully retrieved four out of
nine Taj Mahal images. The good thing about this experiment is, it retrieved Taj Mahal
images of different scale and resolution, view and noise level. Results from experiment 3
are shown in Figure 6-3. In this experiment, a Small Building was given as query and
three images were successfully retrieved from its group. In experiment 4 the query image
Spartan Village Apartments, was from the same group of Small Buildings. This time the

algorithm retrieved six relevant images from the same group.

79

query img 100% 81% 79%

m

78% 76% 76%

75%

 

43%

     

Figure 6-1 Query image Drawn Taj and retrieved images

80

query img 93% 74% 62%

 

Figure 6-2 Query image Taj Mahal and retrieved images

81

query img

 

~ ‘2 1' a; n . ".. r ‘1‘
..,‘q‘ ' _ «_-
ai - . . ,“

3 - - .
£9“?! 5- 'w Vaughn- ”5.1-Jr;
53 %

 

 

 

 

. .'-‘ 5 . Iq
L’ ‘ . ' ' I'-.. ‘ ;J_u-. —‘.
$3502: WM“? . I, ,1:

63%

“1"}...‘5: +£- I 1.3.... 4‘ ﬁ':
I I I V"
9+?» 1 T ‘9

H.-. ’

Ir- m“ a minima‘rrmrwaqm"

57%

 

,. . E} . , ,.
4 7 . ,. v
by. III-2.4 7‘“ x1.- “nuamr/l' .,.I.

54 %

t 3.,7 v
V ‘ . .
r 'u ‘ 9
7
'ﬂ '.
v
1‘ I

52%

at???

.357“
a}
L.
V - . .
N,
' .
P
A.

 

. .
,. ,.

r “ "" .k
net‘- ‘
$4.1 ‘5; w . , «5.. .

O
. _ . r.“
'4 . $- .
..,n: l . z

‘ f
. .
v- ' 1‘
i .
¢r

”a 7" m I w “ I- ' uh '
x ¢‘
9
'r 4
‘x
1 a. . - - .
‘h-nn ‘ . . .
‘ftA .'
‘ "T ,
. .v-
‘ 4 4 '.
r I-"' ‘ “ ”-5.1 a.

51°(

Figure 6-3 Query image Small Building and retrieved images

82

query img 98% 58% 57%

 

Figure 6-4 Query image Spartan Village Apartments and retrieved images

83

6.4 Corner-Graph Matching.

The particular goal in this section is to incorporate the comer structure that is defined
in chapter 5 into the matching algorithm and to study the effect. Before we proceed to
experiment with the comer matching, it is important to understand the way the algorithm
works and some of the notation that is being used in the subsequent paragraphs

If G = (V , E) is a graph then V = { “CM } is the descriptor of vertex and E = { 'Cijp ,
“Cg-0 , “CUP , CUC }are the relationships that vertices make with each other. ‘Cm represents

the comer angle and '6 Up , 'C ,j-O , CUP , 'erc represent the distance between comers,

comer’s orientation, comer’s precinct and angle between comers. where i i j.

6.4.1 Matching Algorithm

The graph-matching algorithm for corner structures is identical to the line structure
except the attributes of vertices and edges are used for comparison. For detailed

explanation, see Section 6.3.1.

84

l/G=(V , E) where V={ CA. }and E={ CD ,tp , to ,CC
' CornerMatchingRoutine( G1 , G2 )
{
n1=size of graph Gl
l n2=size of graph G2
vote=0
k=l
for i=1 to nl
V for j=l to n2
if(match_vertex ( Vi , Vj) )// match vertex Vi with Vj
vote=match_edge(Ei(i , : ) , Ej(j , : ) , i ,j )
if vote > 0
match( k , : )2 [Ci Cj vote] //store the matching results
k=k+1
end if
end if
end for
. end for

FinalMatch = comer_sel_by_vote( match )
return (FinalMatch)

for i=I-5 to 1+5
for j=J-5 to J+5
if (Gm = = jPZ & 'Cim == j02 &
l'CiDI- CjDz|<51&lCiC2 ' CiC2l<Szl
if ( match_vertex (Vi , Vj ))
vote=vote+1
end if
end if
end for
end for
return (vote)

 

Algorithm 6-7 Edge matching sub-routine

85

' Match_vertex(v1 , v2 )

. { if( I‘CIA - C2A|< 8n )// compare the vertex
return (1)

else return (0) }

i comer_sel_by_vote( ComerMatch )
' {
x 2 size of ComerMatch
I for i=1:5
i=1
1 while j<x
' if (ComerMatch 0,1) = = ComerMatch (j+1,l)) // compare the comer #
// for G1
if (ComerMatch (j,3) > ComerMatch (j+1,3)) // compare the votes
ComerMatch (j+l,:) = [ ] // delete the next comer
else ComerMatch (j,:) = [ ] // delete the comer
end
x=x-l
end
j=j+l
end
end
ComerMatch =sortrows(ComerMatch ,[2]) // sort array by row # 2
l x =size of ComerMatch
I' for i=1 :5
; i=1
I while j<x
if (ComerMatch 0,2): =ComerMatch (j+1,2)) // compare the comer #
// for G2
if (ComerMatch (L3) > ComerMatch (j+1,3)) // compare the votes
ComerMatch (j+l,:) = [ ]; // delete the next comer
else ComerMatch (j,:) = [ ]; // delete the comer
end
x=x-1;
end
j=j+1;
’ end
end
return (ComerMatch )

 

Algorithm 6-9 Corner selection sub-routine

86

 

 

 

 

 

 

 

 

Algorithms Thresholds Explanation Default Setting
Algorithm 6.7 5n Comer angle 10°

Algorithm 6.8 51 Distance between comers 10 pixels
Algorithm 6.8 52 Angle between comers 10°

 

 

Table 6-2 Different threshold used in Comer Structure Graph Algorithms

6.4.2 Matching Results for Comers

In this section, we will investigate the robustness and quality of the graph-matching
algorithm for comers. We conducted four experiments on the same query images as was
done in Section 6.3.2. The results from the first experiment are shown in Figure 6-5. This
experiment was one of the best experiments and gave a 100% recall. All the images from
the group were successfully retrieved. In the second experiment, as shown in Figure 6-6,
the algorithm successfully retrieved six out of nine Taj Mahal images. Retrieved images
of Taj Mahal are of different scale and resolutions, view and with Gaussian noise. In
experiments three and four the results are much better than the experiments three and four
of graph-matching algorithm for lines. The results from these experiments are shown in

Figure 6-7 and Figure 6-8 respectively.

87

query img 180% 94% 89%

   

88% 88% 88% 78%

   

ﬁat?" -“ I. y-.. p nil). ‘ﬁt‘l- "“1 ' . m _' -.
.a h K‘ ‘1"

.. It “7“. .‘iéi

  

Figure 6-5 Query image Drawn Taj and retrieved images

88

query img 108% 78% 89%

   

‘ 699" 59%
55% 55% 55% ' 55%

 

Figure 6-6 Query image Taj Mahal and retrieved images

89

query img

5,45r‘*n: was .431. ”2...;

, mum, ; ,
,, .

 

Figure 6-7 Query image Small Building and retrieved images

90

query img 100% 51%

I513»... ring as;
"I v- —' a ._

t

  

  

4"
4
Q“

   
  
  
   

- r
e

g ‘2
ill:
r“ "

I
i l‘

  

 

i ’uu u'rTii'v-vnlgvnrrry‘c. 3‘35“");
45%

’ ..,l‘ycyynw-
. . ‘ ,. .“.~'\I‘“”M...
"(s \I‘.'-.'

1A..

 

. ~. 5;; . . .
.5; r a

Figure 6-8 Query image Spartan Village Apartments and retrieved images

91

6.5 Ribbon-Graph Matching.

The particular goal in this section is to incorporate the ribbon structure that is defined
in Chapter 5 into the matching algorithm and to study the effect. Before we proceed to
experiment with the ribbon matching, it is important to understand the way the algorithm
works and some of the notation that is being used in the subsequent paragraphs.

If G = (V , E) is a graph then V = { I (L, I ,0, , I is , Rn } are descriptors of vertices
and E = { R51) , R .70 }are the relationships that vertices makes with each other. I ,-w . I
.0, , I rs , Rn represent the width of a ribbon, edge gradient, line slope and ribbon type

respectively, and R up , R (,0 represent the distance between ribbons, and ribbon’s
orientation , where i ¢ j.

6.5.1 Matching Algorithm

The graph matching algorithm for ribbon structure is identical to the line structure
and corner structure algorithms except for the attributes of vertices and edges. For details

see section 6.3.1. Thresholds used in algorithms are given in Table 6-3.

 

 

 

 

 

Algorithms Thresholds Explanation Default Setting
Algorithm 6.11 51 Distance between ribbons 20 pixels
Algorithm 6.12 52 Width of a ribbon 10 pixels
Algorithm 6.12 53 Slope of lines 10°

Algorithm 6.12 54 Edge gradient of lines 30°

 

 

 

 

 

 

Table 6-3 Different thresholds used in Ribbon Structure Graph Algorithms

92

//G=(V , E) where V={ ILIGHIS , RT }}and E={ RD , R0 , RC}
RibbonMatchingRoutine( G1 , G2 )

{

n1=size of graph G1

n2=size of graph G2

vote=0

k=1

for i=1 to n1
for j=l to n2
if(match_vertex ( Vi , Vj) )// match vertex Vi with Vj
vote=match_edge(Ei( i , : ) , Ej(j , : ) , i ,j )

if vote > 0
match( k , : ) = [R i R j vote] //store the matching results
k=k+l
end if
end if
end for
end for

FinalMatch = ribbon_sel_by_vote( match )
' return (FinalMatch)

 

 

Algorithm 6-10 Ribbon matching routine
5 match_edge( e1 , e2 , I , J )
{
' vote=0
I for i=1-5 to 1+5
for j=J-5 to J+5
if (RiOI == j02 & lRiDr-Rjnzl < 51 )
if ( match_vertex ( Vi , Vj ))
vote=vote+1

end if
end if

, end for
‘ end for
' return (vote)

Algorithm 6-11 Edge matching sub-routine

93

Match_vertex(vl , v2 )
{
if( Rn: =R2'r & lzlw—Izwk 02 & Iris —Izs |< 53 & I110 —120 |< 84)
// compare the vertex
return (1)

; ribbon_sel_by_vote( RibbonMatch )
' l
x = size of RibbonMatch
- for i=1:5
. i=1
' while j<x
if (RibbonMatch 0,1) = = RibbonMatch 0+1,1)) // compare the Ribbon #
// for G1
if (RibbonMatch 0,3) > RibbonMatch 0+1,3)) // compare the votes
RibbonMatch 0+1,:) = [ ] // delete the next Ribbon
else RibbonMatch 0,:) = [ ] // delete the Ribbon
end
x=x-1
end
i j=j+l
. end
end
' RibbonMatch =sortrows(RibbonMatch ,[2]) // sort array by row # 2
i x =size of RibbonMatch
; for i=1:5
. jzl
while j<x
if (RibbonMatch 0,2): =RibbonMatch 0+1,2)) // compare the Ribbon #
// for G2
if (RibbonMatch 0,3) > RibbonMatch 0+1,3)) // compare the votes
RibbonMatch 0+1,:) = [ ]; // delete the next Ribbon
else RibbonMatch 0,:) = [ ]; // delete the Ribbon
end
x=x-1;
end
j=j+1;
1 end
end

 

94

6.5.2 Matching Results for Ribbons

In this section, we will investigate the robustness and quality of the graph-matching
algorithm for ribbons. We conducted four experiments on the same query images as was
done in Section 6.3.2 and Section 6.4.2. The results from the first experiment are shown
in Figure 6-9. This experiment was the best experiment among four experiments
conducted and gave 100% recall. All the images from the group were successfully
retrieved. In the second experiment, as shown in Figure 6-10, the algorithm successfully
retrieved three out of nine Taj Mahal images. Retrieved images of Taj Mahal are of
different scale and resolutions, view and with gaussian noise. In experiment three and
four the results are much also better than the experiment three and four of graph-
matching for lines and comers. The results from these experiments are shown in Figure
6-11 and Figure 6-12 respectively. The overall results from the graph matching algorithm

for ribbons are much better than for lines and comers.

95

query img 180% 88% 82%

111.111.1I.1Il11.1|

81% 79%

[11.111111

59%

 

8%

4
33%

     

Figure 6-9 Query image drawn Taj and retrieved images

96

query img

 

Figure 6-10 Query image Taj Mahal and retrieved images

97

query img 100% 54% 53%

 

Figure 6—11 Query image Small Building and retrieved images

98

query img 75% 47% 42%

35% '

r‘ p *3 w ‘. a-‘hI'

7’1! lllll,

 

Figure 6—12 Query image Spartan Village Apartments and retrieved images

99

6.6 Combined Results of Graph Matching Algorithm

To justify our goal of employing a graph matching algorithm to retrieve images from
a database consider Figure 6-13 to Figure 6-16. The results obtained from line structure,
comer structure and ribbon structure are weighted by 1/3 and then added to obtain the
combined results. The results indicate that by combining individual results we can

improve precision.

100

query img 100% 84% 82%

     

82% 81% 80% 75%

     

49% 47% 48% 44 %

1: Nit-1

   

37% 35%

 

g _ 11"“.‘i'ﬂ‘2" HQ-EZZi-SEC‘JI? “ '4 '53.} P71"; 51°F {dig if"? 1'?
l.‘ .'..- .'~-“. - ~‘ ‘ 'v :-..~ -'..',~._-. . “
. - - x - . -

 
 

aw" \ . . .
-.v‘ (I‘.“"'..I"" : -.,,,.,r:3 ... - 4.5.:
1 " .. 0 ' ' ‘n’ ”-1. ‘ "- '. ._ ;." - I. I .
’ fJgt-anvsutWth W‘a; -' émwiaﬁiﬂtw’t‘alpi ~':‘ :55 9'.
. 4 ,‘~. *4 ~-.-..' a: .;
. ’ .' . b’z ' hi ‘
.1 . ' ...

 

Figure 6-13 Query image drawn Taj and retrieved images

101

query img 78% 51% 48%

 

 

 

    

‘J‘:ra_t‘.hﬁ.-.Iﬁg-L."Mh_u.1 --
~ . 'I"»--"-"I-..- I'L.,‘

  
 

    
   

{_IW 3"» ~ _ . .' .

44%

  
 

 

il"lﬂl

“_n_ui« ‘. _ :.‘ . -. or
(4‘3”? “xiii-Lyric... -' ..a' ' ‘I'
, .

  

, 1.3.
. ' ‘Ib’Yl‘E‘
.- ., \‘.-"-' ”.31

42%

93-555.. .5". ‘3’! ‘33
3' "u “‘1" .5“ ' ‘ I
" .‘t r4 ' l I

553.1516“; In? -. turd-ii.

 

     

. -
. ..v ..-.--~.' ..
r'.; 7’)" . .

'-‘::}='-"'7I.'".'4.-.N-‘ ‘3

     

 

 

Figure 6-14 Query image Taj Mahal and retrieved images

102

query 1mg 88% 58% 58%

 

 

 

 

 

 

 

  

 
    

' .' r 2' . l
V szw «gm-MAG.- -, ,

)I 'I 1““ :=._1“-.‘."'45".1 i».

x.

     

 

 

 

 

 

1.1 i 4 1. . ~ »-, a, "an

lit - '9 I ' .. -. . '_ i.
' . .‘ . z- - . . .

R\"31:’w4;x-H.-tﬂ"ﬂ vino-wt-

 

 

Figure 6-15 Query image Small Building and retrieved images

103

query img 90% 45% 44%

    

. "f-m'j-uI " "’9
mar"? .53;

- 3'!" I ‘~‘

37%

”.35.,

sf: '9’
3 Eli Ir-I"II

I'O-v 7%.

 

 

I - g“ n. 10':
‘5‘ 1.- /-; 'IQ};u;-uy’t“: “)1
”‘7' . ’ . 1 =-Fr":“'..'.-Ju

 

 

Figure 6-16 Query image Spartan Village Apartments and retrieved images

104

6.7 Results of Aerial Images

Fifteen aerial images were added to the image database. Three separate experiments
were conducted for matching the lines, corners and ribbon structures respectively. Results
are shown in Figure 6-17 to Figure 6-24. Results show that the line and the ribbon
matches have better performance than the corner match. All the images were retrieved

within the 25 % image similarity.

105

 

Figure 6-17 Query image ‘img92’ —line matching results

106

query img

 

 

Figure 6-18 Query image ‘img88’ — line matching results

107

query img

..-.I _' .

1'}

1:19-85
_ '.-"f I

3..

I . if“ p
I

j .
.1

}"ﬁ'f is;

was “‘5! m
'~ " ..I-r-A n '

r. ‘3. _-. ,

‘1‘": :- ““ L. _-.,

.4 .I : . .11....
ifﬁalfw". r.“

‘ a

40%

5‘. _._V “7" ‘W
J TzIOI LE. 1
'

   

   
    

TV..." ‘..
4?
"""""- :4“ -*
i

~t, Tl'j VI
1 '1
‘4

111

”)1
I

';“."e; ﬁr: “are

“Ht-‘9'“! II;

  
   
    

Figure 6-19 Query image ‘img92’ — corner matching results

108

query img

' ‘nmﬁ‘. ..ﬂ\'

‘Bﬁg‘ 3'

50 % 49 %

mm- «-

" l" ~32“ .v :9- .‘aﬂi‘ii I5“!
1

_. " .99.”? i
I inf 537143.. ‘
'-}:...'-It II "J L
.. i ii" 7995-1? 45.6..
km 5W :I-

l r
. I. - ‘
‘raw" '— . - g , I- 1
1..» ,. , .
i I ' ' y ‘I . I r rml.‘
45% 42%

””591; If" I 't’éafzwkg": Ml... fr I“ 3.30.
- . .. .t.- .. 1.

r

In?

1:... ”I,
P71. I‘d. ‘ d

4:.
”’Vi
If}

 

v.1..."
" "I
t 1

.1 1M1. . .4 4311’s
5‘ ~ I i;- 2 M”, - ‘ '3‘“ ‘l' If. |
W q ‘1 r '

 

VIII;
. 1: '- I.Io '{
.ﬁd "3 0' :
..H wig-“£1

5W -
~54 ‘~ '

-, ~. ~-;.. an.
‘ ”f “ 11
ﬁt I. ; ‘ '_ “t. "‘
3L? r."

+3:- " "-1 Ter,” “i-

n”: -- rut->1 : 1‘5-
l' ' 2' 9‘ ' - harm as

     

Figure 6-20 Query image ‘img88’ - corner matching results

109

query img

 

Figure 6-21 Query image ‘img92’ — ribbon matching results

110

 

Figure 6-22 Query image ‘img88’ — ribbon matching results

111

query img 100%

  
  
  

_V 'v1 “0‘ 1 ~u.1-',-Iu
iii-43:“ l‘ §T—* . l 1‘ A
a a g
r -_ “‘ 'fn .:r ‘1;
5' a-‘-*'L_ 2. 9"»
a V i i a:
II. I): ‘
M , f" Mi

Figure 6-23 Query image ‘img92’ — combined results

112

 

Figure 6-24 Query image ‘img88’ — combined results

113

7. Discussion and Future Work

7.1 Analysis

The overall performance of the graph-matching algorithm is comparable to other
CBIR matching algorithms. Although it was assumed that the performance of the
proposed matching algorithm would solely depend on the feature detection process,
meaning that feature extraction would be of vital importance, results show that this is not
the case. As we added different levels of noise to some of the images, it negatively
affected the feature extraction results but still correct images were retrieved. For
instance, we added Gaussian noise of 1%, 2% and 4% to the Drawn Taj and Taj Mahal
but still our graph-matching algorithm successfully retrieved these images. To test our
matching algorithm against perspective/view tolerance, we also added images with
different perspectives in our database, such as the Spartan village apartments, Fifth Third
Bank and Small Building. The results are very encouraging. When we submit one of the
images from Spartan village apartments as the query, the matching algorithm successfully
retrieved eleven out of thirteen relevant images and for the small building query, six out
of seven relevant images were retrieved. We also tested the algorithm for intensity
variations and were successful in retrieving images of different brightness, e.g., see
results for Taj Mahal and others. The algorithm was also tested for occlusion, for which
we removed some of the minarets from Drawn Taj and submitted the query; the
algorithm still successfully retrieved the relevant images.

Results from all the three structures used in our graph-matching algorithm are quite
promising and the combination results are even better. Although possible combinations of

two features were not tried, it is safe to assume that results will be better than the

114

individual results. Table 7-1 and Table 7-2 summarize the results using a combination of
the three features. Table 7-1 shows the results for exact matching. By exact matching, we
mean that the same building with different perspectives or with added noise is used in the
query. The first row of the table shows four percentages of similarity for which we
evaluate image retrieval results. The remaining five show the percentage of images
recalled at each of the four levels of similarity. For instance, for the Drawn Taj query,
one of the seven relevant images was retrieved at the 100 % matching level, (14 %)
whereas all seven relevant matches were retrieved at the 75 % matching level (100 %).
Table 7-2 shows that a similarity level of 25 % may have to be used to achieve a high

recall value.

 

 

 

 

 

 

 

Similarity 100% 75% 50% 25% Total
Query Images Images
Drawn Taj 14% 100% 100% 100% 7
Fifth Third 25% 25% 25% 75%

Bank

Small Building - 12% 12% 88% 8
Spartan Village - 8% 8% 84% 13
Apartments

Taj Mahal 25% 25% 50% 4

 

 

 

 

 

 

 

 

Table 7-1 Results for exact matching

Table 7-2 shows the results for those images that are similar in shape. For instance,
the Drawn Taj is similar to the Taj Mahal and 54% of relevant images were retrieved
within 75% of similarity and 100% were retrieved within 25% of similarity. Again, 25 %
similarity level will recall a significant percentage of target images. These results also

show that precision will go down if exact matches are wanted.

115

 

 

 

 

 

 

 

Similarity 100 % 75 % 50% 25 % Total
Query Images Images
Drawn Taj 8% 54% 54% 100% 14
Fifth Third 11% 11% 11% 45% 9
Bank
Small - 6% 67% 84% 15
Building
Spartan - 8% 4% 84% 13
Village
Apartments
Taj Mahal - 12% 12% 25% 8

 

 

 

 

 

 

 

 

Table 7 -2 Results for similar matching

Like every algorithm, our graph-matching algorithm also has some weak points. The
graph-matching algorithm is slow compared to some other types of matching algorithms,
such as histogram matching. The complexity of the graph-matching algorithm is n x m,
where n and m are the number of features of test and query images, respectively. In Table
7-3 some of the data-structures and sizes are shown. From this, one can anticipate the
relative compute time required for the corresponding matching of graphs. Table 7-4 gives

total time in seconds for comparison of query image with 97 images from database.

 

 

 

 

 

 

 

Query Images Line Corner Ribbon
Structure Structure Structure

Drawn Taj 21 17 16

Fifth Third 187 76 55

Bank

Small Building 96 41 60

Spartan Village 128 47 28

Apartments

Taj Mahal 136 60 24

 

 

 

 

 

Table 7-3 Graph sizes (number of vertices) for different data-structures and
images

116

 

 

 

 

 

 

 

 

 

Query Images Time Taken for
Matching in
Seconds
Drawn Taj 165
Fifth Third Bank 3140
Small Building 1940
Spartan Village Apartments 2100
Taj Mahal 1880

 

Table 7-4 Run-time for image retrieval

7.2 Conclusions

Content Based Image Retrieval (CBIR) is an important problem for computer vision
and computer science. This thesis presents a new geometric method for a Content Based
Image Retrieval system that internally uses graph-matching for image identiﬁcation and
retrieval. The retrieval system is divided into three sub processes, feature extraction, data
representation and graph-matching algorithm. Feature extraction and data representation
are pre-processors, while the graph-matching is done online. Initially features such as
lines, comers and ribbons are extracted and saved in a graph data—structure.
Subsequently, when a query is submitted to the graph-matching algorithm, it ﬁrst
undergoes feature extraction. The algorithm then uses the extracted query feature space
along with the existing data-structure database for image matching and retrieval purposes.

The graph-matching algorithm that is proposed in this thesis is robust to the presence
of noise, occlusion, small changes in perspective, and image brightness. The performance
of the algorithm against such adverse effects has been measured via MATLAB
simulations. Further, the proposed algorithm can detect abrupt changes in view and

perspective; therefore, it can be also used for automatic view change detection in

117

continuous video. The current implementation is too slow to be practical. Improvements

are suggested below.

7.3 Future Work

Although all the desired goals of this work have been met, several new issues have
been noticed, which can be dealt with in future research. To improve the retrieval
performance of this prototype image retrieval system, color histogram matching and
texture matching need to be integrated into the algorithm implementation. Further, one
can enhance the existing algorithm for rotation invariance, if needed by the user. The
ability to match the rotated images is critical for some aerial imagery applications; the
current algorithm can handle only small rotations in perspective for ground geometric
structures such as buildings.

The performance of the algorithm for video is yet to be evaluated. Further, it is
evident from the current results that the system can be used for object recognition and
localization. It can be used for security purposes where ﬁnding and matching of exact
buildings or other geometric features in a database of aerial imaging data may be
required. Of course, runtime performance must be improved. Reprogramming in C and

using hierarchical matching are two sure methods for significantly speeding up matching.

118

Appendix A

119

img1
192x192x16M jpeg

é

img12
221x192x16Mjpeg

f

irng15
221x192x16Mipeg

E'

img18
179x192x16M1peg

.<

160x157x16Mjpeg

imgza
115x109x16Mlpeg

 

img10
176x192x16Mjpeg

img13
221x192x16Mjpeg

img16
206x182x16M jpeg

 

ingl
115x109x16M jpeg

im924
115x109x16Mjpeg

120

 

r; ,p .
Eu

img11
176x192x16M jpeg

'I'ng14
221x192x16Mjpeg

 

W17
180x181x16Mjpeg

192x192x16M1peg

im925
115x109x16Mjpog

E F H

img26 mm mm
115x109x16Mjpeg 115x109x16Mipoo 115x109x16Mjpeg

     

ingsa
256x1ux16M jpeg

 

mm mm lrn941
192(19de Jpeg 289x192x16M1peg 289x192x18M1peg

121

   

lm94

5 “946 W7
2m37ﬂ56jpeg 19319265141909 192x192x16MJpeg

 

 

122

. warm ‘2 «1.1“ mg
w l i I
Q M

   

img59 ImsG “96°
azomOxrsM jpeg 192x192x16Mlpeo 3201040x16M jpeg

 

if"964 I"1985 ir"965
30010920ij 30M92xl6Mjpog 300(192x16Mjpog

   

im967 twee |m969
300x192xreM jpeg 295x187xl6M 1m :mmoneM jpog

   

lmg‘lZ
32M40x16M ipeg

123

 

3141:... “915.393. _ ,_
r“ I _ I t“ ‘ :r I

"'1" ‘11?» "III

   

 

img75 img76
320x240x16M jpeg 32M40x16M ipeo

fer r *2 r: ‘u" ’ ' r ‘,
r." M 1! 'f I “S

     

lmg‘lB img79
32M40x16M jpeg 32M40x16M lpeg

   

img80 imgB‘l im982
3201940x16M jpeg 32M40x16M jpeg

a.) MF.‘ - g
‘1

    
  

 

lrng83 lrrI984

 
     

Im986 Imger Imges

    

248:034x16M jpeg 235x179x16M jpeg

   

Img89 lmgQ
216x179x16M jpeg 176x192x16M jpeg 213x179x16M jpeg

124

   

lm992 lmgQ
1 90x1 86x16M jpeg

. ET.

   
  

 

 

011996
205x189x16M jpeg 241x187x16M jpeg

 

9
245x193x16M jpeg

125

Appendix B

126

Line Detection Results Corner Detection Results Ribbon Detection Results

WW 1]” MI

Img 1 (a) Img 1 (b) Img 1 (c)
Img 12 (a) Img 12 (b) Img 12 (c)
Img 16 (a) Img 16 (b) Img 16 (c)

   

Img 29 (a) Img 29 (b) Img 29 (c)

127

 

Img 48 (a) Img 48 (b) Img 48 (c)

   

Img 52 (a) Img 52 (b) Img 52 (c)

     

Img 57 (a) Img 57 (b) Img 57 (c)

128

 

Img 60 (a) Img 60 (b) Img 60 (c)

 

Img 65 (a) Img 65 (b) Img 65 (c)

   

Img 73 (a) Img 73 (b) Img 74 (c)

129

Bibliography

 

10.

ll.

Ahmad I. and Grosky W. I. , Spatial Similarity-Based Retrievals and Image Indexing
by Hierarchical Decomposition, Proceedings of the International Database
Engineering and Application Symposium, Montreal, Canada, August 1997, pp. 269-
278

. Chang S. K. and Liu S. H. , Picture Indexing and Abstraction Techniques for Pictorial

Databases, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume
6, Number 4 (July 1984), pp. 475-484.

Stockman G. and Shapiro L. Computer Vision, Prentice Hall, Inc. Upper Saddle
River, New Jersey, 2001.

Information on the emerging MPEG-7 standard, MPE97 http://drogo.cselt.stet.it
Impeg/faq/faq_mpeg-7.htm

. W. Niblack, R. Barber, W. Equitz, M.D. Flicknet, D. Glasman, D. Petkovic, and P.

Yanker. 1993. The QBIC project: Querying images by content using color, texture,
and shape. SPIE Proc. Storage and Retrieval for Image and Video Databases, 173-187

Bolle, R., J. Connell, N. Haas, R. Mohan, and G. Taubin. 1996. VeggieVision: a
produce recognition system. Proc. IEEE Workshop on Applications of Computer
Vision

Swain, M J and Ballard, D H (1991) “Color indexing” International Journal of
Computer Vision 7(1), 11-32

Carson C S, Belongie ,Hayit Greenspan and Jitendra Malik (1997) “Region-based
image querying” in Proceedings of IEEE Workshop on Content-Based Access of
Image and Video Libraries, San Juan, Puerto Rico, 42-49

H. Tamura, S. Mori, and T. Yamawaki (1978) “Textural features corresponding to
visual perception” IEEE Transactions on Systems, Man and Cybernetics 8(6), 460-
472

Liu, F and Picard, R W (1996) “Periodicity, directionality and randomness: Wold
features for image modelling and retrieval” IEEE Transactions on Pattern Analysis
and Machine Intelligence 18(7), 722-733

Ma W Y and Manjunath, B S (1997) “Netra: a toolbox for navigating large image

databases” Proc IEEE International Conference on Image Processing (ICIP97), 1,
568-571

130

 

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

R. W. Piccard A. Pentland and S. Sclaroff “Photobook: tools for content-based
manipulation of image databases” International Journal of Computer Vision 18(3),
233-254

Chan, Y and Kung, S Y (1997) “A hierarchical algorithm for image retrieval by
sketch” in First IEEE Workshop on Multimedia Signal Processing , 564-569

Chen (1996) “Indexing to 3D model aspects using 2D contour features” in
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San
Francisco , 913-920

Sven Dickinson, Alex Pentland and Suzanne Stevenson (1998) “Viewpoint-invariant
indexing for content-based image retrieval” in IEEE International Workshop on
Content-based Access of Image and Video Databases (CAIVD’98) , Bombay, India,
20-30

N. Roussopoulos, C. Faloutsos, and T. Sellis. (1988) “An efﬁcient pictorial database
system for PSQL” IEEE Transactions on Software Engineering, 14(5), 639-650

Gudivada V N and Raghavan V V (1995a) “Content-based image retrieval systems”
IEEE Computer 28(9), 18-22

Charles E. Jacobs, Adam Finkelstein, David H. Salesin, "Fast Multiresolution Image '
Querying" Proceedings of SIGGRAPH 95, Los Angeles, CA (ACM SIGGRAPH
Annual Conference Series, 1995), 277-286.

Ravela, S and Manmatha, R (1998a) “Retrieving images by appearance” in
Proceedings of IEEE International Conference on Computer Vision (IICV98),
Bombay, India , 608-613

Ravela, S and Manmatha, R (1998b) “On computing global similarity in images” in
Proceedings of IEEE Workshop on Applications of Computer Vision (W ACV98),
Princeton, NJ , 82-87

Rabbitti, F and Stanchev, P (1989) “GRIM_DBMS: a graphical image database
management system” in Visual Database Systems (Kunii, T, ed), Elsevier,
Amsterdam, 415-430

Qasim Iqbal and J. K. Aggarwal “Retrieval by classiﬁcation of images containing
large manmade objects using perceptual grouping” in Pattern Recognition 35 (2002)
1463-1479

David A. Forsyth, Jitendra Malik, Margaret M. Fleck, Hayit Greenspan, Thomas
Leung, Serge Belongie, Chad Carson, Chris Bregler (1997) “Finding pictures of
objects in large collections of images” in Digital Image Access and Retrieval: 1996

131

 

Clinic on Library Applications of Data Processing (Heidom, P B and Sandore, B,
eds), 118-139. Graduate School of Library and Information Science, University of
Illinois at Urbana-Champaign.

24. Niels Haering, Zarina Myles, Niels da Vitoria Lobo (1997) “Locating deciduous
trees” in Proceedings of IEEE Workshop on Content-Based Access of Image and
Video Libraries , San Juan, Puerto Rico, June 1997, 18-25

25. Minka, T (1996) “An image database browser that learns from user interaction” MIT
Media Laboratory Technical Report #365

26. S. Chang and W. Chen and H. Sundaram (1998) “Semantic visual templates: linking
visual features to semantics” in IEEE International Conference on Image Processing
(ICIP’98), Chicago, Illinois 531-535

27. TO. Binford, Inferring surfaces from images, Artif. Intel]. 17 (1981) 205-244.

28. D.G. Lowe, Perceptual Organization and Visual Recognition, Kluwer Academic
Publishers, Hingham, MA, 1985.

29. HQ. Lu, J .K. Aggarwal, Applying perceptual organization to the detection of man-
made objects in non-urban scenes, Pattern Recognition 25 (8) (1992) 835—853.

30. Hussain, Digital Image Processing, Ellis Horwood Ltd, England 1991.

31. Chin-Tu Chen, Jin Shin Chou, Wei Chaung Lin, and CA Pelizzari. Edge and surface
searching in medical images. 1988.

32 Ramesh Jain, Rangachar Kasturi, Brain G. Schunck, Machine Vision, McGraw-Hall,
Inc. 1995.

33. M.W. Akhtar and M. Atiquzzaman, “Determination of line length using Hough
Transform,” Electronics Letters, vol.28, no. 1, pp-94-96, January 2, 1992

34. M. Atiquzzaman and M.W. Akhtar, "A robust Hough transform technique for
complete line segment description" Real Time Imaging, vol. 1, 1995, pp. 419-426.

35. B.Huet, ADJ.Cross, ER.Hancock, “graph matching for shape retrieval”, Neural
Information Processing System 98, pp 896-902, 1998

36. Burak Ozer, Wayne Wolf, Ali N. Akansu, "A Graph Based Object Description for
Information Retrieval in Digital Image and Video Libraries", Accepted for
publication, Journal of Visual Communication and Image Representation, Academic
Press, 2002

37. Christopher M. Cyr and Benjamin B. Kimia 3D Object Recognition Using Shape
Similarity-Based Aspect Graph ICCV 2001

132

 

 

38. Sartaj Shani, “Data Stucture, Algorithms, and Application in C++”, Me Graw Hill
1998

39. Michel Gondran and Michel Minoux, “ Graph and Algorithms”, A Wiley-Interscience
Publication, 1984

40. Alan George, John R.Gilbert and Joseph W.H.Liu, “Graph Theory and Sparse Matrix
Computation”, Springer-Verlag,1993

41. W.T.Tutte, “Graph Theory and Related Topics”, Academic Press New York, 1979

42. MathWorks Group, “Technical Documentation for MATLAB”. The MathWorks,
Inc.,l984-2000

43. Huet B. and ER. Hancock, "Structural Sensitivity for Large-Scale Line-Pattem
Recognition", Third International Conference on Visual Information Systems
(VISUAL99), page 711-718, 2-4 June, 1999, Amsterdam, The Netherlands.

44. Yi Tao and William I. Grosky. Object-Based Image Retrieval Using Point Feature
Maps. Proceedings of the 8th IFIP 2.6 Working Conference on Database Semantics
(D58), Rotorua, New Zealand, January 5-8, 1999, pp. 59-73.

133

 

 

l!TTT’TTTTTTT\i