a . a.) x... .. “HI..." c . “Mlmg Nil)“. . 4. km)». «, I ‘ : .1433. .. . v.1 ‘ .3?» i; .— o . . in." .uw. - ~ 1..« db. .gnfimnb L trig}; .A. v.) El; 19‘! . 2.1.: 4. .LI Jwaumg ‘ I. .f91.4.vl.:v 1-3. . .1. . $.14 .1 . . .r .r. - 3311A r53. .. 1.. 3n h. :.1 ._. 1. A. n. ‘ , ‘ “swunmmmmfifim.y.nmflnna . ‘ , i THF'TJS This is to certify that the thesis entitled OBJECT SHAPE AND STRUCTURE FOR IMAGE MATCHING AND RETRIEVAL presented by NAVEED SARFRAZ KHAN KHATI'AK has been accepted towards fulfillment of the requirements for the Master of degree in Computer Science Science Major Professor’s Signature /3 Decibel Date MSU is an Affinnative Action/Equal Opportunity Institution LTBTiARY Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 cJClRC/DateDuopss-pJS OBJECT SHAPE AND STRUCTURE FOR IMAGE MATCHING AND RETRIEVAL By Naveed Sarfraz Khan Khattak A THESIS Submitted to Michigan State University in partial fulfilment of the requirements for the degree of MASTER OF SCIENCE Department of Computer Science 2002 ABSTRACT OBJECT SHAPE AND STRUCTURE FOR IMAGE MATCHING AND RETRIEVAL By N AVEED SARFRAZ KHAN KHATTAK This thesis reports results on the use of shape features for content-based image retrieval. Edges, lines, corners, and straight ribbons are extracted from a query image. These features and their attributes, along with various relations are used to match representations of other images in a database via a graph-matching algorithm. Images can be matched to images or to hand-drawn sketches. Experimental results on images with man made structures and aerial images are promising. Images can be matched to images of hand drawn sketches. In addition to the content-based retrieval, the matching method is also applicable to object recognition and localization. The algorithm is also suitable for estimation of scene changes in a continuous movie. Presently, the algorithm is developed in MATLAB. The current program must be translated to ‘C/C++’ in order to run much faster and should be tested on a much larger set of images. Copyright by NAVEED SARFRAZ KHAN KHATTAK 2002 ACKNOWLEDGMENTS I would like to express my gratitude to all those who gave me the possibility to complete this thesis. I want to thank National University of Science and Technology for their financial support and for encouraging me to complete my thesis. I would also like to express my special thanks to my supervisor Dr. George Stockman for his advice and guidance during this study and to my committee members Dr Sakti Pramanik and Dr Jiajuo Qi. I would like to give my Special thanks to my parents, wife and children whose patient love enabled me to complete my thesis. I am also thankful to all those who helped me to complete my thesis, especially Irafan Asalam and Khurram Waheed. iv TABLE OF CONTENTS List of Tables ......................................................................... ix List of Figures ........................................................................ x 1 . INTRODUCTION ........................................................................................... 1 2. LITERATURE REVIEW ................................................................................ 6 2.1 Current CBIR Techniques- - - - - 6 2.1.1 Color Based Retrieval ........................................................................................ 6 2.1.2 Texture Based Retrieval .................................................................................... 7 2.1.3 Shape Based Retrieval ....................................................................................... 7 2.1.4 Retrieval by other Types of Primitive Feature ................................................ 8 2.2 Practical Applications of CBIR---- - _ - - 11 3. METHODOLOGY ........................................................................................ 12 3.1 Data Representation---- - - ------ -- -- - - 12 3.2 Image Matching Process ----------------------------- l4 4. FEATURE EXTRACTION MODULE ........................................................... 16 4.1 Edge Detection- - - - - - ---------- - 16 4.1.1 Canny Edge Detector ....................................................................................... 18 4.1.2 Experiment and Results with the Canny Edge Detector .............................. 19 4.2 Line Detection ..... - - - 25 4.2.1 Line Joining ...................................................................................................... 34 4.2.2 Line Detection Results ..................................................................................... 35 4.3 Corner Detection -- 38 4.3.1 Corner Detection Observations ....................................................................... 40 4.3.2 Corner Detection Results ................................................................................. 42 4.4 Ribbon Detection 43 4.4.1 Ribbon Detection .............................................................................................. 44 4.4.2 Type of Ribbons. ............................................................................................... 44 4.4.3 Ribbon Detection Results ................................................................................. 45 5. IMAGE REPRESENTATIONS .................................................................... 49 5.1 Basic Definitions 50 5.1.1 Ad jacency Matrix ............................................................................................. 51 5.2 Line Structure Representation.-- -- 54 5.2.1 Line Slope. ......................................................................................................... 55 5.2.2 Edge Gradient. .................................................................................................. 55 5.2.3 Distance between Line Segments. ................................................................... 57 5.2.4 Relative Line Orientation. ............................................................................... 57 5.2.5 Behavior of Line Descriptors and Relationships ........................................... 58 5.3 Corner Structure Representation. - 59 5.3.1 Angle of a Corner. ............................................................................................ 60 vi 5.3.3 Corner’s Orientation. . ..................................................................................... 61 5.3.4 Comer’s Precincts. .......................................................................................... 62 5.3.5 Angle between Corners .................................................................................... 64 5.3.6 Behavior of Corner Descriptors and Relationships ...................................... 65 5.4 Ribbon Structure Representation. 65 5.4.1 Width of a Ribbon. ........................................................................................... 67 5.4.2 Types of a Ribbon. . .......................................................................................... 67 5.4.3 Relative Distance between Ribbons. ............................................................... 68 5.4.4 Ribbon’s Orientation. ...................................................................................... 69 5.4.5 Behavior of Ribbon Descriptors and Relationships ...................................... 70 6. GRAPH MATCHING ALGORITHM ............................................................. 71 6.1 Graph Matching 71 6.2 Proposed Graph Matching Algorithm. 72 6.3 Line-Graph Matching. 73 6.3.1 Matching Algorithm ........................................................................................ 73 6.3.2 Matching Results for Line Segments .............................................................. 78 6.4 Corner-Graph Matching. 84 6.4.1 Matching Algorithm ........................................................................................ 84 6.4.2 Matching Results for Corners ......................................................................... 87 6.5 Ribbon-Graph Matching. 92 vii 6.5.1 Matching Algorithm ........................................................................................ 92 6.5.2 Matching Results for Ribbons ........................................................................ 95 6.6 Combined Results of Graph Matching Algorithm 100 7. DISCUSSION AND FUTURE WORK ........................................................ 114 7.1 Analysis - 114 7.2 Conclusion 117 7.3 Future Work - - -- 118 APPENDEX-A .................................................................................... 119 APPENDEX-B .................................................................................... 126 BIBLIOGRAPHY ................................................................................ 130 viii LIST OF TABLES Table 3-1 Features and their Parameters - - - - - 15 Table 4-1 Confidence Rating for corner detection of hand-drawn images ................ 43 Table 4-2 Confidence Rating for ribbon detection 46 Table 5-1 Line descriptors - - 57 Table 5-2 Behaviour of Line Descriptors and Relationships 59 Table 5-3 Decision table for arm’s orientation of a corner - -- 63 Table 5-4 Type of bounds corner is making ------------ -- ------ 63 Table 5-5 Behavior of Corner Descriptors and Relationships 65 Table 5-6 Behavior of Ribbon Descriptors and Relationships 70 Table 6-1 Different threshold used in Line Structure Graph Algorithms ................. 78 Table 6-2 Different threshold used in Corner Structure Graph Algorithms ............ 87 Table 6-3 Different thresholds used in Ribbon Structure Graph Algorithms ........... 92 Table 7-1 Results for exact matching - 115 Table 7-2 Results for similar matching - - - 116 Table 7-3 Graph sizes (number of vertices) for different data-structures and images 116 Table 7-4 Run-time for image retrieval 117 ix LIST OF FIGURES “Images in this thesis are presented in color” Figure 1-1 Query image of Small Building and results of our retrieval system. ......... 4 Figure 3-1 Overall Methodology for Image Comparison 14 Figure 4-1 Process for feature extraction - 16 Figure 4-2 Edge image of different shapes with default parameter 0:1.0 ................ 19 Figure 4-3 Edge image of different shapes with $20 20 Figure 4-4 (a) Test Image with Gaussian noise of 0.01. (b)Edge image with o=l.0.. 21 Figure 4-5 (a) Test Image with Gaussian noise of 0.02. (b) Edge image with 0:1.0. 21 Figure 4-6 (a) Test Image with Gaussian noise of 0.05. (b) Edge image with 0:1.0. 22 Figure 4-7 Edge image with (:30 22 Figure 4-8 (a) Thick W with original scanner noise. (b) Edge image of Thick W with $1.0 - 23 Figure 4-9 (a) Taj Mahal with original scanner noise. (b) Edge image of Taj Mahal with 0 =1.0 23 Figure 4-10 (a) Small Building with original scanner noise. (b) Edge image with o=l.0 24 Figure 4-11 The parameters (1 and 0 used in the equation 4.2 of a straight line ....... 27 Figure 4-12 Accumulator Array showing the peaks 29 Figure 4-13 Test image -- 29 Figure 4-14 Accumulator Array showing the merged peaks 32 Figure 4-15 (a) line detection before applying algorithm 4.1 and (b) line detection after applying algorithm 4.1 33 Figure 4-16 (a) Vertical overlapping edge (b) Diagonal overlapping edge ................ 34 Figure 4-17 (a) and (b) Successful line detection 36 Figure 4-18 (a) and (b) Incomplete line detection - -- -- - -- 37 Figure 4-19 (a) Lincoln Memorial and (b) Big Taj, line length = 5 37 Figure 4-20 (a) Narobi 1 and (b) Small Taj, line length = 5 - 38 Figure 4-21 Test image with all the corners detected 40 Figure 4-22 Test Pattern with one corner missing (lower left corner of the lower middle rectangle) - 41 Figure 4-23 Corner detection results for (3) Lincoln 25 corners and (b) Taj Mahal 60 corners 41 Figure 4-24 (a) is Type-1 ribbon and (b) is Type-2 ribbon 45 Figure 4-25 Thin W ribbon detection“ -- - 46 Figure 4-26 (a), (b) and (c) Shows the detected ribbons 47 Figure 4-27 Narobi 2 showing some successful ribbon detection - 48 Figure 4-28 Showing some successful ribbons of Narobi 2 48 Figure 5-1 Pictorial representation of a graph G(V,E). 52 Figure 5-2 Ad jacency Matrix of Figure 5-1 53 Figure 5-3 Ad jacency Matrix of Figure 5-1 53 Figure 5-4 Pictorial representation of line structure. 54 Figure 5-5 (a) Test image, (b) Lines extracted from (a) 56 xi Figure 5-6 Line Orientation Figure 5-7 Pictorial Representation of Corner Structure Figure 5-8 Corners detected by the corner detection algorithm Figure 5-9 Corner Orientation. - Figure 5-10 Example of comer bounds Figure 5-11 Pictorial Representation of Ribbon Structure Figure 5-12 Ribbon Width - - - -- Figure 5-13 Distance between ribbon A and ribbon B Figure 5-14 Ribbon Orientation- - - -- - -- - -------- Figure 6-1 Query image Drawn Taj and retrieved images Figure 6-2 Query image Taj Mahal and retrieved images Figure 6-3 Query image Small Building and retrieved images,” Figure 6-4 Query image Spartan Village Apartments and retrieved images ............ Figure 6-5 Query image drawn Taj and retrieved images Figure 6-6 Query image Taj Mahal and retrieved images -- - Figure 6-7 Query image Small Building and retrieved images Figure 6-8 Query image Spartan Village Apartments and retrieved images ............ Figure 6-9 Query image drawn Taj and retrieved images Figure 6-10 Query image Taj Mahal and retrieved images Figure 6-11 Query image Small Building and retrieved images xii 58 59 -61 62 66 67 68 -69 80 81 82 -88 89 96 97 98 Figure 6-12 Query image Spartan Village Apartments and retrieved images .......... 99 Figure 6-13 Query image drawn Taj and retrieved images 101 Figure 6-14 Query image Taj Mahal and retrieved images 102 Figure 6-15 Query image Small Building and retrieved images - 103 Figure 6-16 Query image Spartan Village Apartments and retrieved images ........ 104 xiii 1. Introduction An important problem in the field of computer vision is the automatic recognition of objects in two-dimensional images for the classification of scenes. Another problem is that of matching images based upon similarities between object features. Because of the recent increase in the wide range of applications depending upon the resolution of such problems, such as multimedia database searches and augmented reality, the solutions to these problems should have a high degree of flexibility and automation. Matching images in a multimedia database offers an example of such a problem. The database may contain hand-drawn images, images taken from cameras and photographs containing objects with varying geometric properties, colors, and textures, as well as a wide range in image resolution and size. The extraction and subsequent classification of object features must be general enough to deal with the varied contents of the database and must ensure that reliable matching takes place. At the same time, the system should strive for automation so that user interaction is kept to a minimum. The shape of a single object and the various spatial constraints among multiple objects in an image are examples of object classification. Shape and spatial constraints are important data in many applications, ranging from complex space exploration and satellite information management to medical research and entertainment. Image retrieval can be categorized into exact match searching and similarity based searching. For either type of retrieval, the dynamic aspects of image content require expensive computations and sophisticated methodologies in the areas of image processing and database systems. In order to overcome these problems, several schemes for data modeling and image representation have been proposed [1,2]. In general, each of these schemes builds a symbolic image for each given physical image, and symbolic images are then used in conjunction with index structures as proxies for image comparisons to reduce the search Space. One of the traditional indexing methods for image retrieval is text-based. Although text annotation is a practical technique, this task is labor intensive, language dependent, vocabulary controlled, human subjective in nature and also, cannot predict future use. In some cases, it is rather difficult to characterize certain important real world concepts, entities, and attributes by means of text only. The shape of a single object and the various Spatial constraints among multiple objects in an image are examples of such concepts for image comparison. Another indexing method is content—based similarity. Content-based image retrieval systems rely on similarity measures. The four major classes of similarity measures are based upon color, texture, shape, and object and relationship similarity [3]. Once a measure of similarity is determined, the corresponding actual images are retrieved from the database. Due to the lack of any unified framework for image representation, storage, and retrieval (see [4] for information on the emerging MPEG-7 standard), these symbolic representation schemes and retrieval techniques have greatly facilitated image management. The Query By Image Content system commonly known as the QBIC system by IBM is an example of recent work in content—based image retrieval that is based upon color and texture similarity [5]. QBIC is able to perform database queries in terms of color percentage matching, color histogram matching, color layout similarity and texture. While a system such as this will perform well when the most important image feature is color, it will suffer when object shape is the dominant feature. The Veggie Vision system by IBM provides a good example of the performance that can be achieved when the four similarity measures are combined in one system [6]. This system uses color, texture, shape, and size histograms for the identification of produce. The success the Veggie Vision system has achieved, confirms the value for the integration of shape Similarity with that of color and texture. This thesis will concentrate on extracting Shape structures with the intent of developing better representations of scenes and better means of matching in the domain of scenes with significant man-made object content. With a view towards the recognition of man-made objects in scenes, the natural features to use as the basis of the matching algorithms are lines, corners, and ribbons. The retrieval process has been divided into three sub processes; the feature extraction process, data representation, and graph ' matching algorithm. Feature extraction and data representation are offline processes whereas graph matching is an online process. The Canny edge detector and Hough Transform are used for feature extraction. These features such as lines, comers and ribbons are then processed for extraction of relative attributes, for example distance, angle, and orientation, for matching purposes. All this information is then stored into a graph data structure separately for each image in MAT file format used by MATLAB for storing data. A detailed discussion on these processes may be found in Chapter 4 and 5. Our image retrieval system requires a query image for the image retrieval process. The query image is then processed for feature extraction and representation. The extracted features are then compared with the features of all the images already stored in 3 the database and the images are displayed in descending order basing on percentage of similarity. Figure 1-1 is one of the examples of image retrieval. In this example, a query image of a Small Building was submitted and the program successfully retrieved all the images of that Small Building taken from different views. ‘ . query img Figure 1-1 Query image of Small Building and results of our retrieval system. The core work of this thesis is presented in Chapter 3 to Chapter 6. Chapter 3 provides an overview of the image retrieval system and explains the logical decomposition of our system into three main modules: feature extraction module, features representation module and graph matching module. I will give a detailed description of the techniques used for edge, line, comer, and ribbon detection in Chapter 4, and discussing the results of these detectors applied to both hand-drawn images and manmade object. Chapter 5 and 6 are devoted to graph data structure development and the graph matching algorithm. Chapter 7 summarizes the strength of the matching algorithm, as well as its shortcoming by evaluating it in different worst-case scenarios. 2. Literature Review 2.1 Current CBIR Techniques Content Based Image Retrieval ‘CBIR’ works on a principle of retrieving images from a database by comparing features automatically extracted from the images themselves. The commonest features used for CBIR are color, texture or shape. A typical CBIR system allows users to create queries by submitting an example of the type of image being sought, though some offer alternatives such as selection from a palette or sketch input. The system then retrieves images from the database whose feature values match those of the query most closely and displays thumbnails of these images on the screen. Some of the more commonly used types of features used for image retrieval are described below. 2.1.1 Color Based Retrieval Color is one of the most widely used features in content-based image retrieval systems. The color feature is relatively robust to background complications, image resolution and orientation. Color histogram is most commonly used to extract color features. Several other methods for color-based image retrieval systems have been described in the literature, but most are variations on the same basic idea. Each image added to the collection is analyzed to compute a color histogram, which Shows the proportion of pixels of each color within the image. The color histogram for each image is then stored in the database. At search time, the user can either specify the desired proportion of each color (75% olive green and 25% red, for example), or submit an example image from which a color histogram is calculated. Either way, the matching process then retrieves those images whose color histograms match those of the query most closely. The matching technique most commonly used, histogram intersection, was first developed by Swain and Ballard [7]. Variants of this technique are now used in a high proportion of current CBIR systems. Methods of improving on Swain and Ballard’s original technique include the use of region—based color querying [8]. 2.1.2 Texture Based Retrieval “A variety of techniques have been used for measuring texture similarity; the best techniques rely on comparing values of second-order statistics calculated from query and stored images” [10]. Fundamentally, these calculate the relative brightness of selected pairs of pixels from each image. From these Tamura et al[9], calculated measures of image texture such as the degree of contrast, coarseness, directionality and regularity. Liu et al[lO] calculated periodicity, directionality and randomness. Texture queries can be made in a manner similar to color queries, by supplying an example query image. The system then retrieves images with texture measures most Similar in value to the query. Ma and Manjunath [11] introduced a new technique of texture thesaurus that retrieves images based on similarity. 2.1.3 Shape Based Retrieval Shape is an important feature in content-based image retrieval. However, shape is not a well-defined concept mathematically. Because of uncertainty in shape representations, the retrieval system may work well only in certain environments. Shapes can be represented either globally such as with aspect ratio, perimeters and moments, or locally such as with boundary segments. An alternative method for shape matching is elastic deformation of templates by Pentland et a1 [12]. Queries to shape retrieval systems are 7 made either by identifying an example image to act as the query, or as a user-drawn sketch [l3]. Shape matching of three-dimensional objects is a more challenging task - particularly where only a single 2-D view of the object in question is available. While no general solution to this problem is known, some useful inroads have been made into the problem of identifying at least some instances of a given object from different viewpoints. One approach has been to build up a set of 3-D models from the available 2- D image, and match them with other models in the database [14]. Another is to generate a series of alternative 2-D views of each database object, each of which is matched with the query image [15] 2.1.4 Retrieval by other Types of Primitive Feature One of the oldest established means .of accessing pictorial data is retrieval by its position within an image. Accessing data by spatial location is an essential aspect of geographical information systems and efficient methods to achieve this have been around for many years [16]. A similar technique was proposed by Gudivada and Raghavan [17] to allow users to search for images containing objects in defined spatial relationships with each other. Several other types of image feature have been proposed as a basis for CBIR. Most of these depend on complex transformations of pixel intensities. These techniques aim to extract features, which reflect some part of image similarity. The most well known technique of this type uses the wavelet transform to model an image at several different resolutions. Promising retrieval results have been reported by matching wavelet features computed from query and stored images [18]. 8 Retrieval by appearance is another method that gives interesting results. Two versions of this method have been developed, one for whole-image matching and one for matching selected parts of an image. The part-image technique involves filtering the image with Gaussian derivatives at multiple scales [l9], and then computing differential invariants; the whole-image technique uses distributions of local curvature and phase [20]. The advantage of all these techniques is that they can describe an image at varying levels of detail (useful in natural scenes where the objects of interest may appear in a variety of appearances), and avoid the need to segment the image into regions of interest before shape descriptors can be computed. One early system was designed to interpret and retrieve line drawings of objects within a narrow predefined domain, such as floor plans for domestic buildings. The system analysed object drawings, labelling each with a set of possible interpretations and their probabilities. These were then used to derive likely interpretations of the scene within which they appeared. GRIM_DBMS [21]. One system recently introduced by Qasim Iqbal and I. K Aggarawal [22], used higher level and lower level vision methodology to retrieve manmade objects. Higher level vision was used to extract features such as ‘L’ junctions, ‘U’ junctions and parallel groups. Lower-level analysis is performed globally by using Gabor filters to extract texture features. A manmade object region of interest extracted by using perceptual grouping is used as a frame for conducting lower-level analysis, but the method of image retrieval is weak. Instead of comparing color, texture and shape features, these features are weighted and used for comparison. Object recognition has also been an area of interest to the computer vision community for many years. Techniques are now being developed for recognizing and classifying objects with database retrieval in mind. David et a1 [23] have attracted considerable publicity for themselves by developing a technique for recognizing naked human beings within images, though their approach has been applied to a much wider range of objects, including horses and trees. Haering et al [24] also developed a method for identifying deciduous trees via their foliage. All such techniques are based on the idea of developing a model of each class of object to be recognized, identifying image regions which might contain examples of the object, and building up evidence to confirm or rule out the object’s presence. Evidence will typically include both features of the candidate region itself (color, shape or texture) and contextual information such as its position and the type of background in the image. In contrast to these fully automatic methods is a family of techniques, which allow systems to learn associations between semantic concepts and primitive features from user feedback. The earliest such system was FourEyes from Minka [25]. This encourages the user to explain selected regions of an image, and then proceeds to apply similar semantic labels to areas with similar characteristics. The system is capable of improving its performance with user feedback. Another approach is the concept of the semantic visual template introduced by S F Chang et al [26]. Here, the user is asked to identify a possible range of color, texture, and shape or motion parameters to express his or her query, which is then refined using relevance feedback techniques. When the user is satisfied, the query is given a semantic label (such as “sunset”) and stored in a query database for later use. 10 2.2 Practical Applications of CBIR A wide range of possible applications for CBIR technology has been identified e. g. 0 Crime prevention 0 The military 0 Intellectual property 0 Architectural and engineering design 0 Fashion and interior design 0 Journalism and advertising 0 Medical diagnosis 0 Geographical information and remote sensing systems 0 Cultural heritage 0 Education and training 0 Home entertainment 0 Web searching 0 Robotics 0 Industries 11 3. Methodology This chapter focuses on the development of a methodology for feature extraction using higher-level image analysis. Manmade objects like buildings generally have Sharp edges and straight boundaries. The prominent characteristics of a building are the apparent regularity and the relationship of its component features. An image containing large manmade objects such as buildings, bridges and roads, will exhibit a large number of significant edges, junctions and parallel lines, compared to an image with predominantly non-manmade objects (such as scene of vegetation). These images are generated by the presence of corners in the object, such as doors, windows, pillars, and the boundaries of the buildings. These higher-level features exhibit apparent regularity and relationship, and are strong evidence of structure in a scene. The presence of these distinguishing features in an image can be utilized for image comparison and subsequently for image retrieval. 3.1 Data Representation To compare different images, the following higher-level features are extracted from images: 0 Straight lines o Ribbons (Parallel lines) o Corners Manmade objects are generally rigid; therefore, these representations adequately define their shape descriptions. The extraction process is hierarchical. The first stage is extraction of edges from the image, then extraction of straight lines, which tend to be 12 small fragments that are grouped to form longer lines. Lines are then used to find comers and ribbons. A set of lines terminating at a common point is called a corner. The comer is an important relation. According to the proximity rule of perceptual grouping, the human visual system easily groups comers. In fact, it has been suggested that a major function of eye movement is to determine comers and edges [27]. The straight-line segments are also used to extract parallel lines. A large number of manmade objects contain parallel structures such as pillars, beams, doors and windows. Human vision can rapidly identify parallel lines [28]. A parallel relation is a non- accidental relationship that can be used to infer relationships. These parallel lines are grouped together to find parallel groups, which are then used to extract significant parallel groups [29]. The rationale for the grouping relations described above is the following. o Spatially closed primitive structures are likely to be related and to reflect meaningful structures. 0 Accidental image relations of natural objects may cause some primitive structures. For example, line segments extracted from a cluster of tree leaves may accidentally form a ribbon or comer. Such ribbons or comers tend to be randomly and sparsely distributed and unlikely to form meaningful structures. 0 Manmade objects usually consist of regular structures. 13 3.2 Image Matching Process Before explaining each step in detail, here, I would like to give an overall view of my image retrieval methodology. I have divided my matching process into three main parts namely:- 0 Feature Extraction Module 0 Features Representation Module, and 0 Graph Matching Module / Image A / / Image B / I C Feature Extraction Module ) C Feature Extraction Module ) C Features Representation Module > ' C Features Representation Module ) C Graph Matching Module ) / Output Result / Figure 3-1 Overall Methodology for Image Comparison l4 Figure 3-1 shows the overall methodology for an image retrieval system. In Figure 3-1 the feature extraction module is actually a combination of a number of other processes (i.e such as Canny edge detector, hough transform, line fit routine that joins small line segments into longer lines, reordering the lines, comer detector and ribbon detector). These processes are responsible for extracting higher-level features such as lines, comers and ribbons from the given gray level images. After extraction of higher— level features, the feature representation process starts and collects several other parameters (Table 3-1) of these features before storing them into graph data structures. When these parameters are calculated, then all these parameters are stored into respective graph data structures. The graph-matching algorithm actually compares the query image with a stored image from the database. Features Parameters Lines Slope of Edge Line Distance Line Gradient Orientation between Lines Corners Angle of a Comer Comer Distance Angle Comer Orientation Precincts between between Corners Comers Ribbons Ribbon Slope of Edge Type of Ribbon Distance Width Line Gradient Ribbon Orientation between Ribbons Table 3-1 Features and their Parameters 15 4. Feature Extraction Module Feature extraction is one of the most important steps in our image retrieval system as all other steps depend on this process. This process is divided into sub processes in order to analyse each step critically. Figure 4-1 illustrates the processing chain that each image will ultimately pass through for feature extraction. Comer \ T Detection _’ Edge Line Line Fit & Features Detection h—T Detection r-H Line 30in T > Representation ' Process Ribbon I Detection a) Figure 4-1 Process for feature extraction Because of this processing sequence and the number of parameters at each stage, one must be conscious of a natural dependency. Namely, no matter how robust the comer and ribbon detectors are, they are dependent upon the line detector, which in turn is dependent upon the edge detector. Thus, the number of parameters that influence the last two detectors may be larger than desired. With this in mind, the edge and line detectors must exhibit acceptable performance over a wide range of images, balanced with the need for parameter optimisation and automation. 4.1 Edge Detection The early stages of vision processing identify features in images that are relevant to estimating the structure and properties of objects in a scene. Edges are one such feature. 16 Edges are significant local changes in the intensity image and are important features for analyzing images. Edges typically occur on the boundary between two different regions in the image. Edge detection is frequently the first Step in recovering information from images. Due to its importance, edge detection continues to be an active research area. At its simplest, an edge is a sharp discontinuity in gray-level profile. However, the situation is complicated by presence of noise and image resolution. An edge is Specified by its magnitude and its direction. A number of linked edge points may be better approximated by a linear segment called an edge] [30]. There are many types of edges; they may be classified into three classes: step edge, roof edge and linear edge. A good edge operator must have the following properties [30]; it must 0 Operate locally 0 Be efficient 0 Be sensitive to the orientation and magnitude of an edge 0 Work in the presence of noise 0 Be insensitive to threshold value 0 A further condition imposed by Canny [Canny 86] is that the operator must not have multiple responses to a single edge. To detect these different types of edges many edge operators have been suggested in the computer vision literature. i,e Sobel [Prewitt 70], Roberts [Roberts, 65], Krish [Krish, 71], Marr and Hildreth [ Marr, 80]. These edge operators possess some of the above listed properties, but not all, and also are good for one type of edge and not good for other types of edge. Canny in 1986 developed an edge operator, which extracted not only step edges but also ridge and roof edges. Chen et al. [31] have evaluated many of 17 these algorithms for edge detection on brain images obtained from various sources such as computer tomography (CT), magnetic resonance images (MRI), and positron emission tomography (PET). Their results indicate that none of the edge detectors mentioned are universally applicable. Since in our thesis first step in the feature extraction sequence is to extract edges from the grey scale image, the use of the Canny edge detector to satisfy this step is justified as the Canny operator has a consistent response with single smoothing parameter. The Canny edge detector used here is the Canny algorithm described by Stockman and Shapiro [3]. 4.1.1 Canny Edge Detector The original gray scale image I[i, j] is smoothed with a Gaussian filter G[i, j, o], where o is the spread of the Gaussian that controls the degree of smoothing. S[i,j]= G[i,j , o] * I[i,j] 4.1 The gradient magnitude M[i, j] and direction 9[i, j] are then computed from the smoothed image S[i, j]. The magnitude image is used to provide votes for possible edge pixels. If the magnitude is higher than a given low threshold, then the pixel is classified as a possible edge. Next, the gradient direction is used in a step called non-maximal suppression. The purpose of this step is to thin a possible edge by suppressing any pixel response that is not higher than the two neighbouring pixels on either side of it along the direction of the gradient. Finally, the magnitude of each pixel in the image of possible edges is compared against a given high threshold. If a pixel passes the test, it is classified 18 as an edge pixel, and the magnitudes of its neighbours are recursively tested against the low threshold. 4.1.2 Experiment and Results with the Canny Edge Detector The edge detection is the first stage of the segmentation chain; the spread parameter 0‘ is perhaps the most important of all parameters to be encountered. For example, Figure 4-2 shows the edge image of different shapes with default parameter of 0:10 and Figure 4—3 shows the edge image with 0:20 Figure 4-2 Edge image of different Shapes with default parameter 6:1 .0 l9 Figure 4-3 Edge image of different shapes with 0:20 In Figure 4—3 comers are somewhat distorted when spread 0:2.0 was used. The smoothing parameter 0 may also have to be adjusted in the presence of noise. Figure 4—4 to Figure 4-7 Show the results of adding Gaussian noise to the test image. Notice that the adjustment of 0' yields the proper edges and these edges are still as ‘clean’ as those detected from the noise free image using the same 0. The image with medium noise will not suffer from using a small 0' because the edges are still ‘clean’. The image with high noise will undoubtedly suffer in later stages of detection, due to the jagged edges and noise artifacts. One should not be alarmed at such results, since the amount of noise that has been added is much greater than what is expected in practice. A more reasonable amount of noise can be seen in the original Thick W, Taj Mahal and Small Building images (see Figure 4—8 to Figure 4-10). This noise was added during the scanning 20 process, and it will be shown that this noise does not affect the overall performance of the detection system. (a) . (b) Figure 4—4 (a) Test Image with Gaussian noise of 0.01. (b)Edge image with 0:10 (a) (b) Figure 4-5 (a) Test Image with Gaussian noise of 0.02. (b) Edge image with 6:1.0 21 (a) (b) Figure 4-6 (a) Test Image with Gaussian noise of 0.05. (b) Edge image with O'=l.0 Figure 4-7 Edge image with 6:30 22 (a) (b) Figure 4-8 (a) Thick W with original scanner noise. (b) Edge image of Thick W with o=1.0 r‘. II"! in} Jill" i] [r1]. [bib 27—57:; ICLj: _:' (a) (b) Figure 4-9 (a) Taj Mahal with original scanner noise. (b) Edge image of Taj Mahal with 0 =1.0 23 (a) (b) Figure 4-10 (a) Small Building with original scanner noise. (b) Edge image with 0:10 24 4.2 Line Detection The Hough Transform (HT) is used in computer vision and pattern recognition for detecting geometric shapes (in digital images) that can be defined by parametric equations. It is a mapping from the image plane to the parameter space, and essentially consists of a voting process followed by a peak detection stage. The advantages of the transform are its robustness to noise and discontinuities in the image, while the disadvantages are its demand for a large amount of computing time and storage (although with present advancement in computer memory and computing power it’s not a problem). The time and Space requirements depend on the number of parameters required to describe the pattern, the resolution of accumulator array, and the image resolution. A straight line is the simplest of all geometric patterns and can be described with only two parameters (see Equation 4.2). Different aspects of the HT have been investigated and reported in the literature [3,30,32,34]. The compute bound nature of the HT has inspired the development of efficient algorithms and implementations of the transform. Performance of the transform with respect to accuracy of detection has been discussed in the literature. A formal rnathematical definition of the transform, its properties and relationships appear in [3 ,30,33,34]. Commonly used parameterizations of a straight line are slope-intercept and the r1()I‘mal forms. When the HT is used for the detection of straight lines in an image, only the parameters of the infinite line are obtained. It does not provide any information IVegarding the length, position and endpoints of the line segment in the image plane. 25 s—fj III] However, in computer vision application, the length and exact position of a line segment in addition to its normal parameters, are required for locating the line segment. The parameters, the length, and the coordinates of end points of a line segment constitute the complete line segment description. Techniques to find the complete line segment description using the Hough Transform have not been thoroughly studied, although a few algorithms are available in the literature [33]. The available methods are based on the projection of the line on either the x or y axis in the image plane. I also used the technique of least square error fit to more precisely estimate the line parameters. In order to detect and report on the line detection process, one must first decide on a proper definition of a line. There are certain unavoidable problems that occur when detecting lines in an image. These problems stem from the fact that, even though a digital line is not the same as its mathematical counterpart, the mathematical definition is used for detection purposes. In order to take advantage of both concepts of a line, or more properly a line segment, I define a line to be a set of pixels that fit a mathematical line model. For the purposes of detection, the model used for a line is d = xcos(6) + y sin(t9) 4.2 where d is the distance from the line to the origin and I9 is the angle between the vector perpendicular to the line and the x-axis see Figure 4-11. 26 y 003(0) + y sin (0) y-axis . + x-axrs Figure 4-11 The parameters (1 and 0 used in the equation 4.2 of a straight line. The line detection algorithm employed is a variation of the Hough Transform with the O’Gorman and Clowes method for extracting straight lines. Refer to Chapter 10 of Stockman and Shapiro [3] for a detailed description of this method. The idea behind the Hough transform is to build evidence for the existence of lines using the model d = c cos(t9) + r sin(6) 4.3 where r and c are the respective row and column location of a pixel and d and 6 are the length of normal and angle of the normal with respect to the positive x-axis. The angle 6 was computed using Sobel 5 x 5 gradient operator. The distance from a potential line to the origin can thus be computed by d 2| c cos(t9) + r sin(t9) I, so that OSde' 44 27 where d' is half the length of the main diagonal of the image. The distance d and angle 6’ are then quantized for each feature point in the image, i.e. initially d is assigned to the nearest multiple of 3, and 6? is assigned to the nearest multiple of 10. A 36 x d? accumulator array, V, is used to gather votes from pixels. For example, if a pixel has a quantized angle 6 of 180° and a quantized distance d of 240, the value in V [18][80] is incremented. An array of point lists is also kept to store the list of pixels that voted for a particular line. Once the accumulator array has been filled, it is searched for the line with the greatest number of votes. Then, the set of pixels that voted for the line are processed to find neighbors whose angles are within ten degrees of the given angle. If such a neighbor is found, it is deleted from its point list, added to the current point list, and the value in its accumulator array is decremented. This process is referred to as the Merge stage of the Hough Transform. The process of choosing the greatest bin and merging is continued as long as the size of the bin in the accumulator array is greater than a chosen length parameter. This parameter dictates the length of the lines that can be detected. 28 4 1 3 0 2:9 6 7 4 4 0 0 0 o 0 0 5 4 4 3 0 1 15 31 3 0 3 4 £133 0 0 0 0 r I 5 3 25 6 l 0 o 7 0 0 1 44 0 0 8 4 2 o 0 0 53 4 0 [f1] 4 0 0 3 0 10 4 74 23 0 6 0 o 0 20 6 10 47 0 0 3 0 33 1 2 95 0 2 o 0 0 d; 0 0 0 6 0 0 0 0 2 255 54 4 3 2 2 0 o 0 0 o 0 0 0 0 0 0 0 0 37 0 70 13 0 0 0 o 0 0 0 0 0 0 o 0 0 0 0 5 71 0 0 0 0 0 0 o 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Quantized 0]- Figure 4—12 Accumulator Array Showing the peaks Figure 4-13 Test image Figure 4-12 is a portion of accumulator array for Figure 4-13. The array size in Figure 4-12 is di * 0j where i=1 to maximum_distance / 3 and j=l to 36. The digits marked with dark color shows the peaks in accumulator array, which are quite significant and can be detected easily, but if there is some diagonal line in the image plane that has some 29 irregular slope it will be hard to detect. The votes marked with light color actually represent a diagonal line and we can observe from the accumulator array that votes are spread from top to bottom in different peaks and it very hard to detect complete diagonal line. 30 _—_ __ _— __ _ . ______ __ __-_ ._ _ _.__ ._.. _ . _.q Merge the peak in accumulator array A A is accumulator array [x , y] = size ofA for d = l to x for 0 = l to y if (A[d , 0]>threshold) [i , j ] = Search_Highest_Peak( A, d , 0) k = i; while A(i,j) > 0 Merge A(k,j) and A(i+l,j);// also merge point list A(i+1,j)=0; i=i+1; end while while A(i,j) > 0 Merge A(k,j) and A(i-l,j);// also merge point list A(i-l,j)=0; i=i-l; end while end if end for end for Search_Highest_Peak( A, k, i’ ) Temp = A[k, K] i=k; i=3: while A[i,j]> 0 if A[i , j] >Temp Temp: A[i,j]; end if i=i+l; end while i=k; while A[i,j]> 0 if A[i , j] >Temp Temp: A[i,j]; end if i=i—l; end while return k, i Algorithm 4.1 Algorithm to merge peaks in the accumulator array 31 The algorithm that I designed in Algorithm 4.1 can detect diagonal lines and it’s a modified version of the technique proposed by M. Atiquzzaman and M.W Akhtar [34]. The algorithm searches for the peak in accumulator array V(d, 0) from left to right and up to down by incrementing 0]- and d; respectively. When it finds the first peak in the accumulator array that is greater then the threshold, it calls a routine to search for the highest peak in the d direction, searching from up to down and down to up and returns the position of the highest peak. Now the calling program searches all the adjacent peaks up and down and merges them into the highest peak making it more prominent. 1 0 0 9 7 7 8 o 0 0 0 0 0 o 0 7 o 0 26 0 o 0 0 o o 0 0 o 0 9 o 0 20 o 0 7 0 0 0 o 0 o o 5 o 0 0 98 o o :-' 4 0 0 3 0 o 10 o 31 0 14 0 o 0 o o 0 0 3 0 0 0 0 314 o 2 o 0 0 d.- o 0 6 0 0 0 2 Is 146 o o o 2 0 0 o 0 0 0 0 0 o o o 0 0 o 15 o 0 0 0 o o o 0 o 0 o 0 o 5 0 0 0 0 0 0 0 o 0 0 o 0 0 0 0 o o o o 0 0 0 + Quantized0j Figure 4-14 Accumulator Array showing the merged peaks I run this algorithm on Figure 4-12, which is a portion of accumulator array of Figure 4-13 by setting threshold to 4. The resultant accumulator array, is shown in Figure 4-14. The algorithm scans from left to right and up to down. The first peak in Figure 4-12 is 4 since it equals the threshold, therefore the second part of the algorithm will activated and will search for the highest in that column from up to down. When the algorithm finds the 32 highest peak in the column, such as 292, it starts merging the columns until it finds a zero. Now the peak in the first column becomes 299. Similarly, the rest of the smaller peaks are merged into other larger peaks, which makes them quite prominent. (b) Figure 4-15 (a) line detection before applying algorithm 4.1 and (b) line detection after applying algorithm 4.1 In Figure 4-15 (a) all the diagonal lines are not completely detected. Diagonal lines 7 and 8 should have been merged together but they were not, similarly lines 12 and 13 should have been merged. Horizontal lines are completely detected, whereas diagonal lines are not completely detected. The reason for incomplete detection is the same as peaks are spread in accumulator array. Line segments 7 and 8 in Figure 4-15 (3) are merged into single line segment 1 in Figure 4-15 (b), similarly line segments 12 and 13 in Figure 4-15 (a) are also merged into single line segment 7 in Figure 4-15 (b). 33 4.2.1 Line Joining It is well known that HT may be confused in detecting lines when the image size is more than 250x250. To be consistent in line detection, images greater than 192x192 are required to be processed in smaller frames of 128x128 with the overlap of 8 pixels. The overlap of 8 pixels is chosen so that information present at the intersection of the frames should not be lost. To rejoin these smaller frames into one image, a precise joining routine is required. During the joining process one must keep certain things in mind, first vertical or horizontal edges that may be in the overlap area Figure 4-16 (a) must be detected only once and those diagonal lines Figure 4-16 (b), which continue from frame 1 to frame 2, must be joined perfectly. r-------« Frame 2 Overlap Overlap (a) (b) Figure 4-16 (a) Vertical overlapping edge (b) Diagonal overlapping edge Not only does this problem require the line-joining routine but the real images such as Taj Mahal and Small Building may also require least square error fit and line-joining 34 routines to join any broken line segments detected by the Hough Transform. In practice, the line segments obtained from the Hough Transform may be fragmented and must be grouped together to obtain longer line segments. To accomplish this task, a set of closely bunched and similarly oriented straight-line segments must be found and joined together. I used least squares error fitting to join those lines segments, which are close to each other. Before joining these line segments, line segments are labeled in sequential order to reduce the computational time, as we have to compare only neighboring lines and not all the lines in the image. K(L,-, L,) < 8., 4.2 K is function that calculates the least squares fit and joins pair of lines [L,- , Lj] if least square error is less then the threshold 5“, where i and j are line labels and i at j and j e [i - 5,i+5]. 4.2.2 Line Detection Results The results of the line detection procedure for all of the test images can be found in Appendix 3. The line detection for every image in this appendix uses the default parameter of 5 pixels for the line length, any line segments less than 5 pixels are taken as noise. In this section, I will discuss the results on specific images that exemplify both the positive and negative aspects of the line detector, and in particular, view how the line length parameter influences the number of lines that are correctly detected. 35 The first images that I will investigate are hand—drawn images and test patterns. Since these images are not as complicated as natural scenes, one should expect that the line detector would perform with a high degree of accuracy. Figure 4-17 (a) (b) Shows the results of line detection on two of the test images. I consider the line detection for these images to be highly accurate. Every line that one may expect to be detected has been detected. Figure 4-18 (a) (b) shows line detection results that are partly successful. In Figure 4-18 (a), the lowermost and upper right horizontal lines are missed, and in Figure 4-18 (b), the lowermost horizontal line is split into two lines and the lines in the minarets are also not joined. The reason that these potential segments escaped detection is because their individual bins did not receive enough votes to warrant line fitting. *1 /1\ ’ i ‘1. l -..» . . \ .\ , j /1 \ \‘1 "i. K ‘1 ”I I ’ l / '\. .7“ \\ If X. x; f , l ‘ \\ \\ 1 x [I] “x / 1’ L . i / ‘\\\ ‘k k ‘ 1 K1, I, ‘ ’ X 1k L\ 1‘ / ,0" a K \\ / \ If ,a’/ //1. 'P‘\ ,/ i‘» \. f \ \/ " ~.\ I j 1 - \k j x“ .r' \ l \\ ‘.\ f) . | _.1 i \\ ‘1 '__— l ; _._. I L J 5_ I ,“g ___, ' l (a) (b) Figure 4-17 (a) and (b) Successful line detection 36 I i IL . «'I L‘ II‘L‘ I / ‘1. 1. t j \ Ia "ff ‘/ IJ/ u.\ ." l H \ ,f 1‘ l l K ’ I l t / . ‘ \ ‘ / / l ‘1 IR 7 ' ' E ,2 a 1 f ‘ . i I \ {I i ‘ j ‘1 1 \1 L , L ' x z 1 \. i i 1 1 / f 1 1 l _. l 1 '1 1L / f I j l l l l l 1 L. ‘V / r/ l 1 I} l 1 1“ / g 1 K I (a) (b) Figure 4-18 (a) and (b) Incomplete line detection As with the test images, the set of scene images exhibit both successes and failures with respect to line detection. Some of the positive line extraction examples can be seen in Figure 4-19 (a) and (b). In particular, the vertical lines formed by the columns in the Figure 4-19 (a) and the broken lines of the Figure 4—19 (b) have been successfully identified. 2‘ -_-- LL - / x I l1 l/ 1115. 1‘17. - . _.__‘ --\ \ r—~—\ I.” a. in. l"\; “V \l —. _.S _._L ET] , I , 1111 2!,1’1 : ill I, 2 i [j ‘* 1 \ lIllI '; 1 '11 ‘3" l1 ’1“1l|l 1* 1111“,. 11111“ . l‘ l a l/ j \ j . j l l I \’I\ I I1 l I) III I l ll 1, , I l \ /1 II l 17 x . 1 le . I\‘\_ I ll” K :—__.——— - - 1 1.. 111 111—— T—T/ m4 \ '_*'_ __I__V_ *- Jztigvfs 3" 5M; ; _J __ .:_ ,3 - k __L_ w“ (a) (b) Figure 4-19 (a) Lincoln Memorial and (b) Big Taj, line length = 5 37 The line extraction results from Figure 4-20 (a) and Figure 4-20 (b) shows some successful and unsuccessful line detection results. While many lines are correctly identified in Figure 4-20 (a), some of the most important lines, namely those formed by the crossbeams, are not detected. This can be attributed to the soft shadows present beneath the crossbeams. The dismal lack of line detection in Figure 4-20 (h) is due to the size of the image, only 150 x 150 pixels, and the distance of the structure from the VICWCI. “\ n ,f .' \k. ‘\, ‘1 \\ \ .f' / \ j '. T4 \\ I \L \A " X” f/ N \ I'I /’ /’H \ \‘ I‘ / 5 i K \ I e/ I ‘I \ T / "I H ‘ I‘s .- 11 ,4 \- A . . . ., .1 _-_ /’_f _. I‘\w / ___\\ ‘1_/-‘\~'\_.’ ' [fl/ \ ‘ K \ I .’ 11‘ ' 5 r _ — s. —~ I 1 r. ’ " " L, M. __ 7 . 1 l / / \\ /’ 1- . I I1 : i7 \ / \Jj .‘ N / \ \ \_ ‘ x .1 L f l (a: I\ j I _ P l f l l l 1 1 TI I ‘1 \J 1 1 i 1 1 1 g . . , ‘ .\ l g I g ,, .d H».- ‘1, 1 I = I‘ ka “L 43. i l ”1“ l _ ._. _.__ "1'1. ‘1n\1. a -, 7 __ __ L _ a” (a) (b) Figure 4—20 (a) Narobi 1 and (b) Small Taj, line length = 5 4.3 Comer Detection A comer is defined to be the point of intersection of two distinct proximal line segments making a shape of “L”. Here we will not consider corners (“_L”). The attributes of a comer consist of an angle and a location. Since lines are fit in a non-quantized sense, it is possible to define the location of a comer with sub-pixel accuracy. In order to detect comers, the line lists found in the line detection stage must be processed. 38 K(L, , L,) < 6,, 4.3 K is function that finds the comer between pair of lines [Li , L,-] if the threshold is less than 5,, , where i andj are line labels andi ¢j andj E [i — 10, i + 10]. m = (y - yj) / (x — x1) is the slope of the line L.- with two end points (y , x)(y1 ,xj). First, the slope of each line L,~ is computed with equation 4.4 and then compared against the slope of Lj (where i at j and j e [i + 10,1 — 10]). If the Slopes are commensurate, the point of intersection is computed. Because the points of interest occur as the intersection of line segments rather than lines, the computed intersection point must be compared against the initial and terminal points of each line segment. If the point is within a certain distance, the corner threshold, of either the initial or the terminal point of both segments, or if the point lies on both segments, the pair is classified as a comer. The default distance for the comer threshold is ten pixels. Essentially, the comer threshold is the distance that each line segment is extended in both directions before testing for intersections. After detecting the corner point, the measure of the comer angle is computed. AngleLiLj = tan"I ((mj-mi)/(l+m,*mj)) 4.4 where m is slope of L,- and m is slope of Lj, and i at j 39 4.3.1 Corner Detection Observations Since corner detection is dependent upon both the edge detection and line detection, there are three parameters that affect its performance: the Gaussian spread, the line length tolerance, and the corner threshold. Because the number of parameters is three, the reasons for incorrect corner detection may be difficult to pinpoint, and it may be difficult to find the best configuration of these parameters for a particular image. Upon referring to Appendix 4, the reader can verify that, for the majority of the examples, both false positives and false negatives (non-detection of corners) can be attributed to faulty line detection. A few exceptions to this, as well as some of the more successful attempts at corner detection can be seen in Figure 4-21 to Figure 4-23. Figure 4-21 Test image with all the corners detected 40 Figure 4-22 Test Pattern with one corner missing (lower left comer of the lower middle rectangle) (a) (b) Figure 4-23 Comer detection results for (3) Lincoln, 25 corners and (b) Taj Mahal, 60 comers In Figure 4-21 all of the possible corners are detected but in Figure 4-22 one corner at the left bottom comer of the lower middle rectangle is missing and it is due to the comer threshold. In Figure 4-23 (a), the Lincoln memorial scene displays very strong comer 41 evidence in the upper left and upper right hand sides of the building. While this is encouraging, there are also comers that are incorrectly detected due to the edges of the columns extending into the strong line at the bottom of the building. This can be directly attributed to the default comer threshold. However, if the threshold were to be lowered, the strong corners at the top of monument may not be detected. The comer detection for the Figure 4-23 (b) Big Taj scene also displays mixed results. Both the points of intersection on the minarets and those on the windows are promising, but all of the window corners are not successfully detected, and there are many false positives amongst the crowd of people. 4.3.2 Corner Detection Results The general observations that were made in the previous section provide evidence as to the strengths and weaknesses of the line-driven comer detector. In this section, the results of the comer detector are quantified so that an objective report of its accuracy can be made. The accuracy of the detector is determined by the confidence ratio, which is defined by R = (C-F)/E 4.6 Where C is the number of corners correctly detected, F is the number of false alarms, and E is the number of corners expected. Clearly, the number of comers that one “expects” to find in an image is not well defined. While some might define a comer to be a point with high curvature, others, including myself, consider a corner to be the potential point of intersection between two line segments. It will be impossible to make an 42 objective decision about expected comers in scene images. Therefore, only confidence ratings for test and hand drawn images will be reported. The comer detection results for these images are found in Table 4-1. The results are reported with the default values 0 = 1.0, line length = 5, and comer threshold = 10. Image Detected False Expected Confidence Corners Alarms Comers Rating Parallelogram l6 0 16 100% Test Image 35 4 32 97% Thick W 9 1 12 67% Thin W 6 0 6 100% Line Taj 16 0 25 64% Table 4-1 Confidence Rating for comer detection of hand-drawn images 4.4 Ribbon Detection Stockman and Shapiro define a ribbon as “an elongated region that is approximately symmetrical about its major axis” [3]. This definition encompasses many varied objects, including picture frames; bottles, columns, and any 2-dimensional object that posseses symmetry. When the images of such objects are projected onto a plane, the symmetry is translated into curves with reflective symmetry about an axis. When an actual cylindrical object is projected onto a plane, its symmetry is translated into parallel lines. The ribbon detection algorithm will only identify those ribbons that are the result of projected cylinders and rectangular prisms, namely straight ribbons. In this section, I will describe the algorithm for straight ribbon detection and explore the advantages and potential difficulties of ribbon detection. 43 4.4.1 Ribbon Definition For the purpose of identifying straight ribbons, a more practical definition of a ribbon must be given. To this end, I define a straight ribbon to be two lines whose edge gradients differ by l80°ilO°, have approximately the same length, approximately the same slopes and the width should be S 1.5 times the line length. To be more precise, given two lines, L1 and L2, they form a ribbon if and only if the following conditions are satisfied: 0 1700 S | 61 - 62 | 2 1900 where 191 and 62 are the respective average edge gradient of L1 and L2. l(L.) . . . . o l(L, ) 2 J 2 for IS I, j S 2 where l(L) rs the length of line segment L. 6 The projected line Ll should overlap L2 and vise versa. A detected ribbon then inherits the attributes of each of its lines, as well as a width, length, and one of two possible types. The length of the ribbon is defined to be the average of the lengths of the two lines. The width of the ribbon is the distance between the lines. 4.4.2 Type of Ribbons. Ribbons are divided into two different types: Type-l and Type-2. If the edge gradient 9, at line L1 is greater than the edge gradient 62 at line L2, the ribbon is Type-l, otherwise it is Type 2. Essentially, a Type-l ribbon is the one in which the gradient vectors of the lines pointing away from each other, and a Type—2 ribbon is one with gradient vectors pointing towards each other. Figure 4-24 gives examples of the two different types. Figure 4-24 (a) is Type-1 ribbon and (b) is Type—2 ribbon (Ll must be left of L2) 4.4.3 Ribbon Detection Results The confidence rating that was defined for comer detection can also be used to determine the general accuracy of the ribbon detection algorithm. As with comer detection, the concept of an expected ribbon is not well defined, and the determination of such ribbons in images is fuzzy at best. For this reason, confidence ratings will only be reported for the class of hand-drawn images and images that have some obvious ribbons (see Table 4-2). 45 Image Detected False Expected Confidence Ribbons Alarms Ribbons Rating Test Pattern 8 0 8 100% Thick V 2 O 2 100% Thick W 4 0 4 100% Thin W 4 O 4 100% Line Taj 9 0 9 100% Taj Mahal 24 6 26 69% Lincoln scene 44 29 24 62% Narobi 2 21 5 26 61% Table 4-2 Confidence Rating for ribbon detection Appendix 5 contains the results of ribbon detection for each of the test images. Images that yielded perfect ribbon detections are Test Pattern, Thick V, Thick W, Thin W and Line Taj. In the Thin W image, one would expect the detection of four distinct ribbons and all the four ribbons have been detected successfully as shown in in Figure 4.25. W W WW Ribbon detection for the remaining hand-drawn images performs as one might expect: if the corresponding lines are detected, then a ribbon is detected. Non-detection only occurs when there is incomplete line evidence. 46 Figure 4-25 Thin W ribbon detection The results of applying the ribbon detector to the set of scene images show varying levels of success. While most of the obvious ribbons are detected, some surprising detections are made as well. Figure 4-26 displays some of the more promising detection results. In Figure 4-26 (a) Narobi l, the strong type 1 ribbon for is clearly detected, as are the minarets in Figure 4—26 (b) Taj Mahal. The ribbons for all the minarets in Figure 4-26 (h) Taj Mahal have been detected successfirlly but the main door and some of the window’s ribbons are not detected. The most encouraging example can be found in the Figure 4-26 (c) Lincoln scene. In this image, vertical ribbons of both types are detected. (a) Narobi 1 (b) Taj Mahal (c) Lincoln scene Figure 4-26 (3), (b) and (c) Shows the detected ribbons 47 Figure 4-27 Narobi 2 showing some successful ribbon detection The scene in Figure 4-27, Narobi 2, has strong ribbon evidence in the crossbeams, and one would hope that they would be detected, but this is not the case. The problem is that some of the lines in question were never detected. Other interesting ribbons are displayed in the Figure 4-28. (a) (b) (C) Figure 4-28 Showing some successful ribbons of Narobi 2 48 5. Image Representations There are many types of features such as global, local and relational features that can be used for object recognition and matching. Global features are region based and can be obtained either by considering all points inside the region, or only those points on the boundary of the region. In any of the case, the purpose is to find the locations, intensity characteristics, color, and spatial relations of these points. Local features usually represent a small area of a region. Some local features are comers, and boundary segments. Relational features are based on relative position of different entities, such as distance between features, relative angle, orientations and precincts. These features are very important in object recognition as even change in lighting conditions, view angle and noise will not greatly influence these features. Line segments and orientation were used for object recognition by B.Huet et al. [35]. We will use relative features to define our data structures in order to get optimum and consistent performance from graph matching algorithm. Graphs are important data structures for computer vision. They are widely used to represent the neighborhood relations that exit in an image. Relational graph matching with model based segmentation for human detection was used by Ozer et al [36] for the decision of human presence in the image as well as posture recognition. A similarity based aspect graph approach was used by Christopher et a] [37] to recognize 3D objects. Section 5.1, contains the basic definitions from the graph theory that will be used in the rest of this Chapter. Section 5.2 discusses the line structure representation, Section 5.3 49 discusses the corner Structure representation and Section 5.4 is devoted to ribbon structure representation. 5.1 Basic Definitions The definitions from graph theory that are contained in this section can be found in standard textbooks on graph theory such as [38,39,40,41]. Definition 5-1 A graph G=(V,E) consists of two sets: a finite set V of elements called vertices and finite set of elements called edges. Each edge creates a binary relation between a pair of vertices. Definition 5-2 The vertices vi and Vj associated with an edge e are called the end vertices of e: e is said to be incident to its end vertices. An edge is denoted as e=(vi , vj) Definition 5-3 Two edges are adjacent if they have a common end vertex. Definition 5-4 Two vertices are adjacent if they are the end vertices of same edge. Definition 5-5 A path K in a graph is finite alternating sequence of vertices and edges, such that o Vertices v3-1 and v, are the end vertices of the edge e, where 1S i S k 0 All edges are distinct o All vertices are distinct 50 Vertices v0 and V], are called end vertices of the path, and we refer to it as vo-vk path. The number of edges in the path is called length of the path. Definition 5-6 A graph G is connected if there exists a path between every two vertices in G. Definition 57 Consider a graph G=(V , E). G’ =(\/, E’) is a sub-graph of G if \/ and E’ are, respectively, sub set of V and E such that an edge (vi , Vj) is in B, only if V, and vj are in V’. Definition 5-8 An n-vertex graph with n(n-1)/2 edges is a complete graph. Definition 5-9 The adjacency matrix of an n-vertex graph G = (V , E) is an n x n matrix A. Each element of A is either zero or one. If G is an undirected graph and V={ l,2,3,. ..,n }then the elements of A are defined as follows: lif(i,j)e Eor(j,i)€ E A(i,j): .< 00therwise \ 5.1.1 Ad jacency Matrix Since I used MATLAB 6.12 for my implementation, it is, therefore important to highlight some of the MATLAB features. MATLAB as defined by Math works group [42] “MATLAB is a high-level matrix/array language with control flow statements, functions, data structures, input/output, and object-oriented programming features. It 51 allows both programming in the small to rapidly create quickly and dirty throw away programs, and programming in the large to create complete large and complex application programs”. MATLAB does not support pointers [42], therefore, it is hard to implement a graph data structure in its classic manner. MATLAB is well known for matrix manipulation and operations. The graph data structures can also be represented in matrix form (Adjacency Matrix). Therefore, I will refer to an adjacency matrix as a graph data structure. Now let see some examples so that it’s clearer. V] e V2 6 V3 ‘ I 2 e4 63 6’5 e 7 V6 88 311 C e, i 6 V7 V8 V9 Figure 5-1 Pictorial representation of a graph G(V,E). Figure 5-1 shows the example of a graph. Vertices are represented by black spots (0), and edges by straight lines. The graph in Figure 5-1 has nine vertices: v. to v9. Therefore, the adjacency matrix will have nine rows and nine columns. The adjacency matrices of Figure 5-1 are Shown in Figure 5-2 and Figure 5-3. 52 r0 1 O l 0 0 O O 0‘ l 0 1 l l 0 0 O O 0 l 0 O l l O 0 0 l l 0 O O 0 l 0 O G: O l l O 0 l 1 l O 0 O l 0 l 0 O 0 l O 0 O 1 l O 0 l 0 O O 0 O l O 1 O 1 KO 0 0 O 0 I O 1 OJ Figure 5-2 Adjacency Matrix of Figure 5-1 r \ Q N OOOOOOOOO OOOOOOOOH OOOOOOOv—O OOOOOOOt—F— OOOOOOI—I—O OOOOt—Ob—OO ocoo-ooo OOHOI—OOOO Ot—Ov—OOOOO Figure 5-3 Adjacency Matrix of Figure Sil Figure 5-2 and Figure 5-3 are adjacency matrices of Figure 5-1. The problem with adjacency matrices is, it takes n x n memory space and if we want to have sub pixel accuracy such as distance between two lines or a comer point, then we have to declare float type and float takes 8 bytes of memory space to store each value. So the total memory space that matrix will occupy in the above-mentioned case is 8n2, where n is the number of vertices. To reduce this we have to, first declare non-float type such as unsigned character type, which takes one byte of memory space and value ranges from O to 255, and secondly we can eliminate the lower diagonal of Figure 5-2 as it is identical to the upper diagonal. To further reduce memory storage we can convert the adjacency matrix shown in Figure 5-3 into a sparse matrix [40,42]. 53 5.2 Line Structure Representation. I have divided a line structure into two groups depending upon its properties, descriptor and relationship. Descriptors of a line represent different properties of line such as line slope and edge gradient, and relationships represent distance between line segments and line orientations. For each of the descriptors and relationships we have to calculate the spatial relationship as described in detail in subsequent sections. A descriptor is a good candidate for vertex and relationship is a good candidate for an edge in a graph. A pictorial representation of line structure is as shown in Figure 5-4 Figure 5-4 Pictorial representation of line structure. Figure 5-4 ‘a’ ‘b’ and ‘c’ are properties of vertices and represent line length, line slope, edge gradient respectively and ‘d’ and ‘e’ are properties for an edge of the graph and represent distance between two lines and relative line orientation respectively. 54 5.2.1 Line Slope. Line slope is used as one of the attributes of line segment. 5.2.2 Edge Gradient. Edge gradient is one of the primitives in computer vision. It has been used as a property of lines. Since I am using a graph-matching algorithm to match different lines and line structures to achieve accurate matching results, edge gradient is used as one of the descriptors of a line. Edge gradient normally changes with the changes in lighting conditions, therefore, for consistent matching results a high threshold is used for comparison. To compute the average edge gradient Equation 5.3 is used. S — 3 E - 3 2 A(n)+ 2AM) n=m n=m 2:0 = , 5.3 LmeLength - 6 where LG represents the average edge gradient, ‘i’ represents the line number, ‘8’ and ‘E’ represent start and end points of a line respectively, ‘m’ is a midpoint of the line, ‘n’ is a pair of row and column coordinates of the line and ‘A’ is the gradient of the edge. The Sobel gradient operator normally distorts the gradient at the end-points of an edge. The degree at which it degrades the gradient depends on the size of the operator. I am using 5 x 5 Sobel gradient operator to get more accurate and consistent average gradient, average gradient is computed between end-points. Figure 5-5 (a) is a test image and Figure 5-5 (b) shows the line segments resulting from the Hough Transform and line joining routine. 55 (a) (b) Figure 5-5 (a) Test image, (b) Lines extracted from (a) In Table 5-1 descriptors such as line lengths, line slopes and edge gradients are shown for Figure 5—5 (b). Line lengths for the horizontal lines are 30 or 32, whereas their line slope is ‘0‘”, The only thing that differentiates among these horizontal lines is their edge gradient. The edge gradient divides these lines into two distinct groups. The first group has the edge gradient less than 100° and the second group has the edge gradient more then 200°. Similarly one can see diagonal lines. Line lengths range from 42 to 48 and line slope ranges from 73° to 79 °. The differences in line lengths and line slopes are less than 10 pixels and 10° respectively, which is less than the threshold that is set for the matching (thresholds for these descriptors are discussed in Chapter 6). Here also edge gradient plays an important role by dividing these lines into four groups. Diagonal lines, 1 and 7, 6 and 13, line 3, 10, 11 and 15 are grouped together. Similarly Horizontal lines 2,5,8 and 12 are grouped together and lines 4,9,14,and 16 are grouped together. Line # Line Length Line Slope Edge Gradient Remarks (Pixels) (Degree) (Degree) 1 46 77 40 Diagonal 2 32 0 94 Horizontal 3 48 79 195 Diagonal 4 32 0 267 Horizontal 5 32 0 96 Horizontal 6 42 73 18 Diagonal 7 47 77 46 Diagonal 8 32 0 9O Horizontal 9 32 0 270 Horizontal 10 44 74 200 Diagonal 11 48 79 190 Diagonal 12 32 0 95 Horizontal 13 43 75 25 Diagonal 14 32 0 269 Horizontal 15 46 74 198 Diagonal 16 30 0 267 Horizontal Table 5-1 Line descriptors 5.2.3 Distance between Line Segments. Distance between lines is another important parameter that can refine our matching process. To compute the distance, Equation 5.4 is used. 2 2 IUD:\/(xi—xj) +[yi—yj) i¢j,je[i-10,i+10] 5.4 where IUD represents the line distance adjacency matrix, ‘i’ and ‘j’ represents the line numbers, ‘D’ represents the distance between line ‘i’ and line ‘j’ in pixels and (x.,y,-) and (39,») are the mid points of line ‘i’ and ‘j’ respectively. 5.2.4 Relative Line Orientation. Line orientation means relative positions of the lines, such as ‘left’, ‘right’, ‘up’ and ‘down’. To relate their positions, midpoints are used. 57 Figure 5-6 Line Orientation In Figure 5-6, 360° is divided into four equal angles of 90° each. Up is from 45° to 135 °, left is from 135 ° to 225 °, down from 225 ° to 315 ° and right from 315 ° to 45 °. In Figure 5—6 line L2 is right of line L1. L10: K(L,, L!) i ¢j,j E [i - 10,i + 10] 5.5 where Ljo is a line orientation adjacency matrix, ‘i’ and ‘j’ are line segment label numbers and K is a function that returns the orientation of line ‘i’ with respect to line ‘j’. 5.2.5 Behavior of Line Descriptors and Relationships Predicted behavior of line descriptors and relationships are shown in Table 5-2. However, the behavior of these descriptors and relationships will be analyzed and reported in Chapter 7. 58 Orientation Descriptors] Rotation Scale Illumination Occlusion Relationships Invariant Invariant Invariant Invariant Line Slope No Yes Yes Yes Edge Gradient No Yes No Yes Direction Edge Gradient Yes Yes No Yes Magnitude Distance between Yes No Yes N 0 Line Segments Relative Line No Yes Yes Yes Table 5-2 Behaviour of Line Descriptors and Relationships 5.3 Corner Structure Representation. A geometric descriptor of the comer structure is ‘angle of a comer’, and relations that a COITICI' makes With other comers are:- 0 Distance between Comers. 0 Comer’s Orientation. 0 Comer’s Precincts. 0 Angle between Corners. The attributes of comer structure are shown in Figure 5-7. b,c,d,e b,c,d,e b,c,d,e b,c,d,e b,c,d,e Figure 5-7 Pictorial Representation of Corner Structure b,c,d,e b,c,d,e 59 In Figure 5-7 ‘a’ is a property of vertices and represents the angle of a comer and ‘b’, ‘c’, ‘d’ and ‘e’ are properties of the edge of the graph and represent distance between two comers, comer orientation, comer precincts and angle between comers respectively. 5.3.1 Angle of 8 Corner. The procedure to detect a comer and compute its angle was defined in Section 4.3. The results obtained from Equation 4.5 are stored in the adjacency matrix ’Cg-A. 5.3.2 Distance between Corners. The distance between comers is an important relationship that differentiates between comers of doors and windows and those of a building. As one can see in the Figure 5-8 (a) and (b), comers of a door and windows are relatively closer than comers of the building. We require that objects should match under small changes in the view and resolution. For instance Figure 5-8 (a) and (b) show the same building but there is change in the view and scale. Here the resolution of the image is same. (a) (b) Figure 5-8 Comers detected by the comer detection algorithm To compute the distance between different comers Equation 5.6 is used. can: K(‘C,-_,'Cj) < 8 i¢j,j e [i-10,i+10] 5.6 60 where K is function that computes the distance between comers 'C, and “C,- and returns Euclidian distance between two comers. If this distance is less than the threshold distance 5 then this distance is stored in adjacency matrix 'CU-D 5.3.3 Corner’s Orientation. As already discussed in Section 5.2.4 for line orientation, comer orientation is also used to refine the matching results. Unlike the distance between comers, comer orientation will not often be affected by view and resolution. Figure 5-9 Comer Orientation In Figure 5-9, 360° is divided into four equal angles of 90° each. Up is from 45° to 135 °, left is from 135 ° to 225 °, down from 225 ° to 315 ° and right from 315 ° to 45 °. In Figure 5-9 comer C2 is right of corner C l. “6,30: K('C,~, 'Cj) i¢j,j e [i - 10, i+ 10] 5.7 6'? where ((700 is a comer orientation adjacency matrix, I and ‘j’ are comer label numbers and K is a function that returns the orientation of comer “Ci with respect to 'Cj. 61 5.3.4 Comer’s Precincts. One of the most important relationships among comer structures is represented by the comer precincts. Precincts mean boundary, the boundary of a comer that arms (lines) make. In Figure 5-10 precincts are shown with extended doted lines. We assumed that any two comers of a window or door would close each other in their boundary as in Figure 5-10 comer Cl and comer C2 close each other in their bounds. C1 C2 ,/'/. C3 Figure 5-10 Example of comer bounds In Figure 5-10, C1, C2 and C3 are three comers. Dark thick lines show the line segments and dotted lines show the bounds of these corners. Cl and C2 bound each other, whereas C3 bounds C2 but C2 does not bound C3. 62 A1—>B Az—)B Dot Product Cross Product Dot Product Cross Product Decision Angle Sign Angle Sign 01 +1 02 +1 Outside 01 +1 02 -1 Inside 01 -1 02 +1 Inside 01 -l 02 -1 Outside Table 5-3 Decision table for arm’s orientation of a comer Corner A Corner B Type Cod Outside Outside > < 1 Outside Inside > > 2 Inside Outside < < 3 Inside Inside < > 4 Comer point ‘A’ and ‘B’ can be any comers shown in the Figure 5-10. A1 and A2 are the distant endpoints of arms of corner ‘A’. Suppose ‘A’ is ‘Cl’ and ‘B’ is ‘C2’. Now first we have to decide if comer point ‘B’ is inbound to comer point ‘A’ or not. Dot product and cross product are taken between, comer point ‘B’ and arm ‘Al’. Comer point ‘A’ is taken as origin. Similarly, dot product and cross product are taken between, corner point ‘B’ and arm ‘Az’. If both the signs of cross product are same, the comer point ‘B’ is out side the bounds of ‘A’, if signs are opposite, then comer point ‘B’ is inbound (see Table 5-3). Similarly, we can find, if comer point ‘A’ is inside the bounds of corner point ‘B’ or not. After getting both the results, Table 5-4 is consulted for the type of bounds. In this example, both the signs are opposite for ‘C2’; therefore, comer ‘C2’ is inbound to corner ‘Cl’ and similarly ‘Cl’ is inbound to ‘C2’. 63 Table 5-4 Type of bounds comer is making 5.3.5 Angle between Corners. The angle between two comers also plays an important role in matching the corner points. C2 is making 0° with C1 in Figure 5-10, whereas C1 is making 180° with C2 and Cl and C2 relate with code type 4. From this we can conclude that if the difference between the two angles is 180° and code type is 4, then it is strong evidence that these comers are of some windows or doors. c,,(.‘=K(*c,-, 'cj) i¢j,je [i-10,i+10] 5.7 where Cup is a adjacency matrix, ‘i’ and ‘j’ are comer label numbers and K is a function that returns the angle between comer ”(ii and ‘Cj. function K uses dot product to calculate the angle. 64 5.3.6 Behavior of Corner Descriptors and Relationships Predicted behavior of comer descriptors and relationships are shown in Table 5-5. However, the behavior of these descriptors and relationships will be analyzed and reported in Chapter 7. Descriptors/ Rotation Scale Illumination Occlusion Relationships Invariant Invariant Invariant Invariant Angle of a Yes Yes Yes Yes Comer Relative N 0 Yes Yes Yes Comer’s Orientation Distance Yes No Yes Yes between Comers Comer’s Yes Yes Yes Yes Precincts Angle Yes Yes Yes Yes between Comers Table 5-5 Behavior of Comer Descriptors and Relationships 5.4 Ribbon Structure Representation. The detection of ribbons is an important step towards the ultimate goal of image matching using the graph-matching algorithm. The ribbon structure required by the matching algorithm mainly uses descriptors of the line structure because a ribbon is defined as parallel lines whose edge gradients differ by 180°, are in the same relative 65 position and at least some of the portion of lines overlap each other. The descriptors used for ribbon structure are:- 0 Width of a Ribbon. 0 Line Slope. 0 Edge Gradient. 0 Type of Ribbon. Except for ‘type of ribbon’ and ‘width of a ribbon’ the rest of the above descriptors are the same as already explained in Section 5.2 and will not be discussed here. The relationship between ribbons are:- 0 Distance between Ribbons. 0 Ribbon’s Relative Orientation. Figure 5-11 Pictorial Representation of Ribbon Structure 66 In Figure 5-11 ‘a’ ‘b’ ‘c’ and ‘d’ are the properties of vertices and represents width of a ribbon, line slope, edge gradient and type of ribbon respectively and ‘e’ and ‘f’ are properties for edge of the graph and represent distance between two ribbons and ribbon’s orientation respectively. 5.4.1 Width of a Ribbon. Figure 5-12 shows two ribbons ‘A’ and ‘B’ with the shaded area between two lines. In ribbon ‘A’ and ‘B’ some of the portion of lines are overlapping each other. The difference in edge gradient is 180°. In Figure 5-12, ribbon width ‘w’ is the shortest distance between two parallel lines. w = ribbon width Figure 5-12 Ribbon Width 5.4.2 Types of a Ribbon. The types of a ribbon are already described in Section 4.4.2. 67 5.4.3 Relative Distance between Ribbons. Relative distance between two ribbons can be defined as the shortest distance between midpoints of ribbons. In Figure 5-13 distance ‘d’ is the shortest distance between two ribbons. 4 Distance ‘d’ 1 Figure 5-13 Distance between ribbon A and ribbon B Equation 5.8 shows how to generate the adjacency matrix for distance between ribbons. RUC= K(R,~ , R,) i :t-‘j,j E [i - 10, i + 10] 5.8 where joc is an adjacency matrix, 1 and ‘j’ are ribbon label numbers and K is a function that returns the distance between ribbon Ri and Rj. 68 5.4.4 Ribbon’s Orientation. Like corners and line orientation, orientation of a ribbon also plays vital role in graph matching as it helps in narrowing the match. Ribbon orientation means position of the ribbons relative to other adjacent ribbons, in terms of positions such as ‘left’, ‘right’, ‘up and ‘down’, taking midpoints of each ribbon as reference. u-‘ .5 o o u.- u . '- Figure 5-14 Ribbon Orientation In Figure 5-14 360° is divided into four equal angles of 90° each. Up is from 45° to 135°, left is from 135° to 225°, down from 225° to 315° and right from 315° to 45°. In Figure 5-14 ribbon A is right of ribbon B. R00: K(Ri, Rj) i¢j,je [i - 10,i + 10] 5.9 where R00 is a ribbon orientation adjacency matrix, ‘i’ and ‘j’ are ribbon label numbers and K is a function that returns the orientation of ribbon Ri with respect to Rj. 69 5.4.5 Behavior of Ribbon Descriptors and Relationships Predicted behavior of ribbon descriptors and relationships are shown in Table 5-6. However, the behavior of these descriptors and relationships will be analyzed and reported in Chapter 7. Descriptors/ Rotation Scale Illumination Occlusion Relationships Invariant Invariant Invariant Invariant Width of a Yes No Yes Yes Ribbon Line Slope No Yes Yes Yes Edge No Yes No Yes Gradient Type of Yes Yes Yes Yes Ribbons Distance Yes No Yes Yes between Ribbons Relative No Yes Yes Yes Ribbon Orientation Table 5-6 Behavior of Ribbon Descriptors and Relationships 70 6. Graph Matching Algorithm In most of the image retrieval algorithms, researchers have been using only one of these image features in isolation such as lines and comers with only two attributes. Lines with two of its attributes, line length and gradient, was used by Benoit Huet and Edwin R. Hancock [43] for sensitivity analysis for the problem of recognizing line patterns from large structural libraries. For object based image retrieval Yi Tao and William I. Grosky [44] also used a single feature (comer) with two attributes, location and color histogram at'the centre of mass. This thesis develops a graph-matching algorithm using three structures, (1) lines with four attributes (2) comers with five attributes and (3) ribbons with six attributes. For details of these features and their attributes see Chapter 5 and Table 3-1. Although graph matching is comparatively slower than other techniques used for matching, at the same time the accuracy in graph matching is superior. In this chapter I will try to prove that with such a large number of feature points, their attributes and relationships graph matching is robust. The matching algorithm is divided into three routines; each routine is used for matching one of the features. This chapter is divided into four sections. The overall methodology of graph matching is discussed in Section 1. In the remaining three sections, each matching algorithm is discussed along with the results obtained. 6.1 Graph Matching The term graph matching refers to the process of comparing two or more graphs with each other. In our case, we are matching a query graph with the database graph. Now before explaining the matching algorithm some of the basic terms used in graph matching 71 are introduced. If G1 is the graph for a database image and G2 is the graph for query image (G1 = { V1 , E1 } and G2 = { V2 , E2 }), then we can match graph G1 with G2 and G2 with Cl. Since the image is represented by three sets of structures (lines, comers and ribbons), a separate graph G is defined for each structure. For details refer to Chapter 5. 6.2 Proposed Graph Matching Algorithm. The algorithm shown in Algorithm 6-1 is a general algorithm that will give the framework for how the graph matching will be done in subsequent sections. Input: Graph G1 and Graph G2 Output: List of matching Vertices I For each vertex V, of graph G2 ' Find matching vertex Vj of graph G1 (by attributes) If (True) For all adjacent edges of G2 and G1 Compare vertices If (True) Increment Vote End If End For Save vertices of 61, G2 and Votes End If End For Algorithm 6-1 General graph matching algorithm The algorithm takes two inputs, graph G1 and graph G2, as graph data structures and compares each vertex of graph G1 with every vertex of graph G2 and return an array of matched vertices. The complexity of the algorithm is n x m, where n is the number of vertices of G1 and m is the vertices of G2. In this section I proposed the general algorithm for graph matching problems. Now using the same concept, I will develop 72 particular graph matching algorithms that will match the images using specific image structures. 6.3 Line-Graph Matching. The particular goal in this section is to incorporate the line structure that is defined in Chapter 5 into the matching algorithm. Before we proceed to experiment with the line matching, it is important to understand the way the matching algorithm works and some of the notation that is being used in the subsequent paragraphs. If G = (V , E) is a graph then V = { Ira, , 1:13 } are the descriptors of vertices and E = { I go , I up }are the relationships that vertices make with each other; I [a , I is represent the edge gradient and line slope respectively, and I ()0 , I 00 represent the line 9 orientation and distance between two line segments, where ‘i and ‘j’ represent the vertices and i i j. 6.3.1 Matching Algorithm Algorithm 6-2 shows the line matching routine. It requires two inputs, graph G1 and graph GZ, as line structures and returns the line-matching result. For comparison of vertices Algorithm 6-2 calls sub-routine match_vertex(.) given in Algorithm 6-4. match_vertex(.) compares all descriptors of vertices V, of graph G1 with every descriptor of vertex V,- of graph G2. If a match found, it returns ‘1; else it returns ‘0’. Based on this result, match_edge(.) sub-routine Algorithm 6-3 is called. The match_edge(.) subroutine compares all the descriptors of edges EU of graph G1 with every descriptor of edges E0- of graph G2. Now if any edge of V, matched with V], match_edge(.) calls the match_vertex(.) sub-routine for matching the other end of the edge, and on successful 73 vertex match the algorithm increments the variable ‘vote’. The loop will continue until it compares all the edges of V, with Vj. At the end it returns the ‘vote’ it gathered for successful matches. Getting control from the match_edge(.) sub-routine, the LineMatchingRoutine(.) stores lines L,- and L,- in the ‘match’ array, provided the vote is greater then zero. 74 //G=(V , E) where V={ IiL, 1.0.133 }and E={ I90 ,1er } . LineMatchingRoutine( G1 , GZ ) { n1=size of graph G1 , n2=size of graph G2 vote=0 k=l for i=1 to n1 for j=1 to n2 if(match_vertex ( Vi , Vj) )// match vertex Vi with Vj vote=match_edge(Ei( i , : ) , Ej(j , : ) , i ,j ) if vote > 0 match( k , : ) = [Li Lj vote] //Array that store the matching. results k=k+1 end if end if ‘ end for : end for ; FinalMatch = line_sel__by_vote( match ) . return (FinalMatch) Algorithm 6-2 Line Matching Routine 75 e1 is edges of Vi of G1 : 62 is edges of Vj of G2 . I is number of vertex V,- ' J is number of vertex Vj- match_edge( e1 , e2 , I , J ) { vote=0 for i=1-5 to 1+5 // compares maximum of 10 edges ' for j=J-5 to J+5 // compares maximum of 10 edges if(1:101: =Ij02 & I 1101 - 1102 I < 5n ) if ( match_vertex ( Vi , Vj )) vote=vote+l end if end if . end for '_ end for i return (vote) i { If ([115 -Izs I< OI & [Ira -Iz(; |< 82 ) // compare the //vertices return (1) else return (0) Algorithm 6-4 Vertex matching sub-routine 76 . line_sel_by_vote( LinesMatch ) { j x = size of LinesMatch ‘ for i=1:5 i=1 ‘ while j LinesMatch(j+l,3)) // compare the votes LinesMatch(j+l,:) = [ ] // delete the next line else LinesMatch(j,:) = [ ] // delete the line end x=x-1 end j=j+l end ‘ end ' LinesMatch=sortrows(LinesMatch,[2]) // sort array by row # 2 ‘ x =size of LinesMatch : for i=1:5 . while j LinesMatch(j+l,3)) // compare the votes LinesMatch(j+l,:) = [ ]; // delete the next line else LinesMatch(j,:) = [ ]; // delete the line end x=x-1; end . j=j+l; ‘ end I end return (LinesMatch ) Algorithm 6-5 Line selection sub-routine 77 After completing the comparison between all the lines of graph G1 and G2, sub- routine ‘line_sel_by_vote(.)’ given in Algorithm 6—5 is called. This sub-routine examines the ‘match’ array and if any line has more then one match it will delete all those with non maximum votes and will return the final line matching results to the parent routine. Thresholds used in the algorithms are given in Table 6-1 Algorithms Thresholds Explanation Default Setting Algorithm 6.3 8n Distance between two lines 10 pixels Algorithm 6.4 81 Slope of the line 10° Algorithm 6.4 52 Edge gradient of the line 30° Table 6—1 Different threshold used in Line Structure Graph Algorithms 6.3.2 Matching Results for Line Segments In this section, we will investigate the robustness and quality of the graph-matching algorithm. Experiments were conducting on an image database consisting of 97 images of different resolutions. In some of the images Gaussian noise was added. Images and their resolutions are given in Appendix A. The images in the database are divided into six groups depending upon their shapes and structural types. Four query images were taken from these groups and the graph-matching algorithm for lines was applied. The results are shown in Figure 6-2, Figure 6-3 and Figure 6-4. The first image shows the query image and the remaining images are the result from the graph matching algorithm. Retrieved images are shown in descending order from left to right and up to down depending upon percentage of similarity. In Figure 6-1, hand drawn Taj was given as query image. The graph-matching algorithm for line structure successfully retrieved all the images of hand drawn Taj, even 78 those with the Gaussian noise and occlusion. The Taj Mahal images were also retrieved along with some other images. This was the most successful experiment as out of eight hand drawn Taj images, seven were successfully retrieved. In Figure 6-2 the Taj Mahal was given as query. In this experiment, the algorithm successfully retrieved four out of nine Taj Mahal images. The good thing about this experiment is, it retrieved Taj Mahal images of different scale and resolution, view and noise level. Results from experiment 3 are shown in Figure 6-3. In this experiment, a Small Building was given as query and three images were successfully retrieved from its group. In experiment 4 the query image Spartan Village Apartments, was from the same group of Small Buildings. This time the algorithm retrieved six relevant images from the same group. 79 query img 100% 81% 79% m 78% 76% 76% 75% 43% Figure 6-1 Query image Drawn Taj and retrieved images 80 query img 93% 74% 62% Figure 6-2 Query image Taj Mahal and retrieved images 81 query img ~ ‘2 1' a; n . ".. r ‘1‘ ..,‘q‘ ' _ «_- ai - . . ,“ 3 - - . £9“?! 5- 'w Vaughn- ”5.1-Jr; 53 % . .'-‘ 5 . Iq L’ ‘ . ' ' I'-.. ‘ ;J_u-. —‘. $3502: WM“? . I, ,1: 63% “1"}...‘5: +£- I 1.3.... 4‘ fi': I I I V" 9+?» 1 T ‘9 H.-. ’ Ir- m“ a minima‘rrmrwaqm" 57% ,. . E} . , ,. 4 7 . ,. v by. III-2.4 7‘“ x1.- “nuamr/l' .,.I. 54 % t 3.,7 v V ‘ . . r 'u ‘ 9 7 'fl '. v 1‘ I 52% at??? .357“ a} L. V - . . N, ' . P A. . . ,. ,. r “ "" .k net‘- ‘ $4.1 ‘5; w . , «5.. . O . _ . r.“ '4 . $- . ..,n: l . z ‘ f . . v- ' 1‘ i . ¢r ”a 7" m I w “ I- ' uh ' x ¢‘ 9 'r 4 ‘x 1 a. . - - . ‘h-nn ‘ . . . ‘ftA .' ‘ "T , . .v- ‘ 4 4 '. r I-"' ‘ “ ”-5.1 a. 51°( Figure 6-3 Query image Small Building and retrieved images 82 query img 98% 58% 57% Figure 6-4 Query image Spartan Village Apartments and retrieved images 83 6.4 Corner-Graph Matching. The particular goal in this section is to incorporate the comer structure that is defined in chapter 5 into the matching algorithm and to study the effect. Before we proceed to experiment with the comer matching, it is important to understand the way the algorithm works and some of the notation that is being used in the subsequent paragraphs If G = (V , E) is a graph then V = { “CM } is the descriptor of vertex and E = { 'Cijp , “Cg-0 , “CUP , CUC }are the relationships that vertices make with each other. ‘Cm represents the comer angle and '6 Up , 'C ,j-O , CUP , 'erc represent the distance between comers, comer’s orientation, comer’s precinct and angle between comers. where i i j. 6.4.1 Matching Algorithm The graph-matching algorithm for corner structures is identical to the line structure except the attributes of vertices and edges are used for comparison. For detailed explanation, see Section 6.3.1. 84 l/G=(V , E) where V={ CA. }and E={ CD ,tp , to ,CC ' CornerMatchingRoutine( G1 , G2 ) { n1=size of graph Gl l n2=size of graph G2 vote=0 k=l for i=1 to nl V for j=l to n2 if(match_vertex ( Vi , Vj) )// match vertex Vi with Vj vote=match_edge(Ei(i , : ) , Ej(j , : ) , i ,j ) if vote > 0 match( k , : )2 [Ci Cj vote] //store the matching results k=k+1 end if end if end for . end for FinalMatch = comer_sel_by_vote( match ) return (FinalMatch) for i=I-5 to 1+5 for j=J-5 to J+5 if (Gm = = jPZ & 'Cim == j02 & l'CiDI- CjDz|<51&lCiC2 ' CiC2l ComerMatch (j+1,3)) // compare the votes ComerMatch (j+l,:) = [ ] // delete the next comer else ComerMatch (j,:) = [ ] // delete the comer end x=x-l end j=j+l end end ComerMatch =sortrows(ComerMatch ,[2]) // sort array by row # 2 l x =size of ComerMatch I' for i=1 :5 ; i=1 I while j ComerMatch (j+1,3)) // compare the votes ComerMatch (j+l,:) = [ ]; // delete the next comer else ComerMatch (j,:) = [ ]; // delete the comer end x=x-1; end j=j+1; ’ end end return (ComerMatch ) Algorithm 6-9 Corner selection sub-routine 86 Algorithms Thresholds Explanation Default Setting Algorithm 6.7 5n Comer angle 10° Algorithm 6.8 51 Distance between comers 10 pixels Algorithm 6.8 52 Angle between comers 10° Table 6-2 Different threshold used in Comer Structure Graph Algorithms 6.4.2 Matching Results for Comers In this section, we will investigate the robustness and quality of the graph-matching algorithm for comers. We conducted four experiments on the same query images as was done in Section 6.3.2. The results from the first experiment are shown in Figure 6-5. This experiment was one of the best experiments and gave a 100% recall. All the images from the group were successfully retrieved. In the second experiment, as shown in Figure 6-6, the algorithm successfully retrieved six out of nine Taj Mahal images. Retrieved images of Taj Mahal are of different scale and resolutions, view and with Gaussian noise. In experiments three and four the results are much better than the experiments three and four of graph-matching algorithm for lines. The results from these experiments are shown in Figure 6-7 and Figure 6-8 respectively. 87 query img 180% 94% 89% 88% 88% 88% 78% fiat?" -“ I. y-.. p nil). ‘fit‘l- "“1 ' . m _' -. .a h K‘ ‘1" .. It “7“. .‘iéi Figure 6-5 Query image Drawn Taj and retrieved images 88 query img 108% 78% 89% ‘ 699" 59% 55% 55% 55% ' 55% Figure 6-6 Query image Taj Mahal and retrieved images 89 query img 5,45r‘*n: was .431. ”2...; , mum, ; , ,, . Figure 6-7 Query image Small Building and retrieved images 90 query img 100% 51% I513»... ring as; "I v- —' a ._ t 4" 4 Q“ - r e g ‘2 ill: r“ " I i l‘ i ’uu u'rTii'v-vnlgvnrrry‘c. 3‘35“"); 45% ’ ..,l‘ycyynw- . . ‘ ,. .“.~'\I‘“”M... "(s \I‘.'-.' 1A.. . ~. 5;; . . . .5; r a Figure 6-8 Query image Spartan Village Apartments and retrieved images 91 6.5 Ribbon-Graph Matching. The particular goal in this section is to incorporate the ribbon structure that is defined in Chapter 5 into the matching algorithm and to study the effect. Before we proceed to experiment with the ribbon matching, it is important to understand the way the algorithm works and some of the notation that is being used in the subsequent paragraphs. If G = (V , E) is a graph then V = { I (L, I ,0, , I is , Rn } are descriptors of vertices and E = { R51) , R .70 }are the relationships that vertices makes with each other. I ,-w . I .0, , I rs , Rn represent the width of a ribbon, edge gradient, line slope and ribbon type respectively, and R up , R (,0 represent the distance between ribbons, and ribbon’s orientation , where i ¢ j. 6.5.1 Matching Algorithm The graph matching algorithm for ribbon structure is identical to the line structure and corner structure algorithms except for the attributes of vertices and edges. For details see section 6.3.1. Thresholds used in algorithms are given in Table 6-3. Algorithms Thresholds Explanation Default Setting Algorithm 6.11 51 Distance between ribbons 20 pixels Algorithm 6.12 52 Width of a ribbon 10 pixels Algorithm 6.12 53 Slope of lines 10° Algorithm 6.12 54 Edge gradient of lines 30° Table 6-3 Different thresholds used in Ribbon Structure Graph Algorithms 92 //G=(V , E) where V={ ILIGHIS , RT }}and E={ RD , R0 , RC} RibbonMatchingRoutine( G1 , G2 ) { n1=size of graph G1 n2=size of graph G2 vote=0 k=1 for i=1 to n1 for j=l to n2 if(match_vertex ( Vi , Vj) )// match vertex Vi with Vj vote=match_edge(Ei( i , : ) , Ej(j , : ) , i ,j ) if vote > 0 match( k , : ) = [R i R j vote] //store the matching results k=k+l end if end if end for end for FinalMatch = ribbon_sel_by_vote( match ) ' return (FinalMatch) Algorithm 6-10 Ribbon matching routine 5 match_edge( e1 , e2 , I , J ) { ' vote=0 I for i=1-5 to 1+5 for j=J-5 to J+5 if (RiOI == j02 & lRiDr-Rjnzl < 51 ) if ( match_vertex ( Vi , Vj )) vote=vote+1 end if end if , end for ‘ end for ' return (vote) Algorithm 6-11 Edge matching sub-routine 93 Match_vertex(vl , v2 ) { if( Rn: =R2'r & lzlw—Izwk 02 & Iris —Izs |< 53 & I110 —120 |< 84) // compare the vertex return (1) ; ribbon_sel_by_vote( RibbonMatch ) ' l x = size of RibbonMatch - for i=1:5 . i=1 ' while j RibbonMatch 0+1,3)) // compare the votes RibbonMatch 0+1,:) = [ ] // delete the next Ribbon else RibbonMatch 0,:) = [ ] // delete the Ribbon end x=x-1 end i j=j+l . end end ' RibbonMatch =sortrows(RibbonMatch ,[2]) // sort array by row # 2 i x =size of RibbonMatch ; for i=1:5 . jzl while j RibbonMatch 0+1,3)) // compare the votes RibbonMatch 0+1,:) = [ ]; // delete the next Ribbon else RibbonMatch 0,:) = [ ]; // delete the Ribbon end x=x-1; end j=j+1; 1 end end 94 6.5.2 Matching Results for Ribbons In this section, we will investigate the robustness and quality of the graph-matching algorithm for ribbons. We conducted four experiments on the same query images as was done in Section 6.3.2 and Section 6.4.2. The results from the first experiment are shown in Figure 6-9. This experiment was the best experiment among four experiments conducted and gave 100% recall. All the images from the group were successfully retrieved. In the second experiment, as shown in Figure 6-10, the algorithm successfully retrieved three out of nine Taj Mahal images. Retrieved images of Taj Mahal are of different scale and resolutions, view and with gaussian noise. In experiment three and four the results are much also better than the experiment three and four of graph- matching for lines and comers. The results from these experiments are shown in Figure 6-11 and Figure 6-12 respectively. The overall results from the graph matching algorithm for ribbons are much better than for lines and comers. 95 query img 180% 88% 82% 111.111.1I.1Il11.1| 81% 79% [11.111111 59% 8% 4 33% Figure 6-9 Query image drawn Taj and retrieved images 96 query img Figure 6-10 Query image Taj Mahal and retrieved images 97 query img 100% 54% 53% Figure 6—11 Query image Small Building and retrieved images 98 query img 75% 47% 42% 35% ' r‘ p *3 w ‘. a-‘hI' 7’1! lllll, Figure 6—12 Query image Spartan Village Apartments and retrieved images 99 6.6 Combined Results of Graph Matching Algorithm To justify our goal of employing a graph matching algorithm to retrieve images from a database consider Figure 6-13 to Figure 6-16. The results obtained from line structure, comer structure and ribbon structure are weighted by 1/3 and then added to obtain the combined results. The results indicate that by combining individual results we can improve precision. 100 query img 100% 84% 82% 82% 81% 80% 75% 49% 47% 48% 44 % 1: Nit-1 37% 35% g _ 11"“.‘i'fl‘2" HQ-EZZi-SEC‘JI? “ '4 '53.} P71"; 51°F {dig if"? 1'? l.‘ .'..- .'~-“. - ~‘ ‘ 'v :-..~ -'..',~._-. . “ . - - x - . - aw" \ . . . -.v‘ (I‘.“"'..I"" : -.,,,.,r:3 ... - 4.5.: 1 " .. 0 ' ' ‘n’ ”-1. ‘ "- '. ._ ;." - I. I . ’ fJgt-anvsutWth W‘a; -' émwiafiifltw’t‘alpi ~':‘ :55 9'. . 4 ,‘~. *4 ~-.-..' a: .; . ’ .' . b’z ' hi ‘ .1 . ' ... Figure 6-13 Query image drawn Taj and retrieved images 101 query img 78% 51% 48% ‘J‘:ra_t‘.hfi.-.Ifig-L."Mh_u.1 -- ~ . 'I"»--"-"I-..- I'L.,‘ {_IW 3"» ~ _ . .' . 44% il"lfll “_n_ui« ‘. _ :.‘ . -. or (4‘3”? “xiii-Lyric... -' ..a' ' ‘I' , . , 1.3. . ' ‘Ib’Yl‘E‘ .- ., \‘.-"-' ”.31 42% 93-555.. .5". ‘3’! ‘33 3' "u “‘1" .5“ ' ‘ I " .‘t r4 ' l I 553.1516“; In? -. turd-ii. . - . ..v ..-.--~.' .. r'.; 7’)" . . '-‘::}='-"'7I.'".'4.-.N-‘ ‘3 Figure 6-14 Query image Taj Mahal and retrieved images 102 query 1mg 88% 58% 58% ' .' r 2' . l V szw «gm-MAG.- -, , )I 'I 1““ :=._1“-.‘."'45".1 i». x. 1.1 i 4 1. . ~ »-, a, "an lit - '9 I ' .. -. . '_ i. ' . .‘ . z- - . . . R\"31:’w4;x-H.-tfl"fl vino-wt- Figure 6-15 Query image Small Building and retrieved images 103 query img 90% 45% 44% . "f-m'j-uI " "’9 mar"? .53; - 3'!" I ‘~‘ 37% ”.35., sf: '9’ 3 Eli Ir-I"II I'O-v 7%. I - g“ n. 10': ‘5‘ 1.- /-; 'IQ};u;-uy’t“: “)1 ”‘7' . ’ . 1 =-Fr":“'..'.-Ju Figure 6-16 Query image Spartan Village Apartments and retrieved images 104 6.7 Results of Aerial Images Fifteen aerial images were added to the image database. Three separate experiments were conducted for matching the lines, corners and ribbon structures respectively. Results are shown in Figure 6-17 to Figure 6-24. Results show that the line and the ribbon matches have better performance than the corner match. All the images were retrieved within the 25 % image similarity. 105 Figure 6-17 Query image ‘img92’ —line matching results 106 query img Figure 6-18 Query image ‘img88’ — line matching results 107 query img ..-.I _' . 1'} 1:19-85 _ '.-"f I 3.. I . if“ p I j . .1 }"fi'f is; was “‘5! m '~ " ..I-r-A n ' r. ‘3. _-. , ‘1‘": :- ““ L. _-., .4 .I : . .11.... iffialfw". r.“ ‘ a 40% 5‘. _._V “7" ‘W J TzIOI LE. 1 ' TV..." ‘.. 4? """""- :4“ -* i ~t, Tl'j VI 1 '1 ‘4 111 ”)1 I ';“."e; fir: “are “Ht-‘9'“! II; Figure 6-19 Query image ‘img92’ — corner matching results 108 query img ' ‘nmfi‘. ..fl\' ‘Bfig‘ 3' 50 % 49 % mm- «- " l" ~32“ .v :9- .‘afli‘ii I5“! 1 _. " .99.”? i I inf 537143.. ‘ '-}:...'-It II "J L .. i ii" 7995-1? 45.6.. km 5W :I- l r . I. - ‘ ‘raw" '— . - g , I- 1 1..» ,. , . i I ' ' y ‘I . I r rml.‘ 45% 42% ””591; If" I 't’éafzwkg": Ml... fr I“ 3.30. - . .. .t.- .. 1. r In? 1:... ”I, P71. I‘d. ‘ d 4:. ”’Vi If} v.1..." " "I t 1 .1 1M1. . .4 4311’s 5‘ ~ I i;- 2 M”, - ‘ '3‘“ ‘l' If. | W q ‘1 r ' VIII; . 1: '- I.Io '{ .fid "3 0' : ..H wig-“£1 5W - ~54 ‘~ ' -, ~. ~-;.. an. ‘ ”f “ 11 fit I. ; ‘ '_ “t. "‘ 3L? r." +3:- " "-1 Ter,” “i- n”: -- rut->1 : 1‘5- l' ' 2' 9‘ ' - harm as Figure 6-20 Query image ‘img88’ - corner matching results 109 query img Figure 6-21 Query image ‘img92’ — ribbon matching results 110 Figure 6-22 Query image ‘img88’ — ribbon matching results 111 query img 100% _V 'v1 “0‘ 1 ~u.1-',-Iu iii-43:“ l‘ §T—* . l 1‘ A a a g r -_ “‘ 'fn .:r ‘1; 5' a-‘-*'L_ 2. 9"» a V i i a: II. I): ‘ M , f" Mi Figure 6-23 Query image ‘img92’ — combined results 112 Figure 6-24 Query image ‘img88’ — combined results 113 7. Discussion and Future Work 7.1 Analysis The overall performance of the graph-matching algorithm is comparable to other CBIR matching algorithms. Although it was assumed that the performance of the proposed matching algorithm would solely depend on the feature detection process, meaning that feature extraction would be of vital importance, results show that this is not the case. As we added different levels of noise to some of the images, it negatively affected the feature extraction results but still correct images were retrieved. For instance, we added Gaussian noise of 1%, 2% and 4% to the Drawn Taj and Taj Mahal but still our graph-matching algorithm successfully retrieved these images. To test our matching algorithm against perspective/view tolerance, we also added images with different perspectives in our database, such as the Spartan village apartments, Fifth Third Bank and Small Building. The results are very encouraging. When we submit one of the images from Spartan village apartments as the query, the matching algorithm successfully retrieved eleven out of thirteen relevant images and for the small building query, six out of seven relevant images were retrieved. We also tested the algorithm for intensity variations and were successful in retrieving images of different brightness, e.g., see results for Taj Mahal and others. The algorithm was also tested for occlusion, for which we removed some of the minarets from Drawn Taj and submitted the query; the algorithm still successfully retrieved the relevant images. Results from all the three structures used in our graph-matching algorithm are quite promising and the combination results are even better. Although possible combinations of two features were not tried, it is safe to assume that results will be better than the 114 individual results. Table 7-1 and Table 7-2 summarize the results using a combination of the three features. Table 7-1 shows the results for exact matching. By exact matching, we mean that the same building with different perspectives or with added noise is used in the query. The first row of the table shows four percentages of similarity for which we evaluate image retrieval results. The remaining five show the percentage of images recalled at each of the four levels of similarity. For instance, for the Drawn Taj query, one of the seven relevant images was retrieved at the 100 % matching level, (14 %) whereas all seven relevant matches were retrieved at the 75 % matching level (100 %). Table 7-2 shows that a similarity level of 25 % may have to be used to achieve a high recall value. Similarity 100% 75% 50% 25% Total Query Images Images Drawn Taj 14% 100% 100% 100% 7 Fifth Third 25% 25% 25% 75% Bank Small Building - 12% 12% 88% 8 Spartan Village - 8% 8% 84% 13 Apartments Taj Mahal 25% 25% 50% 4 Table 7-1 Results for exact matching Table 7-2 shows the results for those images that are similar in shape. For instance, the Drawn Taj is similar to the Taj Mahal and 54% of relevant images were retrieved within 75% of similarity and 100% were retrieved within 25% of similarity. Again, 25 % similarity level will recall a significant percentage of target images. These results also show that precision will go down if exact matches are wanted. 115 Similarity 100 % 75 % 50% 25 % Total Query Images Images Drawn Taj 8% 54% 54% 100% 14 Fifth Third 11% 11% 11% 45% 9 Bank Small - 6% 67% 84% 15 Building Spartan - 8% 4% 84% 13 Village Apartments Taj Mahal - 12% 12% 25% 8 Table 7 -2 Results for similar matching Like every algorithm, our graph-matching algorithm also has some weak points. The graph-matching algorithm is slow compared to some other types of matching algorithms, such as histogram matching. The complexity of the graph-matching algorithm is n x m, where n and m are the number of features of test and query images, respectively. In Table 7-3 some of the data-structures and sizes are shown. From this, one can anticipate the relative compute time required for the corresponding matching of graphs. Table 7-4 gives total time in seconds for comparison of query image with 97 images from database. Query Images Line Corner Ribbon Structure Structure Structure Drawn Taj 21 17 16 Fifth Third 187 76 55 Bank Small Building 96 41 60 Spartan Village 128 47 28 Apartments Taj Mahal 136 60 24 Table 7-3 Graph sizes (number of vertices) for different data-structures and images 116 Query Images Time Taken for Matching in Seconds Drawn Taj 165 Fifth Third Bank 3140 Small Building 1940 Spartan Village Apartments 2100 Taj Mahal 1880 Table 7-4 Run-time for image retrieval 7.2 Conclusions Content Based Image Retrieval (CBIR) is an important problem for computer vision and computer science. This thesis presents a new geometric method for a Content Based Image Retrieval system that internally uses graph-matching for image identification and retrieval. The retrieval system is divided into three sub processes, feature extraction, data representation and graph-matching algorithm. Feature extraction and data representation are pre-processors, while the graph-matching is done online. Initially features such as lines, comers and ribbons are extracted and saved in a graph data—structure. Subsequently, when a query is submitted to the graph-matching algorithm, it first undergoes feature extraction. The algorithm then uses the extracted query feature space along with the existing data-structure database for image matching and retrieval purposes. The graph-matching algorithm that is proposed in this thesis is robust to the presence of noise, occlusion, small changes in perspective, and image brightness. The performance of the algorithm against such adverse effects has been measured via MATLAB simulations. Further, the proposed algorithm can detect abrupt changes in view and perspective; therefore, it can be also used for automatic view change detection in 117 continuous video. The current implementation is too slow to be practical. Improvements are suggested below. 7.3 Future Work Although all the desired goals of this work have been met, several new issues have been noticed, which can be dealt with in future research. To improve the retrieval performance of this prototype image retrieval system, color histogram matching and texture matching need to be integrated into the algorithm implementation. Further, one can enhance the existing algorithm for rotation invariance, if needed by the user. The ability to match the rotated images is critical for some aerial imagery applications; the current algorithm can handle only small rotations in perspective for ground geometric structures such as buildings. The performance of the algorithm for video is yet to be evaluated. Further, it is evident from the current results that the system can be used for object recognition and localization. It can be used for security purposes where finding and matching of exact buildings or other geometric features in a database of aerial imaging data may be required. Of course, runtime performance must be improved. Reprogramming in C and using hierarchical matching are two sure methods for significantly speeding up matching. 118 Appendix A 119 img1 192x192x16M jpeg é img12 221x192x16Mjpeg f irng15 221x192x16Mipeg E' img18 179x192x16M1peg .< 160x157x16Mjpeg imgza 115x109x16Mlpeg img10 176x192x16Mjpeg img13 221x192x16Mjpeg img16 206x182x16M jpeg ingl 115x109x16M jpeg im924 115x109x16Mjpeg 120 r; ,p . Eu img11 176x192x16M jpeg 'I'ng14 221x192x16Mjpeg W17 180x181x16Mjpeg 192x192x16M1peg im925 115x109x16Mjpog E F H img26 mm mm 115x109x16Mjpeg 115x109x16Mipoo 115x109x16Mjpeg ingsa 256x1ux16M jpeg mm mm lrn941 192(19de Jpeg 289x192x16M1peg 289x192x18M1peg 121 lm94 5 “946 W7 2m37fl56jpeg 19319265141909 192x192x16MJpeg 122 . warm ‘2 «1.1“ mg w l i I Q M img59 ImsG “96° azomOxrsM jpeg 192x192x16Mlpeo 3201040x16M jpeg if"964 I"1985 ir"965 30010920ij 30M92xl6Mjpog 300(192x16Mjpog im967 twee |m969 300x192xreM jpeg 295x187xl6M 1m :mmoneM jpog lmg‘lZ 32M40x16M ipeg 123 3141:... “915.393. _ ,_ r“ I _ I t“ ‘ :r I "'1" ‘11?» "III img75 img76 320x240x16M jpeg 32M40x16M ipeo fer r *2 r: ‘u" ’ ' r ‘, r." M 1! 'f I “S lmg‘lB img79 32M40x16M jpeg 32M40x16M lpeg img80 imgB‘l im982 3201940x16M jpeg 32M40x16M jpeg a.) MF.‘ - g ‘1 lrng83 lrrI984 Im986 Imger Imges 248:034x16M jpeg 235x179x16M jpeg Img89 lmgQ 216x179x16M jpeg 176x192x16M jpeg 213x179x16M jpeg 124 lm992 lmgQ 1 90x1 86x16M jpeg . ET. 011996 205x189x16M jpeg 241x187x16M jpeg 9 245x193x16M jpeg 125 Appendix B 126 Line Detection Results Corner Detection Results Ribbon Detection Results WW 1]” MI Img 1 (a) Img 1 (b) Img 1 (c) Img 12 (a) Img 12 (b) Img 12 (c) Img 16 (a) Img 16 (b) Img 16 (c) Img 29 (a) Img 29 (b) Img 29 (c) 127 Img 48 (a) Img 48 (b) Img 48 (c) Img 52 (a) Img 52 (b) Img 52 (c) Img 57 (a) Img 57 (b) Img 57 (c) 128 Img 60 (a) Img 60 (b) Img 60 (c) Img 65 (a) Img 65 (b) Img 65 (c) Img 73 (a) Img 73 (b) Img 74 (c) 129 Bibliography 10. ll. Ahmad I. and Grosky W. I. , Spatial Similarity-Based Retrievals and Image Indexing by Hierarchical Decomposition, Proceedings of the International Database Engineering and Application Symposium, Montreal, Canada, August 1997, pp. 269- 278 . Chang S. K. and Liu S. H. , Picture Indexing and Abstraction Techniques for Pictorial Databases, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 6, Number 4 (July 1984), pp. 475-484. Stockman G. and Shapiro L. Computer Vision, Prentice Hall, Inc. Upper Saddle River, New Jersey, 2001. Information on the emerging MPEG-7 standard, MPE97 http://drogo.cselt.stet.it Impeg/faq/faq_mpeg-7.htm . W. Niblack, R. Barber, W. Equitz, M.D. Flicknet, D. Glasman, D. Petkovic, and P. Yanker. 1993. The QBIC project: Querying images by content using color, texture, and shape. SPIE Proc. Storage and Retrieval for Image and Video Databases, 173-187 Bolle, R., J. Connell, N. Haas, R. Mohan, and G. Taubin. 1996. VeggieVision: a produce recognition system. Proc. IEEE Workshop on Applications of Computer Vision Swain, M J and Ballard, D H (1991) “Color indexing” International Journal of Computer Vision 7(1), 11-32 Carson C S, Belongie ,Hayit Greenspan and Jitendra Malik (1997) “Region-based image querying” in Proceedings of IEEE Workshop on Content-Based Access of Image and Video Libraries, San Juan, Puerto Rico, 42-49 H. Tamura, S. Mori, and T. Yamawaki (1978) “Textural features corresponding to visual perception” IEEE Transactions on Systems, Man and Cybernetics 8(6), 460- 472 Liu, F and Picard, R W (1996) “Periodicity, directionality and randomness: Wold features for image modelling and retrieval” IEEE Transactions on Pattern Analysis and Machine Intelligence 18(7), 722-733 Ma W Y and Manjunath, B S (1997) “Netra: a toolbox for navigating large image databases” Proc IEEE International Conference on Image Processing (ICIP97), 1, 568-571 130 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. R. W. Piccard A. Pentland and S. Sclaroff “Photobook: tools for content-based manipulation of image databases” International Journal of Computer Vision 18(3), 233-254 Chan, Y and Kung, S Y (1997) “A hierarchical algorithm for image retrieval by sketch” in First IEEE Workshop on Multimedia Signal Processing , 564-569 Chen (1996) “Indexing to 3D model aspects using 2D contour features” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, San Francisco , 913-920 Sven Dickinson, Alex Pentland and Suzanne Stevenson (1998) “Viewpoint-invariant indexing for content-based image retrieval” in IEEE International Workshop on Content-based Access of Image and Video Databases (CAIVD’98) , Bombay, India, 20-30 N. Roussopoulos, C. Faloutsos, and T. Sellis. (1988) “An efficient pictorial database system for PSQL” IEEE Transactions on Software Engineering, 14(5), 639-650 Gudivada V N and Raghavan V V (1995a) “Content-based image retrieval systems” IEEE Computer 28(9), 18-22 Charles E. Jacobs, Adam Finkelstein, David H. Salesin, "Fast Multiresolution Image ' Querying" Proceedings of SIGGRAPH 95, Los Angeles, CA (ACM SIGGRAPH Annual Conference Series, 1995), 277-286. Ravela, S and Manmatha, R (1998a) “Retrieving images by appearance” in Proceedings of IEEE International Conference on Computer Vision (IICV98), Bombay, India , 608-613 Ravela, S and Manmatha, R (1998b) “On computing global similarity in images” in Proceedings of IEEE Workshop on Applications of Computer Vision (W ACV98), Princeton, NJ , 82-87 Rabbitti, F and Stanchev, P (1989) “GRIM_DBMS: a graphical image database management system” in Visual Database Systems (Kunii, T, ed), Elsevier, Amsterdam, 415-430 Qasim Iqbal and J. K. Aggarwal “Retrieval by classification of images containing large manmade objects using perceptual grouping” in Pattern Recognition 35 (2002) 1463-1479 David A. Forsyth, Jitendra Malik, Margaret M. Fleck, Hayit Greenspan, Thomas Leung, Serge Belongie, Chad Carson, Chris Bregler (1997) “Finding pictures of objects in large collections of images” in Digital Image Access and Retrieval: 1996 131 Clinic on Library Applications of Data Processing (Heidom, P B and Sandore, B, eds), 118-139. Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. 24. Niels Haering, Zarina Myles, Niels da Vitoria Lobo (1997) “Locating deciduous trees” in Proceedings of IEEE Workshop on Content-Based Access of Image and Video Libraries , San Juan, Puerto Rico, June 1997, 18-25 25. Minka, T (1996) “An image database browser that learns from user interaction” MIT Media Laboratory Technical Report #365 26. S. Chang and W. Chen and H. Sundaram (1998) “Semantic visual templates: linking visual features to semantics” in IEEE International Conference on Image Processing (ICIP’98), Chicago, Illinois 531-535 27. TO. Binford, Inferring surfaces from images, Artif. Intel]. 17 (1981) 205-244. 28. D.G. Lowe, Perceptual Organization and Visual Recognition, Kluwer Academic Publishers, Hingham, MA, 1985. 29. HQ. Lu, J .K. Aggarwal, Applying perceptual organization to the detection of man- made objects in non-urban scenes, Pattern Recognition 25 (8) (1992) 835—853. 30. Hussain, Digital Image Processing, Ellis Horwood Ltd, England 1991. 31. Chin-Tu Chen, Jin Shin Chou, Wei Chaung Lin, and CA Pelizzari. Edge and surface searching in medical images. 1988. 32 Ramesh Jain, Rangachar Kasturi, Brain G. Schunck, Machine Vision, McGraw-Hall, Inc. 1995. 33. M.W. Akhtar and M. Atiquzzaman, “Determination of line length using Hough Transform,” Electronics Letters, vol.28, no. 1, pp-94-96, January 2, 1992 34. M. Atiquzzaman and M.W. Akhtar, "A robust Hough transform technique for complete line segment description" Real Time Imaging, vol. 1, 1995, pp. 419-426. 35. B.Huet, ADJ.Cross, ER.Hancock, “graph matching for shape retrieval”, Neural Information Processing System 98, pp 896-902, 1998 36. Burak Ozer, Wayne Wolf, Ali N. Akansu, "A Graph Based Object Description for Information Retrieval in Digital Image and Video Libraries", Accepted for publication, Journal of Visual Communication and Image Representation, Academic Press, 2002 37. Christopher M. Cyr and Benjamin B. Kimia 3D Object Recognition Using Shape Similarity-Based Aspect Graph ICCV 2001 132 38. Sartaj Shani, “Data Stucture, Algorithms, and Application in C++”, Me Graw Hill 1998 39. Michel Gondran and Michel Minoux, “ Graph and Algorithms”, A Wiley-Interscience Publication, 1984 40. Alan George, John R.Gilbert and Joseph W.H.Liu, “Graph Theory and Sparse Matrix Computation”, Springer-Verlag,1993 41. W.T.Tutte, “Graph Theory and Related Topics”, Academic Press New York, 1979 42. MathWorks Group, “Technical Documentation for MATLAB”. The MathWorks, Inc.,l984-2000 43. Huet B. and ER. Hancock, "Structural Sensitivity for Large-Scale Line-Pattem Recognition", Third International Conference on Visual Information Systems (VISUAL99), page 711-718, 2-4 June, 1999, Amsterdam, The Netherlands. 44. Yi Tao and William I. Grosky. Object-Based Image Retrieval Using Point Feature Maps. Proceedings of the 8th IFIP 2.6 Working Conference on Database Semantics (D58), Rotorua, New Zealand, January 5-8, 1999, pp. 59-73. 133 l!TTT’TTTTTTT\i