PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

DATE DUE DATE DUE DATE DUE

H

._JL
“ﬁr

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

J

MSU Is An Affirmative Action/Equal Opportunity Institution
czbimmtams-pd

MULTI-CHANN EL FILTERING TECHNIQUES
FOR TEXTURE SEGMENTATION
AND SURFACE QUALITY INSPECTION

By

Farshid Farrokhnia

A DISSERTATION
Submitted to
Michigan State University

in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Electrical Engineering

1990

Abstract

Multi-Channel Filtering Techniques for Texture Segmentation
and Surface Quality Inspection

By

F arshid Farrokhnia

This dissertation focuses on the multi-channel ﬁltering approach to texture analysis.
We combine this biologically motivated approach with analytical and signal analysis
considerations to develop powerful, generally applicable texture analysis techniques.
First, a detailed texture segmentation algorithm is proposed that uses a bank of even-
symmetric Gabor ﬁlters to represent the channels. This representation is augmented
with a systematic filter selection scheme based on an intuitive least squares error
criterion. By introducing a nonlinear stage following the linear ﬁltering operations,
a multi-scale ‘blob detection’ mechanism is created. ‘Feature images’ are then ob-
tained by computing the “energy” in a small neighborhood around each pixel, in each
‘response image’. These energy features capture the attributes of the blobs with-
out the need for extracting them. The texture segmentation experiments show that
these features can discriminate among a large number of textures, including some
artiﬁcially generated texture pairs with identical second- and third-order statistics.
Both unsupervised and supervised texture segmentation experiments are reported. In
the supervised segmentation experiments a feed-forward neural network is used, in

addition to several other classifiers.

We also develop a new technique to obtain an edge-based segmentation by com-
bining the magnitude responses of Canny edge detectors to the feature images. The
region-based and edge—based segmentation techniques each have certain weaknesses.
To eliminate these weaknesses we propose an integrated approach that combines the
region- and edge—based segmentations to produce a new, improved segmentation. The
integrated approach results in a truly unsupervised segmentation technique by elim-
inating the need for knowing the “true” number of texture categories.

Finally, we address a practical problem involving automated visual inspection
of the textural appearance of automotive metallic ﬁnishes. We address imaging and
preprocessing requirements and demonstrate that a multi-channel ﬁltering technique
can be used to successfully characterize the ﬁnish texture. Two alternative meth-
ods for grading the degree of uniformity of the ﬁnish texture are developed. The
‘texture grading’ experiments show that there is a high correlation between the tex-
ture uniformity grade and the visual scaling of the ﬁnish samples by ﬁnish inspection

experts.

To my parents Parikh-Niaz and Doursun-Bibi,
and my wife Katy

iv

Acknowledgements

I would like to express my sincere gratitude to my advisor Prof. Anil K. Jain. His
guidance and constant encouragement were crucial for keeping research objectives
in perspective and keeping me motivated. I also would like to thank Professors
Mihran Tiiceryan, John Deller Jr., Hassan Khalil, and V. Mandrekar for serving
on the guidance committee. Their criticisms and recommendations have significantly
improved the content and presentation of this dissertation. I am particularly grateful
for the many constructive discussions I had with Prof. Tiiceryan.

I also consider myself very fortunate for being part of the Pattern Recognition
and Image Processing (PRIP) Group. I would like to thank many current and former
graduate students in the PRIP Group for their support. Thanks to Dr. Chaur-Chin
Chen, Dr. Sci-Wang Chen, Dr. Patrick Flynn, Dr. Joseph Miller, Debby Trytten,
Satish Nadabar, Narayan Raja, Sally Howden, Greg Lee, Tim Newman, Jian-Chang
Mao, Sushil Battacharjee, and David Marks, our lab manager, for making PRIP
Laboratory a friendly place to work and a great environment in which to do research.
During my graduate studies at Michigan State University, I enjoyed and beneﬁted
from the friendship, help, and encouragement of my long time friends, Dr. Farzad
Esfandiari, Dr. Mani Azimi, Carl Erickson, Javad Kalantar, Dr. Mahmud Khodadad,
and many others.

I am truly grateful to Dr. David Alman, of Automotive Products Division at E.I.
Du Pont De Nemours & Company Inc., for his support throughout my Ph.D. program.
The content of Chapter 5 on quality inspection of metallic ﬁnish texture is part of
an ongoing research program supported by a grant from Du Pont. The support from
the National Science Foundation infrastructure grant CDA-8806599 was also critical.
The computing facilities acquired through this grant, were instrumental for carrying
out this research.

Last, but not least, I would like to thank my wife, Katy, whose love and support

were crucial for completing this dissertation successfully.

V

Table of Contents

List of Tables
List of Figures

1 Introduction

1.1 Vision in Man and Machine .......................
1.2 What Is Texture? .............................
1.3 Texture Analysis: A Taxonomy .....................
1.3.1 Statistical Approach .......................
1.3.2 Structural Approach .......................
1.3.3 Model-Based Approach ......................
1.3.4 Multi-Channel Filtering Approach ................
1.4 Texture Analysis Tasks ..........................
1.5 Outline‘of the Dissertation ........................

2 Multi—Channel Filtering

2.1 Biological Motivations ..........................
2.2 Analytical Considerations ........................
2.3 Existing Techniques ............................
2.4 Summary .................................

3 Texture Segmentation Using Gabor Filters

3.1 Characterizing the Channels .......................
3.1.1 Choice of Filter Parameters ...................
3.1.2 Filter Selection ..........................

3.2 Computing Feature Images ........................

3.3 Unsupervised Texture Segmentation ...................
3.3.1 How Many Categories? ......................

vi

viii

><

coooooooumm...

p—sy-a
HQ

12
12
15
17
27

28
28
28

36
42
43

3.3.2 Performance Evaluation ................ . ..... 44

3.3.3 Experimental Results ....................... 45

3.4 Supervised Texture Segmentation .................... 57
3.4.1 Segmentation Using a Neural Network Classiﬁer . . . . -. . . . 57

3.4.2 Comparison with Other Classiﬁers ............... 65

3.5 Summary ................................. 69

4 Integrating Region- and Edge-Based Texture Segmentations 71
4.1 Edge-Based Segmentation ........................ 72
4.1.1 Experimental Results ....................... 73

4.2 Integrated Approach ........................... 73
4.2.1 Experimental Results ....................... 77

4.3 Summary ................................. 80

5 Texture Analysis of Automotive Finishes 84
5.1 Metallic Finish .............................. 85
5.2 Image Acquisition and Pre‘processing .................. 88
5.2.1 Preprocessing ........................... 90

5.3 Characterization of Finish Texture ................... 92
5.3.1 Filter Functions and Parameters ................. 92

5.3.2 Texture Features ......................... 93

5.4 Grading Finish Texture Uniformity ................... 94
5.4.1 Reference-Based Grading ..................... 94

5.4.2 Regression-Based Grading .................... 101

5.4.3 Mottle, Flake-Size, and Blotchy Components .......... 105

5.5 Summary ................................. 110

6 Conclusions and Future Research 111
A Generating Filter Functions 115
Bibliography 117

vii

3.1
3.2

3.3

3.4

3.5

3.6

5.1

5.2

5.3

5.4

5.5

5.6

List of Tables

Percentage of pixels misclassiﬁed in the segmentation results. ...... 53
Percentage of misclassiﬁed pixels using a feed-forward neural network clas-
siﬁer. Only 10 training cycles were used in each case. ........... 60
Percentage of misclassiﬁed pixels using a feed-forward neural network clas-
siﬁer. 100 training cycles were used in each case. ............. 60
Percentage of misclassiﬁed pixels using the minimum Euclidean distance
classiﬁer. ................................. 66
Percentage of misclassiﬁed pixels using the minimum Mahalanobis distance
classiﬁer. ................................. 67
Percentage of misclassiﬁed pixels using the 3-NN classiﬁer. Classiﬁcation

errors for l-NN and 5-N N were essentially the same. ........... 67

Visual scale values for texture uniformity, mottle, ﬂake-size, and blotchy
appearance of panels in the LBLUE set. ................. 87
Visual scale values for texture uniformity, mottle, ﬂake-size, and blotchy
appearance of panels in the MBLUE set. ................. 87
Panels that were used as references when grading the ﬁnish texture unifor-

mity of the LBLUE and the MBLUE sets. Each set contains 13 panels. . 96
Results of reference-based grading of ﬁnish texture uniformity for the LBLUE

set. This table shows the “best” feature subsets of size 1—7 and correspond-

ing rank correlations between texture grade and visual scale. ....... 100
Results of reference-based grading of ﬁnish texture uniformity for the MBLUE

set. This table shows the “best” feature subsets of size 1—7 and correspond-

ing rank correlations between texture grade and visual scale. ....... 100
Results of regression-based grading of ﬁnish texture uniformity for the LBLUE

set. This table shows selected variables (texture features) for regression

models with 1—7 variables and corresponding coefﬁcients of determination. 103

viii

5.7

5.8

5.9

5.10

5.11

5.12

5.13

5.14

5.15

Results of regression-based grading of ﬁnish texture uniformity for the MB LUE

set. This table shows selected variables (texture features) for regression
models with 1-7 variables and corresponding coefﬁcients of determination.
Grading MBLUE set using regression model for LBLUE set. Correlation =
0.87, Rank Correlation = 0.88. ................... ‘ . . .
Grading LBLUE set using regression model for MBLUE set. Correlation :
0.76, Rank Correlation = 0.74. ......................
“Best” feature subsets of size 1—7 from the set { f3, . . . , f 14} explaining the
visual scales for the ‘mottle’ component of ﬁnish texture appearance for
panels in the LBLUE set. .........................
“Best” feature subsets of size 1—7 from the set { f3, . . . , f14} explaining the
visual scales for the ‘mottle’ component of ﬁnish texture appearance for
panels in the MBLUE set. .........................
“Best” feature subsets of size 1—7 from the set {[3, . . . , f 14} explaining the
visual scales for the ‘ﬂake-size’ component of ﬁnish texture appearance for
panels in the LBLUE set. .........................
“Best” feature subsets of size 1—7 from the set { f3, . . . , f 14} explaining the
visual scales for the ‘ﬂake-size’ component of ﬁnish texture appearance for
panels in the MBLUE set. .........................
“Best” feature subsets of size 1—7 from the set { f3, . . . , f 14} explaining the
visual scales for the ‘blotchy’ component of ﬁnish texture appearance for
panels in the LBLUE set. .........................
“Best” feature subsets of size 1-7 from the set { f3, . . . , f1’4} explaining the
visual scales for the ‘blotchy’ component of ﬁnish texture appearance for

panels in the MBLUE set. .........................

ix

103

104

105

107

107

108

108

109

109

1.1

1.2

2.1

2.3

2.4

3.1
3.2

3.3

List of Figures

Simultaneous contrast phenomenon in human visual system. The perceived
luminance, i.e. the brightness, of an object is inﬂuenced by the luminance of
its background. A square block of constant gray-level of 128 (a) surrounded
by gray-level of 32, (b) surrounded by gray-level of 192. .........
Some natural and artiﬁcial textures. (a) ‘Straw matting’ (D55) from the
Brodatz album [7]. (b) ‘Bark of tree’ (D12) from the Brodatz album. (c)
An artiﬁcial texture generated from a Gaussian Markov random ﬁeld model

[14]. (d) A synthetic texture [53]. .....................

An example of sinusoidal gratings used as stimuli in the measurement of
contrast sensitivity function. ............ . ...........
A typical contrast sensitivity function for the human visual system. (Re-
drawn from Campbell and Robson [9].) ..................
Examples of spatial ﬁlters used by Coggins and Jain [20]. The origin (u, v) =
(0,0) is at (r,c) = (32,32). (a) A frequency-selective ﬁlter tuned to radial
frequency of 16 cycles/image. (b) An orientation-selective ﬁlter tuned to 0°.
(a) An even-symmetric Gabor ﬁlter in the spatial domain. The radial fre-
quency and orientation are 8 cycles/image-width and 0°, respectively. (b)
Corresponding MTF. The origin is at (r, c) = (32,32). ..........

An overview of the texture segmentation algorithm. ...........
The ﬁlter set in the spatial-frequency domain (256 x 256). There are a total
of 28 Gabor ﬁlters. Only the half-peak support of the ﬁlters is shown.

Examples demonstrating the advantage of nearly uniform coverage of the
spatial-frequency domain by the ﬁlter set. (a) ‘Wood grain’ (D68) from the
Brodatz album [7]. (b) ‘Mandrill’. Top row: original images. Bottom row:

reconstructed images. Bothimages are 128 x 128. ............

14

14

19

22

29

31

3.4

3.5

3.6

3.7

3.8

3.9

3.10

3.11

3.12

3.13

3.13

Examples of ﬁltered images for the ‘D55-D68’ texture pair (128 x 256). (a)
Input image. (b—e) Filtered images corresponding to Gabor ﬁlters tuned to
16 \/2 cycles / image—width. (f—i) Filtered images corresponding to Gabor ﬁl-
ters tuned to 32 J2 cycles / imageowidth. All four orientations — 0°, 45°, 90°,
and 135° — for each frequency are shown. ................
(a) Filter selection by reconstruction for the ‘D55-D68’ texture pair. Note
that 13 ﬁlters alone, out of a total of 20, account for at least 95% of intensity

variations in the original textured image. (b) Filter selection by approximate

. method. This method calls for using 15 ﬁlters, which include the 13 ﬁlters

in (a). ...................................
Feature images corresponding to ﬁltered images in Figure 3.4. A a =
0.5\/2 T was used for the Gaussian averaging windows. ..........
(a) The ‘D55-D68’ texture pair. (b) Two-category segmentation obtained
using a total of 13 Gabor ﬁlters. .....................
The plot of the MH index versus number of clusters (texture categories) for
the ‘D55-D68’ texture pair. ........................
(a) A 256 x 256 image (‘GMRF-4’) containing four Gaussian Markov random
ﬁeld textures. (b) Four-category segmentation obtained using a total of 11
Gabor ﬁlters. (c) Same as (b), but with pixel coordinates used as additional
features. .........
The plot of the MH index versus number of texture categories for the
‘GMRF-4’ image shown in Figure 3.9(a). .................
(a) A 256 x 256 image (‘N at-5’) containing ﬁve natural textures (D77, D55,
D84, D17, and D24) from the Brodatz album. (b) Five-category segmenta-
tion obtained using a total 0f 13 Gabor ﬁlters and the pixel coordinates.

The plot of the MH index versus number of texture categories for the ‘Nat-5’
image shown in Figure 3.11(a). .......... . ............
(a) A 512 x 512 image (‘Nat-16’) containing sixteen natural textures (row
1: D29, D12, D17, D55; row 2: D32, D5, D84, D68; row 3: D77, D24, D9,
D4; row 4: D3, D33, D51, D54) from the Brodatz album (b) 16-category
segmentation obtained using a total of 20 Gabor ﬁlters and the pixel coor-

dinates. ..................................
(cont ’d.) ..................................

xi

34

37

41

46

46

47

48

49

49

3.14

3.15

3.15
3.16

3.17

3.18

3.19

3.20

3.21

3.22

3.22
3.23

The plot of the MH index versus number of texture categories for the ‘N at-
16’ image shown in Figure 3.l3(a). ....................
Segmentation of texture pairs that have been used in the psychophysical
studies of texture perception. All images are 256 x 256. The number of
Gabor ﬁlters used varied between 8 - 11. (a) ‘L and +’. (b) ‘Even-Odd’.
(c) ‘Triangle-Arrow’. (d) ‘S and 10’. ...................
(cont’d.). .................................
The plot of the MH index versus number of texture categories for the texture
pair images shown in Figure 3.15. (a) The plot for ‘L and +’. (b) The
plot for ‘Even-Odd’. (c) The plot for ‘Triangle-Arrow’. (d) The plot for
‘S and 10’. .................................
The feed-forward neural network used in our supervised texture segmenta-
tion experiments. The network has a single hidden layer with 10 units.

(a) The ‘Even-Odd’ texture pair (256 x 256). (b) Supervised segmentation
obtained using a feed-forward neural network. (Number of training cycles
= 100.) ..................................
(a) The ‘GMRF-4’ image (256 x 256) containing four Gaussian Markov
random ﬁeld textures. (b) Supervised segmentation obtained using a feed-
forward neural network. (Number of training cycles = 100.) .......
(a) The ‘L and +’ texture pair (256 X 256). (b) Supervised segmentation
obtained using a feed-forward neural network. (Number of training cycles
2 100.) ...................................
(a) The ‘Nat-5’ image (256x 256) containing ﬁve natural textures (D77, D55,
D84, D17, and D24) from the Brodatz album. (b) Supervised segmentation
obtained using a feed-forward neural network. (Number of training cycles
= 100.) ..................................
(a) The ‘Nat-16’ image (512 x 512) containing sixteen natural textures (row
1: D29, D12, D17, D55; row 2: D32, D5, D84, D68; row 3: D77, D24, D9,
D4; row 4: D3, D33, D51, D54) from the Brodatz album. (b) Supervised
segmentation obtained using a feed-forward neural network. (Number of
training cycles = 100.) ..........................
(cont’d.) ..................................
Percent misclassiﬁed pixels for various classiﬁers. Here, the (row, col) coor-

dinates of pixels are not used. .......................

52

54
55

56

59

61

61

62

62

63
64

3.24 Percent misclassiﬁed pixels for various classiﬁers. Here, the (row, col) coor-

4.1

4.2

4.3

4.4

4.5

4.6

4.7

4.8

dinates of pixels are used as additional features. .............

Canny magnitude images corresponding to feature images shown in Fig
ure 3.6. A a = x/2T was used for the Canny edge detectors. .......
An example illustrating the edge-based segmentation technique. (a) Input
image (‘D55-D68’). (b) Total Canny magnitude response to 13 feature im-
ages. (c) Edge-based segmentation. ....................
An example demonstrating some of the shortcomings of the current edge-
based segmentation technique. (a) Original input image. (b) Total Canny
magnitude response to 13 feature images. (c) Edge-based segmentation.
The low and high hysteresis thresholds were 0.5 and 0.8, respectively. (d)
Edge-based segmentation. The low and high hysteresis thresholds were 0.5
and 0.7, respectively. . . . . . . . . . . ‘. ................
Region- and edge-based integration results for the ‘D55-D68’ texture pair
(128 x 256). (a) Original input image. (b) Four-category region-based
segmentation (over-segmented). (c) Edge-based segmentation. (d) New
segmentation after integration. ......................
Region— and edge-based integration results for the ‘GMRF-Z’ image (128 X
256). (a) Original input image. (b) Four-category region-based segmenta-
tion (over-segmented). (c) Edge-based segmentation. (d) New segmentation
after integration. .............................
Region- and edge-based integration results for ‘D17-D77’ texture pair (128 x
256). (a) Original input image. (b) Four-category region-based segmenta-
tion (over-segmented). (c) Edge-based segmentation. ((1) New segmentation
after integration. .............................
Region- and edge-based integration results for the ‘D84-in-D68’ (256 x 256).
(a) Original input image. (b) Four-category region-based segmentation
(over-segmented). (c) Edge-based segmentation. ((1) New segmentation
after integration. .............................
Region- and edge-based integration results for the ‘GMRF-4’ image (256 x
256). (a) Original input image. (b) Six-category region-based segmentation
(over-segmented). (c) Edge-based segmentation. (d) New segmentation

after integration. .............................

68

74

75

76

78

78

79

81

4.9

5.1

5.2

5.3

5.4

5.5

5.6

5.7

6.1

Region- and edge-based integration results for a 256 x 256 image containing
ﬁve natural textures (‘N at-5’). (a) Originalinput image. (b) Seven-category
region-based segmentation (over-segmented). (c) Edge-based segmentation.

((1) New segmentation after integration. .................

The imaging setup used for acquiring images from ﬁnish samples. The
main components of the setup are a Mole-Richardson light projector and a
Panasonic CCD camera with a 90 mm macro lens. . .i ..........
Multiple imaging from a given panel. The resolution of acquired images is
close to the maximum resolution of the human eye. ............
Two examples demonstrating differences in the histograms of the acquired
images. (a) Histogram of a ﬁnish sample with ﬁne aluminum ﬂakes. (b)
Histogram of a ﬁnish sample with coarse aluminum ﬂakes. ........
(a) A 256 x 256 image of a metallic ﬁnish sample. (b — h) Filtered images
corresponding to frequency-selective ﬁlters with center frequencies at 4, 8,
16, 16 ,5, 32, 32 ﬂ, and 64 cycles/image-width. .............
Illustration of reference-based grading in a two-dimensional feature space.
The mean feature vector f of the panel to be graded is shown as ‘x’.

Box plot of reference patterns used in grading the texture uniformity of
panels in the LBLUE set. There are 16 patterns in each cluster. The “a” and
“b” sufﬁxes indicate least-uniform and most-uniform clusters, respectively.
Box plot of reference patterns used in grading the texture uniformity of pan-
els in the MBLUE set. There are 16 patterns in each cluster. The “a” and

“b” suﬂixes indicate least-uniform and most—uniform clusters, respectively.

(a) A 256 x 512 image of a‘scene containing both textured and untextured
objects. (b) Two-category segmentation obtained using a total of 16 Gabor

ﬁlters and the pixel coordinates. .....................

xiv

83

89

90

91

93

95

97

98

Chapter 1

Introduction

In computer vision or image analysis, an important goal is to “summarize” mean-
ingful information in an image, that is otherwise distributed among a large number
of pixels. For example, signiﬁcant research effort is directed toward extracting seg-
ments of an image that correspond to objects or other physical entities. For intensity
images, differences in average gray value alone are not always sufﬁcient to obtain a
segmentation. Rather, one has to rely on differences in the spatial distribution of
gray levels of neighboring pixels —- that is, on differences in texture.

This dissertation focuses on a particular approach to texture analysis known as
the multi—channel ﬁltering approach. This approach is inspired by the multi-channel
ﬁltering theory of visual information processing in the early stages of the' human vi-
sual system. First proposed by Campbell and Robson [9], the theory holds that the
visual system decomposes the retinal image into a number of ﬁltered images, each
of which contains intensity variations over a narrow range of frequency (size) and/ or
orientation. In texture analysis, such a decomposition is intuitively appealing, be-
cause it allows us to exploit differences in dominant frequencies and orientations of
different textures. When combined with analytical and signal analysis considerations,
this biologically motivated approach has the potential to produce powerful, generally
applicable techniques for texture analysis. In this dissertation, we develop and evalu-
ate several such techniques for segmenting images on the basis of texture. We model
the ‘channels’ by a bank of even-symmetric Gabor ﬁlters, and propose an intuitive
least squares error criterion for ﬁlter selection.

Texture analysis also plays an important role in industrial quality inspection
problems. In many cases, the quality of a surface is best characterized by its texture.
Texture analysis techniques have been used for controlling the quality of paper in
paper-rolling mills [17], for detecting and classifying common surface defects in wood
[23], and for determining the degree of carpet wear [84]. In this dissertation, we
also address a practical problem involving automated visual inspection of automotive

metallic ﬁnishes. We demonstrate that a multi-channel ﬁltering‘technique can be used

to successfully characterize the ﬁnish texture, and develop two alternative methods
for grading the degree of uniformity of the ﬁnish texture.

The remainder of this chapter discusses visual information processing in both
biological and artiﬁcial vision systems. We emphasize the role of texture in image
analysis or computer vision and give a taxonomy of texture analysis. techniques. We

conclude the chapter by providing an outline of the dissertation.

1.1 Vision in Man and Machine

It has been said that “a picture is worth a thousand words”. An estimated 75% of
information about our environment is obtained through our visual system [17]. With
increased reliance on visual information has come the need for visual information
processing systems that can ‘look at’ and ‘interpret’ various types of imagery. For
example, thousands and thousands of aerial reconnaissance images are taken everyday.
These images can not be screened and analyzed by human experts alone. In medical
applications, x-ray, ultrasound, or other kinds of imagery need to be processed and
analyzed quickly and reliably. Increased availability and affordability of electronic
imaging systems has made it possible to use image analysis techniques to address
these problems.

Another reason for the increasing interest in building machines that can pro-
cess and analyze images is for automation of visual inspection tasks. For industrial
products that require visual inspection, increased automation of production lines has
turned the inspection stage into a signiﬁcant bottleneck. Within the next decade, as
much as 90% of all industrial visual inspection tasks might be performed by computer
vision systems [17]. Using human inspectors does have certain advantages. Humans
can adapt to changes in the inspection requirements or to new inspection tasks very
quickly. Human versatility and judgement, therefore, make strict and detailed spec-
iﬁcation of product requirements and tolerances unnecessary. However, humans are
usually affected by fatigue, psychological state, and monotony. Machine vision sys-
terns, on the other hand, can provide reliable decisions based on objective criteria
that are not expected to change. Also, many tasks that involve working in dangerous
environments, such as mines, can be done safely and more efﬁciently by robots that
are capable of analyzing visual information.

Although visual information is not necessary for all automated tasks, integrating

visual data does have advantages. Most mechanical systems — for example, gauging
and surface inspection systems — are being replaced by optical systems that are
much faster. In other applications, integration of visual data with data that is sensed
through other sensors, such as thermal or range imagery allows for building more
robust systems.

A distinction must be made between image processing and image analysis (or
computer vision.) Digital image processing, in general terms, refers to the processing
of a two-dimensional picture (or any two-dimensional data) by a digital computer. In
image processing operations, such as image restoration or enhancement, the output
is another image. Edge detection is another common image processing operation.
Such operations are also referred to as ‘low-level’ processing. This is because the
information contained in the image is still distributed among individual pixels. For
example, in an edge image each pixel is labeled as an edge or non-edge pixel. The
aim of image analysis, on the other hand, goes beyond such operations and involves
interpreting the content of the image. When edge pixels in an edge image are grouped
together as line segments, for example, one obtains a more compact and meaningful
representation. Similarly, grouping pixels into regions to obtain a meaningful seg—
mentation of an image results in a compact description of the image. Sophisticated
computer vision systems are expected to interpret or assign labels to regions or sur-
faces in images. Image analysis therefore involves tasks such as feature extraction,
segmentation, classiﬁcation, and recognition. In these cases the output of the vision
system is usually a symbolic description of the input image.

Computer vision techniques invoke concepts from diverse ﬁelds such as optics,
digital signal processing, estimation theory, information theory, stochastic processes,
visual perception, pattern recognition, artiﬁcial intelligence, and computer graphics.
Computer vision is a rapidly evolving ﬁeld with growing applications in science and
technology. This area holds the possibility of developing machines that can perform
many of the visual functions of human beings. While many theoretical and tech-
nological breakthroughs are required before we could build such sophisticated vision
systems, there is an abundance of vision applications that can be realized through
available algorithms and hardware.

In recent years, a major trend in computer vision research has been the integra-

tion of biological and psychological studies of vision and image analysis techniques.

   

(a)

Figure 1.1: Simultaneous contrast phenomenon in human visual system. The per-
ceived luminance, i.e. the brightness, of an object is inﬂuenced by the luminance
of its background. A square block of constant gray-level of 128 (a) surrounded by
gray-level of 32, (b) surrounded by gray-level of 192.

Clearly, computer vision systems are not restricted to the limited range of the electro-
magnetic spectrum, called visible light, for input. Neither are they restricted to use
hardware or processing architectures similar to that of biological systems. However,
study of psychological and neurophysiological properties of visual systems in humans
and other living beings holds great potential for developing sophisticated algorithms
for image analysis. After all, many applications of computer vision involve developing
vision systems that can imitate human performance.

For example, it is known that two objects with different surroundings may have
identical luminance, but different brightness values [62, Sec. 6.2]. (See Figure 1.1.)
In other words, the perceived luminance, which we call brightness, depends on the
luminance of the surround. This phenomenon is called simultaneous contrast and is
caused by the fact that our visual system is sensitive to luminance contrast rather
than to absolute luminance values themselves. Such perceptual phenomena need to
be considered and incorporated in any computer vision system that is to behave like
humans. Psychophysical experiments, such as those by Campbell and Robson [9], and
Julesz and his co-workers [53, 56] have contributed a great deal to our understanding
of the mechanisms and properties of the human visual system.

As noted before, computer vision or image analysis involves tasks such as feature

extraction, image segmentation, and recognition that require interpreting the content

of an image. Image texture, which is the main subject of this dissertation, is one
of the richest sources of information for analyzing many images. Following sections
discuss different characteristics of image textures, texture analysis approaches, and

major texture analysis tasks.

1.2 What Is Texture?

Texture is an intrinsic property of all surfaces. Humans use texture features in an-
alyzing retinal images of a scene. This implies that texture is an easily understood
concept. However, it is very difﬁcult to deﬁne texture in a concise mathematical
sense. A common deﬁnition of texture describes a basic local pattern that is repeated
in a nearly periodic manner over some image region. This deﬁnition is appropriate
for ‘macrostructure’ textures — textures whose underlying patterns can be easily
detected. This deﬁnition, however, does not apply to ‘microstructure’ textures —
textures whose underlying patterns are not obvious. Surprisingly, even the more ran-
dom looking textures seem to possess a distinctive property that is readily identiﬁed
by the human observer.

The lack of a comprehensive deﬁnition of texture really stems from the lack
of good understanding of texture and texture models. The proliferation of texture
analysis techniques over the last two decades was stimulated by lack of agreement as
to how texture should be measured.

Figure 1.2 shows some natural and artiﬁcial textures. Some textures, such as
that of a rough wall surface or canvas, are perceived because of the underlying physical
structure of the surface. Others, such as the texture of a checkerboard or ruled paper,
are perceived because of the design patterns or marks on the surface. In some cases,
a collection of objects is viewed as a single textural entity, as in the case of grass
or a brick wall. An important characteristic of texture is its dependence on spatial
resolution. For example, a tiled ﬂoor is perceived to have a nearly regular (cellular)
texture whose elements are the individual tiles. But, when attention is focused on a
single tile, one perceives the random texture of the tile.

We often correlate visual textures with tactile sensations such as smoothness,
coarseness, graininess. We also describe them with adjectives such as regular, direc-

tional, line-like, etc. Intuitively, developing computational measures that correspond

 

kskkkﬁaﬂkﬁaﬁsﬁ
aakxsxwanskknk
axnuknknnknkna
kﬁaslukakﬂaﬁaa
kKkKMHZHKK‘kkkﬁ
sxkﬁlkkﬁﬁﬂkxﬁlﬁ
kﬁkkxﬂ'ka’llﬂkﬂk
Kanwxxaxwawaxx
HKHkKszkkKaka
usuallalkuklﬂﬂ
lukk’lkkﬁﬂﬁkxuu
kkusxkkkﬁlxxsk
Kﬂukaﬁlﬁkkﬂﬁ’lﬂﬂ
kkukaaaaﬁkakﬁs

 

Figure 1.2: Some natural and artiﬁcial textures. (a) ‘Straw matting’ (D55) from
the Brodatz album [7]. (b) ‘Bark of tree’ (D12) from the Brodatz album. (c) An
artiﬁcial texture generated from a Gaussian Markov random ﬁeld model [14]. (d) A
synthetic texture [53].

to such perceptual attributes of textures is very appealing. Tamura et al. [86] devel-
oped six computational measures corresponding to coarseness, contrast, directionality,
line-likeness, regularity, and roughness. They compared their measures with psycho-
logical data on 16 natural textures. The rank correlations between their measures
and the visual judgements by human subjects were between 0.65 and 0.90.

Psychological experiments have shown that human beings are capable of dis-
criminating certain types of texture preattentively; that is, by viewing the texture for
a very short period of time so as to avoid detailed scrutiny by higher-level processes.
Most of these experiments made use of computer-generated patterns or textures which
were void of familiar cues. For example, J ulesz and his co-workers used a large num-
ber of computer-generated texture pairs in their texture discrimination experiments
[52, 55, 57]. Such experiments have greatly contributed to our understanding of tex-
ture discrimination processes in the human visual system.

Haralick [43] emphasizes the interaction between tone (i.e., gray-level) and tex—
ture in an image. He points out that tone and texture are both present in the image
at the same time, but depending on circumstances one or the other may dominate.
When there is a large variation in the tonal primitives in a relatively small area in an
image, texture becomes the dominant property. Obviously, the amount of perceived
variation in the tonal primitives depends strongly on the size of the image area being
investigated. For example, in an extreme case where the area consists of a single pixel,
the texture property is completely suppressed. What characterizes image texture is,
therefore, the tonal primitives as well as the spatial relationships between them.

What is important about texture, therefore, is that it is the property of regions
or neighborhoods rather than individual pixels. The interaction between gray-levels

of neighboring pixels can therefore be used to characterize textures.

1.3 Texture Analysis: A Taxonomy

In this section we present a general classiﬁcation of texture analysis approaches, and
describe them brieﬂy. Several surveys of texture analysis techniques have appeared
in the literature. Haralick [43] compiled a comprehensive survey of statistical and
structural approaches to texture. Van Gool et al. [91] have also published a survey of
texture analysis techniques. Their survey emphasizes texture classiﬁcation techniques;

texture segmentation techniques are covered only brieﬂy. Our classiﬁcation of existing

computational approaches to texture analysis consists of the following categories:
1) statistical approach, 2) structural approach, 3) model-based approach, and 4)
multi-channel ﬁltering approach. While an unambiguous and exhaustive classiﬁcation
of texture analysis techniques is impossible, we believe that the above categories
represent a compact and descriptive classiﬁcation. In Chapter 2, we will review

multi-channel ﬁltering techniques in more detail.

1.3.1 Statistical Approach

In the statistical approach to texture analysis, the image texture is represented by a
point or a pattern in the feature space, where the features are various statistics of the
gray level distribution in the image [62, Ch. 9]. Since texture is a property of image
regions, as opposed to individual pixels, these statistics try to capture the interactions,
or dependencies, among neighboring pixel values. Well-known examples are features
computed from gray-level co-occurrence matrices [43], and autocorrelation and power

spectrum features.

1.3.2 Structural Approach

Certain textures, in particular ‘man-made’ textures, possess a high degree of regu-
larity. Textures, such as one perceived on a brick wall, can be described by their
building blocks or primitives and their placement rules. This approach to texture
analysis is referred to as the structural approach [62, Ch. 9]. Either the primitives or
the placement rules, or both, may have a random component associated with them.
Given the primitives and their placement rules, one can generate samples of the tex-
ture. Unfortunately, extracting the primitives from a given texture is not an easy
task. This difﬁculty imposes a Serious limitation on the applicability of structural

approaches to practical problems.

1.3.3 Model-Based Approach

A third approach to texture analysis, viz. the model-based approach, has received a
great deal of attention in the recent years. Model-based techniques attempt to capture
the dependencies among neighboring pixel values by ﬁtting an analytical function to

the textured image. Most model-based techniques treat texture as a realization of a

two—dimensional stochastic process, or a random ﬁeld. Once an appropriate model of
a given texture has been found, the parameters of the model would completely specify
the texture. The ability to represent a texture with a small number of parameters
makes the storage and processing of texture images extremely efﬁcient. Some of the
well-known model—based techniques for texture classiﬁcation and segmentation are
based on Markov random ﬁeld (MRF) models [3, 13, 24], mosaic models [1], and
fractals [59, 76].

1.3.4 Multi-Channel Filtering Approach

In this dissertation, we focus on a particular approach to texture analysis which
is referred to as the multi—channel ﬁltering approach. The multi-channel ﬁltering
paradigm in image analysis has received considerable attention in the past decade
[62, Ch. 6]. This paradigm is inspired by a multi-channel ﬁltering theory for process-
ing visual information in the early stages of the human visual system. According to
the theory [9], the human visual system decomposes the retinal image into a number
of ﬁltered images, each of which contains intensity variations over a narrow range of
frequency (size) and orientation. The psychophysical experiments that suggested such
a decomposition used various grating patterns as stimuli and were based on adapta-
tion techniques [9]. Subsequent psychophysiological experiments provided additional
evidence supporting the theory [28, 82].

In texture analysis, the multi—channel ﬁltering approach is intuitively appealing,
because it allows us to exploit differences in dominant frequencies and orientations
of different textures. A decomposition of the original textured image based on fre-
quency is also in agreement with the need for a multi-resolution approach to texture
analysis. The need for processing images at different scales or resolutions is well rec-
ognized in image analysis and computer vision [71]. An important advantage of the
multi-channel ﬁltering approach is that one can use simple statistics of gray values in
the ﬁltered images as texture features. This simplicity is the direct result of decom-
posing the original textured image into several ﬁltered images with limited spectral
information.

In Chapter 2, we will discuss, in more detail, the biological motivations and
analytical considerations for the multi-channel ﬁltering approach, and survey existing

multi-channel ﬁltering techniques for texture analysis.

10

1.4 Texture Analysis Tasks

Textural cues are essential for basic image analysis tasks such as image classiﬁcation
and segmentation. In texture classiﬁcation, the entire image is assigned to one of
several known categories, on the basis of its textural properties. A statistical approach
is often used to represent each image by a feature vector containing various texture
measures computed over the entire image. Texture segmentation, on the other hand,
involves identifying regions with “uniform” textures in a given image. Appropriate
measures of texture are needed in order to decide whether a given region has uniform
texture. Sklansky [85] has suggested the following deﬁnition of texture which is
appropriate in the segmentation context. “A region in an image has a constant
texture if a set of local statistics or other local properties of the picture are constant,
slowly varying, or approximately periodic.” Texture segmentation, therefore, has
both local and global connotations —— it involves detecting invariance of certain local
measures or properties over an image region.

Compared to texture classiﬁcation, texture segmentation is a much more difﬁcult
problem. In texture segmentation the number of texture categories that are present in
an image and the information about the size, shape, and number of textured regions
often are not known a priori. In fact, some texture segmentation problems have more
than one possible solution, and determining the “correct” segmentation depends on
the goal of the analysis and may require additional knowledge of the scene. Texture
segmentation may be achieved in one of two ways. A region-based segmentation is
obtained by identifying regions with homogeneous textures. This is usually done by
computing texture measures for each pixel (or block of pixels) and assigning pixels
with similar measures to the same texture category. An edge-based segmentation, on
the other hand, is obtained by detecting the boundaries between the textures.

In addition to image classiﬁcation and segmentation, gradients of texture prim-
itives — density gradient, area gradient, and aspect-ratio gradient —— can be used to
estimate the orientation of a surface patch in the scene. That is, to extract three-
dimensional information from a two-dimensional image. This so called problem of
shape-from-texture is a difﬁcult and challenging problem, because it requires that
both texture and the change (gradient) in texture be characterized simultaneously.

In a recent paper, Blostein and Ahuja [4] review the problem of shape-from-texture

11

and propose a new technique that integrates extraction of texture elements with esti-
mation of surface orientation. Coggins and Jain [21] have explored the effect of texture

gradients on texture measures obtained using a multi-channel ﬁltering technique.

1.5 Outline of the Dissertation

The remainder of this dissertation is organized as follows. In Chapter 2, we discuss
biological motivations and analytical considerations for a multi-channel ﬁltering ap-
proach to texture analysis, and survey the existing techniques. Chapter 3, describes
the main components of our proposed texture segmentation algorithm. The choice of
ﬁlter parameters, ﬁlter selection, computation of texture features, and the procedure
used to integrate the feature images are described. We report both unsupervised and
supervised texture segmentation experiments. In Chapter 4, we combine our region-
based texture segmentation technique with an edge-based segmentation technique
to eliminate the need for knowing the exact number of texture categories. Auto-
mated visual inspection of metallic ﬁnish texture is described in Chapter 5. Finally,
in Chapter 6, we summarize the results and contributions of the thesis and discuss

future research directions.

Chapter 2

Multi—Channel Filtering

2.1 Biological Motivations

In psychophysim, early attempts to model the human visual system focused on its
overall input /output characteristics [62, Sec. 4.3]. To measure its “transfer function”,
for example, sinusoidal (sine—wave) gratings with different spatial-frequencies were
used as visual stimuli1 (Figure 2.1). Optical systems are often characterized by their
modulation transfer function (MTF), which is determined by comparing some quan-
titative measurement of the input with that of the output. For sinusoidal gratings, a
common measure is contrast, C, which is deﬁned by
C = [max — 1min
[max + 1min ’

where I...“ and 1min, respectively, are the maximum and minimum intensities of the
grating. Thus, one way to deﬁne the MTF of the human visual system is

output contrast

 

H(u) =

input contrast °

Unfortunately, it is not possible to measure the output contrast! The practical alter—
native is to use a psychological measurement known as contrast sensitivity. Experi-
mentally, the contrast sensitivity is measured as follows [62]. The subject views the
stationary sinusoidal gratings on a display which allows him/her to vary the contrast
without changing the average intensity. For each frequency u, the threshold con—
trast C¢(u) necessary to barely distinguish the grating from a uniform illumination is
measured. The contrast sensitivity function (CSF) is then deﬁned as:

1
CSF(u) = m
Figure 2.2 shows a typical CSF. Clearly, the CSF is an oversimpliﬁed, “black box”

representation of the human visual system. Nonetheless, it serves as a useful qual-

itative measure of the sensitivity of the human visual system to visual patterns of

 

lSpatial-frequencies are commonly given in cycles per degree of visual angle, although cycles per
centimeter of test pattern or cycles per millimeter on the retina may be used.

12

 

 

 

i i i fo in
Spatial-frequency (cycles/deg)

Figure 2.1: An example of sinusoidal Figure 2.2: A typical contrast sensitiv-

gratings used as stimuli in the measure- ity function for the human visual sys-
ment of contrast sensitivity function. tem. (Redrawn from Campbell and Rob-
son [9].)

different frequencies.

The evidence for multiple ‘channels’, as opposed to a single channel, in the
human visual system comes from psychophysical as well as psychophysiological ex—
periments. Campbell and Robson [9] proposed that the visual system decomposes
the retinal image into a number of ﬁltered images, each of which contains intensity
variations over a narrow range of frequency (size) and orientation. The psychophys-
ical experiments that suggested such a decomposition used various grating patterns
as stimuli and were based on adaptation techniques [9].

Other experiments veriﬁed the frequency and orientation tuning properties of
certain cells in the visual cortex of some mammals. De Valois et al. [28], for example,
recorded the response of simple cells in the visual cortex of the Macaque monkey to
sinusoidal gratings with diﬂerent frequencies and orientations. It was observed that
each cell responds to a narrow range of frequency and orientation only. Therefore, it
appears that there are mechanisms in the visual cortex of mammals that are ‘tuned’
to combinations of frequency and orientation in a narrow range. These mechanisms
are often referred to as ‘channels’ and are appropriately interpreted as band-pass
ﬁlters.

More recently, Beck et al. [2] reported psychophysical experiments on texture

segmentation using patterns containing squares with different gray levels or different

14

colors. They conclude that the results of their experiments “support the argument
that the higher order processes in texture segregation have access to information
corresponding to the outputs of the spatial frequency channels”.

Earlier experiments aimed at characterizing the channels focused on measuring
the center frequencies and the frequency and orientation bandwidths of the channels.
Several functional characterizations of channels in the frequency domain evolved from
these experiments [38, 83]. More recent characterizations, however, have been largely
based on psychophysiological data. In particular, some ﬁlter characteristics have been
obtained by ﬁtting band-limited functions to the receptive ﬁeld proﬁles of simple cells
in the visual cortex of some mammals [25, 70, 95]. In signal processing and systems
theory terminology, a receptive ﬁeld proﬁle can be interpreted as the impulse response

of a cell.

2.2 Analytical Considerations

The signiﬁcance of frequency (size) and orientation cues for analyzing textures moti-
vates the following view of the problem of texture analysis. Texture analysis, in the
early stages, relies on local frequency and orientation measurements.

Once we accept this view, the question becomes “how do we measure it?” For
simplicity, we present our analysis in one-dimensional space. The extension to two-
dimensions is usually straightforward since it can be viewed as one-dimensional fre-
quency estimation along different orientations. The l-D analysis, therefore, can read-
ily be extended to 2-D. Let s(x) be the 1-D textured “image”. To further facilitate
our analysis, let us assume that s(x) is continuous and may have inﬁnite extent.

Fourier decomposition (transform) of s(x) provides one way to represent its

frequency content.

5(a) = [+00 s(x)e-J'~’m dx (2.1)

However, Fourier transform estimates the frequency globally. As seen in its deﬁnition,
each frequency is inﬂuenced by s(x) at all x values. That is, we can not tell the
location from which the frequencies come. For texture analysis tasks, such as texture
segmentation, we are interested in the frequency content in small regions around each

pixel. One way to localize the estimation of frequencies is to use a window Fourier

l5

transform, which is deﬁned by
+00

Sw(u, C) = / s(x) w(x — C) e‘ﬂm‘” dx , (2.2)

-00

where w(x) is a low-pass function. When w(x) is a Gaussian function, the window
Fourier transform is referred to as a Gabor transform [36]. The notion of localized
frequency measurement is closely related to combined space-frequency image repre-
sentations. Porat and Zeevi [79] have provided a thorough analysis of image repre-
sentation using Gabor elementary functions (GEF). Bovik [5] addresses optimality
criteria for channel ﬁlters, where each “narrow-band” ﬁlter is expressed as the prod-
uct of an equivalent low-pass ﬁlter with a complex sinusoidal plane wave. As a result,
the ﬁltering operations are reminiscent of window Fourier transforms. Also, Reed
and Wechsler [80] use a different “spatial/spatial-frequency” representation, based
on Wigner distribution, to study the texture segmentation and clustering/ grouping
problems.

The ability to localize frequency estimation comes at the expense of more com-
plexity. In particular, window Fourier transforms do not result in an orthogonal
decomposition of s(x). Computing such decompositions, therefore, is not straightfor-
ward [68]. Daugman [27] has proposed a neural network architecture for computing
optimal coefﬁcients in arbitrary two-dimensional transforms.

It is well known in signal analysis that there is a trade-off between the effective
width of a localized signal (pulse) in the time domain and its bandwidth in the
frequency domain [45]. Signals with short durations in the time domain will have large
bandwidths in the frequency domain, and vice versa. The bandwidth and duration
of a signal can be deﬁned in several different ways. However, the inverse relationship
between duration and bandwidth applies irrespective of these deﬁnitions [45, p. 40].
This inverse relationship implies a trade-off between spread or “uncertainty” of a
localized signal in the time and the frequency domains.

A similar trade-off applies to two-dimensional signals. The uncertainty principle
[26] relates the detection and localization performance of a ﬁlter. Texture analysis
tasks such as segmentation require simultaneous measurements both in the spatial
and the spatial—frequency domains. High resolution in the spatial-frequency domain is
desirable because it allows us to make ﬁner distinctions among different textures. On
the other hand, accurate localization of texture boundaries requires high resolution

in the spatial domain.

16

In a window Fourier transform, the effective width and bandwidth of the basis
functions are determined by the window function w(x). A Gaussian window function
minimizes the joint uncertainty in the time and the frequency domains [36]. Texture
analysis considerations, therefore, point to Gabor transform (decomposition) as the
ideal means of frequency estimation. In the context of texture segmentation, however,
smaller width for basis functions with higher spatial-frequencies will result in better
localization of the texture boundaries. This suggests that the widths of the basis
functions should be inversely proportional to their frequency. In other words, they
should have a constant bandwidth on the logarithmic scale. When this is the case,
the window Fourier transform becomes a wavelet transform [67, 68] deﬁned as follows.

Sw(u, C) = [:0 s(x) ﬁw(u (x - C)) e’ﬂ’mx dx. (2.3)
In Chapter 3, we pr0pose a texture segmentation algorithm that models the ‘channels’
by a set of two-dimensional Gabor ﬁlters. It will be shown that the ﬁlters constitute
an approximate orthogonal basis for a wavelet transform, with the Gabor function as

the wavelet.

2.3 Existing Techniques

The main issues involved in the multi-channel ﬁltering approach to texture analysis

in general, and texture segmentation in particular, are:
1. functional characterization of the channels and the number of channels,
2. extraction of appropriate texture features from the ﬁltered images,
3. the relationship between channels (dependent vs. independent),

4. integration of texture features from different channels to produce a segmenta-

tion, and
5. segmentation method (region-based vs. edge-based).

Different multi-channel ﬁltering techniques that are proposed in the literature vary
in their approach to one or more of the above issues.
In this section, we survey existing multi-channel ﬁltering techniques. Our em-

phasis is on various characterizations of the ‘channels’ and on texture segmentation

l7

algorithms. The background developed in this chapter will set the stage for present-
ing our proposed texture segmentation algorithm in Chapter 3, which uses a bank
of Gabor ﬁlters. In Chapter 5, we use isotropic frequency-selective ﬁlters [19, 20]
to analyze textural appearance of metallic ﬁnishes. Gabor ﬁlters and the isotropic
frequency-selective ﬁlters are both described in this section.

Earlier multi-channel ﬁltering techniques used the spatial-frequency domain
characterization of the channels based on psychophysical and psychophysiological
data. F augeras [33] used the spatial-frequency domain characterization of the chan-
nels by Sakrison [83], which consisted of bandpass ﬁlters with both frequency- and
orientation-selective properties. The modulation transfer function2 (MTF) of these

ﬁlters, in polar coordinates, is given by

H(f30) = Hr(f) Ha(6), (24)

where

”rm = {Li—fit“,

110(0) = exp{_%w}+exp{__1_(9—00—7r)2},

 

b 2 b

f0 and 00 determine the center radial frequency and orientation of the ﬁlter. w and
b, on the other hand, determine the radial and angular bandwidths, respectively.

F augeras computed the texture features by taking the sixth-order norm of the
pixel values in a ﬁltered image and then averaged them over the entire image. He
chose the sixth-order norm because it contains “information about the phase” in the
input image, and because it offers a good compromise between too much averaging of
details (corresponding to the Euclidean norm) and the ability to detect isolated noise
spikes (corresponding to the inﬁnity norm). Faugeras used a total of 27 ﬁlters - three
radial frequencies and nine orientations. He showed the potential of these features by
constructing texture classiﬁcation experiments, but he did not give any algorithm for
texture segmentation.

Coggins [19] used a different set of ﬁlters that are also speciﬁed in the spatial-

frequency domain. Each ﬁlter has either frequency-selective or orientation-selective

 

2The modulation transfer function of a ﬁlter speciﬁes the amount by which it modulates the
magnitude of each frequency component of the input image.

18

property only. The MTFs of the frequency-selective ﬁlters are given by

‘f 2 2_ 2
H(u,v)=exp{—%(ln u +1; In“) }

0'1

 

(”i”) # (070) a (2'5)

where p is the center radial frequency and 01 determines the bandwidth of the ﬁlter.
Note that these ﬁlters are deﬁned on a logarithmic scale. The MTFs of the orientation-

selective ﬁlters, on the other hand, are given by

H(u.v) = exp{--;--4$—)} (um) ¢ (0,0). (2.6)

02

where
. v v
A(u,v) : Min {Han-1(5) — a], | tan"1(;) — (a + W)]} .

Here, 0 5 tan“(-) < 1r, (1 (in radians) is the center orientation, and 02 determines
the orientation bandwidth of the ﬁlter. The value of MTFs at (u, v) = (0, 0), for both
types of ﬁlters, was set to 1. So the mean gray value of each ﬁltered image was the
same as that of the input image.

The ﬁlter set used by Coggins [19] and Coggins and Jain [20] contained four
orientation-selective ﬁlters tuned to 0°, 45°, 90°, and 135°. The number of frequency-
selective ﬁlters in the ﬁlter set depended on the size of the image array. For a 128 x
128 image array, for example, they used six frequency-selective ﬁlters with center
frequencies at 1, 2, 4, 8, 16, 32, and 64 cycles/ image. Two examples of these ﬁlters
are shown in Figure 2.3. Coggins and Jain demonstrated the utility of these ﬁlters
for texture classiﬁcation and segmentation. For texture classiﬁcation, they use the
average absolute deviation (AAD) from the mean gray value of each ﬁltered image as

texture features. The AAD feature for ﬁltered image ok(x, y) is computed as follows:

1 N,- NC
N N Z Z |0k(a, b) .- gk it (27)
" C a=1 b=l

 

fk=

where N, and Nc are the number of rows and columns of the image array, and g). is
the mean gray value of the ﬁltered image3. Clearly, the number of texture features
used depends on the number of ﬁlters, since there is one AAD feature corresponding

to each ﬁltered image.

 

3As pointed out earlier, the mean gray value (gt) of each ﬁltered image is the same as that of
the input image.

          
    

  
   
      

         

 

M3 s ”on". i;;;;" n
33 M$§¢ “0"; ghﬁhﬁ
1 0 0 0 0 0 \\\\\\\\\ \Q?‘ 0...... 0:0,, ”II/III, ”If, I
l , '1 [[1135] \\\\\\ ”3" "(I {I’ll/I, :1 \\\\\\\\\
,nIf’ III "" 34'" ‘4 .. ". 3: “‘\\\\§ \\‘\\\‘:‘\.
,ﬂfllgl

   

'Il'o'o on
"0 0.00;."
ll"? ”:0. ”z”.
0 00000

N\\\\
:9 ‘z‘ “\‘\s\i
ozozozzzg‘n ‘3

   

 

25111,, I’ll: :Iz
2,?

51

1 00000

 

(b)

F lgure 2 3 Examples of spatial ﬁlters used by Coggins and Jain [20]. The origin
(u,v) = (0,0) is at (r, c) = (32, 32). (a) A frequency-selective ﬁlter tuned to radial
frequency of 16 cycles/image. (b) An orientation-selective ﬁlter tuned to 0°

20

Coggins and Jain use the AAD features also in their texture segmentation algo-
rithm. However, instead of averaging over the entire image array, the AAD feature
is computed over small overlapping windows around each pixel and is assigned to
the center pixel. This local averaging process results in one ‘feature image’, ek(.r, y),

corresponding to each ﬁltered image, ok(x,y). That is,

ek(~'v,y) = ri— Z I ok(a,b) — a I. (2.8)
(0.5)6Wzy

where WW is an M x M window centered at location (x,y). The collection of feature
images, therefore, deﬁnes one feature vector (pattern) for each pixel in the original
image. The following two-step procedure is used to obtain a segmentation. First, a
pattern clustering algorithm is used to group a small subset of these patterns into a
given number of clusters, and a generic label is assigned to patterns in each cluster.
These labeled patterns are then used as ‘training patterns’ to classify all patterns
(pixels). Coggins [19] and Coggins and Jain [20] successfully applied this algorithm
to segment images containing natural as well as artiﬁcial textures. Jain [48] demon-
strated the ability of the algorithm to segment images that contained artiﬁcially

generated texture pairs with identical second- and third-order statistics.
More recently, a number of texture segmentation algorithms have been pro-
posed that use two-dimensional Gabor ﬁlters. A Gabor function consists of a sinu-
soidal plane wave of some frequency and orientation, modulated by a two-dimensional

Gaussian envelope. A “canonical” Gabor ﬁlter in the spatial domain is given by

1 x2 31’ .
h(x,y) = exp ——§- :5 + :2- cos(27ruox + (15) , (2.9)
r y

where uo and 43 are the frequency and phase of the sinusoidal plane wave along
the x-axis (i.e. the 0° orientation), and at and 0,, are the space constants of the
Gaussian envelope along the x- and y-axis, respectively. A Gabor ﬁlter with arbitrary
orientation, 00, can be obtained via a rigid rotation of the x-y coordinate system.
These two-dimensional functions have been shown to be good ﬁts to the receptive
ﬁeld proﬁles of simple cells in the striate cortex [70, 25].

As a spatial ﬁlter, we are interested in the frequency- and orientation-selective
properties of a Gabor ﬁlter. These properties are more explicit in the frequency

domain representation of a Gabor ﬁlter. With 43 == 0, the Fourier transform of the

21

Gabor function in (2.9) is real-valued and is given by

2 2 2 2
11...): A (“d-é l—‘" 5") +331} +exp{-% l—‘“°’ +317“),
(2.10)
where 0,, = 1/27roz, 0,, = 1/21roy, and A = 27roxay. Figure 2.4 shows an even—
symmetric Gabor ﬁlter and its MTF, in a 64 X 64 array.

An important property of Gabor ﬁlters is that they simultaneously achieve op-
timal joint localization, and hence resolution, in both the spatial domain and the
spatial-frequency domain. Gabor [36] showed that one-dimensional Gabor functions
uniquely achieve the lower bound of the uncertainty relationship Ax Au 2 1/47r,
where Ax and Au are the effective width and bandwidth of the signal in the one-
dimensional spatial domain and the spatial-frequency domain, respectively (measured
by the square root of the variance of the energy functions). Daugman [26] extended
this result to two-dimensional Gabor functions, by showing that they uniquely achieve
the lower bounds in the uncertainty relationships Ax Au 2 1/47r and Ag Av Z 1/47r.
Here, Ax and Ag are the eﬂective widths in the spatial domain, and Au and Av
are the bandwidths in the spatial-frequency domain. Texture analysis tasks such as
segmentation require simultaneous measurements both in the spatial and the spatial-
frequency domains. The above optimum property suggests that the Gabor ﬁlter is
an ideal “tool” for analyzing textures. (See Section 2.2.)

Turner [90] used a set of Gabor ﬁlters and demonstrated their potential for
texture discrimination. The ﬁlters had four different frequencies, four orientations,
and two quadrature phase pairs for each combination of frequency and orientation ~—
a total of 32 ﬁlters. The ﬁlters were generated in the spatial domain. The spatial
extent of all the ﬁlters was the same — they all had identical, circularly symmetric,
Gaussian envelopes. The coefﬁcients of each ﬁlter function were adjusted so that the
mean gray value of each ﬁltered image was zero. The input image was convolved by
each ﬁlter function to obtain 32 ﬁltered images. For computational efﬁciency, the
convolution results were computed every 16"h pixel in a row or column only, with the
result being assigned to all the pixels in a 16 x 16 block.

For a given frequency and orientation, the ﬁltered images om (x, y) and ok,2(x, y),
corresponding to a pair of ﬁlters with quadrature phase relationship, were combined

to obtain a “phase insensitive” response, ok(x,y):

0mg) = [(0r,1(x.y))’ +(o.,,(.,y))2]1/2 (2.11)

22

1 00000

163336

- 673328

1 00000

0 00000

 

Figure 2.4: (a) An even-symmetric Gabor ﬁlter in the spatial domain. The ra-
dial frequency and orientation are 8 cycles/image-width and 0°, respectively. (b)
Corresponding MTF. The origin is at (r,c) = (32,32).

23

This combination of pairs of ﬁltered images transformed the initial 32 ﬁltered images
into 16 ‘response images’. In order to demonstrate the effectiveness of the Gabor
ﬁlters in texture discrimination, Turner summed up these response images to obtain
a single response image. A difference in the mean values of this ‘total response’
in different texture regions was taken as evidence of discrimination. In some cases,
however, the difference in the mean values could only be revealed by adding a subset
of response images rather than all of them.

By summing up the response images, Turner was actually performing a very
crude feature extraction. However, adding the components of two feature vectors
may result in similar values, even though individual components are very different.
Turner’s scheme falls short of producing a segmentation; it only demonstrates the po-
tential of Gabor ﬁlters to obtain features that are capable of discriminating textures.

The texture segmentation algorithm proposed by Bovik and his co—investigators
[6, 18] also uses Gabor ﬁlters. Like Turner [90], these authors also combine pairs of
ﬁltered images corresponding to ﬁlters with quadrature phase relationship. A more
compact ﬁlter representation is used, however, where each ﬁlter pair is treated as a
single complex Gabor ﬁlter. The real part of each complex ﬁlter is an even-symmetric
Gabor ﬁlter (i.e., a5 = 0) and the imaginary part is an odd-symmetric Gabor ﬁlter
(i.e., 43 = 1r/ 2). By linearity, the real and imaginary parts of each ﬁltered image are
responses to a pair of (real) Gabor ﬁlters with quadrature phase relationship.

Bovik et al. also combine the responses to each pair of ﬁlters (i.e., the real
and imaginary parts of the response to a complex ﬁlter) to obtain a single response.

Instead of using the Euclidean norm, however, they use the sum of absolute values.

That is,
0143,31) =| 0k.1(~’v,y) I + 10k.2($»3/) | (2-12)
These responses are then smoothed by a Gaussian weighted window “to counteract
the effects of leakage and noise”. The spread or space constant of this Gaussian
ﬁlter is chosen to be slightly wider than the spread of the Gaussian envelope of the
corresponding Gabor ﬁlter. This smoothing operation, however, can be interpreted
as computing a measure of local energy using a weighted averaging window. We will,
therefore, refer to these smoothed response images as feature images.
In their segmentation examples, Bovik et al. apply a peak-ﬁnding algorithm to
the power spectrum of the image in order to determine the center frequencies of the

appropriate Gabor ﬁlters. In addition, a “limited amount of human intervention” is

24

used in determining the parameters of the Gabor ﬁlters. For example, for strongly
oriented textures, the most signiﬁcant spectral peak along the orientation direction
is used. For periodic textures, on the other hand, the lower fundamental frequency is
chosen.

The segmentation algorithm of Bovik et al. is based on the assumption that
each texture has a distinct narrow range of frequencies, which is not present in other
texture categories. The algorithm produces a region-based segmentation by labeling
each pixel with the index of the complex Gabor ﬁlter which has the maximum response
at that pixel. Using the indices of the ﬁlters as labels implies that the number of
texture categories is constrained by the number of complex Gabor ﬁlters that are used.
(If k ﬁlters are used then [C labels are possible.) The algorithm, therefore, requires
knowing the true number of texture categories, as well as their distinct narrow ranges
of frequencies.

Tan and Constantinides [87] have used Gabor ﬁlters with quadrature phase rela-
tionship for texture classiﬁcation and segmentation. Like Turner [90] they use (2.11)
to obtain the response in each channel. For texture classiﬁcation, a ﬁxed set of Gabor
ﬁlters tuned to one of four radial frequencies and one of four orientations is used.
The mean and the standard deviation in each response image is used as texture fea-
tures. For texture segmentation, on the other hand, the number of Gabor ﬁlters and
their center frequencies are determined by identifying spectral peaks in the spatial-
frequency domain. An edge-based segmentation is then obtained by “intensity gradi-
ent calculation” in each channel followed by “channel grouping”, “thresholding” and
“edge thinning”. The paper [87] does not give the details of these stages. However,
“channel grouping” appears to involve adding the responses of a gradient operator
in different channels. The edge-based segmentation is obtained by thresholding this
‘total’ gradient response and thinning the resulting binary image.

Another texture segmentation algorithm that uses a set of Gabor ﬁlters is pro-
posed by Perry and Lowe [77]. The ﬁlters have three frequencies (or scales, to use
their terminology) corresponding to periods of l, 2, and 4 pixels; eight orientations:
0, 30, 45, 60, 90, 120, 135, and 150 degrees; and two phases: 0, and 90 degrees. A
procedure similar to that of Bovik et al. is used to obtain a set of feature images.
Instead of using the feature vectors that are deﬁned by these responses, however, they
deﬁne two new feature vectors, based on the original feature vector, in an attempt to

obtain a “more compact representation”.

25

The ﬁrst new feature vector is obtained as follows. First, the sum of all ﬁlter
responses for all orientations are determined for each scale (frequency). The scale
with the largest sum value is designated as “max scale”. The responses of orientation
ﬁlters with “max scale” and the “max scale” value itself form the ﬁrst new feature
vector. The other new feature vector is “a more compact version” of the ﬁrst new
feature vector and again emphasizes the orientation features. We must note that
these new feature vectors are computed at “grid points” that are few pixels apart
and, therefore, represent small blocks of pixels. The typical size of the blocks is 8 x 8.

A distance measure is deﬁned for neighboring grid points by comparing their

feature vectors“

. An iterative procedure is then used to obtain a segmentation as
follows. The procedure begins by detecting “seed regions” using an initial threshold on
distance. Each seed region is then represented by the mean vector of its components.
A small threshold value is used in the beginning, which, as expected, results in over-
splitting of less uniform regions. In subsequent iterations, however, the threshold
values are allowed to increase depending on a measure of uniformity of each region.
This relaxation of threshold value allows the algorithm to recover from possible over-
fragmentation. The procedure is stopped after a prespeciﬁed number of iterations
(about 20). The authors give only three segmentation examples, but do not discuss
the effect of different threshold values.

Malik and Perona [65, 66] have proposed a texture segmentation algorithm that
also uses a bank of linear ﬁlters. As the functional form for the ﬁlters (channels),
Malik and Perona choose the Gaussian derivative model proposed by Young [95].
These functional forms are shown by Young to be good ﬁts to cortical receptive
ﬁeld proﬁles. Both radially symmetric difference of Gaussians (DOG) ﬁlters, and
directionally tuned difference of offset Gaussians (DOOG) ﬁlters are used. DOG
ﬁlters are assumed to model non-oriented simple cells, while DOOG ﬁlters model
bar-sensitive simple cells. Following the ﬁltering operation each ﬁltered image is half-
wave rectiﬁed to obtain a set of “neural” responses. These responses are smoothed
using spatial averaging. They also use a nonlinear inhibition stage to model the
“intracortical inhibition”. Texture boundaries are then detected by combining the

responses of the Canny edge detector [10] applied to the resulting images.

 

“We note that the authors use two different feature vectors at each grid point, which play different
roles in computing the distance between two grid points. However, for simplicity, we will refer to a
feature vector for each point.

26

An important advantage of the multi-channel ﬁltering approach, as seen in the
above examples, is that one can use simple statistics of gray values in the ﬁltered
images as texture features. This simplicity is the direct result of decomposing the
original image into several ﬁltered images with limited spectral information. In con-
trast, texture features that are based on the statistics of the gray-level distribution in
the given image, such as gray-level co-occurrence features [43], are usually very com-
plicated and also lack physical interpretation. As an example, consider an application
where rotation invariant texture features are needed. In the multi-channel ﬁltering
approach, such features can be obtained using the isotropic frequency-selective ﬁlters
of Coggins and Jain [20]. (See Figure 2.3). Most other techniques for extracting
rotation invariant features, such as that proposed by Kashyap and Khotanzad [58]
which uses a “circular symmetric autoregressive model”, are less intuitive and require

more complicated operations.

2 .4 Summary

In this chapter, we discussed biological motivations as well as analytical considera-
tions for the multi-channel ﬁltering approach to texture analysis. In texture analysis,
a decomposition of the textured image based on frequency (size) and orientation is
intuitively appealing, because size and orientation are strong properties of most natu-
ral and artiﬁcial textures. Furthermore, these properties are general enough to allow
discriminating a large number of textures.

We emphasized the interpretation of multi-channel ﬁltering as localized fre-
quency estimation, and its relationship to combined space-frequency representations.
We also discussed the importance of joint localization in the space and spatial-
frequency domains, in the context of texture segmentation. Accurate localization
of the texture boundaries calls for using ﬁlters with smaller width at higher frequency
channels. Also, for a given width in the spatial domain, a two-dimensional Gabor
ﬁlter has the smallest possible bandwidth in the spatial-frequency domain. These
arguments favored a wavelet transform (decomposition) interpretation of the multi-
channel ﬁltering operations, with the Gabor function as the wavelet.

We identiﬁed the main issues involved in the multi-channel ﬁltering approach
to texture segmentation and presented a survey of the existing techniques. One

limitation of these techniques is the lack of a systematic method for determining

27

appropriate ﬁlter parameters. Furthermore, only limited segmentation results have
been provided in the literature. Bovik et al. [6], for example, apply their segmentation
technique only to images containing at most two textures. In Chapter 3, we address
these limitations and propose a new multi—channel ﬁltering technique that uses a bank

of even-symmetric Gabor ﬁlters to model the channels.

Chapter 3

Texture Segmentation Using
Gabor Filters

In this chapter, we present a multi-channel ﬁltering technique for texture segmentation
that uses a bank of Gabor ﬁlters to characterize the channels. Figure 3.1 shows an
overview of the texture segmentation algorithm. The organization of this chapter is
as follows. The choice of the parameters of the Gabor ﬁlters in the initial ﬁlter set and
a systematic ﬁlter selection scheme are described in Section 3.1. In Section 3.2, we
describe how texture features are computed from ﬁltered images. Section 3.3 describes
the process of integrating the feature images to obtain an unsupervised segmentation.
Supervised texture segmentation experiments using a feed-forward neural network
and several other classiﬁers are reported in Section 3.4. Section 3.5 concludes with a

summary and a general discussion.

3.1 Characterizing the Channels

In our texture segmentation algorithm, we represent the channels with a bank of two-
dimensional Gabor ﬁlters. The spatial and spatial—frequency domain representations
of a ‘canonical’ Gabor ﬁlter were given in (2.9) and (2.10). Psychophysical and
psychophysiological studies of biological visual systems have provided us with some
clues for appropriate bandwidth of the channels. However, the choice of the radial
frequencies and .the amount of overlap between the channels remains unclear. Like
Turner [90] and Perry and Lowe [77], we model the channels with a ﬁxed set of Gabor
ﬁlters. However, our choice of ﬁlter parameters results in a ﬁlter set that preserves

almost all the information in the input image.

3.1.1 Choice of Filter Parameters

Our ﬁlter set consists of even-symmetric Gabor ﬁlters. In the spatial-frequency do-

main, these ﬁlters are completely speciﬁed by their MTF (see (2.10)). In addition to
28

29

 

Input Image
Bank of Gabor Filters
ﬁg ﬁ . . . iltemd
] mages

Nonlinearlty

Eli; [j

Local ‘Energy’ Computation

Row &
Column
Coordina\es eggcturse

Square-Error Clustering

Segmented Image

 

Figure 3.1: An overview of the texture segmentation algorithm.

30

radial frequency and orientation, the frequency bandwidth B, and orientation band-
width B9 of a spatial ﬁlter are also of interest. For the Gabor ﬁlter deﬁned by (2.10),
the half-peak magnitude bandwidths are given by

 

 

_ 110 + (2111 2)1/2O'u
B, — log2 (U0 _ (21n2)1/2au ,and (3.1)
1/2
B9 = 2 tan"1 (am?) a”) , (3.2)
o

where B, is in octaves and B9 is in degrees. (The frequency bandwidth, in octaves,
from frequency f1 to frequency f2 is given by log2(f2/f1).)

We implement each even-symmetric Gabor ﬁlter by direct sampling of the MTF
in (2.10). Details of the implementation are provided in Appendix A. We use four
values of orientation 00: 0°, 45°, 90°, and 135°. For an image array with a width of
Nc pixels, where Nc is a power of 2, the following values of radial frequency uo are

used:
Ni, 2J5, N5, m, and (NC/4)\/2 cycles/image-width

Note that the radial frequencies are 1 octave apart. The above choice of radial fre-
quencies guarantees that the passband of the ﬁlter with the highest radial frequency,
viz. (NC/4)\/2 cycles/image-width, falls inside the image arrayl. We let the orienta-
tion and frequency bandwidths of each ﬁlter be 45° and 1 octave, respectively. Several
experiments have shown that the frequency bandwidth of simple cells in the visual
cortex is about 1 octave [78]. Figure 3.2 shows the ﬁlter set used for segmenting
256 X 256 images.

Psychophysical experiments show that the resolution of the orientation tuning
ability of the human visual system is as high as 5°. Therefore, in general, ﬁner
quantization of orientation will be needed. The restriction to four orientations is
made for computational efﬁciency in the current implementation of the algorithm,
and is sufﬁcient for discriminating many textures. The total number of Gabor ﬁlters
in the ﬁlter set is given by 4 log2(Nc/ 2). For an image with 256 columns, for example,
a total of 28 ﬁlters can be used — 4 orientations and 7 radial frequencies. For some
textures, however, ﬁlters with low radial frequencies (e.g., lx/2 and 2J2 cycles / image-

width) are not very useful, because these ﬁlters capture spatial variations that are

 

1In psychophysics, frequencies are expressed in cycles per degree of visual angle subtended on
the eye. The frequencies in cycles/image-width can be converted to cycles/degree if the width of
the image in degrees of visual angle is known.

31

 

Figure 3.2: The ﬁlter set in the spatial-frequency domain (256 X 256). There are a
total of 28 Gabor ﬁlters. Only the half-peak support of the ﬁlters is shown.

too large to explain textural variations in an image. Therefore, we do not use these
ﬁlters, in the texture segmentation experiments.

In order to assure that the ﬁlters do not respond to regions with constant in-
tensity, we have set the MTF of each ﬁlter at (u, v) = (0,0) to zero. As a result each
ﬁltered image has a mean of zero. Furthermore, the FFT algorithm that is used to
perform the convolutions requires that the dimensions of the input image be powers
of two. When this requirement is not met, the input image can be padded by zeros
to obtain a rectangular image with appropriate dimensions.

The set of ﬁlters used in the algorithm results in nearly uniform coverage of
the spatial-frequency domain (Figure 3.2). A decomposition obtained by the ﬁlter
set is nearly orthogonal, as the amount of overlap between the ﬁlters (in the spatial-
frequency domain) is small. One way to demonstrate this property is through recon-
struction of an image from the ﬁltered images. Figure 3.3 shows two 128 X 128 images
and their reconstructed versions. The original images are shown in the top row. The
reconstructed images, obtained by adding all 24 ﬁltered images are in the bottom
row. After adding all the ﬁltered images, the gray values were linearly mapped to
0 — 255 interval.

From a signal analysis point of view, our ﬁlter set constitutes an approximate

32

 

Figure 3.3: Examples demonstrating the advantage of nearly uniform coverage
of the spatial-frequency domain by the ﬁlter set. (a) ‘Wood grain’ (D68) from
the Brodatz album [7]. (b) ‘Mandrill’. Top row: original images. Bottom row:
reconstructed images. Both images are 128 X 128.

33

orthogonal basis for a wavelet transform, with the Gabor function as the wavelet.
(See Section 2.2.) Intuitively, a wavelet transform can be interpreted as a band-
pass ﬁltering operation on the input image. The Gabor function is an admissible
wavelet; however, it does not result in an orthogonal decomposition. This means
that a wavelet transform based on the Gabor wavelet is redundant [67]. The ﬁltering
operations using the ﬁlter set can be interpreted as computing the wavelet transform
of the input image at selected spatial-frequencies (frequency and orientation pairs).
The ability to reconstruct good approximations of the input image from the ﬁltered
images demonstrates that the ﬁlter set forms an almost complete basis for the wavelet
transform.

Figure 3.4 shows examples of ﬁltered images for an image containing ‘straw
matting’ (D55) and ‘wood grain’ (D68) textures from the photographic album of
textures by Brodatz [7]. To maximize visibility, each ﬁltered image has been scaled
to full contrast. (Note that this scaling does not affect the relative differences in the
strength of the responses in different regions.) The ability of the ﬁlters to exploit
differences in frequency (size) and orientation in the two textures is evident in these
images. The differences in the strength of the responses in regions with different

textures is the key to the multi-channel approach to texture analysis.

3. 1 .2 Filter Selection

We now describe a systematic ﬁlter selection scheme which is based on an intuitive
least squares error criterion. Using only a subset of the ﬁltered images can reduce the
computational burden at later stages, because this directly translates into a reduction
in the number of texture features.

Let s(x,y) be the reconstruction of the input image obtained by adding all the
ﬁltered images. (We have demonstrated that s(x,y) is a good approximation of the
original input image.) Let §(x,y) be the partial reconstruction of s(x,y), obtained
by adding a subset A of ﬁltered images. That is,

§(x,y) = 2 new), (3.3)

16.4

where rJ-(x,y) is the jth ﬁltered image. The error involved in using §(x,y) instead of

34

 

. . . . I" .

(h) 0)

Figure 3.4: Examples of ﬁltered images for the ‘D55-D68’ texture pair (128 X
256). (a) Input image. (b—e) Filtered images corresponding to Gabor ﬁlters tuned
to 16 J2 cycles/image-width. (f—i) Filtered images corresponding to Gabor ﬁlters

tuned to 32 J2 cycles/image-width. All four orientations — 0°,45°,90°, and 135°
—— for each frequency are shown.

35
s(x, y) can be measured by
SSE = Z [s(x,y) — s(x,y)]2 . (3.4)
any

The fraction of intensity variations in s(x,y) that is explained by §(x,y) can be
measured by the coeﬂicient of determination2 (COD)

SSE
2 — _ —
R — 1 SSTOT’ (3.5)
where
SSTOT = Z [s(x,y)]z. (3.6)
12.3:

Note that s(x, y) has a mean of zero, since the mean gray value of each ﬁltered image
is zero.

The motivation behind the ﬁlter selection scheme is to use only a subset of
ﬁltered images that together explain a “signiﬁcant” portion of the intensity variations
in s(x,y). We determine the “best” subset of the ﬁltered images (ﬁlters) by the

following sequential forward selection procedure [30]:

1. Select the ﬁltered image that best approximates s(x, y), i.e. results in the highest
R2 value.

2. Select the next ﬁltered image that together with previously selected ﬁltered

image(s) best approximate s(x,y).
3. Repeat Step 2 until R2 Z 0.95.

Since adding all ﬁltered images gives s(x, y), the value of B2 when all ﬁlters are used
is 1.0. A minimum value of 0.95 for R2 means that we will use only as many ﬁltered
images as necessary to account for at least 95% of the intensity variations in s(x, y).
Note that the above sequential forward selection scheme is not optimal. Determining
the best subsets of ﬁltered images requires examination of all possible subsets of all
possible sizes. An exhaustive search, however, is computationally prohibitive.

An important point to bear in mind is that the least squares error criterion
in (3.5) only reﬂects convergence in the mean of s(x,y) and §(x,y). A large R2
value, therefore, does not necessarily guarantee a good ﬁt at every point. If there

are texture categories in the input image that occupy very small portions of the

 

2The terminology used here is borrowed from linear regression analysis.

36

image, it is recommended that a larger minimum value for R2 (e.g., 0.99) be used.
Figure 3.5 illustrates the ﬁlter selection results for the ‘D55-D68’ texture pair shown
in Figure 3.4(a). Based on the forward selection procedure, only 13 ﬁlters, out of a

total of 20, explain more than 95% of the intensity variations.

Approximate Method

Again, let rj(x,y) be the jth ﬁltered image, and R,-(u,v) be its discrete Fourier
transform (DF T). The amount of overlap between the MTFs of the Gabor ﬁlters in
the ﬁlter set is small. (See Figure 3.2.) Therefore, the total energy E in s(x, y) can
be approximated by

E "at: Z Ej, (3.7)

i=1
where

Ej = 2 [13(1), 31)]2 = Z IRJ-(u,v)|2. (3'8)

and n is the total number of ﬁlters (typically 20). Now, it is easily veriﬁed that for

any subset A of ﬁltered images,

2 '6A E '
R2 z -’—-—i. 3.9
E ( )
An approximate ﬁlter selection then consist of computing E,- for j = 1, - - - ,n. These

energies can be computed in the Fourier domain, hence avoiding unnecessary inverse
DF Ts. We then sort the ﬁlters (channels) based on their energy and pick as many
ﬁlters as needed to achieve R2 Z 0.95. Computationally, this procedure is much
more efﬁcient than the sequential forward selection procedure described before. The
inclusion of ﬁlters (channels) with higher energy is intuitively appealing. On the
other hand, if an input image does not contain frequency components that fall in the
passband of a Gabor ﬁlter, then that ﬁlter will not be very useful for discriminating

the textures in the image.

3.2 Computing Feature Images

An important goal of the research in texture analysis is to develop a set of texture

measures (features) that can successfully discriminate arbitrary textures. Here, we

)

o\°

(

COD

(%)

COD

37

N mm
m
mnga‘m’QNmF-O
H w... ammo
O Hv‘t-vm‘o‘agmmma
... gessm__7r- 7
m.om__
0 “Tim”—
co" mob—-
"3mm
0 m—
xo" ‘9
p—

40
141.2

20

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

O
[24.16

 

1312 9 11 7 5 101517 1 1614 8 6 3 192 1820 4
Filter Number

(8)

Q‘
pH
mmgpoﬂmfo
co \o‘“.-'m¢omo
O I" 0 Int" mar-t
m ' V' 030‘
01 N .‘onm ._..—.l-—l—'
1‘ on as —
H g’vommr—l—l
m C)
o “Téggp‘r—j
co'1 ”Re-l“—
mow—
.m—
04 m___
KO 1““)
v—
0 3'3
W
V-l‘p
o
o N
N-lll
0.1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1391271151017151168146 2 31918204
Filter Number

0))

Figure 3.5: (a) Filter selection by reconstruction for the ‘D55—D68’ texture pair.
Note that 13 ﬁlters alone, out of a total of 20, account for at least 95% of inten-
sity variations in the original textured image. (b) Filter selection by approximate
method. This method calls for using 15 ﬁlters, which include the 13 ﬁlters in (a).

38

present one such set which captures the attributes of “blobs” detected in the Gabor
ﬁltered images.

Psychophysical studies of texture perception suggest that preattentive texture
discrimination may be explained by differences in the attributes of a few conspicu-
ous local features, called textons [55]. Some features identiﬁed as textons include
elongated blobs (e.g., rectangles, ellipses, line segments) with speciﬁc colors, angular
orientations, widths, and lengths; line segment terminators (end-of-lines); and line
segment crossings. Julesz and his co-investigators [54, 55] demonstrate the ability
of textons to predict and to explain texture discrimination in numerous artiﬁcially
generated texture pairs. The main criticism of the texton theory has been that it does
not describe how textons are extracted from natural (grayscale) textured images.

Voorhees and Poggio [93] have proposed an algorithm for extracting blob textons
from grayscale images by using a Laplacian of Gaussian (LOG) operator followed by
thresholding at a small positive value and using morphological operations. Differences
in the statistical distribution of attributes of the blobs, such as contrast, orientation,
width, length, area, and area density, in small windows are then used to detect texture
boundaries. Tuceryan and Jain [89] have used a similar approach to extract texture
primitives that they call “tokens”. Voorhees and Poggio contend that line segment
crossings and terminators are not textons, and that texture discrimination can be
explained using blob textons only. As recognized by the authors, their segmentation
algorithm does not address the problem of determining the appropriate scale(s) for
detecting the blobs. As we will see, the multi-resolution nature of our segmentation
algorithm offers one possible solution to this problem. Furthermore, the computation
of texture features in our approach does not require extraction of explicit texture
primitives such as textons or tokens.

Several investigators have speculated on the possible relationship between Gabor
ﬁlters and texton detection [18, 90]. However, no clear procedure has been set forth
that describes how Gabor ﬁlters act as texton detectors or how texton attributes
are captured by them. Our feature extraction scheme, which involves a nonlinear
stage, provides a more clear explanation of the purported role of Gabor ﬁlters as blob
detectors. Some of the experiments (see Section 3.3.3) support the position taken by
Voorhees and Poggio [93] that differences in the attributes of blob textons alone can
explain texture discrimination.

We use the following procedure to compute features from each ﬁltered image.

39

First, each ﬁltered image is subjected to a nonlinear transformation. Speciﬁcally, we

use the following bounded nonlinearity

1_ e—2at

where a is a constant. This nonlinearity is similar to the sigmoidal activation function

’t/J(t) = tanh(a t) = (3.10)

used in artiﬁcial neural networks [63]. In our experiments, we have used an empirical
value of a = 0.25 which results in a rapidly saturating, threshold-like transformation.
As a result, the application of the nonlinearity transforms the sinusoidal modulations
in the ﬁltered images to square modulations and, therefore, can be interpreted as
a blob detector. However, the detected blobs are not binary, and unlike the blobs
detected by Voorhees and Poggio [93] they are not necessarily isolated from each
other. Also, since each ﬁltered image has a zero mean and the nonlinearity in (3.10)
is odd-symmetric, both dark and light blobs are detected.

Instead of identifying individual blobs and then measuring their attributes, we
capture their attributes by computing the average absolute deviation (AAD) from
the mean value in a small window around each pixel in the ‘response images’ (at the
output of nonlinear stages). This is similar to the ‘texture energy’ measure that was
first proposed by Laws [61]. Formally, the feature image €j($, y) corresponding to the
ﬁltered image rj(x,y) is given by

e.(m.y) = ri— 2: |¢(rj(a.b))l. (3.11)
(a,b)eW,,
where M) is the nonlinear function in (3.10) and ny is an M X M window centered
at the pixel with coordinates (x,y).

The size, M, of the averaging window in (3.11) is an important parameter. More
reliable measurement of texture features calls for larger window sizes. On the other
hand, more accurate localization of region boundaries calls for smaller windows. This
is because averaging blurs the boundaries between textured regions. Furthermore,
using Gaussian weighted windows, rather than unweighted windows, will minimize
distortions due to the Gibbs phenomenon. Gaussian-weighted windows are also likely
to result in more accurate localization of texture boundaries. Therefore, for each
ﬁltered image we use a Gaussian window whose space constant a is proportional to
the average size of the intensity variations in the image. For a Gabor ﬁlter with center

radial frequency uo this average size is given by

T = Nc/uo pixels, (3.12)

40

where Nc is the width (number of columns) of the image.

We found a = 0.5J2 T to be appropriate in most of the segmentation experi-
ments. Note that although we use different window sizes for different ﬁltered images,
they are all speciﬁed by a single parameter — the proportionality constant. When
computing the texture features for pixels near the image boundary we assume that
the image is extended by its mirror image — often referred to as even reﬂection
boundary condition [22]. Figure 3.6 shows feature images corresponding to ﬁltered

images shown in Figure 3.4.

41

 

(h) (I)

Figure 3.6: Feature images corresponding to ﬁltered images in Figure 3.4. A a =
0.5J2 T was used for the Gaussian averaging windows.

42

3.3 Unsupervised Texture Segmentation

Having obtained the feature images, the main question is how to integrate features
corresponding to different ﬁlters to produce a segmentation. Let’s assume that there
are K texture categories, Cl, . . .,CK, present in the image. If our texture features
are capable of discriminating these categories then the patterns belonging to each
category will form a cluster in the feature space which is “compact” and “isolated”
from clusters corresponding to other texture categories. Pattern clustering algorithms
are ideal vehicles for recovering such clusters in the feature space.

A segmentation algorithm based on clustering pixels using their associated fea-
ture vectors alone suffers from an important shortcoming — it does not utilize the
spatial (contextual) information. In texture segmentation, neighboring pixels are very
likely to belong to the same texture category. One possible approach to incorporate
this contextual information is to use a relaxation labeling technique. That is, ﬁrst
obtain an initial labeling by clustering patterns in the feature space, and then enforce
the spatial constraints using relaxation [47]. Instead, we propose a simple method
that incorporates the spatial adjacency information directly in the clustering process.
This is achieved by including the spatial coordinates of the pixels as two additional
features (see Figure 3.1). The spatial coordinates of pixels have been used by Hoffman
and Jain [46] for segmentation of range images. The inclusion of spatial coordinates in
the computation of the distance between feature vectors encourages neighboring pix-
els to cluster together. As a result, over-fragmentation of otherwise uniform texture
regions is avoided.

In our texture segmentation experiments we have used a square-error clustering
algorithm known as CLUSTER [49]. The algorithm iterates through two phases.
Phase 1 (the K -means pass) creates a sequence of clusterings containing 2, 3, - - - , km”
clusters, where km”. is speciﬁed by the user. Phase 2 (the forcing pass) then creates
another set of clusterings by merging existing clusters two at a time to see if a better
clustering can be obtained. After each pass through phase 1 and phase 2, the square
errors of the clusterings are compared with the square errors of the clusterings that
existed before that pass. (Each new clustering is compared with the old clustering
having the same number of clusters.) If any of the square errors are smaller than
before, another pass through phases 1 and 2 is initiated. This continues until the

square error cannot be decreased.

43

3.3.1 How Many Categories?

Determining the number of texture categories that are present in an image is a difﬁcult
problem. Relative indices provide a means of comparing clusterings with different
number of clusters and deciding which clustering is “best”. In our segmentation
algorithm we rely on the modiﬁed Hubert (MH) index, proposed by Dubes [32]. For

a given clustering, the MH index is computed as follows. Let L(i) be the label function

L(i) = I if pattern i is in cluster I,
and dM the Euclidean distance between cluster centers p and q. Deﬁne

YUJ) = dL(i),L(j)-

The (normalized) MH index is then given by:

l "'1 n . . . .

MH: “XIX: Z [X(i,])—m,][Y(z,J)—my] /3x3yv (3°13)
i=1 j=i+l

where X (i , J ) is the Euclidean distance between patterns i and j, n is the total number

of patterns, M = n(n ; 1)/2, and

1 n-l n 1 n—l n

m..- = — X(M) m =— YOU)
‘NIEETjEEEI y ‘A{i=ljg§;1
n—l n n-l n
s: = —Z 2 X203) —m: s:=—l-Zj Z Y’(i.j) —m.’,.
M i=1 j=i+l M i=1 j=i+l

The MH index is, therefore, the point serial correlation between the entries of X and
Y matrices. Unusually large values of MH suggest that corresponding entries in the
two matrices are close to each other. Intuitively, the cluster centers are assumed to be
the true representation of the texture categories, and any deviations from the centers
are assumed to be due to errors in measurements and distortions. Note that the MH
index will be 1 for the trivial clustering in which each pattern is an individual cluster
and is not deﬁned for the clustering in which all patterns are in the same cluster.
The “true” number of texture categories is estimated as follows. First a se-
quence of clusterings is obtained using the CLUSTER algorithm. We assume km“
is known, or can be reliably estimated, and plot MH(k) for k = 2, - - - , km“. When
the data contain a strong clustering, MH(k) ﬁrst increases and then levels off, and a

“signiﬁcant” knee is formed at the true number of clusters. The following intuitive

44

justiﬁcation for this behavior is suggested by Dubes [32]. Suppose the true number
of clusters is lc". The clusterings with k > k" will then be formed by breaking the
true clusters into smaller ones. As a result, the correlation between the entries of
X and Y matrices will be high. The clusterings with k < lc" clusters, however, will
be formed by merging the true clusters, hence reducing the correlation. Therefore,
assuming that our texture features provide strong discrimination between different
texture categories, we should see a signiﬁcant knee in the plot of MH(lc) at the true
value of lc.

A major difﬁculty with clustering indices is that it is hard to determine the
signiﬁcance of an observed index. In our segmentation experiments, the signiﬁcant
knee in the plot of the MH(k) is determined visually. When such a knee is hard
to identify, we will simply assume that the “true” number of texture categories is
known a priori. In Chapter 4, we will propose an alternative, integrated approach to
eliminate the need for knowing the true number of texture categories.

Some implementation details must be explained. Prior to clustering we normal-
ize each feature to have a mean of zero and a constant variance (2 10.0). When
used as additional features, the row and column coordinates are normalized in the
same way. (Feature images with very small variances (< 10“) are simply discarded.)
This normalization is intended to avoid domination of features with small numerical
ranges by those with larger ranges”. Clustering a large number of patterns becomes
computationally demanding. The following two-step grouping of pixels is, therefore,
adopted for computational efﬁciency. First, we cluster a small randomly selected sub-
set of patterns into a speciﬁed number of clusters. Patterns in each cluster are given
a generic category label that distinguishes them from those in other clusters. These
labeled patterns are then used as training patterns to classify patterns (pixels) in the

entire image using a minimum distance classiﬁer.

3.3.2 Performance Evaluation

The lack of appropriate quantitative measures of the goodness of a segmentation
makes it very difﬁcult to evaluate and compare different texture segmentation algo-

rithms. A simple criterion that is often used is the percentage of misclassiﬁed pixels.

 

3For a study of standardization strategies in cluster analysis, see the recent article by Milligan
and Cooper [73].

45

This criterion, however, has certain disadvantages. For example, it often does not
reﬂect the ability of the algorithm to accurately locate the boundaries. By changing
the locations of misclassiﬁed pixels in the vicinity of a boundary we can make the
boundary “look” less (or more) accurate, and still have the same percentage of mis-
classiﬁed pixels. Despite such drawbacks, we use this simple criterion, because it is

the only general and practical criterion that is currently available.

3.3.3 Experimental Results

We now apply our texture segmentation algorithm to several images in order to
demonstrate its performance. These images are created by collaging subimages of
natural as well as artiﬁcial textures. We start with a total of 20 Gabor ﬁlters in
each case. Each ﬁlter is tuned to one of the four orientations and one of the ﬁve
highest radial frequencies. For an image with a width of 256 pixels, for example,
4J2, 8J2, 16J2, 32J2, and 64J2 cycles / image-width radial frequencies are used. We
then use our ﬁlter selection scheme to determine a subset of ﬁltered images that
achieves an 1?.2 value of at least 0.95 (see Section 3.1.2).

The number of randomly selected feature vectors that are used as input to the
clustering program is proportional to the size of the input image. For a 256 X 256
image, for example, 4000 patterns are selected at random, which is about 6% of the
total number of patterns. This percentage is used in all the following experiments.
The segmentation results are displayed as gray-level images, where regions belonging
to different categories are shown with different gray levels.

Figure 3.7 shows the segmentation results for the ‘D55-D68’ texture pair. Only
13 Gabor ﬁlters (texture features) are used. As seen in the two-category segmenta-
tion, the two textures are successfully discriminated and the boundary between them
is detected quite accurately. The segmentation with pixel coordinates included as
additional features was essentially the same and is not shown here. The plot of the
modiﬁed Hubert index versus number of texture categories for this image is shown in
Figure 3.8. The curve levels off at k = 2, with MH(2) z 0.9 — strong evidence for
the two-category segmentation.

Figure 3.9(a) shows a 256 X 256 image (‘GMRF-4’) containing four Gaussian
Markov random ﬁeld (GMRF) textures. These textures have been generated using

non-causal ﬁnite lattice GMRFs [14] and can not be discriminated on the basis of their

46

 

1.0

 

0.8

0.4 0.6

0.2

 

 

0

 

 

(1)) Number of Clusters

Figure 3.7: (a) The ‘D55-D68’ texture Figure 3.8: The plot of the MH index ver-
pair. (b) Two-category segmentation ob- sus number of clusters (texture categories)
tained using a total of 13 Gabor ﬁlters. for the ‘D55—D68’ texture pair.

mean gray value. In Figure 3.9 we show the segmentation results for this image. The
difference between segmentations in Figures 3.9(b) and (c) shows the improvement
due to inclusion of pixel coordinates as additional features in the clustering algorithm.
The plot of MH(k) is shown in Figure 3.10. The curve levels off at k = 4, following
a “signiﬁcant” knee. The high value of MH index at k = 4 (z 0.80) also strongly
supports the four-category segmentation.

Figure 3.11(a) shows another 256 X 256 image (‘Nat-5’) containing natural tex-
tures D77, D55, D84, D17, and D24 from the Brodatz album. Only 13 Gabor ﬁlters,
out of a total of 20, are used. The ﬁve-category segmentation of this image is shown
in Figure 3.11(b). As seen in Figure 3.12, the plot of MH(k) is not helpful for deciding
the true number categories.

Figures 3.13 and 3.14 summarize the segmentation results for a 512 X 512 image
(‘Nat-16’) containing sixteen natural textures, also from the Brodatz album. Again,
it is difﬁcult, if not impossible, to decide the true number of texture categories using
the plot of MH(k). Nonetheless, assuming that we know the true number of texture

categories, we have shown the 16-category segmentation.

47

 

o», . .

(b) (C)

Figure 3.9: (a) A 256 x 256 image (‘GMRF-4’) containing four Gaussian Markov
random ﬁeld textures. (b) Four-category segmentation obtained using a total of

11 Gabor ﬁlters. (c) Same as (b), but with pixel coordinates used as additional
features.

48

 

1

0.8

\

 

0.4

0.2

.0

 

 

0

 

I I

2 3 4 5 6
Number of Clusters

Figure 3.10: The plot of the MH index versus number of texture categories for the
‘GMRF-4’ image shown in Figure 3.9(a).

The ﬁlter selection (with a threshold of 0.95 for R2) indicated that only 14 ﬁltered
images are sufﬁcient. However, the resulting segmentations were not very good. The
16-category segmentation in Figure 3.l3(b) is obtained using all 20 ﬁltered images
(and the pixel coordinates). Compared to previous examples where each texture
category constituted about 1 / 2 or 1/4 of the image, in this example each category
occupies only 1 / 16 of the image. Recall that the ﬁtting criterion in our ﬁlter selection
scheme is computed globally over the entire image. A larger threshold for B“ should,
therefore, be used if any texture category is expected to occupy only a small fraction

of the image.

MH

49

 

(b)

 

Figure 3.11: (a) A 256 x 256 image (‘Nat-5’) containing ﬁve natural textures (D77,
D55, D84, D17, and D24) from the Brodatz album. (b) Five-category segmentation
obtained using a total of 13 Gabor ﬁlters and the pixel coordinates.

 

 

 

 

TI I l I I

2 3 4 5 6 ’7
Number of Clusters

Figure 3.12: The plot of the MH index versus number of texture categories for the
‘Nat-5’ image shown in Figure 3.11(a).

50

 

Figure 3.13: (a) A 512 x 512 image (‘Nat-16’) containing sixteen natural textures
(row 1: D29, D12, D17, D55; row 2: D32, D5, D84, D68; row 3: D77, D24, D9, D4;
row 4: D3, D33, D51, D54) from the Brodatz album (b) 16-category segmentation
obtained using a total of 20 Gabor ﬁlters and the pixel coordinates.

51

 

(b)
Figure 3.13. (cont’d.)

52

 

MB
1

 

 

 

I I T I U I I T T I I T I T I I I

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of Clusters

Figure 3.14: The plot of the MH index versus number of texture categories for the
‘Nat-16’ image shown in Figure 3.13(a).

Figures 3.15 and 3.16 summarize the segmentation results for a number of tex-
t ure pair images that have been used in the psychophysical studies of texture percep-
tion. The two textures in the ‘L and +’ texture pair have identical power spectra.
The textures in the ‘Even-Odd’ texture pair [57] have identical third-order statistics.
(The ‘Even-Odd’ nomenclature comes from the fact that, in the two textures, any
2 X 2 neighborhood contains either an even or an odd number of black (or white)
pixels.) The textures in the ‘Triangle-Arrow’ and ‘S and 10’ texture pairs [57], on
the other hand, have identical second-order statistics. The ‘Even-Odd’ and ‘Triangle-
Arrow’ textures are two counter-examples to the original Julesz conjecture that tex-
ture pairs with identical second-order statistics cannot be preattentively discriminated
[56]. While the ﬁrst three texture pairs in Figure 3.15 are easily discriminated, the
‘S and 10’ texture pair is not preattentively discriminable.

Compared to our previous examples, the observed values of the MH indices for
these artiﬁcially generated texture pairs are low (z 0.6). Moreover, while the plot of
the MH index for the ‘Triangle-Arrow’ texture pair levels off at lc = 2, it is not easy to
judge the behavior of the curve for other texture pairs. Nonetheless, assuming that the
true number of categories is two, we obtained the two-category segmentation of each
image (Figure 3.15). Our algorithm appears to perform as predicted by preattentive

texture discrimination by humans -— the algorithm successfully segments the ﬁrst

53

Table 3.1: Percentage of pixels misclassiﬁed in the segmentation results.

I Input Image _Misclassiﬁed (‘70)
name I size [ # categories I without inc] I with (r,cl l

 

 

 

 

 

 

 

 

 

 

 

 

D03-D17 128 x 256 2 J T 0.97 0.94
D03-D68 128 x 256 2 1.07 1.04
D17-D77 128 x 256 2 0.89 0.81
D55—D68 128 x 256 2 0.61 0.58
GMRF-4 256 x 256 4 2.68 1.78

Nat-5 256 x 256 5 4.87 2.96

Nat-16 512 x 512 16 12.85 7.47
L and + 256 X 256 2 2.21 2.21
Even-Odd 256 x 256 2 3.05 2.60

Tri-Arr 256 X 256 2 6.12 5.20

 

 

 

 

 

 

 

three texture pairs, but fails to do so for the ‘S and 10’ texture pair.

The texton theory of Julesz associates the preattentive discrimination of the
‘Triangle—Arrow’ texture pair to the difference in the density of termination points
[55]. The successful discrimination of this texture pair by our algorithm supports
the position taken by Voorhees and Poggio [93], that differences in the attributes
of blobs detected in the ﬁltered versions of textures alone can explain the observed
discrimination.

Table 3.1 gives the percentage of misclassiﬁed pixels for the segmentation ex-
periments reported here. As seen in this table, there is a clear advantage in using the

pixel coordinates (spatial information) as additional features.

54

++++++++++++++++

++++++++++++++++

++++++++++++++++

++++++++++++++++
++++++++++++++++
+++++LLLLLL+++++

+++++LLLLLL+++++

+++++LLLLLL+++++

+++++LLLLLL+++++

+++++LLLLLL+++++
+++++LLLLLL+++++

 

++++++++++++++++

 

ion of texture pairs that have been used in the psychophys-

Segmentat

Figure 3.15

256 X 256. The number of Gabor

. All images are

1011

ical studies of texture percept
ﬁlters used varied between 8

1angle-

(a) ‘L and +’. (b) ‘Even-Odd’. (c) ‘Tr

-11

(d) ‘s and 10’

Arrow’.

55

E
ID [11 III
E

nllﬂlﬂﬂl
E
ES
05
Illa
EE
ES

bbAAAAbAVthVQAb
E
E
E
E
E
E
(I
0

asavaavsvvvaavss
AAVbVAVbVVVbVAbb
VBVVABAVBVPVAVAV
ssavvvaaqqsasvav
vavvvsvnvsvvasva
VbbbPVVAVVbVAAVV
sausauuzzszazxus
sssuzaZKKuazysz
'zsszaszsszusnc'nss
unasszssssuzzuss
NKKKZKKZNZZKKNWK
ssznsszzsziczzzzs
uznxsyszssnzsuus
usszynmsszzsvnzss
msmssmsmmssmsmsm
smmsssmmssmssmmm
mmsmsmmssmmmmmss
mmmmmsmssmmsmmss
ssmmsmssssmmmmss

E

a

a

a

ll!

   

   

    

        

'vvavaaasavasvaas

             

5
,l0 Ill

Figure 3.15: (cont’d.).

ME

MB

56

 

 

O O
s an: o .1

v-l H
q .4

00 G
u] a uni

w . w ////////’//’
e d e -I
o o

- g . .

 

 

 

 

 

 

v v
o o' ‘
N N
d 0 cl
0 O
1 1
o o
o q o q
0 O
T I T I
2 3 4 2 3
Number of Clusters Number of Clusters

(a) (b)

 

 

 

 

 

 

 

 

 

o o
v-t H
00 (D
O 0
‘° - . “3 4
0 .._—~————“” 0
d m / I
z .4 / .
v q o
- -1
O O
N N
_. a 3
o o
o o
o o 1
I I l r
2 3 4 2 3
Number of Clusters Number of Clusters

(C) (d)

Figure 3.16: The plot of the MH index versus number of texture categories for the
texture pair images shown in Figure 3.15. (a) The plot for ‘L and +’. (b) The plot
for ‘Even-Odd’. (c) The plot for ‘Triangle-Arrow’. (d) The plot for ‘S and 10’.

57

3.4 Supervised Texture Segmentation

In Section 3.3, we assumed that the texture categories were unknown and relied
on a clustering algorithm to identify them. In many instances, such as in remote
sensing or medical applications, one has access to previously collected data with
known categories. In others, data with known category labels can be obtained from
the image itself — for example with the help of a human expert. In our texture
segmentation technique, when such training data are already available, we can replace
the clustering stage with a classiﬁer.

Motivated by the biological plausibility of neural network classiﬁers, we will
use a feed-forward network as our main classiﬁer. To assess the performance of
the feed-forward network we compare its performance with classiﬁers used in pattern
recognition literature. Speciﬁcally, we use the minimum Euclidean distance, minimum
Mahalanobis distance, and k-nearest neighbor (k-NN) classiﬁers.

In the following supervised texture segmentation experiments, the training data
are obtained by randomly sampling the feature images. The randomly sampled pat-
terns make up about 6% of the total number of patterns. (This fraction is the same
as that used in Section 3.3 when a small randomly sampled subset of patterns was
clustered to identify the texture categories.) For a 256 X 256 image, for example,
4000 patterns are used. The performance of the classiﬁers is reported as percentage
of misclassiﬁed pixels. The method of error estimation is essentially a “hold out”
method — we use about 6% of the patterns to train the classiﬁer, then use the entire

patterns, including the training patterns, to test it.

3.4.1 Segmentation Using a Neural Network Classiﬁer

Several papers have appeared in the literature that address texture segmentation by
neural networks [69, 94]. One might question the biological plausibility of the clus-
tering algorithm used to obtain the segmentations in Section 3.3. However, there are
reasons to believe that biological systems are capable of carrying out such clustering
' or grouping operations [11]. Computational models of the brain are largely charac-
terized by highly interconnected information processing units. This computational
paradigm is known as massively parallel computers, connectionist architecture, or

neural networks [40].

58

An important characteristic of neural networks is their learning capability. The
supervised pattern classiﬁcation capability of neural networks have been demon-
strated by many researchers. In this section, we will use feed-forward networks along
with the back—propagation training algorithm to carry out supervised texture seg-
mentation experiments.

A feed-forward network may be regarded as a mapping from the input space to
the output space. In this application, the input consists of texture features. Therefore,
there are as many input units as features. Furthermore, there are as many output
units as texture categories. In addition, one or more hidden layers usually are used
between the input and output layers. Figure 3.17 shows a feed-forward neural network
with one hidden layer.

Networks with no hidden layers, known as perceptron, have been studied and
used extensively. However, the set of mappings from input to output that can be
carried out by these networks is restricted [74]. Adding one or more hidden layers
allows for an internal representation(s) to be formed, which in turn enables the net-
work to carry out arbitrary mappings from input patterns to output patterns. In
fact, a feed-forward neural network with only two hidden layers, linear output nodes,
and sigmoidal nonlinearities, can perform complex nonlinear mappings [64].

Training a feed-forward neural network, using a set of training patterns, is equiv-
alent to ﬁnding a set of weights for all the links (connections), such that the proper
output unit is activated for the corresponding input pattern. Several algorithms for
‘training’ neural networks have been proposed and their utility for pattern classiﬁ-
cation has been demonstrated. For example, Rumelhart et al. [81] have proposed a
network training algorithm based on error propagation known as the back-propagation
or generalized delta rule.

Although there are some guidelines for the minimum number of hidden layers,
similar guidelines for number of units in each layer are not available. In our exper-
iments, we use a single hidden layer consisting of 10 units. For our simulations, we
have used the back—propagation routines in the Rochester Connectionist Simulator
(RCS) [39]. The training is stopped after a prespeciﬁed number of training cycles.
More speciﬁcally, the performance under a small number (10) and a larger number
(100) of training cycles is studied.

Figures 3.18—3.22 show several examples of supervised texture segmentation us-

ing the feed—forward neural network in Figure 3.17. The ﬁlters used in each case

59

 

 

 

 

 

 

Output Patterns

1 l l l

... Internal
Representation
Units

1 I l l I ] .. .
t ‘l t t t j t t t 1‘
Input Patterns

Figure 3.17: The feed-forward neural network used in our supervised texture seg-
mentation experiments. The network has a single hidden layer with 10 units.

account for 95% of the intensity variation captured by an initial ﬁlter set with 20
ﬁlters. (See Section 3.1.2.) Table 3.2 lists the percentage of misclassiﬁed pixels when
only 10 training cycles are used. Table 3.3 shows the results for 100 cycles. Even
with a small number of training cycles, the feed-forward network’s performance is

impressive.

60

Table 3.2: Percentage of misclassiﬁed pixels using a feed-forward neural network
classiﬁer. Only 10 training cycles were used in each case.

I Input Image I Misclassiﬁed (%) I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

_:—:l_::lwith rec
D03-D17 128 X 256 2 1.32 0.80
D03-D68 128 X 256 2 0.89 1.49
D17-D77 128 X 256 2 0.63 0.50
D24-D09 128 X 256 2 5.41 4.41
D55-D68 128 X 256 2 1.80 0.78
GMRF-4 256 X 256 4 2.92 1.75

Nat-5 256 X 256 5 5.64 4.54
Nat-16 512 X 512 16 12.77 8.43
L and + 256 x 256 2 3.39 3.61
Even-Odd 256 X 256 2 3.07 1.29
Tri-Arr 256 X 256 2 2.88 2.93

 

 

 

 

Table 3.3: Percentage of misclassiﬁed pixels using a feed-forward neural network
classiﬁer. 100 training cycles were used in each case.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Input 11238

m . .izeﬁ # categories
D03—D17 128 X 256 2 1.11 0.99
D03-D68 128 X 256 2 1.02 1.20
D17-D77 128 X 256 2 0.60 0.78
D24-D09 128 X 256 2 4.29 3.19
D55-D68 128 X 256 2 0.93 0.79
GMRF-4 256 X 256 4 2.77 1.65
Nat-5 256 X 256 5 3.73 2.75
Nat-16 512 X 512 16 7.32 5.67
L and + 256 X 256 2 3.32 3.02
Even-Odd 256 X 256 2 0.83 0.84
Tri-Arr 256 X 256 2 2.78 2.95

 

 

 

61

 

(a). . .. ,_ (b)

Figure 3.18: (a) The ‘Even-Odd’ texture pair (256 x 256). (b) Supervised segmen-
tation obtained using a feed-forward neural network. (Number of training cycles 2
100.)

 

(a)

 

Figure 3.19: (a) The ‘GMRF-4’ image (256 x 256) containing four Gaussian Markov
random ﬁeld textures. (b) Supervised segmentation obtained using a feed—forward
neural network. (Number of training cycles = 100.)

62

++++++++++++++++

++++++++++++++++
++++++++++++++++
++++++++++++++++
++++++++++++++++

+++++LLLLLL+++++
+++++LLLLLL+++++
+++++LLLLLL+++++

+++++LLLLLL+++'++
+++++LLLLLL+++++

++++++++++++++++

 

)

(

(a) The ‘L and +’ texture p

d segmen-

ise

. (b) Superv

(256 x 256)

air

Figure 3.20

les

8.1ng cyc

(Number of tr

e

tation obtained using a feed-forward neural network

)

100

 

0))

ﬁve natural textures (D77,

 

(a) The ‘Nat-5’ image (256x256) containing

e
I

Figure 3.21

D55, D84, D17, and D24) from the Brodatz album. (b) Supervised segmentation

forward neural network. (Number of training cycles = 100.)

ing a feed

ained us

obt

63

 

Figure 3.22: (a) The ‘Nat-16’ image (512 X 512) containing sixteen natural textures
(row 1: D29, D12, D17, D55; row 2: D32, D5, D84, D68; row 3: D77, D24, D9, D4;
row 4: D3, D33, D51, D54) from the Brodatz album. (b) Supervised segmentation
obtained using a feed-forward neural network. (Number of training cycles = 100.)

 

64

 

(b)
Figure 3.22: (cont ’d.)

65

3.4.2 Comparison with Other Classiﬁers

How does the feed-forward neural network classiﬁer in Section 3.4.1 compare with
commonly used classiﬁers in the pattern recognition literature? To answer this ques-
tion on a quantitative basis we carried out supervised texture segmentation exper—
iments using a number of commonly used classiﬁers. We will brieﬂy describe each
classiﬁer. The reader may refer to [30] for more details.

To classify a new pattern, the k-nearest neighbor classiﬁer ﬁrst determines the lc
nearest training patterns. It then assigns the pattern to the class that is most heavily
represented in the Is nearest neighbors. In addition to k-NN classiﬁers we will consider
two other conventional classiﬁers also: the minimum Euclidean distance classiﬁer [30]
and the minimum Mahalanobis distance classiﬁer. The latter is also known as Fisher’s
classiﬁer [50]. We used the minimum Euclidean distance classiﬁer in Section 3.3 in
our two-step clustering. (There, we referred to it simply as the minimum distance
classiﬁer.) The following experiments will allow us to evaluate possible advantages of
using a different classiﬁer in step 2.

For the k-NN classiﬁer, no training is required, but the storage requirement is
large. Also, classiﬁcation of test patterns is computationally expensive. Determining
the I: nearest neighbors of a test pattern, in general, requires computing its distance
from all of the stored training patterns. However, fast algorithms for searching near-
est neighbors are available. Fukunaga and Narendra [35], for example, have proposed
a branch and bound algorithm for computing k-nearest neighbors. An alternative
approach is to reduce the number of training samples by selecting a representative
subset. The condensing technique proposed by Hart [44] is only one of the many reduc-
tion techniques. When combined with such preprocessing techniques, the complexity
of the k-NN classiﬁer compares quite favorably with other classiﬁers [30]. In our im-
plementation of the k-NN classiﬁer we did not use any preprocessing. However, for
computational efﬁciency, the Manhattan (city block) distance measure, rather than
the usual Euclidean distance, is used. Three different values of k: 1, 3, and 5 are
tried. Since the performance of all three classiﬁers was essentially the same, only the
results for the 3-NN classiﬁer are reported here.

Tables 3.4, 3.5, and 3.6 show the percentage of misclassiﬁed pixels for different
segmentation experiments. For easier comparison, Figures 3.23 and 3.24 show the

same data as scatter plots. The image numbers on the horizontal axis correspond

66

Table 3.4: Percentage of misclassiﬁed pixels using the minimum Euclidean distance
classiﬁer.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Input Image _ Misclaisiﬁed (%)
name size # categorTes I without (r:c) [ with (r3—
D03-D17 128 X 256 2 0.94 0.90
D03-D68 128 X 256 2 0.98 0.96
D17-D77 128 X 256 2 0.92 0.82
D24-D09 128 X 256 2 9.48 7.15
D55-D68 128 X 256 2 0.63 0.60
GMRF-4 256 X 256 4 2.59 1.75
Nat-5 256 X 256 5 4.44 2.85
Nat-16 512 X 512 16 9.20 6.19
L and + 256 x 256 2 1.97 1.97
Even-Odd 256 X 256 2 2.20 1.95
Tri—Arr 256 X 256 2 5.47 4.68

 

 

 

 

 

 

 

to the row numbers in the tables. So, for example, image number 8 refers to the
‘N at-16’ image which contains sixteen natural textures. Clearly, the 3-NN classiﬁer
outperforms the minimum distance classiﬁers. It also outperforms the feed-forward
neural network used in Section 3.4.1. However, it took more than four days of CPU
time on a Sune4/390 to obtain the segmentation for the ‘Nat-16’ image. As pointed
out before, preprocessing techniques can be used to reduce computational complexity
of the Ic-NN classiﬁer. Note that, in most cases, the performance of the feed-forward
neural network classiﬁer has improved when the number of training cycles is increased
to 100. It is very likely that using a larger number of training cycles, e.g. 500 cycles,
will further improve its performance.

In the two-step clustering scheme in Section 3.3 we used the minimum distance
classiﬁer in the second step. The supervised texture segmentation experiments in
this section suggest using other classiﬁers. For example, the minimum Mahalanobis
distance classiﬁer, or the k-NN classiﬁer in conjunction with the condensing technique,

may result in better segmentations.

67

Table 3.5: Percentage of misclassiﬁed pixels using the minimum Mahalanobis dis-
tance classiﬁer.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

; Inpjut Irgge IJ Misclassiﬁed (%)
riﬂe I size r [ # categoE without (r,c)] with (r,c)
DﬁsTnn 128: 256 2 i 0.70 0.63
D03-D68 128 X 256 2 0.82 0.81
D17-D77 128 X 256 2 0.62 0.55
D24-D09 128 X 256 2 2.89 2.91
D55-D68 128 X 256 2 0.72 0.45
GMRF-4 256 X 256 4 1.97 1.51
Nat-5 256 X 256 5 2.82 2.39
Nat-16 512 X 512 16 5.41 3.37
L and + 256 X 256 2 1.26 1.26
Even-Odd 256 X 256 2 0.88 0.75
Tri-Arr 256 X 256 2 2.35 2.49

 

 

 

 

 

 

 

Table 3.6: Percentage of misclassiﬁed pixels using the 3-NN classiﬁer. Classiﬁcation
errors for 1-NN and 5-N N were essentially the same.

‘—

 

 

I m
_

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Input Image ‘ [ Misclassiﬁed (%) I]

| name ] size ] # categories ] without (r,c) ] with (r,c)J
D03-D17 128 X 256 2 0.37 0.35
D03-D68 128 X 256 2 0.57 0.53
D17-D77 128 X 256 2 0.42 0.38
D24-D09 128 X 256 2 0.65 0.63
D55-D68 128 X 256 2 0.39 0.45
GMRF-4 256 X 256 4 1.48 1.07
Nat-5 256 X 256 5 1.17 1.12
Nat-16 512 X 512 16 1.23 1.23
L and + 256 X 256 2 0.82 0.81
Even-Odd 256 X 256 2 0.45 0.48
Tri-Arr 256 X 256 2 1.87 1.50

 

 

 

 

 

 

 

68

 

 

" — nn; 10 cycles

........... nn; 100 cycles
---- Min. Euc. Dist.
- —-—-— Min. Mah. Dist.
____. 34m

10 12 14 16

 

 

 

Percent Error
6 8
l 1

 

 

 

 

Image

Figure 3.23: Percent misclassiﬁed pixels for various classiﬁers. Here, the (row, col)
coordinates of pixels are not used.

 

 

" —— um; 10 cycles

........... nn; 100 cycles
--_- Min. Euc. Dist.
- ———— Min. Mah. Dist.

—— 3-NN

 

 

10 12 14 16

 

Percent Error
6 8
l l

 

q‘ -
N .1
O —

 

 

 

Image

Figure 3.24: Percent misclassiﬁed pixels for various classiﬁers. Here, the (row, col)
coordinates of pixels are used as additional features.

69

3.5 Summary

In this chapter, we presented a multi-channel ﬁltering technique for texture segmenta-
tion. The channels were represented with a ﬁxed set of Gabor ﬁlters and a systematic
ﬁlter selection scheme was proposed, which is based on reconstruction of the original
image from the ﬁltered images. As a result, unlike some of the existing techniques,
our segmentation algorithm does not require any knowledge of the frequency content
of textures in the input image. Both unsupervised and supervised texture segmen-
tation experiments were conducted and the ability of the “texture energy” features
to discriminate among various textures was demonstrated. In'particular, we demon-
strated the ability of the segmentation technique to discriminate artiﬁcially generated
texture pairs with identical second- and third-order statistics.

The ﬁltering and feature extraction operations in the algorithm account for
most of the required computations. However, these operations can be performed in
parallel, regardless of the number of ﬁlters. The use of a nonlinear transformation
following the linear ﬁltering operations has been suggested as one way to account
for the inherently nonlinear nature of biological visual systems [29]. We argued that
the localized ﬁltering by the Gabor ﬁlter set followed by a “squashing” nonlinear
transformation can be interpreted as a multi-scale blob detection operation.

One of the limitations of the texture segmentation algorithm is the lack of a
criterion for choosing the value of a in the nonlinear transformation. In the exper-
iments, we used a ﬁxed empirical value. Also, the algorithm assumes that different
channels are independent from each other. However, there is psychophysical and
physiological evidence indicating inhibitory interactions between different spatial fre-
quency channels [29]. Some researchers have incorporated such interactions in their
texture segmentation algorithms [8, 66]. Feature selection or extraction from the ini-
tial pool of features is computationally desirable, and may result in more accurate
segmentations. Allowing inhibitory interactions among the channels is shown to have
the potential to reduce the effective dimensionality of the feature space [8].

For unsupervised texture segmentation, we tried to use the plot of the modiﬁed
Hubert index versus number of texture categories. In most cases, however, it was
assumed that the true number of categories is known a priori. In the following
chapter we will propose an integrated approach that combines the current region-

based segmentation technique with an edge- based technique. The integrated approach

70

requires only a reliable upper bound for the number of texture categories.

In the Supervised texture segmentation experiments a feed—forward neural net-
work was used. Some of the architectures for neural networks are capable of unsu-
pervised classiﬁcation. One example, is the “self-organizing” and “self-stabilizing”
architecture proposed by Carpenter and Grossberg [11]. An attempt was made to
use the ART2 architecture [12] as a substitute for the square-error clustering algo-
rithm that was described in Section 3.3. However, the experiments were discontinued

because the initial experimental results were not promising.

Chapter 4

Integrating Region— and Edge-
Based Texture Segmentations

The texture segmentation technique proposed in Chapter 3 results in a region-based
segmentation, as it assigns pixels with similar texture properties to the same region.
One of the disadvantages of our region-based segmentation is the need for knowing the
“true” number of texture categories ahead of time. In fact, this drawback applies to
all region-based techniques. Another method of discriminating regions with different
textures is to detect the boundaries between them. The output of a boundary detection
operation is commonly referred to as an edge-based segmentation. The knowledge of
the true number of texture categories is not necessary for obtaining an edge-based
segmentation, where the presence of an edge point is determined locally based on a
measure of disparity across the (unknown) boundaries.

In this chapter, we describe a new technique that produces an edge-based seg-
mentation by combining the magnitude responses of feature images to a common
edge detector. As we will see, an important limitation of an edge-based segmentation
is that, in practice, the region boundaries are not closed. Some postprocessing is,
therefore, required to obtain closed regions. In contrast, the region boundaries in a
region-based segmentation are always closed. An integrated approach that combines
the advantages of the two methods could result in a better segmentation. We propose
one such integrated approach and demonstrate its effectiveness.

In the multi-channel ﬁltering approach, there are two fundamentally different

ways in which the integration can take place.

1. Integration takes place separately in different channels. The resulting segmen-

tations are then combined to obtain the ﬁnal segmentation.

2. A single edge-based segmentation is ﬁrst obtained, then integrated with a

region-based segmentation.

The multidimensional nature of texture representation tells us that the information

71

72

from all channels should be used simultaneously to obtain a segmentation. That is,
the information from a single channel is often insufficient to discriminate different
textures or to properly measure the disparity across the boundaries. As a result, the

second approach appears to be more plausible.

4.1 Edge-Based Segmentation

Detecting texture boundaries requires simultaneous consideration of spatial variations
in all feature images. The idea is to combine the “evidence” for texture edges in
different feature images to obtain a single measure of edge strength at each point.
One example of this approach is the method proposed by Khotanzad and Chen [60].
Our edge-based segmentation technique is similar to that of Malik and Perona [66].

The multi-dimensional edge detection is performed as follows. First, we apply
the Canny step edge detector [10] to each feature image. The implementation of
the Canny edge detector used in our experiments uses the efficient approximation of
the optimal detector by the ﬁrst derivative of a Gaussian. Only one detector with
appropriate operator width is used for each feature image. (The Canny edge detector,
in its general form, uses several operator widths whose outputs are then combined
using the ‘feature synthesis’ method.) The width of the operator 0 is adapted to
each feature image. In our experiments we have used a = x/2 T, where T = Nc/uo
is the average size of the intensity variations detected by the corresponding Gabor
ﬁlter. The reason for using different operator widths for different feature images is
that the step edges in different feature images have different widths — those in lower
frequency channels tend to be wider. '

Each feature image is normalized to have a mean of zero and a constant standard
deviation (= 30). Again, this normalization is intended to avoid domination of fea-
tures with smaller numerical ranges by those with larger numerical ranges. Only the
magnitude response of the Canny edge detector is computed for each feature image,
the nonmaximum suppression and the hysteresis thresholding [10] are not performed.
Figure 4.1 shows examples of Canny magnitude images for the ‘D55-D68’ texture
pair. These magnitude images correspond to feature images shown in Figure 3.6.

To obtain a single ‘total’ magnitude image, the Canny magnitude responses
corresponding to individual feature images are summed, point-by-point. Similarly,

the gradient images, one along the x-axis and one along the y-axis, are summed

73

to obtain ‘total’ gradient images. At this point, the nonmaximum suppression and
hysteresis operations are applied to the total magnitude image to obtain an edge-

based segmentation.

4.1.1 Experimental Results

Figure 4.2 illustrates the edge-based segmentation for the ‘D55-D68’ texture pair.
Figure 4.2(b) shows the total magnitude response of the Canny edge detector, ob-
tained by adding the magnitude responses of 13 feature images. Lighter gray values
in this image indicate higher edge strengths. Figure 4.2(c) shows the edge image
obtained by nonmaximum suppression and hysteresis thresholding. The low and high
threshold values were 0.5 and 0.8, respectively.

Other examples of edge-based segmentation are given in part (c) of Figures 4.5
through 4.8. In Section 4.2 we describe how these edge-based segmentations can be
used in an integrated segmentation technique. Here we will give a second example
to demonstrate the shortcomings of our current edge-based segmentation technique.
This example is shown in Figure 4.3 and involves a 256 x 256 image with ﬁve natural
textures.

As seen in (Figure 4.3(b)), the total magnitude response for some true texture
boundaries is much stronger than others. The primary reason is that some texture
boundaries have a strong response in several feature images, while some others enjoy
a strong response in only a few feature images. Adding the magnitude responses
in different channels has some desirable consequences. In particular, it helps sup-
press noise and allows evidence for a boundary in different channels to accumulate.
Unfortunately, as seen in this example, some true texture boundaries are enhanced
more than others. Using adaptive hysteresis thresholding should alleviate some of
these problems. However, it is our belief that a different method of combining the

magnitude responses is needed.

4.2 Integrated Approach

The general principle of integration or fusion of information from different sources is
well recognized in the computer vision community. Several techniques for simulta-

neously utilizing the region and edge information in image segmentation have been

74

 

(d) (e)

 

(h) (i)

Figure 4.1: Canny magnitude images corresponding to feature images shown in
Figure 3.6. A a = \f2T was used for the Canny edge detectors.

75

 

(b) (C)

Figure 4.2: An example illustrating the edge-based segmentation technique. (3)
Input image (‘D55—D68’). (b) Total Canny magnitude response to 13 feature images.
(c) Edge-based segmentation.

proposed [16, 41, 72, 75]. A similar integrated approach to texture segmentation, how-
ever, has not been emphasized. The multidimensional nature of texture segmentation
makes either of the region- and edge-based segmentations complicated, making an in—
tegrated approach even more formidable. As we will see, an integrated approach need
not necessarily be overwhelming. A proper integration method should suppress the
weaknesses and emphasize the strengths of the region- and edge-based segmentation
techniques. A

The integration of the region-based and the edge-based segmentations is carried
out as follows. Jain and Nadabar [51] have applied a similar integration technique
to segmentation of range images. First, using the estimated upper bound km” for
the number of texture categories, a segmentation is obtained using the algorithm
described in Section 3.3. Suppose the true number of categories (clusters) is Ic‘. The
segmentation (clustering) with km“, (> k’) categories is always formed by breaking
some of the true clusters. Therefore, assuming that our texture features provide
strong discrimination between different texture categories, the false splitting of regions
will occur within true segments only, not across them. The algorithm described in

Section 4.1 is used to obtain an edge-based segmentation (an edge image). For the

76

 

(C) (0')

Figure 4.3: An example demonstrating some of the shortcomings of the current
edge-based segmentation technique. (11) Original input image. (b) Total Canny
magnitude response to 13 feature images. (c) Edge-based segmentation. The low
and high hysteresis thresholds were 0.5 and 0.8, respectively. (d) Edge-based seg-
mentation. The low and high hysteresis thresholds were 0.5 and 0.7, respectively.

77

Canny edge detector, we use 0.5 and 0.8 for the low and high thresholds of the
hysteresis, respectively. These relatively low threshold values may generate a large
number of spurious edges. However, a perfect edge image is not crucial for the purpose
of integration. What is important is that we minimize the likelihood of missing true
edges.

Now, in the over-segmented image, for each border between pairs of regions, we
compute the fraction of border sites that “coincide” with an edge point in the edge
image. We refer to this fraction as hit-ratio h. The coincidence is determined by
examining a small rectangular neighborhood centered at the border site. The typical
neighborhood size used in our experiments was 5 x 10. The longer dimension of this
rectangular neighborhood is along the local orientation of the border site. We have
only considered the 0° and 90° orientations. The border site between two adjacent
pixels, one with label 11 and the other with label 12, is said to have a 0° orientation
if the pixels are to the east and west of each other. The 90° orientation is formed by
pixels to the north or south of each other.

The boundaries in the over-segmented image that do not correspond to true
texture boundaries are expected to have a low h value. A threshold h; is used to
decide whether the border between two regions should be preserved. The typical 12,
value used in our experiments was 0.5. That is, when the hit-ratio is below 0.5 the
border is removed and the corresponding pair of regions are merged. As we will see in
the following examples, segmentation results are identical for a wide range of values
for threshold ht. This indicates the robustness of the integration technique. That is,
the integration is not sensitive to noise, because false borders are eliminated even at
low values of ht (e.g., 0.25). On the other hand, true boundaries are preserved even
at high values of ht (e.g., 0.75).

4.2.1 Experimental Results

As our ﬁrst example we will apply our integration method to the ‘D55-D68’ texture
pair in Figure 4.4(a). A region-based segmentation assuming a maximum of four
texture categories is shown in Figure 4.4(b). The edge-based segmentation is shown
in Figure 4.4(c). The segmentation after integration is shown in Figure 4.4(d). The
same segmentation is obtained for all h; 6 (012,069). The integration result for
another image containing two GMRF textures (‘GMRF-2’) is illustrated in Figure 4.5.

78

 

(C) (01)

Figure 4.4: Region- and edge-based integration results for the ‘D55-D68’ texture
pair (128 X 256). (3.) Original input image. (b) Four-category region-based segmen-
tation (over-segmented). (c) Edge-based segmentation. ((1) New segmentation after
integration.

   

(b)

 

(C) (d)

Figure 4.5: Region- and edge-based integration results for the ‘GMRF-2’ image
(128 X 256). (a) Original input image. (b) Four-category region-based segmenta-
tion (over-segmented). (c) Edge-based segmentation. (d) New segmentation after
integration.

79

 

(b)

   

(C) (d)

Figure 4.6: Region- and edgebased integration results for ‘D 17~D77’ texture pair
(128 X 256). (a) Original input image. (b) Four-category region—based segmenta-
tion (over-segmented). (c) Edge-based segmentation. ((1) New segmentation after
integration.

In this case, the correct segmentation is obtained for all h; 6 (033,089).

Figure 4.6 shows similar results for the ‘D17-D77’ texture pair. Here, identical
segmentation is obtained for all h, 6 (023,089), indicating that the integration
method is highly robust. Figure 4.7 illustrates the integration method on a 256 X 256
image with one natural texture embedded in another (‘D84 in D68’). Again, correct
segmentation is obtained for all h; 6 (032,095). Identical segmentations for a wide
range of values for threshold ht indicates that the integration is highly robust.

Figure 4.8 shows the integration results for the ‘GMRF-4’ image containing four
GMRF textures. Correct segmentation is obtained for all h; E (0.31,0.69). Note
that the edge-based segmentation is far from perfect. Nonetheless, by combining the
imperfect information from two sources, the integration technique is able to produce
the correct segmentation.

Finally, in Figure 4.9, we show the integration result for the ‘N at-5’ texture. In
order to extract all true edges, hysteresis thresholds of 0.5 and 0.7 are used, rather
than 0.5 and 0.8 which were used in all the previous examples. (See discussion in

Section 4.1.1.) Also, instead of a 5 x 10 window, a larger .9 X 18 window is used. With

80

these parameters, the correct segmentation is obtained for all h; E (0.32,0.65).

4.3 Summary

In this chapter, we presented an edge-based segmentation technique. The segmenta-
tion technique detects texture boundaries by combining the magnitude responses of
feature images to the Canny edge detector. We then proposed an integrated approach
that combines the strengths of the region- and edge-based segmentations. The inte-
grated approach allowed us to do away with the need for knowing the true number
of texture categories, and resulted in a truly unsupervised segmentation technique.

When applying the hysteresis thresholding to the total magnitude image, we
used relatively low threshold values. By doing so, we allowed for more spurious
edges, but minimized the likelihood of missing true texture edges. As a result, the
ed ge-based segmentations in most of the examples were far from perfect. Nonetheless,
combining imperfect information from two sources, the integration technique was able
to produce the correct segmentations. The robustness of the integration method is
reflected in identical segmentation results for a wide range of the threshold ht on
hit-ratio.

The edge-based segmentation technique, in its current form, has certain limita-
tions. We showed that, in some cases, adding magnitude responses in different chan-
nels may enhance some true texture boundaries more than others. Consequently,
when magnitude responses from different channels are added together, in addition
to noise, some true texture boundaries are also suppressed. It is our belief that a

different method of combining the magnitude responses is needed.

81

 

(C) (d)

Figure 4.7: Region- and edge-based integration results for the ‘D84—in-D68’ (256 x
256). (9.) Original input image. (b) Four-category region-based segmentation (over-
segmented). (c) Edge-based segmentation. (d) New segmentation after integration.

82

 

(C) (0‘)
Figure 4.8: Region- and edge-based integration results for the ‘GMRF-4’ image
(256 x 256). (11) Original input image. (b) Six-category region-based segmenta-
tion (over-segmented). (c) Edge-based segmentation. (d) New segmentation after
integration.

 

(C) (d)

Figure 4.9: Region- and edge-based integration results for a 256 x 256 image contain-
ing ﬁve natural textures (‘Nat-5’). (8.) Original input image. (b) Seven-category
region-based segmentation (over-segmented). (c) Edge-based segmentation. ((1)
New segmentation after integration.

Chapter 5

Texture Analysis of Automotive

Finishes

In recent years, there has been a growing emphasis on machine vision and its appli-
cations in manufacturing processes. To achieve higher speed and increased reliability,
machine vision systems are being used with increasing frequency to perform various
inspection tasks. For example, visual inspection of mass-produced printed circuit
boards, integrated circuit chips, and photomasks in electronics industry is an impor-
tant area where machine vision techniques are used [15, 42].

Since in many cases the quality of a surface is best characterized by its texture,
texture analysis plays an important role in automated visual inspection of surfaces.
The texture of a paper, for example, controls its printability. This is because, the
random ﬁber distribution on the surface of the paper affects the contact area between
paper and the printing medium. Texture analysis techniques are, therefore, useful in
controlling the quality of paper in paper-rolling mills [17]. As part of an automated
lumber processing system, Conners et al. [23] used texture analysis techniques to
detect and classify common surface defects in wood.

Visual inspection of product appearance, as assessed by the customer, is another
important area where texture analysis techniques have proved to be useful. For
example, Siew et al. [84] used textural features to determine the degree of carpet
wear. In the food industry, textural appearance is an important factor in determining
product quality [37].

In this chapter, we describe a problem involving automated visual inspection of
automotive metallic ﬁnishes. The appearance of metallic ﬁnishes, which are primarily
used in the automotive industry, isaffected by their color as well as their visual
texture. One of the factors that determines the acceptability of the ﬁnish is the
degree of “uniformity” of its visual texture. Our goal is to ﬁnd quantitative measures
that capture the characteristics of the metallic ﬁnish texture, hereafter called simply

ﬁnish texture. We use a multi-channel ﬁltering technique to compute texture features

84

85

that are used to grade the uniformity of metallic ﬁnish samples.

The organization of this chapter is as follows. In Section 5.1, we describe metallic
ﬁnish and various factors that affect its appearance. We also describe the psycho-
metric experiments which were designed to grade the degree of uniformity of ﬁnish
textures. In section 5.2, we address image acquisition and preprocessing requirements.
Section 5.3 describes the multi-channel ﬁltering technique that is used to characterize
the ﬁnish texture. The functional form of the ‘channels’ (ﬁlter functions) and the
choice of ﬁlter parameters, as well as the deﬁnition of texture features are discussed.
In Section 5.4, we propose two alternative ways to grade the degree of uniformity of
ﬁnish texture. Finally, we conclude with a summary and a general discussion of the

results in Section 5.5.

5. 1 Metallic Finish

The sparkle and color directionality appearances of metallic automotive ﬁnishes are
due to metal particles such as aluminum ﬂakes that are added to the paint. The non-
uniform distribution of position and tilt angle of these ﬂakes within the paint ﬁlm
give rise to a visual texture which consists of patterns of light and dark color regions.
The distribution of the ﬂakes, and hence the perceived ﬁnish texture, is inﬂuenced by
various parameters of the paint itself, and by various paint application parameters
such as pot pressure, air pressure, gun distance, and rheology treatment. Ideally, we
would like the ﬁnish texture to ‘look’ uniform. Judging the degree of uniformity of a
ﬁnish texture, however, is a rather subjective process. Even ﬁnish inspection experts,
among themselves, tend to have different opinions of uniformity.

Over the years, ﬁnish inspection experts have adopted various terms to describe
the appearance of metallic ﬁnishes. Two frequently used terms are ‘mottle’ and
‘blotchy’ which appear to make up two potential components of uniformity. Mottle-
ness refers to a pseudo-random positioning of metallic ﬂakes that creates an accidental
patterning effect. The size of these patterns is usually on the order of a millimeter.
Blotchiness, on the other hand, refers to the non-uniformity characterized by irregu-
larly spaced areas of color change. The size of these irregularities is usually on the
order of an inch.

Metallic ﬁnish samples used in our experiments consist of metal panels that

86

are painted under various settings of paint application parameters. Different set-
tings of these parameters give rise to ﬁnish textures with different degrees of texture
“uniformity”. Speciﬁcally, two sets of ﬁnish samples are analyzed in the following
experiments: the light blue set (LBLUE) and the medium blue set (MBLUE). There
are 13 4” X 6” (about 10 cm X 15 cm) panels in each set. Panels in each set vary in
paint application parameters (ﬂash time, gun distance, and air pressure) as well as in
the grade (size) of the aluminum ﬂakes.

A group of paint technicians were asked to judge the uniformity of ﬁnish sam-
ples in each set. First, 10 observers were asked to rank the panels from most to least
uniform. The rank order average was then used as initial ranking in a paired compar-
ison experiment. Each panel was compared to eight other panels nearest to its rank
order. For example, the panel with rank order 7 was compared to panels with rank
orders 3, 4, 5, 6, 8, 9, 10, and 11. The pairs were presented, in random order, to four
observers. Each observer was asked to select the more uniform panel from the pair
shown. Each observer performed the comparisons 10 times. Using the results of the
paired comparisons, a preference frequency matrix was constructed for each set. (The
(i, j) entry in a preference frequency matrix shows the number of times the panel in
row i was preferred over the panel in colurrm j.) Ordinal scale values for the panels
were then obtained using a scaling technique [88, Ch..4]. The resulting ‘visual scale
values’ are given in Tables 5.1 and 5.2.

In another visual scaling experiment, a panel of 10 paint technicians were asked
to grade the ﬁnish samples in each set along other visual components that might
be related to the perceived uniformity of the ﬁnish. These components are ‘mottle’,
‘flake-size’, and ‘blotchy’. Each technician was asked to place the panels on a scale
of 1 to 10, for each of the above components, with 10 indicating severe mottle effect,
extremely coarse ﬂake-size, or severe blotchy effect. Since the observers had no ref-
erence samples to deﬁne their base lines, signiﬁcant individual biases in the resulting
values are possible. The rank order of the scale values, on the other hand, are less
likely to suffer from these individual biases. The mean rank values for each of the
visual components are given in Tables 5.1 and 5.2. Our goal is to develop quantitative

measures of ﬁnish texture that ‘explain’ these subjective data.

87

Table 5.1: Visual scale values for texture uniformity, mottle, ﬂake-size, and blotchy
appearance of panels in the LBLUE set.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ml Uniformity Mottle Blotchy I
1 3.02493 5.71604 4.80 5.8452
2 0.21339 8.61112 4.65 9.4642
3 3.62269 3.79093 11.10 3.0198
4 2.14104 5.90562 6.10 7.3358
5 2.42717 5.63374 7.75 7.2702
6 1.32080 8.00890 8.10 8.7932
7 1.04008 8.38379 5.25 9.7165
8 3.36581 3.17102 2.25 3.9988
9 0.24747 9.20337 6.60 9.6850
10 4.80362 3.34307 9.85 3.3227
11 0.00000 9.89626 11.30 11.8894
12 1.39585 6.65147 3.45 7.3092
13 4.28204 2.51435 9.80 3.3501

 

 

Table 5.2: Visual scale values for texture uniformity, mottle, ﬂake-size, and blotchy
appearance of panels in the MBLUE set.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

] Panel Uniformity Mottle I F lake-Size Blotchy |
1 1.10309 6.35 5.40 4.70
2 0.60587 10.10 4.25 8.85
3 2.32165 5.70 12.40 6.05
4 2.03682 3.85 5.00 3.80
5 0.00000 11.20 8.05 11.25
6 1.35577 9.30 8.05 7.95
7 1.11753 5.40 4.55 8.05
8 1.77176 3.75 4.10 5.90
9 0.12917 9.05 5.35 9.10
10 3.42172 5.20 11.75 4.40
11 2.28994 5.35 11.85 4.80
12 0.43111 7.65 3.95 6.95
13 0.85181 8.10 6.30 9.20

 

 

 

 

 

 

 

88

5.2 Image Acquisition and Preprocessing

Although there are some general guidelines for lighting and imaging setups, every
machine vision application has its own unique peculiarities that have to be dealt with
individually. A crucial issue in any imaging problem is the selection of an appropriate
light source(s) that highlights features of interest. The lighting geometry, i.e. relative
positions of light source, camera, and the object, is equally important.

We have experimented with various types of light sources and concluded that
directed lighting — as opposed to diffused lighting - is more appropriate for high-
lighting the texture of the ﬁnish. The specular nature of the metallic ﬁnish poses a
great challenge in achieving uniform illumination. In general, imaging metallic and
specular objects are much harder than imaging lambertian objects.

A common technique for dealing with the problem of specular reﬂections is to
use a pair of polarizing ﬁlters. The ﬁrst ﬁlter (the polarizer) is used to polarize the
light source. The object is then viewed through the second, cross polarized, ﬁlter (the
analyzer). Since specular regions do not alter the polarization of the incident light,
the reﬂected light from these regions is blocked by the analyzer. The reﬂected light
from truly diffuse regions, on the other hand, is depolarized and partially passed by
the analyzer. This technique, however, proved to be unsuitable to our application.
The reason was that almost all the reﬂected light from the surface of the panel was
blocked by the analyzer. This resulted in a signiﬁcant loss of detail and contrast in
the acquired images. We encountered a similar problem when we tried to use dulling
spray to cut down the specularity of the ﬁnish‘.

In order to alleviate the problems arising from specular nature of the metallic
ﬁnish, we were forced to reduce the size of the area on the panel surface that is
being imaged. The relatively high image resolution required for analyzing the ﬁnish
texture also dictated using high magniﬁcations, and hence imaging a small area on
the panel surface. Our current imaging setup is shown in Figure 5.1. We use a single
light projector to illuminate the ﬁnish sample (panel). To minimize the illumination
variations, we keep the angle between the axis of the camera and the axis of the light
projector as small as possible. Note that too small an angle will result in a very large

specular reﬂection into the camera. A value between 15° to 20° was found to be a

 

1Dulling sprays have been used by professional photographers for eliminating the glare on shiny
surfaces.

89

 

Figure 5.1: The imaging setup used for acquiring images from ﬁnish samples. The
main components of the setup are a Mole-Richardsonlight projector and a Panasonic
CCD camera with a 90 mm macro lens.

good compromise.

The maximum resolution of the human eye is estimated to be about 60 cycles
per degree of visual angle [29]. Assuming a standoff of 0.5 meter, this value translates
into 0.073 mm per individual receptor. The resolution of images obtained using our
imaging setup is 0.08 mm/pixel which is close to the above value. We acquired
several images from each panel by shifting the panel with respect to a ﬁxed position
of the camera and the light projector. The image data base used in the following
experiments contains eight 256 X 256 images from each panel in each of the two sets.
Each image, therefore, corresponds approximately to a 2.06 cm wide by 1.54 cm high
(about 0.81” X 0.61”) physical area”. Figure 5.2 illustrates the physical location of

the images taken from a given panel.

 

1’Note that the physical area is not a square, because the CCD camera used for acquiring the
images has a 4:3 aspect ratio. The resolution indicated here is along the horizontal direction. The
resolution in the vertical direction is slightly higher (by a factor of 4/3).

90

 

 

r i
F """" ' . -------- - i """" : .- ------- -
\ .........
256 x 256 images
(0.61" x 0.81")
Resolution a 0.08 mmlplxel
L ‘ J

 

 

4" x 6" Panel

Figure 5.2: Multiple imaging from a given panel. The resolution of acquired images
is close to the maximum resolution of the human eye.

5.2.1 Preprocessing

We use a number of preprocessing operations to compensate for non-uniform illumi-
nation of the panels. These operations include an ‘image subtraction’ stage where the
(smoothed) image of the “background” is subtracted from the original image. If the
intensity variations inherent to the light source itself, or due to the position of the light
source, were known, one could compensate for the resulting non-uniformities in the
illumination by subtracting these variations from the acquired images. In practice,
these variations can be approximated by an intensity image of the “background”. We
obtain the background image by imaging an unpainted metal panel. The background
image is smoothed to suppress the ﬁne texture of the unpainted metal panel.

Figure 5.3 shows the gray level histograms of two ﬁnish samples, one with ﬁne
and the other with coarse aluminum ﬂakes. Both histograms are highly symmetric
and have a similar shape (they both look like a Gaussian distribution). However, the
histogram of the ﬁnish sample with coarse aluminum ﬂakes is wider than the other.
Such differences in the gray level distribution is caused by variations in lighting, as well
as by differences in paint factors such as color and the grade (size) of the aluminum

ﬂakes. Further preprocessing is needed in order to compensate for such variations.

91

 

(a) ' (b)

Figure 5.3: Two examples demonstrating differences in the histograms of the ac-
quired images. (:1) Histogram of a ﬁnish sample with ﬁne aluminum ﬂakes. (b)
Histogram of a ﬁnish sample with coarse aluminum ﬂakes.

Histogram equalization operations (also known as histogram ﬂattening or prob-
ability equalization) are often used to remove differences in the ﬁrst-order statistics
of images. Histogram equalization algorithms achieve this goal by reassigning pixel
gray levels so that the population of pixels with a given gray level (or a small range
of gray levels) is the same for each gray level (or range of gray levels). However, the
similarity in the shapes of the histograms for different ﬁnish samples suggests that
a linear scaling of the gray levels should be sufficient for suppressing differences in
the ﬁrst-order statistics of the acquired images. The effective width, or spread, of a
histogram can be measured in several different ways. We use the average absolute
deviation (AAD) from the mean value for this purpose. Let s(x,y) be the acquired
image. The AAD measure is then given by

1 N. Nc
f0=m§l§ l3(avb)_gla (5-1)

where N, and Nc are the number of rows and columns, and g is the mean gray level
in the image. Image normalization is achieved by dividing the gray levels in each
acquired image by its AAD measure.

In Section 5.3, we will use the AAD measure in (5.1) to compute texture features

in the ﬁltered images. There we will show that the above image normalization can be

92

applied equivalently to texture features obtained by processing the original image.

5.3 Characterization of Finish Texture

We characterize the textural appearance of the metallic ﬁnish by using a multi-channel
ﬁltering technique. In this section, we describe the functional form of the channels,
the choice of the ﬁlter parameters, and the deﬁnition of texture features. We will use
the resulting texture features as input to the texture grading methods described in
Section 5.4.

5.3.1 Filter Functions and Parameters

Metallic ﬁnish textures do not possess signiﬁcant orientation tendencies, i.e. they
are practically isotropic. Even-symmetric Gabor ﬁlters used in our texture segmen-
tation algorithms have both frequency- and orientation-selective properties. Instead
of using Gabor ﬁlters, therefore, in this application, we characterize the channels by
isotropic frequency-selective ﬁlters that originated with Coggins [19]. The modulation
transfer function (MTF) of these ﬁlters is deﬁned in (2.5), which is repeated here for

convenience.

 

(um) # (0,0),

01

,/ 2 2_ 2
H(u,v)=exp{-%(ln u +1; lnp) },

Again, )1 is the center radial frequency and 01 determines the bandwidth of the ﬁlter.
Note that these ﬁlters are deﬁned on a logarithmic scale.

We use 01 = 0.275 for all ﬁlters. This results in a bandwidth of about one octave
which is close to the estimated bandwidth of simple cells in the mammalian visual
cortex. (See Section 3.1.) Also, we set the value of MTFs at (u, v) = (0,0) to zero so
that the mean gray values of the ﬁltered images are zero. (That is, we block the DC
component.)

We address the problem of determining the appropriate values for the center
frequencies of the ﬁlters by considering a large, but ﬁnite, number of center frequen-
cies. Speciﬁcally, we consider a set of ﬁlters whose center frequencies are one half
octave apart. The number of the ﬁlters considered depends on the size of the input

image. For a 256 X 256 image, for example, we shall consider a total of fourteen

frequency-selective ﬁlters tuned to 1, 1 J2, 2, 2J2, 4, 4J2, 8, 8x/2, 16, 16 x/2, 32,

93

 

(e) (f) I (g) V I (h)

Figure 5.4: (a) A 256 x 256 image of a metallic ﬁnish sample. (b — h) Filtered
images corresponding to frequency-selective ﬁlters with center frequencies at 4, 8,
16, 16 J2, 32, 32 J2, and 64 cycles/image-width.

32 x/2, 64, and 64x/2 cycles/image-width. This choice of center frequencies for the
ﬁlters provides a nearly uniform coverage of the spatial-frequency domain. Any sig-
niﬁcant range of spatial-frequencies in the input image should, practically, fall in the
passband of one of these ﬁlters. In Section 5.4, we will describe procedures that allow
us to determine which subset of ﬁlters is best suited for a given set of ﬁnish samples.

The ﬁltering operations are again carried our using a fast Fourier transform
(FF T). Figure 5.4 shows an image of a ﬁnish sample along with some of the ﬁltered
images. The ability of the ﬁlters to exploit differences in spatial-frequency (size) is

evident in these ﬁltered images.

5.3 .2 Texture Features

The texture features are deﬁned as the average absolute deviation (AAD) in the
ﬁltered images. The texture feature I, for the j‘h (zero mean) ﬁltered image r_,-(z,y)

is computed as follows.

1 N, N;
fj = W2 Z |r,-(a,b) I, (5-2)

C a=l b=l

94

where N, and Nc are the number of rows and columns in the image. Each ﬁltered
image is, therefore, ‘summarized’ by one feature, and there are as many features as
ﬁltered images. With a total of ten ﬁlters, for example, we will have ten texture
features, resulting in a ten-dimensional feature vector for each image. These feature
vectors will be used in grading the texture uniformity of panels using texture grading
schemes described in Section 5.4.

In Section 5.2.1, we described an image normalization operation. This normal-
ization can be achieved equivalently by dividing each texture feature by the AAD
measure in the input image deﬁned by (5.1). Formally, the normalized feature f ;

corresponding to texture feature fj is then given by

fizfj/foa i=17"',14a

where f0 is the AAD measure of the input image. In the following sections, for
convenience, we will refer to the normalized texture features as f1, f2, etc. Also, we
will not use features f1 and f2 in the grading experiments. These features correspond
to ﬁlters that respond to very slow intensity variations. Such variations, however, are

very likely to result from variations in the lighting rather than from ﬁnish texture.

5.4 Grading Finish Texture Uniformity

How can the above texture features be used to ‘grade’ the degree of uniformity of the
ﬁnish texture? In this section, we propose two alternative grading schemes to achieve

this goal.

5.4.1 Reference-Based Grading

Our ﬁrst grading scheme can be summarized as follows. Given a set of panels, we
use a few panels with extreme appearances as ‘reference panels’. These reference
panels are, in a sense, our training samples. Since ﬁnish samples with highly uniform
or highly non-uniform ﬁnishes are easier to identify, these reference panels can be
selected with very high conﬁdence. In our experiments, we typically use the two
panels with lowest visual scale values and the two panels with the highest visual scale
values in each set. Table 5.3 lists the least- and most-uniform panels for the LBLUE

and the MBLUE sets. Using the feature vectors corresponding to images from these

95

 

D D
D Least-Uniform Cluster
B
a I:
D D Most-Uniform Cluster
do 0
D o
O
x 4 d1 9
_ V 7 O
f o o
0

Figure 5.5: Illustration of reference-based grading in a two-dimensional .feature
space. The mean feature vector f' of the panel to be graded is shown as ‘x’.

four panels we construct a least-uniform and a most-uniform ‘reference cluster’. We
then assign a texture uniformity grade to each panel based on the “distances” of its
mean feature vector to these reference clusters.

Formally, let f,- denote the feature vector for the 2"11 image from a panel. We

represent each panel by its mean feature vector
_ 1 n
f = :1: 2 fi, (5.3)

where n is the number of images taken from a panel. In our experiments n = 8.

Let do and d1 be the distances between the mean feature vector f and the least-
uniform and most-uniform clusters, respectively. (See Figure 5.5.) The distance of
a point from a cluster (of points) can be deﬁned in several different ways. Here, we
simply use the Euclidean distance between the point and the centroid (mean) of the
cluster3. We deﬁne the texture uniformity grade ’7 for the panel by the following

ratio.

do
do + d1
Note that 7 lies between 0 and 1. A value of 7 close to 1 indicates that the corre-

(5.4)

 

’7:

sponding panel can be classiﬁed as uniform.

To get an idea of the discrimination provided by individual features, we show
the box plot of the patterns in the reference clusters. These plots for the LBLUE
and MBLUE sets are given in Figures 5.6 and 5.7, respectively. In the box plot, the

horizontal line inside a box marks the location of the median. The box contains the

 

3We also experimented with the Mahalanobis distance measure. Since the grading results were
essentially the same, we only report the results based on the Euclidean distance.

96

Table 5.3: Panels that were used as references when grading the ﬁnish texture
uniformity of the LBLUE and the MBLUE sets. Each set contains 13 panels.

 

 

_7

 

 

 

 

 

 

 

 

Set ] Reference P5513 ]
No. Abb. Name Least-Unif. Most-Unif.
1 LBLUE 2, 9 10, 13
MBLUE 5, 9 10, 3

 

 

 

 

 

 

middle half of the data. The extent of the whiskers reﬂects a conﬁdence interval for
the median, and outlying points (outliers) are plotted individually. One interesting
observation to be made in these plots is the general “behavior” or “trend” in the
feature values. Patterns from the most-uniform reference cluster have smaller fea-
ture values at lower frequencies (features f3 through f9) than the patterns from the
least-uniform reference cluster. The opposite situation is true at higher frequencies
(features f 10 through f 14). This observation is consistent with the physical inter-
pretation of the frequency-selective ﬁlters —— less uniform ﬁnish textures are richer
in low frequency components (have larger spatial variations in intensity) than more

uniform ﬁnish textures, and vice versa.

Feature Selection

Which subset of the texture features should we use when grading a given set of panels?
Recall that there is a one-to-one correspondence between the texture features and the
ﬁlters. Selecting a subset of features, therefore, is equivalent to selecting a subset of
ﬁlters. Here we describe feature selection experiments that are based on maximizing
the rank correlation between the visual scale and the texture uniformity grade given

by (5.4). The rank correlation is given by

 

 

r: ‘7de RXS‘ _ S) = ?=‘ R‘S‘ " ”(mi-1y (5.5)
\/Zi=1 (Ri- R)2 ?=1 (Si _ S)? n(n132-1) 7

where R,- and S,- are, respectively, the rank of texture uniformity grade 7,- (among
7’s) and the rank of visual scale value v,- (among v’s), and n is the total number of
panels. Note that unlike correlation, which measures the linear association, the rank

correlation measures the monotone association between two sets of data.

AAD Measure

97

 

so
I

40
l

 

 

3aab 4a4b 5a5b 6a6b 7a7b Saab 9a9b 1oaw11a1b1262b1383b14a4b
Feature (Fuller) No.

Figure 5.6: Box plot of reference patterns used in grading the texture uniformity
of panels in the LBLUE set. There are 16 patterns in each cluster. The “a” and
“b” sufﬁxes indicate least—uniform and most-uniform clusters, respectively.

 

AAD Measure

98

 

is
i, é¢ ¢¢

 

 

381!) 484!) 585!) 6361)

7a7b Baab 9am 1060b 1131b 1232b 1363) 1464b
Feature(FiIler)No.

Figure 5.7: Box plot of reference patterns used in grading the texture uniformity
of panels in the MBLUE set. There are 16 patterns in each cluster. The “a” and
“b” suffixes indicate least-uniform and most-uniform clusters, respectively.

 

99

Our feature selection procedure can be summarized as follows. Using a texture
grading scheme we assign a texture grade to each panel in a given set. We then
compute the rank correlation between the texture grade and the visual scale value.
We repeat this procedure for every subset of texture features. We then choose the
feature subset that results in the highest rank correlation as the “best” feature subset.

Starting with all 12 texture features (corresponding to ﬁlters with 2 through
64 J2 cycles/image-width center frequencies) we performed exhaustive feature selec-
tion. The best feature subsets for the LBLUE and the MBLUE sets of panels, along
with the corresponding rank correlations, are given in Tables 5.4 and 5.5, respectively.
As expected, occasionally, there were ties between different feature subsets. These
ties were resolved based on the direct correlation between the texture grade and the
visual scale value. That is, we chose the feature subset with a higher correlation.

The highest rank correlation for the LBLUE set is is 0.98, and is achieved by
feature subsets { f5, f 14} and { f5, f 12, f 14}. The highest rank correlation for the
MBLUE set, on the other hand, is 0.91 and is achieved by feature subset {f3, f7, f10, fl 1}.
Even though, the LBLUE and MBLUE sets of panels have different colors, one would
expect that the best feature subsets for both sets of panels to be the same. By
comparing the feature subsets of size 4 for both sets of panels, we looked for a com-
mon subset that could be used for grading both sets of panels. The feature subset
{ f3, f7, f9, f 11} results in a rank correlation of 0.95 for the LBLUE set and 0.89 for
the MBLUE set. Therefore, although the “best” feature subsets for the two sets of
panels are not the same, there does exist a feature subset which results in acceptable

performance for both sets.

100

Table 5.4: Results of reference-based grading of ﬁnish texture uniformity for the
LBLUE set. This table shows the “best” feature subsets of size 1—7 and correspond-
ing rank correlations between texture grade and visual scale.

Size Best Subset Rank rrelation
0.91
0.98
0.98

0.97
0.97
0.96
0.96

 

Table 5.5: Results of reference-based grading of ﬁnish texture uniformity for the
MBLUE set. This table shows the “best” feature subsets of size 1—7 and correspond-
ing rank correlations between texture grade and visual scale.

,

2 ,

’ f7, 0,
{3, f7, f9, £10, £11,
, f6, f7, f10, f11,f13,f1

,

 

0

0

03
$9031
13
3

4

101

5.4.2 Regression-Based Grading

Now we propose a different approach to relate the visual scale values for ﬁnish texture
uniformity to the texture features. This alternative grading scheme is based on the
classical linear regression model. Unlike our previous texture grading scheme, which
uses only the panels with extreme appearance qualities as ‘training’ samples, in the
following regression-based grading scheme we will use all the panels in a given set for
estimating the parameters of the regression models.

In the regression model, we treat the visual scale for texture uniformity as the
dependent variable, and the texture features computed from the ﬁltered images as
independent (or predictor) variables. Speciﬁcally, let v be the visual scale for a panel
and f.- be the texture features associated with the panel. (Recall that texture fea-
tures for a panel are obtained by averaging the texture features for all eight images

corresponding to the panel.) Then the linear regression model is given by

v=a+Ah+an+m+Aﬂ+a we

where 6 accounts for random measurement error or the effects of other variables not
explicitly considered in the model. We estimate the regression coeﬂicients, 3,, using
the method of least squares. Let the least square estimates of these coefficients be

80, 61, . . . , 8,. Then the predicted visual scale values are given by

é=3o+31f1+32f2+~-+Brfr. (5-7)

The quality of the ﬁt can be measured by the coeﬂicient of determination (COD)

which is given by
sse

192:1—

 

sstot ’ (5.8)

where sse = 2;;102, - 13,-)2, sstot = zy=1(vj — t7)2, and n is the total number of
observations. This quantity is an indicator of the proportion of the total variation
in the v,- explained by predictor variables. In the process of deciding which subset
of features explains the visual scale values better, we will compare regression models
with different number of predictors. In order to be able to compare these models with
one another, we use the adjusted coefﬁcient of determination which is given by

Reel—.ZiZZ/‘E’Jffy (5.9)

where p is the total number of parameters (including ﬂu) in the ﬁtted model [31].

102

Model Selection

In the following experiments, we continue to use the texture features computed from
images normalized by their AAD measure. Although examining the regression models
corresponding to all possible subsets of features is computationally demanding, the
required computations in this case were not prohibitive. The “best” regression model,
and hence the “best” feature subset, is determined as follows. First, we determine
the best regression model for a given size of the feature subset. The criterion for best

model is to maximize the COD (R2). Among these feature subsets, the subset with
the highest adjusted COD (1‘?2 ) is singled out as the best model. This model can

adj

then be used for grading (predicting) the degree of uniformity of ﬁnish texture for
future samples.

Tables 5.6 and 5.7 show the feature subsets of size 1 through 7 from the set
{ f3, . . . , f14} that give the best regression models for the ‘LBLUE and MBLUE sets,
respectively. As seen in Table 5.6, for example, the adjusted COD ﬁrst tends to
increase as more variables are included, but it begins to ﬂatten when more and more
variables are used. Note that the number of samples used for estimating the regression
coefficients is small (12 or 13). The estimated coefﬁcients for regression models with
a large number of independent variables is, therefore, not very reliable. Also, our
reference-based grading scheme indicated that a feature subset of size 4 results in
acceptable performance. We will, therefore, consider models with no more than 4
independent variables.

Based on the above constraints, the best regression model for the LBLUE set is
found to be

i) = 13.859 — 1.5315 f3 + 3.1389 f4 — 2.1269 f5 — 0.4014 f8 (5.10)
The best regression model for the MBLUE set, on the other hand, is found to be
ii = 82.9490 — 3.3293 f7 + 3.3342 f3 — 1.4469 f10 - 0.9514 f“ (5.11)

Note that the best regression models for the LBLUE and the MBLUE sets are not
the same. Looking for a common regression model that gives acceptable performance
for both sets of ﬁnish samples, we found the feature subset { f3, f4, f6, f 13}. The
corresponding regression model has a COD of 96.93% for the LBLUE set and 88.60%
for the MBLUE set. We may, therefore, use the same subset of features for grading
both sets of panels.

103

Table 5.6: Results of regression-based grading of ﬁnish texture uniformity for the
LBLUE set. This table shows selected variables (texture features) for regression
models with 1—7 variables and corresponding coefﬁcients of determination.

Size Best Subset

83.27
89.91
95.77
97.44
.1
98.47
98.22

 

Table 5.7: Results of regression-based grading of ﬁnish texture uniformity for the
MBLUE set. This table shows selected variables (texture features) for regression
models with 1—7 variables and corresponding coefﬁcients of determination.

Size .
52.18
64.46
77.47

89.99
94.71
95. 4
96.88

 

Grading Examples

Since there are only a small number of panels in each set we have to use all of them
for parameter estimation. If we had more panels in each set, we could have used
a subset of them to estimate the parameters of the regression model and use the
rest to validate the model. An alternative strategy is to obtain a regression model
by using the samples in one set and then use the model to predict the visual scale
values for another set. We have to remember, however, that since each set of panels
was evaluated separately, the visual scale values for the two sets are not on the same

scale. In fact, the range of visual scale values for the LBLUE set is wider than that

104

Table 5.8: Grading MBLUE set using regression model for LBLUE set. Correlation
= 0.87, Rank Correlation = 0.88.

P No. ct

1.10
0.61
2.32
2.04
0.00
1.36
1.12
1.77
0.13
3.42
2.

0.43
0.85

1
2
3
4
5
6
7
8
9

 

of the MBLUE set (see second columns in Tables 5.1 and 5.2). Therefore, the rank
correlation between predicted and actual visual scale values is perhaps a more suitable
ﬁgure of merit than the direct correlation between them.

In the following two grading examples, we used the feature subset { f 3, f4, f6, f 13}.
Using the visual scale values for texture uniformity for panels in the LBLUE set, we
ﬁrst estimated the parameters of the regression model. We then obtained the pre-
dicted visual scales for panels in the MBLUE set using this model. The actual and
predicted visual scale values are tabulated in Table 5.8. The correlation and rank
correlation between predicted and actual scales are 0.87 and 0.88, respectively.

Similarly, we used the visual scale values for panels in the MBLUE set to esti-
mate the parameters of the regression model. We then obtained the predicted visual
scale values for panels in the LBLUE set. The results are given in Table 5.9. The
correlation and the rank correlation between predicted and actual scales are 0.76 and
0.74, respectively. These results indicate that our regression-based grading scheme is

fairly robust.

105

Table 5.9: Grading LBLUE set using regression model for MBLUE set. Correlation
= 0.76, Rank Correlation = 0.74.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Panel No. J Actual ValErPredicted Value ]
1 3.02 1.40 E
2 0.21 -0.75
3 3.62 1.78
4 2.14 -0.44
5 2.43 1.28
6 1.32 0.80
7 1.04 0.25
8 3.37 0.38
9 0.25 0.88
10 4.80 2.56
12 1.40 0.78
13 4.28 3.11

 

 

 

 

 

5.4.3 Mottle, Flake-Size, and Blotchy Components

We now consider additional visual scale values that rank the appearance of the ﬁnish
samples along other components. In this section we will use the linear regression
setting described to obtain best regression models explaining mottle, ﬂake-size, and
blotchy components of the appearance of the metallic ﬁnish samples. Our criterion
for the best regression model is also the same — i.e., to maximize R3,,

Tables 5.10 and 5.11 give the best feature subsets of size 1 through 7 that result
in the best regression models, for the LBLUE and MBLUE sets, respectively. The
best feature subsets for the LBLUE and the MBLUE sets are { f3, f4, f6, f9} and
{f3, f4, f5, f13} and they explain 97.50% and 85.49% of the variations in the visual
scales for mottle appearance, respectively. Based on these experiments, the best

regression model explaining the mottle component of panels in the LBLUE set is
i} = —12.61 + 2.0233 f3 — 2.9597 f.; + 1.6562 f6 + 0.5511 f9 (5.12)
The best regression model for the MBLUE set, on the other hand, is

0 = -149079 + 4.713 f3 — 13.042 f4 + 9.861 f5 + 2.309 f13 (5.13)

106

The feature subsets resulting in the best regression models explaining the ‘ﬂake-
size’ component of ﬁnish appearance of panels in the LBLUE and MBLUE sets are
given in Tables 5.12 and 5.13, respectively. Similar results for ‘blotchy’ components

of ﬁnish appearance are given in Tables 5.14 and 5.15.

107

Table 5.10: “Best” feature subsets of size 1—7 from the set { f 3, . . . , f 14} explaining
the visual scales for the ‘mottle’ component of ﬁnish texture appearance for panels
in the LBLUE set.

73.72
91.84
95.06

96.07
98.15
99.13
99.62

 

Table 5.11: “Best” feature subsets of size 1—7 from the set { f 3, . . . , f 14} explaining
the visual scales for the ‘mottle’ component of ﬁnish texture appearance for panels
in the MBLUE set.

 

 

 

 

 

 

 

 

 

 

 

 

 

Size Best Subset_ R2 %T Rim
1 _ [{7} 47.92 43.19
2 {f7,f14} 69.26 63.12
3 {f3, f4, f5} 78.85 71.80
4 {f3, f4, f5, {13} 85.49 78.23
5 {f8, f9, f12, £13, {14] 89.18 81.45
6 {f3, f8, f9, f12, £13, {14] 91.76 83.52
7 {f3, f8, f9, f11,f12, £13, {14]} 94.70 87.29

 

 

 

 

 

 

 

108

Table 5.12: “Best” feature subsets of size 1—7 from the set { f 3, . . . , f14} explaining
the visual scales for the ‘ﬂake-size’ component of ﬁnish texture appearance for panels

in the LBLUE set.

 

Table 5.13: “Best” feature subsets of size 1-7 from the set { f 3, . . . , f 14} explaining
the visual scales for the ‘ﬂake-size’ component of ﬁnish texture appearance for panels

in the MBLUE set.

40.63
67.93
69.40

80.20
91.69
98.93
99.00

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Size I Best Subset 1 4R2 % I Rid,- %]
1 — ' ‘{f3}' 62.38 58.96
2 {f3,f8} 89.58 87.50
3 {f3, f4, is? 93.34 91.13
4 {f7, f8, f10, {12] 94.09 91.14
5 {f6,t7,f8,f10,f12] 96.07 93.27
6 {f6, {7, f8, f10, £11, {12} 96.36 92.72
7 {f4,f6,17,18,f10,f11,f12] 96.67 92.02

 

 

109

Table 5.14: “Best” feature subsets of size 1—7 from the set { f 3, . . . , f 14} explaining
the visual scales for the ‘blotchy’ component of ﬁnish texture appearance for panels

in the LBLUE set.

 

Table 5.15: “Best” feature subsets of size 1—7 from the set { f 3, . . . , f 14} explaining
the visual scales for the ‘blotchy’ component of ﬁnish texture appearance for panels

in the MBLUE set.

76.42
87.95
95.60

96.62
98.79
99.21
99.51

 

 

#—

 

 

 

 

 

 

 

 

 

 

 

I Sizje ;Bwt Subset I R2 % R33 %
1 ' {f7} 34.52 28.57
2 If7,f14} 70.87 65.04
3 {f7,f12,f14} 72.57 63.43
4 {f7,f12,f13,f14] 75.28 62.92
5 {f8,f10,f12,f13,f14] 83.93 72.45
6 {f7,f8,f10,f12,f13,f14] 90.60 81.20
7 {f3, f8, f9,f11,f12, f13,f14]} 91.81 80.33

 

 

 

 

 

 

 

110

5.5 Summary

In this chapter, we addressed a practical problem where texture analysis is required.
The problem involved automated visual inspection of the textural appearance of au-
tomotive metallic ﬁnishes. We addressed imaging and preprocessing requirements
and demonstrated that a multi-channel ﬁltering technique can be used to successfully
characterize the ﬁnish texture. We also developed two alternative methods for grad-
ing the degree of uniformity of the ﬁnish texture. Our ‘texture grading’ experiments
showed that there is a high correlation between our texture uniformity grade and the
visual scaling of the ﬁnish samples by ﬁnish inspection experts.

Non-uniform illumination of the panels due to their specular nature, and resolu-
tion requirements forced us to acquire multiple images from small areas on the panel
surface. Clearly, it is more desirable to have a single image acquired from the entire,
or a signiﬁcant portion, of the panel surface. When judging the appearance of a ﬁn-
ish sample, human observers are more likely to base their judgment on simultaneous
examination of the entire panel. Currently, the resolution of most 2-D sensors are
limited to 1024 X 1024 pixels. For imaging larger areas, therefore, one will need to
scan the panels with a 1-D sensor array (along with a linear light source).

The ﬁltering and feature computation operations can be performed in parallel,
regardless of the number of ﬁlters. Therefore, a fast, real-time implementation of our
grading techniques is possible. Moreover, our grading experiments showed that only
a small number of ﬁlters is sufficient. The results of our feature selection experiments
indicate that using a small number of features is not only possible, but also leads
to improved performance. Moreover, these results indicate that a common feature
subset can give acceptable performance across different sets of metallic ﬁnish samples.

In our texture grading experiments, we represented each ﬁnish sample by the
mean feature vector for all eight images from the panel. This representation assumes
that the variation of texture features across images are negligible. However, when
grading texture uniformity, the variation of texture features across the panel could be
a good indicator of degree of uniformity of ﬁnish texture. That is, larger variations
would indicate that the texture is less uniform. This approach to grading ﬁnish

texture uniformity should be examined.

Chapter 6

Conclusions and Future Research

Texture analysis has been an active research area in computer vision for more than
two decades, and has proved to be a very difﬁcult problem. This difﬁculty largely
stems from the diversity of natural and artiﬁcial textures, which makes a universal
deﬁnition of texture impossible. Compared to other approaches, the multi-channel
ﬁltering approach to texture analysis is more general and applies to a larger class
of textures. This generality is a direct consequence of reliance on basic attributes
of frequency (size) and orientation, and the inherent multi-resolution nature of the
approach.

In this dissertation, we presented several multi-channel ﬁltering techniques. Ma-

jor contributions have been:

1. A detailed methodology for modeling the ‘channels’ by even-symmetric Ga-
bor ﬁlters, and a systematic ﬁlter selection scheme based on an intuitive least-

squares criterion.

2. A simple but general methodology for extracting texture features using a non-

linear transformation and local “energy” computation.

3. Incorporating spatial adjacency information in the region-based texture seg-

mentation algorithm.

4. An edge-based texture segmentation technique based on combining the “evi-

dence” for texture boundaries in different feature images.

5. Integrating the region- and edge-based texture segmentation algorithms and

eliminating the need for knowing the “true” number of texture categories.

6. Application of a multi-channel ﬁltering technique to automated visual inspection

of automotive metallic ﬁnishes.

7. Reference-based and regression-based methodologies for grading the degree of

uniformity of metallic ﬁnish texture.

111

112

We reported both unsupervised and supervised texture segmentation experi-
ments. In the supervised segmentation experiments we used a feed-forward neural
network, in addition to several other classiﬁers. The texture segmentation experi-
ments showed that our texture features can discriminate among a large number of
textures, including some artiﬁcially generated texture pairs with identical second- and
third-order statistics.

One limitation of our deﬁnition of texture features is the lack of a criterion
for deciding the optimal value of a, which controls the severity of the threshold-
like nonlinear transformation in (3.10). In the texture segmentation experiments,
we used a ﬁxed empirical value. However, the optimal value of a is likely to be
different for different channels, and for different images. One should consider using
the statistical properties of the input image and the ﬁltered images to estimate the
optimal values [10, 92]. Also, we assume that different channels are independent
from each other. However, there is psychophysical and physiological evidence for
inhibitory interactions between different spatial-frequency channels [29]. Allowing
inhibitory interactions among the channels is shown to have the potential to reduce
the effective dimensionality of the feature space [8].

Our texture segmentation techniques are only applicable to textured images.
They are unable to discriminate between (nearly) uniform gray-level regions. How-
ever, in real world images, both textured regions as well as regions with nearly uniform
gray levels are often present simultaneously. An extension of the current techniques
that allows handling such images would be highly desirable. A simple approach would
be to add low-pass Gaussian ﬁlters with different cutoff frequencies to the original
Gabor ﬁlter set. Note that these low-pass ﬁlters can be viewed as Gabor ﬁlters with
zero center frequencies. However, we would like to point out that, in some appli-
cations, it might be desirable to separate nearly-uniform (untextured) regions from
textured regions. Figure 6.1(a) shows the top view of a scene with a notebook (tex-
tured cover) and two ﬂat “tub blocks” with different colors — pink and blue. As
seen in Figure 6.1(b), the algorithm successfully separates the textured region from
untextured regions. Such a property may also be useful for separating text from
“non—text” in automated document analysis applications [34]. When viewed at an
appropriate resolution, the text forms a distinct texture of its own, allowing it to be
discriminated from photographs and other non-text items in the document image.

In its current form, the edge-based texture segmentation technique in Chapter 4

113

 

(b)

Figure 6.1: (a) A 256 x 512 image of a scene containing both textured and untex-
tured objects. (b) Two-category segmentation obtained using a total of 16 Gabor
ﬁlters and the pixel coordinates.

114

has certain limitations. The process of adding the magnitude responses of different
channels enhances some texture boundaries more than others. This is particularly
true when there are a large number of textures in the image. An adaptive hysteresis
thresholding, rather than the current global thresholding method, may alleviate this
problem. However, we believe that a different method of combining the magnitude
responses should be investigated.

In Chapter 5, the regression-based method for grading the degree of uniformity
of metallic ﬁnish texture is more rigorous than the reference-based method. This
method, however, requires more training data. In the experiments, we used the ﬁnish
samples from one set to estimate the parameters of the regression model and used
samples from the second set to validate it. Further evaluation of the regression-based
grading method using larger sets of ﬁnish samples is recommended.

In addition to improvements/extensions suggested in the previous paragraphs,
we have identiﬁed the following future research problems in texture analysis in general,

and in multi-channel ﬁltering approach in particular.

0 A multi-channel ﬁltering approach for estimating the orientation of a textured
surface patch. Almost all existing shape-from-texture methods require extract—
ing isolated texture primitives, e.g. blobs. Using the existing, or a similar deﬁni-
tion of texture features we should be able to measure texture gradients without

the need for extracting isolated primitives.

0 Considering the survival value of some perceptual processes for animals and
the human being, it seems very likely that some processes, including texture
perception, are biologically adapted to solve speciﬁc visual tasks that arise in
the natural environment. The feasibility of encoding higher-level knowledge
of the scene, for example shape of the regions or boundaries, in the low-level

texture processing must be explored.

0 Color vision has become an active research area in computer vision. Integrating
texture and color information, therefore, is another potential research area in

texture analysis.

Appendix

Appendix A

Generating Filter Functions

In this appendix we describe the implementation of the Gabor ﬁlters used by our
texture segmentation algorithm. We perform the ﬁltering operations through mul-
tiplication in the Fourier domain, rather than through convolution. Therefore, we
implement each Gabor ﬁlter by sampling its Fourier transform. This direct imple-
mentation is faster than sampling the spatial domain representation of the ﬁlter and
then computing its DFT. Furthermore, direct sampling in the Fourier domain avoids
possible distortions due to aliasing.

In order to generate a ﬁlter function in a rectangular array, prior to sampling,
one must scale the Fourier transform of the 2-D function along one of the axes. This
can be demonstrated as follows. Let C(u, v) be the DFT of a discrete realization of a
2-D function, g(.r,y), in an N, X Nc array, where N, and N, are the number of rows
and columns, respectively. That is,

use vy

C(uw) = 2: gm) exp{—J2vr(7v—c- F3} , (M)

where u is in cycles/image—width and v is in cycles/image-height. We can rewrite

(A.1) as follows.
, u +v’
C(uw ) = 2}: 908,9) exp {—J 2r17V—Ii)} , (A2)
a: y C

where v’ = (NC/N,)v. Here, both u and v’ are in cycles/image—width. Thus, a
discrete realization of an even-symmetric Gabor ﬁlter in an N, X Nc array is obtained

by uniformly sampling the following continuous function.

 

 

Hat) = ex, {_é [(212392 + «wall/3v) ll+
ex, I-% [(u :30)? +((N./01:r.)v) l (4.3)

Note that the center radial frequency of this Gabor ﬁlter is no cycles/image-width
and its center orientation is 0. A Gabor ﬁlter with center orientation of 00 is ob-

tained by a rigid rotation of (A3) prior to sampling. Also, note that the choice of
115

116

cycles / image-width as the unit for measuring the frequency is arbitrary and that one
can use cycles / image-height as well. Similar scaling applies to the direct implementa-
tion of other ﬁlter functions in the spatial-frequency domain, including the isotropic

frequency-selective ﬁlters used in Chapter 5.

Bibliography

0]

[31

f0

[5]

[6]

[7]

[8]

[9]

Bibliography

N. Ahuja and A. Rosenfeld. Mosaic models for textures. IEEE Trans. Pattern
Anal. Machine Intell., 3(1):1-11, 1981.

J. Beck, A. Sutter, and R. Ivry. Spatial frequency channels and perceptual
grouping in texture segregation. Computer Vision, Graphics, Image Process,

37:299—325, 1987.

J. Besag. On the statistical analysis of dirty pictures. J. Royal Stat. Soc., Series
B, 48(3):259—302, 1986.

D. Blostein and N. Ahuja. Shape from texture: Integrating texture-element
extraction and surface estimation. IEEE Trans. Pattern Anal. Machine Intell,

11(12y1233-1251,1989.

A. C. Bovik. Properties of multichannel texture analysis ﬁlters. In Proc. IEEE
Int. Conf. on Acoust., Speech, Signal Process, pages 2133—2136, Albuquerque,
New Mexico, April 1990.

A. C. Bovik, M. Clark, and W. S. Geisler. Multichannel texture analysis using
localized spatial ﬁlters. IEEE Trans. Pattern Anal. Machine Intell., 12(1):55—73,
1990.

P. Brodatz. Textures: A Photographic Album for Artists and Designers. Dover,
New York, 1966.

T. M. Caelli. An adaptive computational model for texture segmentation. IEEE
Trans. Syst., Man, Cybern, 18(1):9-17, 1988.

F. W. Campbell and J. G. Robson. Application of Fourier analysis to the visi-
bility of gratings. J. Physiology, 197:551—566, 1968.

117

 

118

[10] J. Canny. A computational approach to edge detection. IEEE Trans. Pattern

[11]

[12]

[1:3]

[14]

[15]

[16]

[17]

[18]

[19]

['20]

Anal. Machine Intell., 8(6):679—698, 1986.

G. A. Carpenter and S. Grossberg. A massively parallel architecture for a self-
organizing neural pattern recognition machine. Computer Vision, Graphics, Im-
age Process., 37:54—115, 1987.

G. A. Carpenter and S. Grossberg. A massively parallel architecture for a self—
organizing neural pattern recognition machine. In S. Grossberg, editor, Neural
Networks and Natural Intelligence, pages 251—315. MIT Press, Cambridge, MA,
1987.

R. Chellappa. Two-dimensional discrete Gaussian Markov random ﬁeld models
for image processing. In L. N. Kanal and A. Rosenfeld, editors, Progress in
Pattern Recognition 2, pages 79—112. Elsevier Science, 1985.

R. Chellappa, S. Chatterjee, and R. Bagdazian. Texture synthesis and com-
pression using Gaussian-Markov random ﬁeld models. IEEE' Trans. Syst., Man,
Cybern, 15(2):298-303, 1985.

R. T. Chin. Algorithms and techniques for automated visual inspection. In
T. Y. Young and K. S. Fu, editors, Handbook of Pattern Recognition and Image
Processing, pages 587—612. Academic Press, 1986.

C. C. Chu and J. K. Aggarwal. The integration of region and edge-based segmen-
tation. In Proc. International Conference on Computer Vision, pages 117—120,

Osaka, Japan, December 1990.
P. Cielo. Optical Techniques for Industrial Inspection. Academic Press, 1988.

M. Clark and A. C. Bovik. Experiments in segmenting texton patterns using
localized spatial ﬁlters. Pattern Recognition, 22(6):707—717, 1989.

J. M. Coggins. A Framework for Texture Analysis Based on Spatial Filtering.
PhD thesis, Dept. of Computer Science, Michigan State University, East Lansing,
MI 48824-1027, 1982.

J. M. Coggins and A. K. Jain. A spatial ﬁltering approach to texture analysis.
Pattern Recognition Letters, 3(3):195—203, 1985.

119

[21] J. M. Coggins and A. K. Jain. Surface orientation from texture. In Proc. IEEE
Int. Conf. on Syst., Man, Cybern, pages 1617-1620, Atlanta, GA, 1986.

[22] Y. Cohen and M. S. Landy. The hips picture processing software. Technical
report, Psychology Dept., New York University, 1982.

[23] R. W. Conners, C. W. McMilin, K. Lin, and R. E. Vasquez-Espinosa. Identifying
and locating surface defects in wood: Part of an automated lumber processing

system. IEEE Trans. Pattern Anal. Machine Intell., 5(6):573—583, 1983.

[24] G. R. Cross and A. K. Jain. Markov random ﬁeld texture models. IEEE Trans.
Pattern Anal. Machine Intell., 5(1):25—39, 1983.

[25] J. G. Daugman. Two—dimensional spectral analysis of cortical receptive ﬁeld
proﬁles. Vision Research, 20:847—856, 1980.

[26] J. G. Daugman. Uncertainty relation for resolution in space, spatial-frequency,

and orientation optimized by two-dimensional visual cortical ﬁlters. J. Opt. Soc.
Amer., 2(7):1160—1169, 1985.

[27] J. G. Daugman. Complete discrete 2-d Gabor transforms by neural networks for
image analysis and compression. IEEE Trans. Acoust., Speech, Signal Process.,

36(7):1169—1179, 1988.

[28] R. L. De Valois, D. G. Albrecht, and L. G. Thorell. Spatial-frequency selectivity

of cells in macaque visual cortex. Vision Research, 22:545—559, 1982.

[29] R. L. De Valois and K. K. De Valois. Spatial Vision. Oxford University Press,
1988.

[30] P. A. Devijver and J. Kittler. Pattern Recognition: A Statistical Approach.
Prentice-Hall, London, 1982.

[31] N. R. Draper and H. Smith. Applied Regression Analysis. John Wiley, New
York, 2nd edition, 1981.

[32] R. C. Dubes. How many clusters are best? —- an experiment. Pattern Recogni-

tion, 20(6):645—663, 1987.

120

[33] O. D. Faugeras. Texture analysis and classiﬁcation using a human visual model.
In Proc. Int. Conf. on Pattern Recognition, pages 549—552, Tokyo, Japan, 1978.

[34] L. A. Fletcher and R. Kasturi. A robust algorithm for text string separation
from mixed text/graphics images. IEEE' Trans. Pattern Anal. Machine Intell.,
10(6):910—918, 1988.

[35] K. F ukunaga and P. M. N arendra. A branch and bound algorithm for computing
k-nearest neighbors. IEEE Trans. Comput., 24:750—753, 1975.

[36] D. Gabor. Theory of communication. J. Inst. Elect. Engr., 93:429—457, 1946.

[37] G. Gagliardi, G. F. Hatch, and N. Sarkar. Machine vision applications in the
food industry. In Proc. SME Vision Conf., pages 6—40 through 6—54, Detroit,
Michigan, 1985.

[38] A. P. Ginsburg. Visual information processing based on spatial ﬁlters constrained
by biological data. Technical Report AMRL-TR-78-129, Air Force Aerospace
Medical Research Laboratory, December 1978.

[39] N. H. Goddard, K. J. Lynne, T. Mintz, and L. Bukys. Rochester connection-
ist simulator. Technical Report TR 233 (revised), Dept. of Computer Science,
University of Rochester, October 1989.

[40] S. Grossberg, editor. Neural Networks and Natural Intelligence. MIT Press,
Cambridge, MA, 1988.

[41] J. F. Haddon and J. F. Boyce. Image segmentation by unifying region and
boundary information. IEEE Trans. Pattern Anal. Machine Intell., 12(10):929-—
948, 1990.

[42] Y. Hara, H. Doi, K. Karasaki, and T. Iida. A system for PCB automated
inspection using ﬂuorescent light. IEEE Trans. Pattern Anal. Machine Intell.,
10(1):69—78, 1988.

[43] R. M. Haralick. Statistical and structural approaches to texture. Proceedings of
the IEEE, 67(5):786-804, 1979.

[44] P. E. Hart. The condensed nearest neighbor rule. IEEE Trans. Inform. Theory,
14:515-516, May 1968.

 

121

[45] S. Haykin. Communication Systems. John Wiley & Sons, 1978.

[46] R. Hoffman and A. K. Jain. Segmentation and classiﬁcation of range images.
IEEE Trans. Pattern Anal. Machine Intell., 9(5):608—620, 1987.

[47] J. Y. Hsiao and A. A. Sawchuk. Unsupervised textured image segmentation using
feature smoothing and probabilistic relaxation techniques. Computer Vision,
Graphics, Image Process., 48:1—21, 1989.

[48] A. K. Jain. Experiments in texture analysis using spatial ﬁltering. In Proc.

IEEE Workshop on Languages for Automation, pages 66—70, Spain, June 1985.

[49] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall,
New Jersey, 1988.

[50] A. K. Jain, R. C. Dubes, and C.-C. Chen. Bootstrap techniques for error esti-
mation. IEEE Trans. Pattern Anal. Machine Intell., 9(5):628—633, 1987.

[51] A. K. Jain and S. G. Nadabar. Mrf model-based segmentation of range images.
In Proc. International Conference on Computer Vision, pages 667—671, Osaka,

Japan, December 1990.

[52] B. Julesz. Visual pattern discrimination. IRE Trans. Inform. Theory, 8(2):84—
92, 1962. i

[53] B. Julesz. Textons, the elements of texture perception, and their interactions.
Nature, 290:91—97, 1981.

[54] B. Julesz. Texton gradients: The texton theory revisited. Biol. Cybern., 54:245—
251,1986.

[55] B. Julesz and J. R. Bergen. Textons, the fundamental elements in preattentive
vision and perception of textures. Bell Syst. Tech. J., 62(6):1619—1645, 1983.

[56] B. Julesz, E. N. Gilbert, L. A. Shepp, and H. L. Frisch. Inability of humans
to discriminate between visual textures that agree in second-order statistics —
revisited. Perception, 2:391—405, 1973.

[57] B. Julesz, E. N. Gilbert, and J. D. Victor. Visual discrimination of textures with
identical third-order statistics. Biol. Cybern., 31:137—140, 1978.

122

[58] R. L. Kashyap and A. Khotanzad. A model-based method for rotation invariant
texture classiﬁcation. IEEE Trans. Pattern Anal. Machine Intell., 8(4):472-481,
1987.

[59] J. M. Keller and S. Chen. Texture description and segmentation through fractal
geometry. Computer Vision, Graphics, Image Process., 45:150—166, 1989.

[60] A. Khotanzad and J. Y. Chen. Unsupervised segmentation of textured images by
edge detection in multidimensional features. IEEE Trans. Pattern Anal. Machine
Intell., 11(4):414—421, 1989.

[61] K. I. Laws. Textured image segmentation. Technical Report USCIPI-940, Image

Process. Inst., University of Southern California, 1980.

[62] M. D. Levine. Vision in Man and Machine. McGraw-Hill, 1985.

[63] R. P. Lippmann. An introduction to computing with neural nets. IEEE ASSP
Magazine, pages 4—22, April 1987.

[64] R. P. Lippmann. Pattern classiﬁcation using neural networks. IEEE Communi-

cations Magazine, pages 47—64, November 1989.

[65] J. Malik and P. Perona. A computational model of texture segmentation. In
Proc. IEEE Computer Soc. Conf. on Computer Vision and Pattern Recognition,
pages 326—332, San Diego, CA, 1989.

[66] J. Malik and P. Perona. Preattentive texture discrimination with early vision
mechanisms. J. Opt. Soc. Amer. A, 7(5):923-932, 1990.

[67] S. G. Mallat. Multifrequency channel decomposition of images and wavelet mod-
els. IEEE Trans. Acoust., Speech, Signal Process., 37(12):2091—2110, 1989.

[68] S. G. Mallat. A theory for multiresolution signal decomposition: The wavelet
representation. IEEE Trans. Pattern Anal. Machine Intell., 11(7):674—693, 1989.

[69] B. S. Manjunath, T. Simchony, and R. Chellappa. Stochastic and determin-
istic networks for texture segmentation. IEEE Trans. Acoust., Speech, Signal

Process., 38(6):1039—1049, 1990.

123

[70] S. Marcelja. Mathematical description of the responses of simple cortical cells.

J. Opt. Soc. Amer., 70:1297—1300, 1980.
[71] D. Marr. Vision. Freeman, San Francisco, 1982.

[72] D. L. Milgram. Region extraction using convergent evidence. Computer Graph-
ics, Image Process., 11:1—12, 1979.

[73] G. W. Milligan and M. C. Cooper. A study of standardization of variables in
cluster analysis. J. Classiﬁcation, 5:181—204, 1988.

[74] M. Minsky and S. Papert. Perceptrons: An Introduction to Computational Ge-
ometry. MIT Press, 1969.

[75] T. Pavlidis and Y. Liow. Integrating region growing and edge detection. IEEE
Trans. Pattern Anal. Machine Intell., 12(3):225-233, 1990.

[76] A. P. Pentland. Fractal-based description of natural scenes. IEEE Trans. Pattern
Anal. Machine Intell., 6(6):661—674, 1984.

[77] A. Perry and D. G. Lowe. Segmentation of textured images. In Proc. IEEE
Computer Soc. Conf. on Computer Vision and Pattern Recognition, pages 326-
332, San Diego, CA, 1989.

[78] D. A. Pollen and S. F. Ronner. Visual cortical neurons as localized spatial

frequency ﬁlters. IEEE Trans. Syst., Man, Cybern., 13(5):907—916, 1983.

[79] M. Porat and Y. Y. Zeevi. The generalized Gabor scheme of image representation
in biological and machine vision. IEEE Trans. Pattern Anal. Machine Intell.,
10(4):452—468, 1988.

[80] T. R. Reed and H. Wechsler. Segmentation of textured images and Gestalt or-
ganization using spatial / spatial-frequency representations. IEEE Trans. Pattern

Anal. Machine Intell., 12(1):1—12, 1990.

[81] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal rep—
resentations by error propagation. In D. E. Rumelhart and J. L. McClelland,
editors, Parallel Distributed Processing: Explorations in the Microstructures of
Cognition, volume 1, pages 318-362. MIT Press, Cambridge, MA, 1986.

124

[82] M. Sachs, J. Nachimas, and R. J. Spatial-frequency channels in human vision.
J. Opt. Soc. Amen, 61:1176—1186, 1971.

[83] D. J. Sakrison. On the role of the observer and a distortion measure in image
transmission. IEEE Trans. Communications, 25(11):1251—1267, 1977.

[84] L. H. Siew, R. M. Hodgson, and E. J. Wood. Texture measures for carpet wear
assessment. IEEE Trans. Pattern Anal. Machine Intell., 10(1):92—105, 1988.

[85] J. Sklansky. Image segmentation and feature extraction. IEEE Trans. Syst.,
Man, Cybern., 8(4):237-—247, 1978.

[86] H. Tamura, S. Mori, and T. Yamawaki. Textural features corresponding to visual
perception. IEEE Trans. Syst., Man, Cybern., 8(6):460—473, 1978.

[87] T. N. Tan and A. G. Constantinides. Texture analysis based on a human visual
model. In Proc. IEEE Int. Conf. on Acoust., Speech, Signal Process., pages
2091—2110, Albuquerque, New Mexico, April 1990.

[88] W. S. Torgerson. Theory and lifethods of Scaling. Wiley, New York, 1958.

[89] M. Tuceryan and A. K. Jain. Texture segmentation using voronoi polygons.

IEEE Trans. Pattern Anal. Machine Intell., 12(2):211—216, 1990.

[90] M. R. Turner. Texture discrimination by Gabor functions. Biol. Cybern., 55:71—
82, 1986.

[91] L. Van Gool, P. Dewaele, and A. Oosterlinck. Texture analysis anno 1983. Com-
puter Vision, Graphics, Image Process., 29:336—357, 1985.

[92] H. Voorhees. Finding texture boundaries in images. Technical Report AI-TR—
986, Artiﬁcial Intelligence Laboratory, Massachusetts Institute of Technology,
1987.

[93] H. Voorhees and T. Poggio. Computing texture boundaries from images. Nature,
333(6171):364—367, 1988.

[94] S. R. Yhann and T. Y. Young. A multiresolution approach to texture segmenta-
tion using neural networks. In Proc. Int. Conf. on Pattern Recognition, volume 1,
pages 513—517, Atlantic City, NJ, June 1990.

125

[95] R. A. Young. The Gaussian derivative theory of spatial vision: Analysis of
cortical cell receptive ﬁeld line-weighting proﬁles. Technical Report GMR—4920,
General Motors Research Center, Warren, MI, 1985.

"I[llllllllllllllllll