1!.
c. or... Ir
n.0,-b...)-
.

_

{to
anon

 

Ix... . A;

Srvfn .
1 2
11:.)an r.

.
3.. . 0-.
7.v..|l'v

‘1
r: :91...

.q
.1. v; .
.nvyI!.I

v3!)
.hototp-i'

.14:n.a(

.ol.
)v ‘0.

I!
'1
.1.

z. 1.1.}.

1) ‘ f ,
Ilalnlulo.‘a
.uyll. rl4lr1..1].1
Iliutll-Vvl‘ .12'.
a! v: I...
«ovtuv

a I o.|..

1.. lot.

I 3.7171(43.
3.... .px

I)! 41..

Jurir. + A)».
I silt-v3.2:

. r.)
‘ . v: a( I... .vf
; v. '11.}..(1! arr-:1.
II“? 3.11; ‘ ._ . q
.5: ..J=!JA.: ‘1
.U ;:....;
.15.)... ‘
.
I: .
.haau.1

 

 

A o . v.
~.~...:: d. >

‘3‘. $71.. $15121}. »
.

"“5819

ullllllllllllllllllll

00794 5227

This is to certify that the

dissertation entitled

MARKOV RANDOM FIELD
CONTEXTUAL MODELS IN COMPUTER VISION

presented by
SATEESHA GOPALAKRISHNA NADABAR

has been accepted towards fulﬁllment
of the requirements for

 

 

 

PhD degreein Computer Science
Major professor
<5] 147m

Date

 

MS U i: an Afﬁrmative Action/Equal Opportunity Institution 0—12771

 

‘—

F LIBRARY
Michigan State
University

x. A

ﬁ

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

DATE DUE DATE DUE DATE DUE

 

 

 

FEB O S 199?

 

 

l

“—“ﬁ—‘él
___J|_,_
:l __

. _,

msu Is An Affirmative Action/Equal Opportunity Institution
chﬂ-DR

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MAI

MARKOV RANDOM FIELD CONTEXTUAL

MODELS IN COMPUTER VISION

By

Sateesha Gopalakrishna Nadabar

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

Computer Science Department

1992

MAF

Many
coiiriteratil
process E“
world smil
random {1*
processes.
range ima

We pr!
The edge-1
resulting i
and segmr

the superh

~Ihe .\ll

of several 1

ABSTRACT

MARKOV RANDOM FIELD CON TEXTUAL

MODELS IN COMPUTER VISION

By

Sateesha Gopalakrishna Nadabar

Many low-level vision processes need to incorporate contextual information to
counteract the effects of sensor noise and incomplete knowledge about the sensing
process and the scene. Context represents a priori assumptions about the physical
world such as continuity and smoothness. This research explores the use of Markov
random field (MRF) models to specify the contextual information in low-level vision
processes. Two important low-level processes considered here are: edge detection in
range images and fusion of intensity and range images.

We propose an MRF model-based algorithm to detect edges in range images.
The edge-based segmentation is integrated with a region—based segmentation scheme
resulting in a hybrid algorithm for surface segmentation. Results of edge detection
and segmentation experiments presented on several real range images demonstrate
the superior performance of the MRF model-based approach.

The MRF prior model used in our edge detection algorithm needs the speciﬁcation

of several parameters. The main difﬁculty in estimation of these parameters is that

Sufﬁi‘lt‘m l
for the ("if
of tllt’ Ill)
0f clique l
eter estiI:
prior mmf
sults usir;
pararrwtv
proceduro

The 1.
tion in tl
registe’trm

0f inform;

sufficient ground truth information is not easily available. We develop a new scheme
for the estimation of MRF line process parameters which uses geometric CAD models
of the 3D objects expected to be present in the scene. A canonical representation
of clique potentials is used to reduce the number of model parameters. This param—
eter estimation scheme is further extended to estimate the parameters of the MRF
prior model for the problem of edge detection in intensity images. Edge detection re-
sults using real range and intensity images indicate that performance with estimated
parameters is as good as with the “best” parameters selected using trial and error
procedure.

The thesis also explores the use of MRF models to capture contextual informa-
tion in the problem of detection and labeling of edges by fusing information from
registered intensity and range images. The key idea in this approach is that fusion
of information from range and intensity images can be carried out by using a sin-
gle physically meaningful edge label for each edge location. Experimental results on
several real range-intensity image pairs are presented. The issue of computational
requirements of the algorithm is also discussed, and the computational performance
of two parallel computer implementations of the algorithm on a Connection Machine
CM-2 are studied.

The results of edge detection and fusion experiments demonstrate the effectiveness
of Markov random fields as models of contextual information for low-level vision

problems.

To my parents

and my Wife, Bhagya

iv

 

1 W15
Without
my progr
just like
patient l3;
Wprrtwm
as my the.
DUl’JPs‘ \
Dubes gm
proved ll;
with Phil
for h.—

‘lpir

I mm
L‘HlVPrgit
Farroklin
develop”
envirohm

I flaw.
Greg L09.

examlfiati
96m 6mm
My Sp.
Pics have i
inn Hm1
Dian, Slim

Aldo‘ Tim
Cheri. I “X
th 0f (:00!

Jutign of I

ACKNOWLEDGMENTS

I wish to express my deep gratitude to my thesis advisor, Professor Anil Jain.
Without his guidance, support, encouragement and the occasional tough talks when
my progress was slow, this work would not have been completed. He always provided
just the right amount of criticism and technical help. My many thanks to him for
patiently reading through many a half-baked draft over the last four years. For me, he
represents the role model of a successful scientist and it has been an honor having him
as my thesis advisor. I would like to express my sincere thanks to Professors Richard
Dubes, Mihran Tuceryan, and Connie Page for serving on my committee. Professors
Dubes and Tuceryan have made many useful suggestions that have tremendously im-
proved the thesis. I would also like to acknowledge the many fruitful discussions I had
with Professor George Stockman. I thank Dr. Anbalagan of Innovision Corporation,
for helping me get a good understanding of the practical aspects of vision systems.

I would like to thank former members of the PRIP group at Michigan State
University, Chaur-Chin Chen, Sei-Wang Chen, Patrick Flynn, Joseph Miller, Farshid
Farrokhnia for their advice during my beginning years. Patrick Flynn in particular,
developed many general utilities which have contributed to improving the working
environment in the PRIP laboratory.

I have been extremely fortunate in being blessed with good friends and colleagues
Greg Lee, Narayan Raja and Debby Trytten. All four of us took our comprehensive
examinations at the same time as a Gang—of—Four. The last year of work would have
been extremely tedious but for their support and company.

My special thanks to Chitra Dorai for proofreading the thesis. Many other PRIP-
pies have helped in making this work interesting and enjoyable. I would like to thank
Qian Huang, Sushil Bhattacharjee, Sally Howden, Hans Dulimarta, Philippe Oha—
nian, Sharathchandra Pankanti, Marie-Pierre Dubuisson, Steve Walsh, Jian-Chang
Mao, Timothy Newman, Philippe Ballard, Chen Yao, John Courtney and Jinlong
Chen. I would also like to thank our lab manager John Lees for doing such a superb
job of coordinating and maintaining the computing facilities. I appreciate the contri-

bution of many of our student managers who have put in a lot of effort in providing

 

a good

unilt‘lsi
my lull;
this rile:

:\(,‘(,‘(
Clii’lliit‘l

Fina
founda:

COIIKJIC

a good software environment in the lab.

Finally, I am deeply indebted to my wife Bhagya Nadabar for her support and
understanding during the preparation of this thesis. Not only did she put up with
my long hours of stay at the lab, but also has contributed directly by proofreading
this thesis.

Access to the Connection Machine parallel computer used for experiments in
Chapter 7 of this thesis was provided by the Argonne National Laboratories.

Financial support for this research was provided in parts by the National Science
Foundation grants IRI-8901513 and ECS-8603541 and by a Fellowship from Innovision

Corporation.

vi

LIST OF

LIST OF
1 “ﬁrm
1.1 II.
1.23 .\
1-3 (.
1.4 1:.
1.3 (
1.6 (

2 Mark
2.1 3
‘J

‘2

g

2.2 3
2.3 l.
“2.4 3
2.3 (
2.6 .\
2.7 S
3 Sung
3.1 I
3

CC

TABLE OF CONTENTS

LIST OF TABLES

LIST OF FIGURES

1 Introduction

1.1 Regularization and the Role of Constraints ...............

1.2 Mechanical and Probabilistic Regularization Approaches .......
1.3 Object Recognition ............................
1.4 Range Image Acquisition .........................

1.5 Contributions of the Thesis

1.6 Organization of the Thesis ........................

2 Markov and Gibbs Random Fields
2.1 Markov and Gibbs Random Fields ....................

2.1.1 Neighborhood System

oooooooooooooooooooooo

2.1.2 Cliques ...............................
2.1.3 Markov Random Field (MRF) ..................
2.1.4 Gibbs Random Field (GRF) ...................

2.1.5 MRF-GRF Equivalence ......................
2.2 MRFs as Models of Spatial Context: Bayesian Formulation ......
2.3 Estimators for a posteriori Labels ....................

2.4 Markov Random Fields on Edge Labels: Line Process .........
2.5 Coupled Markov Random Fields .....................

2.6 Multivariate Markov Random Fields

2.7 Summary .................................

3 Survey: MRF Models in Computer Vision
3.1 Textured Image Analysis .........................
3.1.1 Texture Classiﬁcation .......................
3.1.2 Texture Synthesis .........................

3.1.3 Texture Segmentation

vii

oooooooooooooooooooooo

viii

ix

CDO'lr‘kl-l

10
15
15

17
17
18
19
2O
20
21
22
24
27
28
30
31

32
32
35
35
37

 

3.3
3.4

cu c." c.” or or

w 4- w n:

3.2 Gray Level Image Analysis ........................
3.2.1 Image Segmentation .......................
3.2.2 Image Restoration ........................
3.2.3 Edge Detection ..........................

3.3 Range Image Analysis ..........................

3.4 Multi-Sensor Image Analysis .......................
3.4.1 Color Image Analysis .......................
3.4.2 Multispectral Image Analysis ..................
3.4.3 Fusion of Range and Intensity Images ..............
3.4.4 Integration of Early Vision Modules ...............

3.5 Miscellaneous Applications of MRF ...................

3.6 Related Approaches ............................

3.7 Summary .................................

Edge Detection in Range Images
4.1 Edge Detection as Bayesian Estimation .................
4.2 Computation of Edge Likelihoods ....................
4.2.1 Jump Edge Statistic .......................
4.2.2 Crease Edge Statistic .......................
4.2.3 Difficulties with Gradients and Discontinuities .........
4.2.4 Combining Jump and Crease Edge Likelihoods .........
4.3 Edge Detection Algorithm ........................
4.4 Clique Potential Functions ........................
4.5 Experimental Results ...........................
4.6 Hybrid Method for Segmentation of Range Images ...........
4.7 Evaluation of the Edge Detector .....................

4.8 Conclusions ................................

Parameter Estimation of Line Process MRF

5.1 Estimation of Line Process Parameters from Object Models .....

5.2 Least Squares Formulation ........................

5.3 A Canonical MRF Representation: Normalized Potential Functions

5.4 Experimental Results ...........................

5.5 Edge Detection in Intensity Images ...................
5.5.1 Edge Detection Approach ....................
5.5.2 Estimation of Parameters from 2D Object Models .......
5.5.3 Experimental Results .......................

5.6 Conclusions ................................

38

41
42
43
44
45
46
47
49

50
51
52
52
55
56
57
57
58
59
66
68

74

82
83
84
87
90
107
108
108
109
115

6 Edge detr-
61 Edgy
6.? WM
6.3 Pam:
6.4 £va:
6.3 Coin ':

—.I

Implemcr
nection .\l
7.1 11M
7.3 l’aral.»
7.3 my}

rill.u.~
7-1 (‘Ulli‘lh

 

8 CODClUSIOL
8.1 (,‘UIH 1!;

8-2 Dim i;
A List of pa,

BIBLIOGRA

6.1 Edge Labeling as Bayesian Estimation .................
6.2 TPM and HCF Algorithms for Fusion .................
6.3 Parameter Estimation of Coupled MRFS ................
6.4 Experimental Results ...........................

6.5 Conclusions ................................

6 Edge detection and labeling by fusion of intensity and range images 116

118

127
132
143

Implementation of Energy Minimization Algorithms on the Con-

nection Machine

7.1 The Connection Machine .........................

7.2 Parallel TPM Algorithm .........................

7.3 Edge Labeling Performance of Parallel TPM and Parallel HCF Algo-
rithms ...................................

7.4 Conclusions ................................

Conclusions and Directions for thure Research
8.1 Conclusions ................................

8.2 Directions for Future Research ......................

A List of Parameters

BIBLIOGRAPHY

ix

145
147
148

4.1 \
5.1 (
5.2 \

.11 ’1‘
A2 '11
.13 ’1‘}
AA 1‘}

1.1

4.1

5.1
5.2

7.1

Al
A2
A.3
A.4

LIST OF TABLES

Some early vision processes and corresponding constraints. ...... 5
Visual comparison of segmentation results. ............... 74
Conditional probabilities under two representations ........... 89
Visual comparison of edge detection results. .............. 107
Execution time of the parallel TPM algorithm in seconds ........ 155
The parameters used in Chapter 4. ................... 165
The parameters used in Chapter 5. ................... 166
The parameters used in Chapter 6. ................... 166
The parameters used in Chapter 7. ................... 166

H
p—a
H

I...
F—i
H

p—e o—n p...‘ p—o
O) Q. 4‘ w '
H . I

r» a x

'(v (v I» (u Ix.) I» Iv
~l (.7) (JG w- c»: (C; >-‘
h-I-t
V

w (C; ’-‘
7’ 1‘ w

J...
p—o

(

Us:

‘1 CT) 0'
AA I l ‘1 q ‘1

J.» 4..

Q": LI)
{"7
‘5

4'10 11...;

LIST OF FIGURES

1.1 Two images of a scene at different noise levels .............. 2
1.2 Intensity values at different noise levels .................. 3
1.3 Block diagram of a typical vision system ................. 3
1.4 A typical (a) range image and (b) the corresponding intensity image. 10
1.5 Technical Arts IOOX range image scanner. ............... 11
1.6 Range image database. .......................... 12
2.1 A hierarchical ordering of neighbors for site t. ............. 18
2.2 Cliques for the first-order neighborhood of a site t ............ 19
2.3 HCF site labeling algorithm for discrete labels .............. 26
2.4 Pixel and line (edge) sites. ........................ 27
2.5 Second-order neighbors of edge sites. .................. 27
2.6 Neighborhoods of pixel and line sites ................... 29
2.7 Multivariate MRF: the label and measurement variables. ....... 30
3.1 A classiﬁcation of MRF application areas. ............... 33
3.2 Several texture patterns from the Brodatz’s album. .......... 34
3.3 Texture patterns generated using a second—order MRF. ........ 36
4.1 Edge clique types for a second-order neighborhood. .......... 52
4.2 Pixel numbering in the neighborhood of a vertical edge site. ..... 53
4.3 Distribution of J under the alternative hypothesis ............ 54
4.4 Cliques and their ad hoc potential function values ............ 58
4.5 Edge detection results for the GRNBLK3 image. ........... 61
4.6 Edge detection results for the HARRISCUP image ........... 62
4.7 Edge detection results for the COLUMNl/GRNBLK3 image. . . . . 63
4.8 Edge detection results for the AGPART2 image. ........... 64
4.9 Edge detection results for the COLUMN2/ELL image ......... 65
4.10 Hybrid segmentation results for the GRNBLK3 image ......... 69
4.11 Hybrid segmentation results for the HARRISCUP image. ...... 70

4.12 Hybrid segmentation results for the COLUMNl/GRNBLK3 image. . 71

xi

4.13 11
1.1-1 11
4.15 S.
4.16 S.
4.17 S
4.15 S
4.1?! 8
4.211 S
4.21 S
1.22 5'
4.211 E
4.2; g

«1.23 <

O! {2' Q" C," CH CM 0! c," u: U! (J!
r- ..«a (5) r]; — CT) 0' 4.. (.0 'CJ *"‘

B.
H
I'.

U! '

u! 9' 9' «J
y...‘ r-i w“ t—J F“

('1'

w c"

Q! (a
» r

L." Li"

4.13
4.14
4.15
4.16
4.17
4.18
4.19
4.20
4.21
4.22
4.23
4.24
4.25

5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
5.17
5.18
5.19
5.20
5.21
5.22
5.23
5.24

Hybrid segmentation results for the AGPART2 image. ........ 72
Hybrid segmentation results for the COLUMN2/ELL image ...... 73
Segmentation results for the GRNBLK3 image. ............ 75
Segmentation results for the HARRISCUP image ............ 75
Segmentation results for the COLUMNl/GRNBLK3 image ...... 76
Segmentation results for the AGPART2 image. ............ 76
Segmentation results for the COLUMN2/ELL image .......... 77
Segmentation results for the BIGWYE4 image ............. 77
Segmentation results for the ADAPTER/CURVBLOCK image. . . . 78
Segmentation results for the BALL2/PROPANE image. ....... 78
Segmentation results for the BLOCK4/HALFSPH image. ...... 79
Segmentation results for the CAP/BOX2INCH image. ........ 79
Segmentation results for the COL2/BALL image ............ 80
(a) Synthetic range image and (b) corresponding edge map ....... 85
Neighborhood numbering scheme ..................... 85
Neighborhood for a simple MRF. .................... 88
Clique conﬁgurations for the simple MRF. ............... 88
Clique conﬁgurations with non-zero canonical potentials. ....... 91
Canonical clique potential values estimated from 3D CAD models. . . 91
Synthetic range images of objects used for estimating clique parameters. 92

Edge detection results for the GRNBLK3 image ............. 94
Edge detection results for the HARRISCUP image. .......... 95
Edge detection results for the COLUMNl/GRNBLK3 image ...... 96
Edge detection results for the AGPART2 image ............. 97
Edge detection results for the COLUMN2/ELL image. ........ 98
Edge detection results for the BIGWYE4 image ............. 99
Edge detection results for the ADAPTER/CURVBLOCK image. . . . 100
Edge detection results for the BALL2/PROPANE image ........ 101
Edge detection results for the BLOCK4/HALFSPH image. ...... 102
Edge detection results for the CAP / BOX2IN CH image ......... 103
Edge detection results for the COL2/ BALL image. .......... 104
Edge map synthesized using 2D model of an industrial part. ..... 109
2D objects in the database. ....................... 110
Canonical clique potentials estimated from 2D models. ........ 110
Sampling of MRF line process using the Gibbs sampler ......... 112
Step edge detection results for the INDUSTRIAL PARTS image. . . . 113
Step edge detection results for the SCENE image ............ 114

xii

6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.111
6.11
6.12
6.13
6.14
6.17,

3—4

'_j c; (J. 4... W I;

-—I ~l KI fl fl fl I—l

6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
6.14
6.15

7.1
7.2
7.3
7.4
7.5
7.6
7.7

Relationship between different edge types and information sources. . .
Pixel and edge site neighborhoods. ..................
Pixel cliques and the corresponding potential functions ........
Ad hoc edge clique potential function values ..............
TPM algorithm for fusion. .......................
HCF algorithm for fusion. .......................
Neighborhood of a vertical edge site in the simpliﬁed MRF model. . .
Plot of Equation (6.10 ). ........................
Edge labeling results using TPM (OR =2 0.003 inches) .........
Edge labeling results using TPM (OR = 0.03 inches). ........
Edge labeling results for the BLOCKS image .............
Edge labeling results for the JIG image. ...............
Edge labeling results for the FUNNEL image. ............
Edge labeling results for the BLOCK4/COLUMN1 image ......
Edge labeling results for the AGPART image. ............

A coding scheme for parallel updating of variables. .........

Combined neighborhood dependency of pixel and line variables.

Parallel TPM algorithm for edge labeling with fusion .........

Speedup of parallel TPM algorithm over sequential algorithm.

HCF algorithm for edge labeling with fusion ..............

Speedup of the parallel HCF implementation over Sparcstation 2.

Edge labeling results on the JIG image .................

xiii

119
120
123
124
126
128
129
131
133
134
136
137
138
139
140

150
151
153
156
157
159
160

CH}

IIItI‘t

Computer
data are u
and C011515
$11111 111nm
striirmrw
unianiilia

Can bf) 11¢

huge arm
The t
“Oi Simp
for (:9er
as lln'ar‘}
and11q

thmlﬂh 1

actual tr

In the 11

CHAPTER 1

Introduction

Computer vision deals with machine processing of visual information. Visual sensory
data are usually obtained in the form of two-dimensional images of the physical world
and consist of measurements which depend on the imaging geometry, illumination and
structures present in the scene. The goal of any vision system is to recognize familiar
structures in the system’s environment and obtain concise symbolic descriptions of
unfamiliar structures using visual sensory data. The task of a vision system then,
can be described as data reduction in order to extract meaningful descriptions from
huge amounts of input visual data.

The task of reducing vast amounts of visual data to meaningful descriptions is
not simple. The description obtained from the visual data should remain the same
for certain changes in the scene or sensing environment [10]. This property is known
as invariance of the description and is illustrated in Figures 1.1—1.2. Figures 1.1(a)
and (b) both show the same scene, but the noise level in the two images is different,
corresponding, for example, to sensors with different sensitivity. Figure 1.2 shows the
actual gray values in a small portion of the images in Figure 1.1. Notice that even
though the human eye assigns the same interpretation to the scene in both cases, the
actual gray values are quite different. This is because the data reduction mechanism

in the human visual system is invariant to the sensor noise, within reasonable lim-

 

Figure 1.1. Two images of a scene at different noise levels.

its. Invariance is essential for a vision system since the visual system ‘lives’ in an
environment in which sensory data could change, even though the structures in the
world remain the same. For a fairly general computer vision system, invariance with

respect to the following factors is required [10].

1. Sensor noise

5"

Optical distortion
3. Viewpoint
4. Perspective distortion

5. Variations in photometric conditions.

Figure 1.3 shows a block diagram of a computer vision system. The task of the

early vision modules is to recover the physical or intrinsic properties at each image

l

u—‘r-4
l-l
.. ’4

7131

’l -l
-_. _v

1'

Figure 1.2.
t“'0 images

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

171 156 126 86 71 163 158 127 103 57

173 160 129 86 65 173 157 124 96 82

173 157 128 85 66 167 153 139 84 57

175 159 126 87 68 183 155 129 97 64

171 157 122 77 64 173 154 128 80 72
(a) (b)

Figure 1.2. Intensity values at different noise levels for the same 5 x 5 portion of the
two images in Figure 1.1.

 

Recognition of structure and
learning

 

 

 

Feature predictions I Invariant features
V

 

Derive Invariant descriptions

 

 

 

I'M

Image predictions I Intrinsic Images
ii

 

Early visual processes

 

 

 

T Sensory data

Figure 1.3. Block diagram of a typical vision system.

location 1pm
first step in t
attempt to in
many-room

configuration
this non-into
With the 111111
following w“
the solution

If either of t

1.1 R

80111 1.1101

tn;
posed P1011

\‘dilOUS cm

The Eiddjti
SUCII ”If...”
the earlv

0f COIISI t;

and 1h? ‘

location (pixel) of the corresponding 3D world point being sensed and represents the
first step in obtaining invariant descriptions. That is to say, early vision modules
attempt to invert the world-image mapping. However, the world-image mapping is a
many-tqone mapping and is not invertible because, typically, there are many world
conﬁgurations which result in the same sensory data. In addition to dealing with
this non-invertible mapping problem, early vision processes need to deal effectively
with the ubiquitous sensor noise. Therefore, early vision problems are ill-posed in the
following sense [45, 84]. A problem is well-posed if (i) a unique solution exists, and (ii)
the solution depends continuously on the data (stability against small perturbations).

If either of these conditions is not satisﬁed, the problem is said to be ill-posed.

1.1 Regularization and the Role of Constraints

Both biological and computer vision systems take the same approach to solve these i11-
posed problems— using constraints; the set of possible solutions is restricted by using
various constraints such that there is a unique stable solution to the ill-posed problem.
The addition of these constraints converts an ill-posed problem into a well-posed one.
Such methods have been called regularization methods [84]. Table 1.1 shows some of
the early vision processes and the constraints applicable to these processes. The type
of constraint used depends on the environment in which the vision system operates
and the type of information that it seeks in the environment. One can identify two
classes of constraints [21]: natural constraints, and artiﬁcial constraints. Natural
constraints (also called generic constraints) arise from the speciﬁc environment and
eliminate conﬁgurations which rarely occur in that environment. Artiﬁcial constraints
incorporate restrictions arising from the use of higher level knowledge. Some generic
constraints that vision researchers have tried to incorporate in early vision modules

are: (i) neighboring pixels are likely to have similar properties; (ii) close parallel edges

18

are unlikr‘i.‘
smooth. I:
attribute d7
for exampl
surfare dept
This type of
information
constraints.

purpose of <1

1.2 M

Table 1.1. Some early vision processes and corresponding constraints.

 

[Early Vision Module ] Constraints applicable ]

 

 

Edge detection Piecewise continuous edges

 

Surface Reconstruction Piecewise continuous surfaces

 

Shape from Shading Lambertian surfaces
Structure from Motion Rigidity

 

 

 

 

 

 

are unlikely; (iii) isolated edge elements are improbable; (iv) physical surfaces are
smooth. The constraints presented above can all be speciﬁed as the dependency of an
attribute at a pixel in the image on the attributes in the pixels spatial neighborhood.
For example, smoothness constraint on surfaces can be speciﬁed by restricting the
surface depth value at a pixel to be close to the surface depth values at its neighbors.
This type of neighborhood dependency is called contextual dependency (or contextual
information) and the associated constraints are referred to as contextual or spatial
constraints. This thesis concentrates on modeling contextual dependency for the

purpose of detecting edges in range and intensity images.

1.2 Mechanical and Probabilistic Regularization

Approaches

Two main approaches to regularization have been proposed in computer vision litera-
ture: (i) mechanical vieWpoint [10, 94] and (ii) probabilistic viewpoint [6, 20, 38, 74].
In the mechanical vieWpoint, continuity or piecewise continuity constraints are used to
reduce the class of solutions that are acceptable. The ill-posed problem is reformulated
as an optimization problem by associating a cost with each possible solution. The cost
of a solution is taken to be the sum of the various energy components in an analogous

mechanical system (splines under tension), so the optimal solution corresponds to the

minim
the “’5
the (”f
energ."
that tilt“
in tile in
nation. a
tliert"iUf“
and 21““
inCOTlWW
by TchUi’
In tht.’ l
tic promo“.
statistital t
lem. A Mai
to the ill-[M
statistical p
tures (regula
dependcntiw
the in‘iage pl.
Specified by a
Szelislri :51;
[1011118 for rt'gt
the constraints
the PFOliabilist

reasons.

1. '
The med],-

minimum energy state of a mechanical system [10]. Because of the analogy between
the cost function and the energy of a mechanical system, this approach is known as
the energy function minimization approach. In ‘standard regularization’ methods, the
energy function corresponds to the energies of generalized splines [84]. This means
that the solution is too smooth to be faithful at locations where discontinuities exist
in the image. However, changes in surface geometry, surface composition, and illumi-
nation, all give rise to discontinuities in the intrinsic characteristics of the scene and
therefore, it is important to retain the location of discontinuities in the solution. Blake
and Zisserman [10] use ‘weak continuity constraints’ that can occasionally break, to
incorporate discontinuities into the cost function. A similar approach is also taken
by Terzopoulos [94] by using the so called continuity control functions.

In the probabilistic viewpoint, the structure in the image is modeled as a stochas-
tic process, the a priori knowledge is modeled by a Markov random ﬁeld (MRF), and
statistical estimation techniques are used to obtain a solution to the ill-posed prob-
lem. A Markov random ﬁeld associates a probability value to each possible solution
to the ill-posed problem according to a Gibbs Distribution, which is well known in
statistical physics [38]. What is more important, a signiﬁcant portion of the struc-
tures (regularities) likely to be present in the scene are locally speciﬁed as contextual
dependencies in the form of statements such as “points that are close together in
the image plane usually belong to the same surface”. Such local structures can be
speciﬁed by a locally dependent MRF.

Szeliski [92] has shown the relationship between mechanical and probabilistic view-
points for regularization. It is shown that by selecting an appropriate MRF model,
the constraints speciﬁed by the mechanical viewpoint can be exactly speciﬁed using
the probabilistic viewpoint. We prefer the probabilistic approach for the following

reasons.

1. The mechanical vieWpoint does not allow the speciﬁcation of spatially nonhomo—

3. Tl
ddt

ne'd

4. If...»
pnw
than '

Even lll'

mechanical ‘
specified. it i.

specified as S1

f

probabilitim
Markov ra

appealing {ran
fundamental «l

I
I

used in compul

lllf.’ prior plulm

  
 
  
 

search terlmiq 21'

expensive.

in imporra:
t

innate pariumnh

the difficulty in

geneous interactions which are useful for modeling top-down constraints. These

constraints can be captured using a nonhomogeneous MRF model [64].

2. The mechanical vieWpoint does not naturally allow the speciﬁcation of mutual
constraints in a multiple—source information fusion problem. MRF models can

be easily extended to multivariate observations.

3. The probabilistic vieWpoint has the advantage that random variation in the
data can be incorporated into the model (e.g., use of statistical estimates of

mean and covariance matrices in multispectral classiﬁcation [103]).

4. The mechanical viewpoint. can incorporate only Gaussian degradation (noise)
processes. The probabilistic viewpoint allows modeling of complicated (other

than Gaussian) image degradation processes.

Even though we take the probabilistic vieWpoint here, we do make use of the
mechanical vieWpoint at times. Speciﬁcally, when smoothness constraints are to be
speciﬁed, it is easier to take a mechanical vieWpoint since these constraints are better
speciﬁed as spline energies. It is, however, straightforward to convert the energies to
probabilities [92, 36, 88].

Markov random ﬁelds (and regularization in general) constitute an intuitively
appealing framework in which to cast early vision problems. However, there are two
fundamental difficulties that need to be addressed before this approach can be widely
used in computer vision: (i) there are no good methods to estimate parameters of
the prior probability density which represents the contextual information, and (ii)
search techniques for maximizing the a posteriori density are computationally very
expensive.

An important advantage of the probabilistic approach, namely, the ability to es-
timate parameters of the contextual model, has not been realized so far because of

the difficulty in obtaining true realizations (ground truth) of the contextual model.

This 1h?“
truth infﬁ
scene conl
iiiforn'ldll“
the>l2lllg i
ground Ir”
Th? 5‘“
been ailill‘t‘
istif ﬁl’l’r“)
order to {Hi
quire sevela
slow for mm
computers a:
sion tasks [3‘
parallel mm;
computatimn-

ers.

1.3 Ob

‘ .
iUIQIIJdHC rm'
I

at J I
‘

This thesis proposes an engineering solution to the problem of obtaining the ground
truth information. The thesis advocates the use of geometric models of “typical”
scene conﬁgurations (scene consisting of “typical” objects) to generate ground truth
information. Speciﬁcally, we relate the geometric and probabilistic models by syn-
thesizing images using the geometric model and treating the synthesized images as
ground truth information for estimating the parameters of the probabilistic model.
The second difﬁculty in using MRF models, namely, computational burden, has
been addressed by several researchers [10, 20, 90, 36, 79, 94]. Reasonable determin-
istic approximations to stochastic optimization algorithms have been suggested in
order to reduce the computational time. These deterministic approximations still re-
quire several minutes of computation time on a sequential machine and therefore are
slow for most vision applications. It is generally acknowledged that massively parallel
computers are necessary to run these algorithms at speeds required for real-world vi-
sion tasks [38, 79]. This thesis implements two MRF-based algorithms on a massively
parallel computer (Connection Machine CM-2) in order to study the improvement in
computational performance of these algorithms over traditional single-CPU comput-

CI‘S.

1 .3 Object Recognition

Automatic recognition of objects is an important task which has received considerable
attention in computer vision literature [7]. The bulk of this interest is due to the
importance of object recognition in machine vision industry for automation of certain
tasks that are either (i) too routine, (ii) too expensive, or (iii) too dangerous for
humans to perform. Recognition involves identiﬁcation of objects present in the scene
and determining their position and orientation with respect to a reference coordinate

system (pose estimation).

Mos
{are} {:11
vision P
scene {W
tlit‘fst’ 8""
texture [
stereo tec
methwls }
cause of l
recent year
shape rue-as
taiued :3").
as range im
the correspt
each pixel v;
pOint Iowan
intensity in),-

The task
melllS )5 exfr
OfIhlS Chap“.
There

fore. mt

resulting in or

role in obtaiui:

To illustrat

ures l.1~l.‘2. If

9

Most 3D object recognition systems depend on the availability of 3D shape (sur-
face) measurements of the objects as input. Therefore, much of the effort in early
vision processes is directed towards obtaining the 3D shape of the objects in the
scene from one or more intensity images. Some of the well known approaches among
these are: (i) shape-from-stereo [89], (ii) shape—from-shading [42], (iii) shape-from-
texture [11], and (iv) shape—from-contour [98]. With the exception of shape-from-
stereo techniques which have been used for outdoor navigation systems [89], all the
methods mentioned above have not been used in any complete vision system be-
cause of high computational requirements and restrictive model assumptions. In
recent years, there has been an increasing interest in an engineering approach to 3D
shape measurement problem, where direct depth measurements of 3D objects are ob-
tained [59, 85]. The resulting depth map can be represented as a 2D image, known
as range image. In a range image each pixel value represents the range or distance of
the corresponding scene point from the camera. In contrast, in an intensity image,
each pixel value represents the amount of light reflected from the corresponding scene
point towards the camera. Figure 1.4 illustrates this difference between range and
intensity images of a scene.

The task of recognizing objects directly from either intensity or range measure-
ments is extremely difficult because of the many factors mentioned in the beginning
of this chapter (sensor noise, viewpoint changes, etc.) that affect these measurements.
Therefore, most object recognition systems use more than one stage of processing,
resulting in one or more intermediate representations. Segmentation plays a major
role in obtaining these intermediate representations.

To illustrate the importance of segmentation, consider the images shown in Fig-
ures 1.1-1.2. If the pixel values in Figure 1.2(a) and (b) are compared directly, then
it is difficult to determine that they are sensed from the same surface points; on the

other hand, if the vision system uses the fact that there is a large gradient (or edge)

 

Figure 1.4. A typical (a) range image and (b) the corresponding intensity image.

between the third and fourth columns in the tables, then the similarity between the
two images becomes clearer. Therefore, if the vision system relies on an intermediate
representation such as edges in the image, then small changes in the measurements
do not affect the performance of the vision system drastically.

Several studies in computer vision literature have recognized the importance of
detecting edges in range images [57, 12]. We explore the use of Markov random ﬁeld
models to capture the contextual information in the problem of detecting edges in
range images. Recently, there is growing interest in using multiple sensors to increase
the robustness of vision systems [21]. This thesis also examines the use of Markov

random ﬁeld models to fuse information from range and intensity images.

1.4 Range Image Acquisition

The range images used in this thesis were acquired using a Technical Arts IOOX

Scanner [93]. This scanner uses a structured light mechanism [52] to infer the depth

11

 

O
:5
l
‘
‘
£3
2....

Figure 1.5. Technical Arts 100X range image scanner.

(range) values of the surface points in the scene. A thin laser stripe illuminates surface
points corresponding to a single row of pixels in the range image. A camera placed
such that its optical axis is at approximately 45 degrees to the plane of the laser light
senses the intensity values. Figure 1.5 shows the set up of the scanner at the Michigan
State University PRIP laboratory. The surface points that are illuminated by the laser
light can be easily separated from those that are not, using a simple thresholding
technique. The range values for these pixels are inferred from the displacement of
the laser stripe in the intensity image. The range values for other rows in the range
image are obtained by moving the object using a motorized stage, and repeating the
procedure outlined above for each row position. The acquisition time for a 240 x 240
range image is about 3 minutes. The scanner has a spatial resolution of 0.04 inches
and a depth resolution of 0.001 inches.

We use the 11 range images shown in Figure 1.6 to test our edge detection algo-

 

 

12

 

Figure 1.6. Range image database: (a) GRNBLK3 image, (b) HARRISCUP image,
(c) COLUMNl/GRNBLK3 image, ((1) AGPART2 image, (e) COLUMN2/ELL image,
(f) BIGWYE4 image.

 

 

00

Figure 1.6 (cont’d). Range image database: (g) ADAPTER/CURVBLOCK image,
(h) BALL2/PROPANE image, (i) BLOCK4/HALFSPH image, (j)
CAP/BOX2INCH image, (k) COLUMN2/BALL image.

rithm. The test in

nmmamimnwd
GRNBLK3:132
ll.\RRlS(‘1'1’: 1
COlleN1'(H{
AGPARTE:\3
COLYXIXZfll
BlGWYH 1,

ADAPTERJV
a WOQdPh
BLOCK4/t11‘
pldted h
QUVBOXII
bodnit.

C0Ll\1\2 ,

bal]

14

rithm. The test images were selected such that they represent a reasonable mix of

planar and curved surfaces. Following is a brief description of each image.

GRNBLK3: 122 x 226 image of a wooden block.

HARRISCUP: 106 x 164 image of a coffee cup.

COLUMNl/GRNBLK3: 194 x 212 image of a pair of wooden blocks.
AGPART2: 83 x 156 image of a rotationally symmetric industrial object.
COLUMN2/ELL: 240 x 240 image of another pair of wooden blocks.
BIGWYE4: 185 x 225 image of Y-shaped object.

ADAPTER/CURVBLOCK: 197 x 187 image of a plastic adapter sitting on top of

a wooden block.
BALL2/PROPANE: 215 x 190 image of a rubber ball placed next to a cylinder.

BLOCK4/HALFSPII: 129 x 171 image of a hemispherical section of a rubber ball

placed next to a rhombus shaped wooden block.

CAP/BOX2INCH: 128 x 146 image of a plastic cap placed on top of a 2 inch card-
board box.

GOLUMN2/BALL: 217 x 197 image of another wooden block placed next to a rubber
ball.

All these images are not of the same size because, in some of the images, background
pixels were deleted to reduce the amount of unnecessary processing. Some of the
images in Figure 1.6 have missing data, because some points in the scene are not
simultaneously visible from the direction of the laser light and the CCD camera. All
the images in Figure 1.6, except the COLUMN2/ ELL image were obtained from the

MSU range image database [34].

1.5 Con

This thesis exp-
inforniatioiiiii

end here are: e
Tilt“ thesis phi;
The thesis hl>()
the proldtuiitif
nnensnj'arnl r.

The main «'1

l. Dew-1.4,”
(if-lg? prur

tion :31]

~. {:59 ()f (a

Parainvn.

3. App]

it at]
rangp

51m

. Stud}. ()f

15

1.5 Contributions of the Thesis

This thesis explores the use of Markov random ﬁeld models to specify the contextual
information in low-level vision processes. Two important low-level processes consid-
ered here are: edge detection in range images and fusion of intensity and range images.
The thesis proposes an MRF model-based algorithm to detect edges in range images.
The thesis also explores the use of MRF models to capture contextual information in
the problem of detection and labeling of edges by fusing information from registered
intensity and range images.

The main contributions of this thesis in the order of importance are:

1. Development of a new scheme for the estimation of parameters of an MRF
edge process by relating geometric and probabilistic models of the domain (Sec-

tion 5.1);

2. Use of canonical potential representation to reduce the number of MRF model

parameters (Section 5.3);
3. Application of MRF model for edge detection in range images (Chapter 4);

4. Application of MRF models to obtain labeled physical discontinuities by fusing

range and intensity images (Section 6.1);

5. Study of computational aspects of two parallel algorithms for the sensor fusion

problem on the Connection Machine parallel computer (Chapter 7).

1.6 Organization of the Thesis

The thesis is organized as follows. Chapter 2 gives some notations and definitions
used throughout the rest of the thesis. Chapter 3 is a review of existing literature

on Markov random ﬁelds and two related research topics— relaxation labeling and

anisotropic diliusiu
ndjustilies the in
a method for mml
problem of fragnn

Chapter 5 pre
ﬁeld defined on e
in Chapter 4 int ‘
scheme with t-stii
chosen paramett-
like problem ()I ('

Chapter 6 l”
meUSf' Of (ﬁg.
i‘i‘r-C"I’P(th"it<> (on
the two inldhlt‘ﬁ.

Chapter 7 p
Chapter 6 on a

Finally (ill;

outlines some (

16

anisotropic diffusion. Chapter 4 presents the range image edge detection algorithm
and justiﬁes the need for a parameter estimation scheme. This chapter also describes
a method for combining edge-based and region-based segmentations to overcome the
problem of fragmented boundaries.

Chapter 5 presents the scheme for estimation of parameters of a Markov random
ﬁeld deﬁned on edge sites. The estimated parameters are used with the algorithm
in Chapter 4 for range image edge detection. The performance of the edge detection
scheme with estimated MRF model parameters is compared with that of empirically
chosen parameters. This chapter also extends the parameter estimation method for
the problem of edge detection in intensity images.

Chapter 6 presents an algorithm for fusion of range and intensity images for the
purpose of edge detection and labeling. The algorithm uses Markov random ﬁelds to
incorporate contextual information as well as a framework to fuse information from
the two images.

Chapter 7 presents details of implementation of the fusion algorithm presented in
Chapter 6 on a Connection Machine parallel computer.

Finally, Chapter 8 gives the conclusions drawn from the research in the thesis and

outlines some directions for future research.

CHAPTER 2

Markov and Gibbs Random Fields

This chapter introduces some notations and deﬁnitions used throughout this thesis.
Section 2.1 presents some basic concepts of Gibbs and Markov random ﬁelds. Sec-
tion 2.2 develops the Bayesian formulation for labeling problems in which the spatial
prior information is represented using an MRF. Section 2.3 gives notations for MRFs
deﬁned on edge lattice instead of the familiar pixel lattice. Sections 2.4 and 2.5

present the notations for coupled MRFs and multivariate MRFS.

2.1 Markov and Gibbs Random Fields

Let X = {X1,X2, ...XM} be the M—tuple random vector representing the colors or
labels at the M sites of an image lattice, S. We will represent realizations of random
variables by the corresponding lower case symbols. For example, we represent a
realization of the random vector X by a vector x. A subset of the random variable
X deﬁned on sites in B C 8 is denoted by X3. Although the image is usually a
2-dimensional array, a linear ordering of the pixels (sites) of the image is assumed for
notational convenience. Each X, takes a value from the set of labels, A. The range
of labels could be either discrete or continuous. Therefore, for discrete variables A =

{1,...,G} and for continuous variables A = ’R, the set of real numbers. In order to

17

avoid duplit at in]
probabilities of l
denoted by hi .'

denoted by fx \

2.1.1 Neig

A neighborhood

.l, the neighlmr
l. The site t

2. lfs is a in

18

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2.1. A hierarchical ordering of neighbors for site t.

avoid duplication, we give deﬁnitions only for the discrete case. We use P(.) to denote
probabilities of named events. For a random variable X, the density function will be
denoted by fx and the conditional density with respect to a random variable Y is

denoted by leYc Let the set of all possible labelings be Q.

2.1.1 Neighborhood System

A neighborhood system, N, on a lattice, S, is a collection of subsets, M of S, where

M, the neighborhood of site t, satisﬁes the following two criteria.
1. The site t is not a neighbor of itself: t é M,Vt
2. If s is a neighbor of t, then t is a neighbor of s: ifs E M then t E N,,V(s,t)

Some commonly used neighborhood systems in image processing are shown as
a hierarchy in Figure 2.1. The number at each pixel represents the order of the
neighborhood system in which they ﬁrst appear as neighbors of t. Notice that this
hierarchy is not the only possible hierarchy. For example, one can create another
hierarchy by naming the neighbors according to their city-block distance from the

site of interest.

Nt‘lglilmh H
Two different d
sidered to be l
near the bourn
free boundary
their neighhur.
bC‘ﬂilddU' [Inuit

model for the

determined by

2.1.2 Clit

Consider tin
and an edge is
of each other.

0i each other.

connected grey

fir

31‘ and seen

and 2l. resper

Ola site t.

19

 

1
ltllxlcl

Figure 2.2. Cliques for the first-order neighborhood of a site t.

Neighbors of sites near the boundary of the lattice need to be deﬁned separately.
Two different definitions are possible depending on whether the boundaries are con-
sidered to be periodic or free boundaries. In the periodic boundary model, sites
near the boundary of the lattice have as many neighbors as interior sites. In the
free boundary model, these sites have only the sites inside the lattice boundary as
their neighbors, so they have fewer neighbors than for interior sites. We use the free
boundary model for the edge detection problem in Chapter 4 and a periodic boundary
model for the fusion problem in Chapter 6. The choice of the boundary model was

determined by implementation considerations.

2.1.2 Cliques

Consider the graph defined by {5,N}. Here, the sites form the vertices of the graph
and an edge is placed between two vertices if the corresponding sites are neighbors
of each other. A clique is a subset c C S in which every pair of sites are neighbors
of each other. That is, the subgraph formed by the sites in a clique is a completely
connected graph. Let C denote the set of all cliques in {5,N}. For example, for the
ﬁrst- and second-order neighborhoods in Figure 2.1, the number of cliques in C are 5
and 21, respectively. Figure 2.2 shows the five cliques for the ﬁrst-order neighborhood

of a site t.

2.13 Mar

A Markov rainh

propeﬁ 1""
l. PositiVllF'

tram-w

ator.

The p()>lll\‘l
ability of ort‘ut‘
from Markov rl
site I depends (
the remaining 5

not necessarily

‘

more sense to 11

model Wit h but

2.1.4 Gib

While a Marko

conditional pro

HE‘S of the imaE
'\ Gibbs r:
a joint distril

5611Mile

)1]

Space. {2

20

2.1.3 Markov Random Field (MRF)

A Markov random ﬁeld is a probability distribution on X which satisﬁes the following

properties.
1. Positivity: P(x) > 0 for all x E Q.

2. Markovianity: P(.rtlx5\{,}) = P(Itlxdv,),Vt, where \ is the set difference oper-

ator.

The positivity property states that every conﬁguration in Q has nonzero prob-
ability of occurrence. The second property, Markovianity, is an intuitive extension
from Markov chain theory and states that the distribution of value (or colors) at a
site t depends only on the values at the neighbors of site t and not on the values at
the remaining sites. The word neighbor is used here to denote any site in M, and
not necessarily a spatial neighbor of the site t, although in image processing it makes
more sense to use spatial neighbors. For example, Geman et al. [37] have proposed a

model with both spatial and random neighbors.

2.1.4 Gibbs Random Field (GRF)

While a Markov random field speciﬁes the local properties of the image in terms of
conditional probabilities P(IrtlxM), a Gibbs random field specifies the global proper-
ties of the image directly.

A Gibbs random field on a lattice S, for a given neighborhood system N, is
a joint distribution which is speciﬁed by the following probability measure on the

sample space 0 for the lattice.

P(x) = e-UW/z, (2.1)

where

and lle) is lllt
Clique c. The f‘
The suniniat lui
continuous. '1‘}
to extremely l.

the fact that l

distribution of

2.1.5 MI

One of the ma
distribution is
lent makes it

The Hammer

in terms of «7]
Theorem 2,
GRF and vit-

A t."Piral
the associah.

MRF for

CO

neighborhor»

distribm ion

21

where

U(X) = Z VAX), (2-2)

CEC

z = Z e-Ul"), (2.3)

x60
and Vc(x) is the potential function (alternatively, clique function) associated with the
clique c. The function Z is a normalizing constant known as the partition function.
The summation in the definition of Z should be replaced by an integral when x is
continuous. The computation of Z is extremely tedious, except in trivial cases, due
to extremely large cardinality of the set Q. The power of the GRF models lies in
the fact that by controlling the relatively few functions VC(.), the form of the joint

distribution of X can be completely speciﬁed.

2.1.5 MRF-GRF Equivalence

One of the major difﬁculties with using MRFS is to verify whether a given conditional
distribution is indeed valid or satisﬁes the deﬁnition of an MRF [5]. This validity prob-
lem makes it very difﬁcult to specify an MRF in terms of conditional distributions.
The Hammersley-Clifford theorem, stated below, allows the speciﬁcation of an MRF
in terms of clique potentials [5].

Theorem 2.1 : For a given neighborhood system, a unique MRF exists for every
GRF and vice versa.

A typical use of the theorem is to model the image globally by a GRF using
the associated clique potentials and then use the local dependence property of the
MRF for computational advantage since only a small number of variables in the
neighborhood of a site need to be used to compute the locally dependent conditional

distribution at that site [38].

2.2 MI
Ba:

Markov randu
formation in r:
l. ' ‘
pfOlHt‘lll inmix
number of clas:
image segment
pixel. Where :5,
regions. \Vt- at.

l-
iaheling prulilt'l

Contextual info.

problems. the tr
neighboring site:
world such as n

models of (ontc'x
spatial prior. We
the Bayesian {ran
Markov random ii

Let X 2 {X

labels at each of tl

I.

i

l;
{“9” are . .-
C, a) talm-s/ '

    

Site labels. One '
, is

spat
ial dependenr iv

distribution P/Xl

22
2.2 MRFS as Models of Spatial Context:
Bayesian Formulation

Markov random ﬁeld models have been successfully used to represent contextual in-
formation in many ‘site’ labeling problems [6, 20, 22, 29, 30, 38, 74]. A site labeling
problem involves classification of each site (pixel, edge element, region) into a certain
number of classes based on an observed value (or vector) at each site. For example, in
image segmentation problems, pixels are labeled based on observed value(s) at each
pixel, where the number of labels correspond to the number of “true” segments or
regions. We assume that the desired labeling is the “true” image. The goal of site
labeling problem then is to recover the true image from the noisy observed image.
Contextual information plays an important role here because in many site labeling
problems, the true label of a site is compatible in some sense with the labels of the
neighboring sites. Context represents our a priori assumptions about the physical
world such as continuity and smoothness. Markov random ﬁelds are appropriate
models of context because they can be used to specify the spatial dependency or
spatial prior. We present below, the procedure for deriving an optimal labeling in
the Bayesian framework by modeling the a priori knowledge about site labels as a
Markov random ﬁeld.

Let X = {X1,X2, ..., XM} be the M-tuple random vector representing the ‘true’
labels at each of the M sites. Vector Y = {Yb 16,..., YM} represents the observations
(6.9., gray values) at the M sites. Thus we have two types of information about the
site labels. One is in the form of measurements Y and the other is in the form of
spatial dependencies among the set of labels X represented by the prior probability

distribution P(x). These two types of information can be combined optimally using

the Bayes. rule.

Contextual info:
statistical (leper
t0 the Gibbs pn
tion 12;“) Mini
neighboring sitt
and continuity
l5 tlét’tl here “It
not expected it
Assumption ~33
\
For cornputati1

”dependent ()l

W'
(J!

23

the Bayes’ rule. The a posteriori probability of X given Y is given by

P(xly) : inxjr:l(:))P(X) (2.4)

 

Assumption 1: prior probability distribution

 

Contextual information is incorporated through a Markov random ﬁeld model of the
statistical dependence among the labels on neighboring sites in X, which is equivalent
to the Gibbs process [5]. The potential functions, VC(.) of the GRF model (see Equa-
tion (2.2)) encode our a priori knowledge about the spatial dependence of labels at
neighboring sites. For site labeling problems, natural constraints such as smoothness
and continuity are enforced using VC(.). We emphasize that the random ﬁeld model
is used here only to incorporate spatial context into the site labeling problem and is
not expected to be an accurate model of the true image.

Assumption 2: Class conditional density

 

For computational reasons, it is assumed that the observations yt are conditionally
independent of each other [6] Therefore, the density for Y, given the true labeling,
is

fY|X(YlX) = “fitﬂytlxd- (2.5)

Note that f(y,|:rt) is the conditional density function for the observation Yt, given
the true label as, at site t. This assumption on the conditional density corresponds to
assuming an independent and identically distributed noise process. In practice, this
assumption is often violated because of the presence of image texture, or correlated
noise. The independence assumption has been used widely not only in MRF literature
but in computer vision as a whole because, it gives a reasonable approximation to
the actual noise processes for many problems and also it enables the development of
simple and efficient algorithms [6, 74, 83]. We emphasize that this assumption is not

due to a limitation of the MRF models. For a formulation with more general noise

process. we refer
tinder assun

site labels X. 2n

where Zy l5 a H

The local ( l
random field.

neighbors of pi

Where 2! is a .

where the sur

2.3 Es

The site 1

,

(1 H‘
5’, estimate t
”to

St termS

( ._ .
‘Mdklmum a

24

process, we refer the reader to [38].
Under assumptions (1) and (2), the a posteriori probability distribution for the

site labels X, given the observations Y = y also has the form of a Gibbs random ﬁeld
P(xly) = e-U‘X'yt/Zy, (2.6)

where Z), is a normalizing constant, and the corresponding energy function is

A!

U(X|Y) = Zl-1n(f(ytlri))l+ Z V.(x). (2-7)

i=1 CEC

The local characteristics of a Markov random ﬁeld can be derived from the Gibbs
random ﬁeld. Let XArt be a (vector) random variable representing the labels of

neighbors of pixel t. The conditional probability of Xt can be written as follows [5]:
P(aixwi = e‘U'W’Wszt, (2.8)
where Z: is a normalizing constant, and

Utll‘thNtaYl = ‘lnfffytlxtll + Z chxA/tla (2-9)

czt€c

where the summation is over all the cliques c to which the site t belongs.

2.3 Estimators for a posteriori Labels

The site labeling problem can now be stated as follows: given the observation vector
y, estimate the label vector x. Contextual information is represented by the right-
most terms (ZCEC Vc(x) and Zonal/Ax“) ) in Eqs. (2.7) and (2.9). The MAP

(Maximum a posteriori) estimate is the vector 5‘: which maximizes P(xly) With re-

 

 

man a:
the 11.
cornput

Sew
labeinrs
ofan er
Segnien'
ofthe ;
also t};t
each an.
Carlo n
Pesterio

llarg,h(,

uPdated
algorit hi
Ellen [n
('1 (.‘Ol’npn
Preperm,
Chou

COIIﬁdt 71‘

and C0,“;

25

spect to x. Maximizing this a posteriori probability mass function which is a function
of M variables is a formidable task since for a 256x256 image, M = 65,536. Ge-
man and Geman [38] have used simulated annealing to obtain an approximation to
the MAP estimate in the context of image restoration, but their procedure is also
computationally expensive.

Several alternative estimates have been suggested for obtaining the a posteriori
labeling. Marroquin et al. [74] have suggested that minimizing the expected value
of an error functional is a better criterion for some problems. For example, for the
segmentation problem an appropriate criterion is to minimize the expected value
of the percentage of pixels misclassified. For this criterion, the optimal labeling is
also the labeling that maximizes the a posteriori marginal probability (MPM) at
each site. They propose a method of computing the MPM estimate using a Monte
Carlo method. W’hen rt is continuous, the optimal estimate is the mean of the
posterior marginal distribution at each site and is called the Thresholded Posterior
Marginals (TPM) estimate [74]. Besag [6] proposed a deterministic iterative scheme
called Iterated Conditional Modes (ICM), in which the labeling of a site is iteratively
updated so as to maximize the local conditional probabilities P(szva). This
algorithm ﬁnds a solution that corresponds to a local minimum of the energy function
given in Eq. ( 2.7). Two important advantages of the ICM algorithm are: (i) it is
a computationally attractive scheme, and (ii) it avoids the undesirable large scale
properties of the MRF prior (see [6] for details).

Chou et al. [20] have proposed another estimation algorithm known as the Highest
Conﬁdence First (HCF) algorithm. The HCF algorithm, like ICM, is deterministic
and computationally attractive. Because of these advantages we use the HCF algo-
rithm in this thesis. The HCF algorithm is similar to the ICM algorithm except for
the order in which the sites are visited. While the ICM algorithm does not impose

any single order for visiting the sites, the HCF algorithm requires that the site that

is visit
MT 5'
site tin.
with .\'

is surnr

/_ .._- Banana _ _

26

is visited next be the one which causes the largest energy reduction. Additionally,
HCF starts by initially assigning a NULL (non-committal) label to each site. The
site commits to a label only if it gains enough conﬁdence in its decision. Cliques
with NULL labels contribute zero potential. The HCF algorithm for discrete labels

is summarized in Figure 2.3.

 

HCF Algorithm for Discrete Labels
1. Initialize all sites to NULL label.

2. Compute a stability measure for each site t as follows.
0 If the site t has a non-NULL label 17,, then
stability : Ut(g,th,y) - “(WM/Vt,”
o If the site has a NULL label, then

stability = U.(g,x,o,,y) - Ui(i,x/v,,y)

where
g : argmin Ut(l,XA/’t,Y)
16 A
i : argmin Ut(l,xN't,y)
16 AJ 75 g
3. Repeat

(a) Select the site with minimum stability
(b) Change its label to the one which corresponds to the lowest energy.
(c) Recompute stability for the site and its neighbors.

 

 

until (stability 2 O) for all sites.

 

Figure 2.3. HCF site labeling algorithm for discrete labels.

FIRM“:

edﬂt’ si

“man
DIXEI \‘a

mulch“!

lattice {,

Edge site

Pixel site

 

Figure 2.4. Pixel and line (edge) sites.

 

(a) (b)

Figure 2.5. Second-order neighbors of edge sites: (a) neighborhood of a horizontal
edge site; (b) neighborhood of a vertical edge site.

2.4 Markov Random Fields on Edge Labels: Line

Process

Geman and Geman [38] introduced the notion of line variables (sites) to augment
pixel variables (sites) in order to model spatial discontinuities in the image. The
introduction of this notion initiated a great deal of reserach in the area of MRF
modeling for context [74, 10, 87, 20, 81]. The line variables are placed in a. dual

lattice (vertical and horizontal sites between the pixel sites) as shown in Figure 2.4.

The stochaﬁlt'
Consider It
indicate wheth
lattice consist it
(“my represt'ti
a pixel prtu e»
use indirect inc
irelglrliutlluml (
ff edge sites lg.
we use for the t
in Figure 3,3,
The line [iri
assumptions ai.
tress [Fix] ﬂ“. ,
and the corms,“

Th
e a Walt rior

Tl"
16 functions l.

28

The stochastic process deﬁned by the line variables is usually referred to as a line
process or edge process.

Consider the image lattice shown in Figure 2.4. The edge sites in the lattice
indicate whether an edge exists between the two pixels it separates. Let L be the
lattice consisting of only edge sites. Let L = { L1, L2, ...LK} be the K-tuple random
vector representing the labels at the K sites of the edge lattice, [3. Note that while
a pixel process is measurable (observable), a line process is not. It is common to
use indirect measurements for edge sites in terms of measurements at pixels in the
neighborhood of the edge site [19, 58]. We denote the indirect measurements at the
K edge sites by the random vector W : {VV1,lV2,..., WA}. The neighborhood that
we use for the edge process to specify edge constraints in Chapters 4 and 5, is shown
in Figure 2.5.

The line process is very useful for edge detection because it represents natural
assumptions about the structure of physical edges such as continuity and smooth-
ness [58]. The expressions for the a posteriori probability distribution of edge labels
and the corresponding energy function can be derived analogous to that in Section 2.2.

The a posteriori energy function is given by

K

UUIW) = Zt—Intftwtiltni + Z Vc(1)- (2.10)

tzl C60

The functions Vc(l) represent the contextual information in the edge labels. The edge

cliques and the constraints imposed are discussed in detail in Chapter 4.

2.5 Coupled Markov Random Fields

A Markov random ﬁeld deﬁned on the coupled lattice of pixel and line variables

shown in Figure 2.4 is known as a coupled Markov random ﬁeld [38]. A commonly

Figure 2.6. .\ei

zontal edge sit «-

used neighborh

Figure 2.6 [33.
edge constraint

follovi'ing eonsti

SmOot hing arm

constraints on t
The lattices

We wi 11

denot e
process and L

IS represented l

‘29

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2.6. Neighborhoods of pixel and line (edge) sites for (a) pixel site, (b) hori-
zontal edge site, and (c) vertical edge site.

used neighborhood system for defining coupled Markov random fields is shown in
Figure 2.6 [38, 74, 19]. Coupled Markov random ﬁelds help specify the pixel and
edge constraints simultaneously. The addition of the line process helps to specify the
following constraints: (i) the values of line variables can be used to avoid undesirable
smoothing across discontinuities, and (ii) the cliques of the line process can specify
constraints on the structure of the discontinuities.

The lattices of pixel sites and line sites will be denoted by S and L, respectively.
We will denote the coupled process by a pair (X,L), where X represents the pixel
process and L represents the line process. A point in the sample (solution) space

is represented by (x,l). Usually, only the pixel process is measurable (observable).

‘Ab
f<‘.— D-‘t‘ I
,4‘"

Figutt‘
The line rm» 0
solution spam-
expression for

form to Equa‘.

F

or the coup}

the expressior

because nieas

2.6 1V1!

\flfffl {Huh if)

hw-
“'9 a multi‘

could be eitl
' J

30

Intensity label Intensity measurement

    
 

    
 

Range labe

.‘,;'/‘I"(..' " ’4' "'
.

Range Measurement

J-‘n,"“l'/ ‘ 1.,“
r f -/ 1.,

 

Edge site

 

Labels ( X, L) Measurements Y

Figure 2.7. Multivariate MRF: the label and measurement variables.

The line process is not observable and only helps to specify the constraints on the
solution space. we denote the measurements at the pixel sites by a vector y. The
expression for the a posteriori energy function for the coupled process is similar in

form to Equation (2.7) and is given by

U((X,1)IY) = Zl—ln(f(yz|1:))l+ Z Vc(XJ)- (2-11)

tES CEC7

For the coupled process, the cliques in set C contain both pixel and line sites. In
the expression above, the line process only helps in specifying contextual constraints

because measurements y are present only at pixel sites.

2.6 Multivariate Markov Random Fields

When multiple data sources (sensors) are used to measure the scene properties, we will
have a multivariate observation at each site. The corresponding pixel label process

could be either multivariate or univariate depending on the labeling problem. For

exazri;
corres
and it
{rel UT!
{Piaf l!

.1,
Tstt

in Sect
lUIliVar

li‘.£1r

1

p05“, rli

2.7

In this (
ﬁEidg‘ a
the Bay.
pm” di>
proldfﬁli

reviewed

31

example, in multispectral image classification problems, the pixel labels are univariate
corresponding to the different categories, whereas in the problem of restoring range
and intensity images, the pixel labels are multivariate, with one label each for the
restored range and intensity values at the corresponding site. Figure 2.7 shows the
relationship between the label and measurement variables for the problem of fusing
range and intensity images.

A Markov random field in which each site has more than one label is known as a
multivariate Markov random Field [71]. The definitions and the estimation methods
described in Sections 2.1—2.3 extend easily to the multivariate case. We will only
mention here our notation for such MRFs. The set of variables on which the MRF is
defined will still be represented by (X, L) as in the case of the simple coupled process
in Section 2.5. For the purposes of this thesis, it is sufficient to use only a single
(univariate) line process L. We will denote the k” measurement from all sources by
Y", and the kth label at all locations by X". We defer a detailed derivation of the a

posteriori probability distribution to Chapter 6.

2.7 Summary

In this chapter, we first presented deﬁnitions of univariate Markov and Gibbs random
fields, and gave notations for coupled and multivariate MRFS. We also reviewed
the Bayesian formulation for combining sensory measurements and spatial contextual
prior distribution and described some commonly used estimators for the site labeling
problem. The HCF algorithm which is used in Chapter 4 for edge detection was also

reviewed.

CHAPTER 3

Survey: MRF Models in

Computer Vision

Computer vision literature contains several problems for which Markov random ﬁelds
models have been used to obtain a solution. These problems can, by and large, be
grouped into two categories: (i) MRF is used to model the contextual information
to enhance decision making algorithms, and (ii) MRF is used to describe the image
using a few parameters so that the image can be modeled and synthesized. In this
thesis, we are mainly concerned with modeling contextual information. Another useful
classiﬁcation, which we will use here to describe signiﬁcant work in the ﬁeld, is the one
based on the image domain in which MRF has been put to use [60]: textured images,
gray level images, range images and multi-sensor images. A pictorial depiction of this

classiﬁcation is shown in Figure 3.1.

3.1 Textured Image Analysis

Texture analysis is an important and very difficult problem. The fundamental difﬁ-
culty in texture modeling is that no one model of texture formation can account for

the large variability of texture types. Figure 3.2 shows some texture patterns from

32

 

lm'dge A

33

Textured
Images

—

Gray Level

 

Image Analysis —

Imagos

_ Range
Images

— Images

 

L Miscellaneous —E

Multi—Sensor _

Toxturo classification
Toxturo synthosis

Toxturo sogmontation

Imago Rostoration
Imago Sogmontation

Rdgo Dotoction

Imago Rostoration

Edgo Dotoction

Color Imagos
Multispoctral Imagos
Rango and Intonsity Imagos

Imago Intorprotation

Spocklo imago sogmontation

Figure 3.1. A classiﬁcation of MRF application areas.

J_--2-—;J‘—’

thim
bl (3x

34

 

Figure 3.2. Several texture patterns from the Brodatz’s album [14]: (a) reptile skin,
(b) expanded mica, (c) pressed calf leather, and (d) water.

the Brodd
[he {8&1 “1
Approa
tural app“
attributes O
techniqm‘S t
View a text!
speciﬁes him
only for text
mathematir'a
cal neighlmrli
model-based a

i. » '
oeen extensm

3.1.1 Tex

Texture classifi

a singl

e texture
in any classiﬁre

ClassfiCat ion rul

model pararnete

have been deri w

t -- '
he \quOUS other

 

35

the Brodatz’s album [14] to illustrate the variety of texture patterns encountered in
the real world.

Approaches to texture analysis can broadly be classiﬁed as feature-based, struc-
tural approach and model-based [3]. Feature—based approaches extract numerical
attributes or features from a textured image and use statistical pattern classification
techniques to classify or segment the texture regions [46, 61]. Structural approaches
view a textured image as generated by a set of rules or a formal grammar which
speciﬁes how texture primitives are arranged [3]. Structural approaches are suitable
only for textures with high degree of regularity. Model-based approaches provide a
mathematical model to capture the statistical dependence among the pixels in a lo-
cal neighborhood [25]. Unlike feature-based approaches which are descriptive only,
model—based approaches can both describe and generate textures. MRF models have

been extensively used in the following texture analysis tasks.

3 . 1 . 1 Texture Classiﬁcation

Texture classiﬁcation deals with the problem of classifying an image consisting of
a single texture into one of several possible categories of textures. As is typical
in any classiﬁcation problem, this involves the selection of a feature vector and a
classiﬁcation rule. A textured image can be modeled by an MRF and the estimated
model parameters are then used as features for classiﬁcation [29]. Textural features
have been derived from both continuous MRF [23] and discrete MRF [17]. Some of
the various other texture features used in the literature are, gray level co-occurrence

features [46], fractal features [65] and Gabor ﬁlter features [61].

3.1 .2 Texture Synthesis

The goal in texture synthesis is to generate (synthesize) an image with desired tex-

Iran.- .3

all

-‘uL -

Figure 3

l6\

grar

36

 

 

Figure 3.3. Texture patterns generated using a second-order discrete MRF model (4
gray levels).

 

 

or":
rexl
2at€>
Hou't

as X'fri

3L1”?

In [extur
regions W
into differ
have been
spatial dist)
within a tm
hierarchical a
of segmental}
textured regio
algorithm base!
into many regim
parameters. T};

usmg this all/m.

37

tural properties using only a few parameters. Texture synthesis is useful in computer
graphics and animation. MRF models have been suggested for synthesis of textured
images [47, 25, 27, 17]. Figure 3.3 shows examples of textures generated using second—
order Derin—Elliott discrete MRF models [27]. The idea of using MRF models for
texture synthesis is appealing because, the problem of synthesizing a texture trans-
lates to the problem of sampling an MRF distribution which is fairly straightforward.
However, usefulness of MRF models for real world texture synthesis problems such

as virtual reality and data compression has not yet been demonstrated.

3.1.3 Texture Segmentation

In texture segmentation, the given image is assumed to contain several homogeneous
regions with different textures. The problem here is to separate and classify pixels
into different regions and is more difﬁcult than texture classiﬁcation. MRF models
have been used in two different capacities to attack this problem: (i) to model the
spatial distribution of region labels, and (ii) to model the distribution of gray levels
within a textured region. Derin and Elliott [27] and Manjunath et a1. [70] use a
hierarchical approach which applies a top-level MRF to incorporate spatial continuity
of segmentation labels and a separate MRF models the gray level distribution for each
textured region. Bouman and Liu [13] developed a multi-resolution segmentation
algorithm based on the same approach. Cohen and Cooper [22] partitioned the image
into many regions, and each region was tested for homogeneity using estimated model
parameters. They report. very good segmentation results on several natural textures

using this approach.

The pro
The lion
other PT”.
on liomugt'
Where the t
is to get an
information i
in Section 2.2

Estimates of tli

probability (list

MRF mode

    
   

intensity but. f a

real images [30“-

3'22 [Ina
image restorau'

[10] ' .-
S.‘ 0f (lc'graa

QSUIHEd [0 be

38
3.2 Gray Level Image Analysis

Three of the most important tasks in the analysis of gray level images (without
texture) are: restoration, edge detection, and segmentation. All these problems are
instances of the general pixel labeling problem. MRF models are used to enhance
the performance of these algorithms by incorporating contextual information in the

labeling process.

3.2.1 Image Segmentation

The problem of image segmentation is to partition an image into homogeneous regions.
The homogeneity prOperty could involve the pixel gray levels, or texture, or some
other property. Here, we will restrict the discussion to partitioning an image based
on homogeneity of gray levels. This problem has been cast as a pixel labeling problem,
where the desired labeling is the true region map. The segmentation problem then
is to get an estimate of the true region map from the observed image. Contextual
information is incorporated in the prior term in the Bayesian formulation introduced
in Section 2.2 (see Equation 2.7) and is modeled by a Markov random ﬁeld. Various
estimates of the true labeling (MAP, MPM) have been deﬁned using the a posteriori
probability distribution of labels [38, 74].

MRF model-based segmentation algorithms perform well on regions of uniform
intensity but fail when the intensity in a region has a gradient, as is often the case in

real images [30].

3.2.2 Image Restoration

Image restoration methods attempt to recover the “true” image from an observed
noisy or degraded image. A mathematical model of the degradation phenomenon is

assumed to be available. The contextual information is present in the form of spatial

 

-_. - , I “‘J ..

 

39

dependency among gray levels. Nearby pixels tend to have similar (closer) gray levels.
Markov random ﬁelds are appropriate models for this contextual information.

Image restoration is another instance of the pixel labeling problem. One signiﬁcant
difference between image segmentation and image restoration, is that, in restoration,
the number of labels in the recovered image is equal to or greater than the number
of gray levels in the corrupted image, whereas in segmentation, there are fewer labels
than gray levels. A more fundamental difference is that segmentation labels are
symbolic whereas labels in a restoration problem are numeric values corresponding
to the gray levels in the image.

Discrete MRFS have been used for restoration by Geman and Geman [38], Be-
sag [6], and Marroquin [74]. Geman and Geman used a hierarchical model consisting
of the usual intensity process indexed by pixels, and an unobservable line process
indexed by imaginary edge sites between pixels as discussed in Section 2.5. Jeng and
Woods [62] have extended this work to continuous MRF models. Continuous MRFS
are more appropriate for restoration, because usually, the undegraded pixel values are
continuous in the real world. A popular class of continuous MRFs is the compound
Gauss Markov random ﬁeld model used by Jeng and Woods as well as by Simchony et
al. [88]. This model has the advantage that sampling the conditional distribution is
straightforward. Some impressive image restoration results have been obtained with
MRF models on images of man-made objects [38, 88, 10, 74]. However, they do not
have any apparent advantage over methods such as Wiener ﬁltering [40] for natural

scenes because of the less signiﬁcant role played by edges in natural scenes.

3.2.3 Edge Detection

MRF model can be used for edge detection in one of the following two ways: (i) as
a by-product of image restoration, (ii) as a direct labeling problem. In both cases,

the edge sites are deﬁned to lie between two adjacent pixels as shown in Figure 2.4.

 

 

Daﬂy
paren
\\
usual
and a
exinn
a dne
randoi
labels
Wt“ SPt‘
a nien
0f {E‘Slt
labeh ]
agmal
Shnthoi
sian .\ll
equ‘Vale
(hot
Slt‘p ibrlgt
m Equat
“li90] '
than for -
Ollie] llim

induded,

40

Daily [26] has argued in favor of hexagonal grids for restoration because of the ap-
parent reduction in distortion for edges of arbitrary orientations.

When edge labeling is obtained as a by-product of image restoration, the model
usually consists of a hierarchical or coupled MRF with an observable pixel process
and an unobservable line process [38]. The labeling problem is posed as the joint
estimation of the pixel labels and the edge labels. When edge detection is posed as
a direct problem, the prior distribution of the edge labels is modeled by a Markov
random ﬁeld. Since there is no direct observation at the edge sites, support for edge
labels is derived from the intensity (or any other image attribute whose discontinuity
we seek) values in the neighborhood of the edge site and expressed in the form of
a likelihood of that label. Geman and Geman [38] obtained edges as a by—product
of restoration using a hierarchical MRF model. They experimented with two edge
labels {edge, no edge] and four edge labels {straight edge, left diagonal, right di-
agonal, no edge} on some simple but noisy images and obtained impressive results.
Simchony et al. [88] used a simple line process with only singleton cliques, and Gaus-
sian MRF model for the pixel process for edge detection and showed that this is
equivalent to Blake’s [10] weak membrane formulation.

Chou et al. [20] posed edge detection as a direct labeling problem and used a
step edge model to compute likelihood of edge labels ( f (wtledge) and f (wtlno-edge)
in Equation (2.10)) at each line site. The same model was also used by Swain et
al. [90]. The direct labeling problem is easier since the search space is much smaller
than for the restoration problem, which reduces the computational burden. On the
other hand, this approach has the disadvantage that, because the pixel process is not

included, the model may not specify the constraints accurately.

 

 

Mu

[Ere

41
3.3 Range Image Analysis

Range images require processing techniques different from those for intensity image
analysis [59]. This is primarily due to the physical interpretation of measurements.
For example, range image segmentation techniques aim at separating different sur-
faces from one another unlike in the case of intensity images, where, by and large,
segmentation is based on uniformity of the pixel values with the hope that they cor-
respond to uniformity in some physical property in the real world. Most of the range
image operations are dictated by the need to preserve the properties of geometric
surfaces which include the accurate speciﬁcation of its boundaries. For many applica-
tions, surfaces are assumed to have zeroth (continuity of depth values) and ﬁrst-order
(continuity of surface orientation) continuities. Discontinuities of these two types are
known as jump and crease edges, respectively. Range image edge detection, there-
fore, necessarily involves the detection of both jump and crease edges. Range image
restoration should ideally take both these discontinuity types into account, so the
surfaces are not smoothed across these edges.

MRF models have not been applied widely to range image analysis problems. MRF
models have only been used for sparse and dense depth restoration problems with
jump edges [74, 20], although the model is sufﬁciently general to deal with crease edges
as well. Bilbro and Snyder [9] have used an MRF model with a large neighborhood
for range image restoration. The model does not incorporate discontinuities directly,
although the large neighborhood and the clique potential functions used are claimed

to preserve the edges during restoration.

3.4 Multi-Sensor Image Analysis

Multi-sensor image analysis techniques deal with the fusion of information from dif-

ferent sources. Data fusion allows better decision making by relying more on sensory

 

 

("*1

do.

(’3

ple
like
Wt):

”101

42

information and less on a priori assumptions which may or may not be accurate. By
incorporating diverse sources of sensory information, data fusion reduces the uncer-
tainty in the decisions.

A variety of methods exist for achieving data fusion. These methods depend on the
nature of the information sources, the relation between these sources, and the desired
output. For example, the type of processing required to combine two independent
intensity images of the same scene to obtain a segmentation into uniform intensity
regions is much different from that required for combining thermal and visual images
of a scene to obtain segmentation in terms of uniform heat flux regions [82]. Clark
and Yuille [21] give a good survey of various information fusion methods.

Markov random field models can model the contextual information for image data
fusion. A site is assumed to have a vector of observations as shown in Figure 2.7.
MRFs can be easily generalized to handle vector observations. In the following sub-
sections we describe some speciﬁc fusion problems which have been solved using MRF

models.

3.4.1 Color Image Analysis

Color images are formed by sensing the visible electromagnetic spectrum at three
primary wavelengths corresponding to red, green and blue (650, 530, 410 nanometers).
Each of these three images consist of responses of ﬁlters whose characteristics are
designed to be close to those of the corresponding photoreceptors (cones) in human
eyes[86]

The key idea in color image analysis is to use the correlation between different color
planes (or component images) to aid in decision making. It is well known that humans
use both color and intensity information to discriminate objects more effectively than
would be possible with intensity alone [86]. Wright [100] uses a simple correlation

model which penalizes conﬁgurations in which the labels in the three images at the

43

same edge site are different. This model does not consider the correlations that
exist among the red-green—blue components at a single pixel. Nevertheless, very good
restoration results were obtained by Wright [100] on synthetic images.

One can also take the correlation between different sensor values at a pixel into
account, in addition to those at the discontinuities. Daily [26] incorporates pixel
correlations implicitly by using a color difference measure that purports to enhance
discriminability. A multivariate MRF model with coupled univariate line process (see
Section 2.6) is used to model the joint prior distribution of pixel and edge site labels.
Smoothing of adjacent pixel values is suspended in all three color planes when an
edge is present at an edge site. Daily [26] obtained very good restoration and edge

detection results using this model on natural images.

3.4.2 Multispectral Image Analysis

Multispectral images are different from color images in that the range of wavelengths
sensed is broader than the range of visible light wavelength, and the speciﬁc fre-
quencies usually do not correspond to the red, green and blue channels used in color
images. The goal of multispectral imaging is the segmentation and classification of
pixels. It is a very useful imaging technique for remote sensing applications. Early
methods for multispectral image segmentation/classiﬁcation ignored the spatial de-
pendency of segmentation labels and simply used the different spectral observations
at each pixel to classify the pixel. Switzer [91] incorporated some spatial dependency,
but the relaxation method used ignores pixel observations once initial classiﬁcations
are obtained independently at each pixel, and then uses only spatial information of
labels. Zhang et al. [103] used a multivariate MRF model for segmenting multispec-
tral images. They used a simple model without a coupled line process where the
correlation between different values at a single pixel was accounted for by assuming a

multivariate normal distribution for the noise. The mean vector and covariance ma—

44

trix for each class were estimated using training samples. Note that for many remote
sensing applications, the additional constraints that can be enforced by using a. line
process may be of little use because of lack of structure in the edges ( this is not true
for the image of a city or farm, however). The images used in Zhang et 0.1. had no
apparent structure in the edges. They obtained 53—10% improvement in classiﬁcation

accuracy over a non—contextual classiﬁer on four real images.

3.4.3 Fusion of Range and Intensity Images

Combining range or depth measurements and intensity measurements of a scene can
be useful in two ways: (i) it can provide a robust solution, and (ii) it can provide
information which can not be derived from either image alone. Robustness is provided
by redundancies. It is very likely that an actual discontinuity in the physical world
produces discontinuities in both range and intensity images. So the effect of noise can
be minimized by using both the images simultaneously in image analysis. A range
image provides only depth measurements of the scene and may not be sufﬁcient
to differentiate, for example, a Pepsi can from a Coca-Cola can since the depth
measurements are the same for both objects. An intensity image provides additional
features such as the printed characters on the objects which enable the recognition
of the objects accurately. An intensity image is sensitive to ambient light, shadows
and highlights. Therefore, a robust and accurate recognition system can be achieved
by combining the two types of information.

Chou [19] used a coupled MRF model for combining intensity and (possibly sparse)
depth observations, to reconstruct depth. Chou used intensity edges as partial support
for the existence of a depth discontinuity at the corresponding location in the depth
map. This is achieved by deviating from the usual coupled MRF approach in which
line sites are assumed to be hidden with no direct observation; Chou derived the

likelihood of edge labels for each site from the intensity image as the support for edge

45

labels in the coupled MRF corresponding to the depth image. This innovative use of
the edge likelihoods allowed working with only a single coupled MRF model rather
than two coupled models (one each for intensity and depth images) or a multivariate
coupled MRF model that would have been necessary otherwise. The integration of
intensity and range data has been shown to produce good surface reconstruction even
when the range data is sparse. Bayesian formalism performs better in combining
different sources of information than methods that use ad hoc rules to combine the

individual segmentations [43].

3.4.4 Integration of Early Vision Modules

Understanding a complex information processing task such as vision calls for a mod—
ular approach [73]. This is partly because there is psychological evidence that bio-
logical vision also is somewhat modular [96] owing perhaps to evolution of different
perceptual capabilities at different times and partly because it seems impossible to
understand the complex process of vision as a whole. Therefore, the bulk of the com-
puter vision research has concentrated on specialized modules (shape, texture, etc.)
that provide a certain capability of the vision system. It is obvious that somewhere
along the information flow path, information from these different modules must be
combined for a more complete interpretation of the scene [97].

Gamble et a1. [35] experimented with integrating results from stereo, motion, color
and (intensity) edge detection modules. Markov random ﬁelds model the mutual con-
straints between different modules. A separate MRF lattice for each module is coupled
to the intensity image at the location of discontinuities. The coupling is in the form
of a term in the energy function of each MRF which is proportional to the bright-
ness change in the intensity image at that location. The idea is that discontinuity in
any module usually results in large intensity changes. The coupling term seems to

have been chosen heuristically. Results are presented for only two images and they

46

seem reasonable. Gamble et al. also propose a more general framework which labels
the discontinuities in terms of their physical origin, but no experimental results are

presented and not enough detail is presented to judge the feasibility of this approach.

3.5 Miscellaneous Applications of MRF

Some Markov random ﬁeld applications do not come under any of the above cate-
gories. Notable exceptions are: (i) modeling of complex data in speckle images [28];
(ii) image interpretation [76]; and (iii) segmentation and recognition of tinker toy
descriptions [24].

Derin et a1. [28] used a hierarchical MRF model similar to those used for segmen-
tation of gray level and textured images, to segment a speckled image. Speckled noise
is observed in synthetic aperture radar (SAR) and laser images and are characterized
by reflections from a rough surface illuminated by coherent light. The received signal
is represented by two components— an inphase component and a quadrature compo-
nent which together form the complex amplitude data at any pixel. The intensity
image corresponding to this complex data is the squared magnitude at each pixel and
this is the data that is commonly used for image analysis. The intensity image in
the presence of speckled noise exhibits a granular appearance and hence the name
speckle images. Derin et al. show that the phase part of the complex data, contrary
to its visual appearance, carries useful information for segmentation purposes. The
top-level in their hierarchical approach models the spatial continuity of regions. The
speckle within a region is modeled by a bivariate Markov random ﬁeld. The joint
distribution of complex amplitude data in a region is shown to be a second-order bi-
variate Gaussian Markov random ﬁeld. They demonstrated the performance of their
method on many synthetic images as well as a few real SAR images. The segmenta-

tion results are very good for all the test images and the use of complex data results

47

in signiﬁcantly better results than when only amplitude data is used.

Modestino and Zhang [76] used an MRF deﬁned on an adjacency graph of seg-
mented regions for the problem of image interpretation. Image interpretation is con-
cerned with assigning interpretation labels, I = {11, ..., IN} to each of the N disjoint
regions found in a segmented image. Each I,- takes one of the possible interpreta-
tion labels {L1, ..., LM}. The MRF model is justiﬁed here because the interpretation
attached to a region depends largely on the interpretation of the regions that are
adjacent to it, and less so on regions that are not adjacent to it. Modestino and
Zhang obtain the MAP estimate of the interpretation labels using simulated anneal—
ing. The interpretation results are reasonable, but results on only two images have

been reported.

3.6 Related Approaches

In addition to the regularization approach mentioned in Chapter 1, various other
paradigms have been proposed in the literature to incorporate contextual dependency.
Two important ones are relaxation labeling [54] and anisotropic diffusion [83].
Relaxation labeling is an iterative technique for propagating local constraints on
labels of neighboring sites. Associated with each site, t are weights pt(l) 6 [0,1], for
each possible label, I. Contextual knowledge is represented by compatibility functions,
rt,(ll, 12) which quantify compatibility of label II at site t with label I; at a neighboring
site 3. The weights for labels at each site are updated in a locally parallel fashion
based on current weights and the compatibility functions according to the following

equation [101]:
Z: Pt(l)(K)l1 + Athlml

 

Ptfl)

where K is the iteration number. The iterations are stopped when the weight for

some label is 1 (or is close to 1) at all sites. The sites are labeled with the label for

48

which the associated weight is the highest among all other labels at that site. There
are two notable drawbacks to this procedure. The weights are difficult to interpret
since they have no semantics attached to them (even if they have one initially, it
will be lost after a few iterations). Secondly, after the initial assignments of weights,
the measurements at each site are ignored and so, it is likely that later iterations
lead the solution farther away from the observations. In other words, the two forms
of evidence are not combined consistently and there is a possibility of ignoring one
form of evidence. Notice that Markov random ﬁeld contextual models, when used in
a Bayesian framework deal effectively with this problem by using the Bayes’ rule to
combine evidence.

Anisotropic diffusion is a paradigm used for computing a multiscale representa-
tion [99] of images. It is achieved by computing a series of images indexed by a scale

parameter, 1, using the anisotropic equation [83]

May) = div(€(ar,y, t) V I(16.31)) = C(rcay,t)AI(-'r,y) + V0 V 1013,31)

where, div is the divergence operator, V and A are the gradient and Laplacian op-
erators, c(2:,y, t) is the conduction coefﬁcient and I, is the image attribute at scale
t. An important requirement for multiscale representations is that the edges and
their location at small scale (high resolution) should be preserved at larger scales
(low resolution) . This is equivalent to enforcing smoothness constraints across the
image except at the location of discontinuities. Perona and Malik let the conduction
coefficients C(x, y, t) depend on the brightness gradient at each location, so diffusion
(Gaussian smoothing) is discouraged between pixels that have large brightness gra-
dients between them by decreasing the conduction coefficient between the two pixels.
They have obtained good edge detection results using this method. They Show that

anisotropic diffusion also minimizes an energy function, and therefore, is a special

49

case of the MRF model. The limitation of this method is that it is applicable to a

very restricted class of contextual models [83].

3.7 Summary

In this chapter we looked at two categories of computer vision problems in which
MRF models have been applied. In the ﬁrst category, MRF models are used to math-
ematically describe image textures using only a few parameters. Only limited success
has been achieved in modeling image textures using MRF models. The issues in
texture modeling using MRF are: (i) speciﬁcation of the model in terms of neighbor-
hood size and clique parameters (ii) robust estimation of parameters and (iii) test of
goodness-of—ﬁt between the data and the model. Several studies which use MRF pa-
rameters for texture classiﬁcation report good classiﬁcation results on a limited class
of textures. However, for texture synthesis and texture segmentation problems, it is
not clear whether MRF models result in superior performance compared to related
approaches.

MRF models have also modeled the spatial a priori information or contextual
information in order to enhance the performance of decision making algorithms in
a wide range of image analysis problems such as restoration, segmentation, edge
detection, and multi-sensor fusion. Performance results reported using MRF model-
based methods for image segmentation and edge detection range from reasonable
on real images to impressive on synthetic images. The relationships between MRF
models and two other paradigms that incorporate contextual information, namely,
relaxation labeling and anisotropic diffusion are brieﬂy examined. The generality of

MRF model-based approach is discussed.

h,
(It
lit]
for

det

CHAPTER 4

Edge Detection in Range Images

There are two common approaches for segmenting range images into surface patches:
region-based [50, 8] and edge—based [31, 75]. Region-based segmentation schemes
group ‘similar’ pixels to form surface patches. Edge-based segmentation schemes
utilize the dissimilarity in properties of adjacent surfaces to compute the surface
boundary map of the image. Closed boundaries deﬁne a surface patch. Jump and
crease edges have been widely used for segmenting range images. A jump edge in
range images is caused by discontinuities in depth values, whereas a crease edge is
formed by the discontinuity of surface normals. Crease edges are more difﬁcult to
detect than jump edges [32].

Region-based schemes assume that regions or surfaces in range images can be
globally approximated by an analytic function [8]. Edge-based schemes need only to
approximate the surfaces near the discontinuities. Region-based methods guarantee
closed boundaries for regions, a feature that is missing in edge-based methods. Edge-
based schemes have the advantage of relying only on local information to separate
surfaces and therefore, need not approximate the entire surface by a global analytic
function. By suitably combining edge-based and region-based approaches, one could
realize the advantages of both methods [102, 58].

In this chapter we develop an MRF model-based approach to detect jump and

50

51

crease edges in range images. The edge detection algorithm has been combined with
the Hoffman and Jain’s [50] region-based scheme resulting in a hybrid algorithm for
segmentation. Our segmentation scheme is compared with the Hoffman-Jain segmen-

tation scheme using several range images.

4.1 Edge Detection as Bayesian Estimation

We View the problem of edge detection in range images as a site labeling problem.
This approach is different from edge relaxation methods [3] in the way contextual
information is combined with measurements. In the site labeling approach, these two
types of information are combined consistently using the Bayes’ rule, whereas in edge
relaxation methods, data values are ignored after an initial labeling is obtained and
therefore, the ﬁnal edge labeling may not be consistent with data.

Our approach to edge detection in range images is similar to the one used by Chou
et al. [20] for edge detection in intensity images. Each edge site can be labeled as
either EDGE or NON-EDGE.

An important difference between site labeling problems that deal with pixel sites
and those that deal with edge sites is that there is no direct measurement or obser-
vation available at an edge site since the edge sites are deﬁned to lie between two
adjacent pixels. We circumvent this problem by using indirect measurements in the
form of edge strengths computed from the range values at pixels in the neighborhood
of the edge site. As mentioned in Section 2.4, we let w = {w1,w2,w3, ...,wK} rep-
resent the indirect measurements and l = {11,12, 13, ..., 1K} represent the edge labels
at the K edge sites. The neighborhood shown in Figure 2.5 is used here to deﬁne
the a priori distribution. Figure 4.1 shows all possible clique types for this neigh-
borhood. To completely specify the a posteriori distribution (see Section 2.2), we

need the conditional distribution f (willt), and clique potential functions Vc(..) For

52

 

 

 

 

 

 

C: :1
= E:
E
(a) (b) (C) (d)
E E
:1" [I I jh L J U U
— n
E E
(e) (f) (g) (m

Figure 4.1. Edge clique types for a second-order neighborhood.

the case of binary edge labels {EDGE, NON-EDGE}, we only need the likelihood

 

ratio 1(wiiﬁudﬁggﬁﬁ at each line site and the clique potential functions VC(.) [20].

4.2 Computation of Edge Likelihoods

The edge strengths are derived from the range values observed at pixels in the neigh-
borhood of the edge site. We deﬁne a statistic which indicates the edge strength at an
edge site and derive its distribution. Obviously, two such statistics will be required,

one each for jump edges and crease edges.

4.2.1 Jump Edge Statistic

We propose the absolute value of the difference in the average depth values on the
two sides of an edge site as a statistic for testing the hypothesis that there is a jump

edge at that site. If the N depth values in the neighborhood of an edge site t are

53

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1:] 2 I I 1 I -+1 N—l N
2 ' ,
R1 R2

Figure 4.2. Pixel numbering in the neighborhood of a vertical edge site. A similar
numbering scheme is used for horizontal edge sites.

numbered as in Figure 4.2, then the jump edge statistic J,, at edge site t, is given by
2 N
J, : |N( d, — E: d,)|, (4.1)

where d,- is the depth value at the pixel i (see Figure 4.2) and [I is an absolute value
operator. A very large value of N reduces the localization accuracy whereas a very
small value increases sensitivity to noise. We used two pixels on either side of an
edge site to compute the jump edge statistic, so N = 4. The null hypothesis is that
no jump edge exists at a site, and the alternative hypothesis states that a jump edge

exists at that site.

1. Distribution under the null hypothesis: Assuming independence of depth mea-
surements on the two sides of an edge site, the distribution of statistic J under
the null hypothesis, fJo, is a t-distribution [78] with % degrees of freedom, mod-
iﬁed such that negative values are not allowed. For computational reasons, we
have approximated the t-distribution by a normal distribution. The density
of .7: can then be computed using the corresponding 2 statistic for a standard

normal distribution given by

J:

\/(4/N)variance(Jt)’ (4’2)

 

z:

 

54

f(J|EDGE)

 

 

k
r

Sd (TJ) maxI J J

Figure 4.3. Distribution of J under the alternative hypothesis.

The variance of J, can be estimated either from a sample range map from the

sensor, or from the given range image itself.

2. Distribution under the alternative hypothesis: The computation of distribution
of the statistic J under the alternative hypothesis (fJA) is complicated by the
composite nature of the alternative hypothesis. Based on the intuition about
how physical objects are arranged, we assume that any depth difference greater
than a speciﬁed minimum depth difference (taken here to be the standard devi-
ation of J, sd(J)) is equally likely. Figure 4.3 shows the assumed distribution
of J, given that a jump edge exists at the edge site. We have scaled the depth
values to lie between 0 and 255 so the value of max J is 255 for the jump edge

statistic.

The jump edge likelihood ratio LJ(t), at edge site t, can then be computed as

: fJAUt)
fJofjt) ,

 

LJU) (4.3)

where j, is the observed value of the statistic Jt.

55

4.2.2 Crease Edge Statistic

The change in direction of the surface normal determines the likelihood of a crease
edge at an edge site. Surface normals were computed by using only four of the eight
neighbors closest in depth value to the center pixel along with the center pixel in order
to reduce the undesirable effect of using pixels that do not belong to a single surface.
We deﬁne the crease edge statistic Ct, at edge site t, using the pixel numbering scheme

given in Figure 4.2, as
N/2 N

2
c. = ”75(an 2 min. (4.4)
‘:1 i=g+l
where n,- is the unit vector in the direction of the surface normal at pixel i and H.”
denotes the magnitude operator. The value of N = 2 was used to compute the crease

edge likelihoods because the use of larger values of N tend to encourage detection of

spurious crease edges inside homogeneous curved surfaces.

1. Distribution under the null hypothesis: As in the case ofjump edge statistic, we
approximate the distribution of C under the null hypothesis, foo, by a normal

distribution. This approximation is necessary only for computational simplicity.

2. Distribution under the alternative hypothesis: Intuition suggests that the distri-
bution of C under the alternative hypothesis, fCA, is similar in form to that of
fJA(J); only the maximum and knee values in Figure 4.3 are different and are
set to max C and sd(C), respectively, where sd(C) is the standard deviation

of the statistic C. The value of max C is 2.0 for the crease edge statistic.

The crease edge likelihood ratio Lc(t), at edge site t, can then be computed similar

to LJ(t) in Eq. (4.3) as
___ fC'A(Ct)
fCO(Ct) .

 

140(1) (45)

where c, is the observed value of the statistic Ct.

56

4.2.3 Difﬁculties with Gradients and Discontinuities

The jump and crease edge statistics described above have been deﬁned under the
assumption that, in the neighborhood of an edge site (R1 and R2 in Figure 4.2), the
observations are i.i.d. Gaussian. This assumption can be violated in two ways: (i)
there could be discontinuities within R1 and R2 as in the case of an edge site close
to a boundary between two surfaces; (ii) the true depth values (or surface normal
directions) may not be constant in R1 and R2 (6.9., slanted surfaces for jump edges
and curved surfaces for crease edges). These violations lead to undesirable effects such
as classifying a thick strip of edge sites near a boundary as edges, or identifying edge
sites within regions with high depth or surface normal gradients as edges. Similar
problems are encountered in regularization schemes which use weak constraints [10].

We incorporate two modiﬁcations to each of the two statistics so that the modiﬁed
statistics can be used in a hypothesis test derived under the i.i.d. Gaussian assump-
tion. We ﬁrst replace the statistics J and C by deviates from local averages computed
in a 1 x 5 (or 5 x 1 for vertical edge sites) window of edge sites parallel to the center
site. For edge sites away from the boundary, the local average and the individual
values are close, so that the modiﬁed statistic will take small values for these sites.
For edge sites on a boundary, individual values are higher than the local averages
and the modiﬁed statistics take larger values. The second modiﬁcation performs a
nonmaxima suppression in the same window by suppressing the statistic value at the
center site if it is not either the largest or the next largest value in the window. The
second largest value in the window is not suppressed to account for parallel edges

that are very often formed near corners.

57

4.2.4 Combining Jump and Crease Edge Likelihoods

At each edge site we have two likelihood ratios, one for the jump edge statistic and
the other for the crease edge statistic. How should we combine this information to
obtain the edge labeling? At one extreme, one could have two separate labeling
processes, one each for jump and crease edges which make use of the corresponding
likelihood ratios only. This approach does not account for the interaction between the
two processes and makes it difﬁcult to enforce continuity constraints. At the other
extreme, one could use the two likelihood ratios as components of an observation
vector, specify appropriate cliques and potential functions, and use an MRF based
labeling framework. However, the speciﬁcation of cliques and potential functions is a
difﬁcult problem which is further compounded by a vector of observed values. As a
compromise, we have allowed limited interaction between the jump and crease edge
observations. At each edge site, we replace the two likelihood ratios by the maximum

value, allowing us to work with Markov random ﬁelds with scalar observations.

4.3 Edge Detection Algorithm

Our edge detection algorithm can be summarized as follows.
1. For each edge site

(a) Compute the jump edge likelihood ratio, L].
(b) Compute the crease edge likelihood ratio, LC.
(c) Edge likelihood ratio, LE : max (LJ, LC).

2. Use HCF algorithm [20] given in Figure 2.3 to obtain the desired edge labeling.

The parameters to be speciﬁed in this algorithm are: neighborhood size, potential

function values, and standard deviations of jump and crease edge statistics.

58

— _ =
3.2 0.1 -l.7 10.0 -1.7 -0.08

== == 1:1: =— -_ __

-1.0 5.1 -4.5 -4.5 -1.0 1.0
= _
I I I I - : Edge site labeled EDGE
= ' Ed 9 site labeled NON—EDGE
10.0 30.0 ' 9

Figure 4.4. Cliques and their potential function values determined using an ad hoc
method.

4.4 Clique Potential Functions

The cliques of interest depend on the size of the neighborhood, while the potential
function values depend on the labeling problem at hand. Clique potentials are chosen
based on the spatial structure or constraints that a clique symbolizes. For example,
if both the edge sites of clique (b) in Figure 4.1 are labeled EDGE, the clique poten-
tial should be assigned a small positive or a negative value since this conﬁguration
represents continuity of edges. If only one of the two sites is labeled EDGE, a high
positive potential value should be assigned to discourage the formation of these con-
ﬁgurations since they indicate a break in the boundary. In other words, if the labels
in the clique do not agree with a priori constraints of continuity and smoothness,
then that labeling is discouraged by assigning a high potential value to that clique.

However, these considerations only provide some guidelines for setting the initial val-

59

ues of parameters. We used the potential values used by Chou ct al. [20] as the initial
values of the parameters. The actual values of the parameters were arrived at by
manually altering these parameters based on performance of the edge detection algo~
rithm on GRNBLK3 and HARRISCUP images. The following procedure was used to
select the parameters. At each step in the procedure, only one parameter value was
changed. For a selected parameter, a new value is assigned based on edge detection
error (broken edges, parallel edges, missing edges, spurious edges) on test images.
This procedure was repeated on all parameters until there was no visible change in
the edge detection result. This procedure results in a locally “best” set of parameter
values.

Following Chou et al. [20], we set the potential values to zero for some clique
conﬁgurations which either do not symbolize any important spatial constraints or the
associated constraints are already incorporated using a larger clique conﬁguration.
While determining these clique parameters empirically, it was observed that edge
detection results are most sensitive to parameters of cliques with single edge sites and
least sensitive to parameters of cliques with 4 edge sites. Chou et a1. [20] explain this
behavior by noting that the single edge site cliques control the ﬁrst-order statistics
and cliques of larger size control higher order statistics. It is known that higher order
statistics are less important than ﬁrst-order statistics in characterizing images [63].
The cliques used in our experiments are given in Figure 4.1 and the conﬁgurations

with non-zero potentials are shown in Figure 4.4.

4.5 Experimental Results

Edge detection experiments were performed using images from the range image
database described in Section 1.4. We report results on ﬁve different range images.

The standard deviations of the jump and crease edge statistics (sd(J), sd(C)) were

60

estimated from the homogeneous area of an arbitrary test image (HARRISCUP). The
estimated standard deviations were 1.05 for the jump edge statistic and 0.05 for the
crease edge statistic. The same cliques and potential function values were used for
all the ﬁve images. The edge detection results for these ﬁve images are presented in

Figures 4.5-4.9. Each ﬁgure shows the following details.

0 The original range image in part (a).

o The range image as a pseudo—intensity image in part (b). The pseudo-intensity
image highlights the edges and surfaces in the original image. The pseudo-
intensity value at each pixel is proportional to the difference in angle between

the surface normal at that pixel and the viewing direction.

0 Edge detection result using the algorithm of Section 4.3 with clique parameters

from Figure 4.4 is shown in part (c).

Figure 4.5(a) shows the GRNBLK3 image. This is a fairly simple image to seg-
ment and as shown in Figure 4.5(c), the algorithm detects all the crease and jump
edges present in the image. Figure 4.6 shows the results for the HARRISCUP image.
Our algorithm detects some spurious edges near the bottom of the cup, but all other
edges in the scene are detected correctly. Figure 4.7 shows the results for the COL-
UMNl/GRNBLK3 image. Again, all signiﬁcant edges (edges of length more than a
few pixels) have been found correctly. The crease edge near the top corner is broken
in the edge map indicating the need for postprocessing when surface segmentation is
desired. Figure 4.8 shows the results for the AGPART2 image and Figure 4.9 shows
the results for the COLUMN 2/ ELL image. All the jump and crease edges have been

detected correctly by the edge detection algorithm.

61

 

Figure 4.5. Edge detection results for the GRNBLK3 image: (a) range image; (b)
pseudo-intensity image; (c) edges detected using ad hoc clique potentials.

62

 

Figure 4.6. Edge detection results for the HARRISCUP image: (a) range image; (b)
pseudo-intensity image; (c) edges detected using ad hoc clique potentials.

 

Figure 4.7. Edge detection results for the COLUMNl/GRNBLK3 image: (a) range
image; (b) pseudo-intensity image; (c) edges detected using ad hoc clique potentials.

64

 

Figure 4.8. Edge detection results for the AGPART2 image: (a) range image; (b)
pseudo-intensity image; (c) edges detected using ad hoc clique potentials.

65

 

Figure 4.9. Edge detection results for the COLUMNZ/ELL image: (a) range image;
(b) pseudo-intensity image; (c) edges detected using ad hoc clique potentials.

66
4.6 Hybrid Method for Segmentation of Range

Images

Our edge detection method, like all other edge-based segmentation methods does not
always result in closed boundaries. Closed boundaries are needed to extract surface
patches. An obvious solution is to apply some heuristics to convert the edge map
with broken boundaries into closed contours which represent surface boundaries [31].
However, this reduces the robustness of the segmentation method signiﬁcantly because
small spurious edges may be treated as boundaries if excessive relaxation (continu-
ation) is used to close broken edges. To overcome this problem, we have developed
a hybrid segmentation method which uses our MRF-based edge ﬁnding algorithm
for discontinuity detection and a region-based method for continuity detection. The
outputs of the two algorithms are integrated using a simple approach.

We have used the clustering-based range image segmentation method of Hoffman
and Jain [50]. Their method used six features per pixel in a squared-error clustering
algorithm: 3 pixel coordinate values and the 3 components of the surface normal
at the pixel. Pixel coordinates incorporate spatial information and surface normals
provide surface continuity and smoothness information. The clustering algorithm is
unable to identify the correct number of surfaces; in other words, there is no one
to one correspondence between the clusters and the physical object surfaces present
in the image. The Hoffman and Jain method oversegments the image by specifying
a desired number of clusters which is larger than the expected number of surfaces
present in the scene. A merging step then merges adjacent surface patches or clusters
which are of the same surface type and which are not separated by a jump or crease
edge. This algorithm gives reasonable segmentation results in most cases but on
some images it fails to merge some surface patches belonging to the same physical

surface [50].

67

Our hybrid segmentation algorithm works as follows. The Hoffman-Jain region-
based segmentation algorithm provides an initial segmentation (only the initial clus-
tering step is used). The oversegmented solution produced at this stage has the
desirable characteristic that the resulting clusters or surface patches are almost al-
ways contained in a single physical surface in the scene. Therefore, nearly all true
edges are preserved in the oversegmented solution to within a couple of pixels ac-
curacy. The boundary segments in the oversegmented solution are validated using
evidence from the edges found in the MRF-based edge detection algorithm.

Consider the boundary between an arbitrary pair of adjacent surface patches, p1
and p2, detected by the clustering algorithm. Let NT be the total number of edge sites
along this boundary. Refer to the edge sites on this boundary as candidate sites. For
each candidate site, we verify whether a corresponding edge site in the edge-based
segmentation is marked as an EDGE. In order to allow for localization disparities
between the two segmentations, matching is relaxed to include the two nearest sites
parallel to the current site in the edge-based segmentation. Consider a candidate site,
i, on the boundary of interest. We mark candidate site, i, to have a match in the
edge-based segmentation if any one of the three corresponding edge sites mentioned
above have been labeled as EDGE. Let NM be the total number of candidate sites
that have a match in the edge—based segmentation. The surface patches p1 and p2 are

merged if
(Ix—:4) < t.
where t is a positive threshold, 0 < t < 1. We tried several values of the threshold
t and found that the segmentation results were the same for a wide range ( t E
(0.40, 0.80)) of values. We used t = 0.60 for all the results reported in this chapter.
The segmentation results of the hybrid algorithm are shown in Figures 4.10 - 4.14.

Parts (a) and (b) of these ﬁgures show the input range image and the corresponding

68

edge-based segmentation, part (c) shows the initial segmentation obtained by the
region-based method, and part (d) shows the ﬁnal segmentation. Comparing the
edge-based and region-based segmentations for these images, we see that: (i) surface
patches do not cross over jump or crease edges and the number of surface patches
match the number of visible surfaces on the object in the scene, and (ii) broken
edges in the edge-based segmentation are connected in the ﬁnal segmentations. For
example, the broken crease edge in the wedge of the polyhedron in Figure 4.12(b)
is connected in Figure 4.12(d). The ﬁnal segmentations for all the test images show
that all but very small surface patches have been grouped correctly and their shape
retained; this is an important trait for a segmentation algorithm meant to be used as

a module in an object recognition system.

4.7 Evaluation of the Edge Detector

The performance of our MRF-based edge detector is evaluated by comparing the
performance of the hybrid segmentation algorithm with the Hoffman-Jain region-
based segmentation algorithm. This is reasonable because the edges detected by the
edge detection algorithm determine the location of the boundaries between surfaces
in the ﬁnal surface segmentation.

The two segmentation algorithms were compared based on their performance on
11 range images. The ﬁnal segmentations for the two algorithms are shown in Fig-
ures 4.15 — 4.25. We compared the two segmentations by visually checking if the
segmented surfaces match the physical surfaces or not. Visual assessment is justiﬁed
here because of the lack of a widely accepted numerical measure of performance. The
author’s visual assessment of the two segmentations is shown in Table 4.1. It is seen
from the table that the hybrid algorithm outperforms the Hoffman—Jain segmentation

method on 3 out of the 11 test images, and the two algorithms are comparable in

 

Figure 4.10. Hybrid segmentation results for the GRNBLK3 image: (a) range im~
age; (b) edge-based segmentation; (c) initial region-based segmentation; ((1) result of
combining region-based and edge-based segmentations.

70

 

Figure 4.11. Hybrid segmentation results for the HARRISCUP image: (a) range
image; (b) edge-based segmentation; (c) initial region-based segmentation; ((1) result
of combining region-based and edge—based segmentations.

71

 

Figure 4.12. Hybrid segmentation results for the COLUMNl/GRNBLK3 image: (a)
range image; (b) edge-based segmentation; (c) initial region-based segmentation; (d)
result of combining region-based and edge-based segmentations.

72

 

   

Figure 4.13. Hybrid segmentation results for the AGPART2 image: (a) range im-
age; (b) edge-based segmentation; (c) initial region-based segmentation; (d) result of
combining region-based and edge-based segmentations.

73

 

Figure 4.14. Hybrid segmentation results for the COLUMN2/ELL image: (a) range
image; (b) edge-based segmentation; (c) initial region-based segmentation; ((1) result
of combining region-based and edge-based segmentations.

74

Table 4.1. Visual comparison of segmentation results. The assessments in column 2
are as follows. “>”: the hybrid algorithm gives better segmentation results than the
Hoffman-Jain algorithm. “z”: the segmentations produced by the two algorithms
are comparable. “j”: the Hoffman-Jain algorithm gives better segmentation results

than the hybrid algorithm.

 

[ Scene [ Assessment [

GRNBLK3
HARRISCUP
COLUMNl/GRNBLK3
AGPART2
COLUMNZ/ELL
BIGWYE4
ADAPTER/CURVBLOCK
BALL2/PROPANE
BLOCK4/HALFSPH
CAP /BOX21NCH
COL2/BALL

 

 

 

 

 

 

 

 

||/\||VV||

 

 

 

||V||

 

 

 

 

 

performance on 7 scenes. The Hoffman-Jain algorithm gives better segmentation on
only BIGWYE4 image. Further, the Hoffman-Jain segmentation algorithm requires
on an average three times the computational time required by the hybrid algorithm.
Since the initial clustering step is common to both the algorithms, this savings in
computational time is evidently due to the efﬁciency of the MRF model-based edge

detector in comparison to the merging step used in the Hoffman-Jain algorithm.

4.8 Conclusions

This chapter developed a new algorithm for detection of jump and crease edges in
range images. The algorithm uses a Markov random ﬁeld model to quantify contex-
tual information in edge labeling. Edge detection experiments using several real range
images indicate that the edges detected by the algorithm indeed correspond to the

jump and crease edges in the input range images. It is, in general, difﬁcult to choose

75

(a) (b)

Figure 4.15. Segmentation results for the GRNBLK3 image using (a) Hoffman-Jain
algorithm, (b) Hybrid algorithm.

   

Figure 4.16. Segmentation results for the HARRISCUP image using (a) Hoffman-
Jain algorithm, (b) Hybrid algorithm.

76

 

Figure 4.17. Segmentation results for the COLUMNl/GRNBLK3 image using (a)
Hoffman-Jain algorithm, (b) Hybrid algorithm.

   

Figure 4.18. Segmentation results for the AGPART2 image using (a) Hoffman-Jain
algorithm, (b) Hybrid algorithm.

77

(a) (b)

Figure 4.19. Segmentation results for the COLUMN2/ELL image using (a) Hoffman-
Jain algorithm, (b) Hybrid algorithm.

0))

Figure 4.20. Segmentation results for the BIGWYE4 image using (a) Hoffman-Jain
algorithm, (b) Hybrid algorithm.

 

 

 

Figure 4.21. Segmentation results for the ADAPTER/CURVBLOCK image using
(a) Hoffman-Jain algorithm, (b) Hybrid algorithm.

(a) (b)

Figure 4.22. Segmentation results for the BALL2/PROPANE image using (a)
Hoffman-Jain algorithm, (1)) Hybrid algorithm.

 

79

 

Figure 4.23. Segmentation results for the BLOCK4/HALFSPH image using (a)
Hoffman-Jain algorithm, (b) Hybrid algorithm.

   

Figure 4.24. Segmentation results for the CAP/BOXZINCH image using (a)
Hoffman-Jain algorithm, (b) Hybrid algorithm.

 

(a) (b)

Figure 4.25. Segmentation results for the COL2/BALL image using (a) Hoffman—Jain
algorithm, (b) Hybrid algorithm.

potential function values which optimize performance in the sense of being closest to
the human grouping mechanism. The quantiﬁcation of various prior constraints and
Gestalt principles in the form of clique potentials is a formidable problem. Neverthe—
less, as we have shown here. for speciﬁc problems, it is possible to utilize contextual
information and other constraints in the form of MRF models.

The edges detected by the edge detection algorithm are not always closed con-
tours. Therefore, the edge-based segmentation was integrated with a region-based
segmentation using a simple technique resulting in a hybrid segmentation algorithm.
The results of experiments using the hybrid segmentation algorithm showed that the
surface patches detected by the algorithm match the physical surfaces in the scene.
Further, comparison of the hybrid segmentation algorithm with the Hoffman-Jain
region—based segmentation algorithm showed that the hybrid algorithm gives visu-
ally better surface segmentations than the Hoffman-Jain algorithm in some cases.
Also, the hybrid algorithm is computationally more efficient than the Hoffman-Jain

algorithm. Since the location of the edges in the hybrid segmentation is determined

81

based on the MRF model-based edge detector output, the superior performance of
the hybrid algorithm is interpreted as evidence for the usefulness of MRF contextual

models for the problem of edge detection in range images.

CHAPTER 5

Parameter Estimation of Line
Process MRF

An MRF prior model is speciﬁed in terms of clique parameters. One important ad-
vantage of the MRF models of context vis a vis other regularization methods [84, 10]
is that the parameters of the MRF model can be determined from the knowledge
of local conditional distribution of image labels. However, in most of the previous
work using MRF models, the parameters of the prior model have been determined in
an ad hoc fashion (the “best” parameters are found using a trial and error method)
similar to the method used in Chapter 4. While the parameters speciﬁed using these
methods seem to give reasonable results for some images and some image analysis
tasks, it is not clear as to how close they are to the “best” set of parameters in terms
of Gestalt principles of grouping. Another signiﬁcant limitation in specifying these
parameters empirically is that, as the neighborhood size and consequently the num-
ber of parameters increases, the parameter space to be explored becomes unwieldy.
Therefore, a systematic method for speciﬁcation or estimation of these parameters is
of significant interest.

Several schemes have been proposed in the computer vision literature to estimate
the parameters of an MRF [5, 27]. For the case of MRFs deﬁned on pixel sites, these
schemes have been applied with considerable success, particularly when MRF models

are used to model image texture. For MRFs deﬁned on edge sites, no parameter

82

83

estimation technique is available. Silverman et a1. [87] have achieved limited success
with an alternative approach, where an analytical model is used to compute the
penalty for edges.

In this chapter, we propose a new method to estimate MRF model parameters
when these models are used to ﬁnd object boundaries in range or intensity images.
We assume that mechanical (geometric) models of the 2D or 3D objects which are
expected to be present in the scene are available. The models of the objects are used
to generate multiple images of the objects from different viewpoints. The frequency
of local edge conﬁgurations estimated from these synthesized edge maps is then used
to estimate the parameters of the MRF. This estimation scheme has been applied to
detect edges in range and intensity images and its performance is demonstrated on

several images.

5.1 Estimation of Line Process Parameters from
Object Models

We face the following difficulties in estimating model parameters for MRFS deﬁned

on edge variables.

1. Line variables are unobservable and therefore, the true line process conﬁgura-

tions are seldom available.

2. Even when such a true edge map is available, estimating parameters of an MRF
from a single labeling is difficult because some edge conﬁgurations occur very

infrequently.

3. Techniques of simultaneous parameter estimation and labeling [67] are also not
suitable here because the local joint probabilities estimated from the given,

possibly incorrect, labeling are very unreliable.

84

Because of these difﬁculties, the most commonly used method of specifying the
parameters of the line process has been by trial and error. We propose a method to
overcome these difficulties. We are primarily interested in site labeling problems with
the intent of generating a surface (or edge) segmentation for a model-based object
recognition system [34, 57].

We assume that a geometric model is available for every object expected in the
scene. Further, it is assumed that all objects in the database are equally likely to
appear in the scene and that all vieWpoints are equally likely. These two statements
represent the prior knowledge about the scene.

Our approach is to take a random sample (set of edge maps) from the population
of edge maps derived from all possible vieWpoints of the given object models. It is
important to note that random sample here means an edge map from a randomly cho-
sen vieWpoint of a randomly chosen object model. We emphasize that the advantage
of using the object models to synthesize the desired edge maps instead of using an
estimated edge map as in simultaneous parameter estimation and labeling schemes,
is the availability of “true” labels at the edge sites. For example, Figure 5.1(b) shows
a synthetic edge map indicating the exact location of edges, obtained from the CAD
model of the object shown in Figure 5.1(a). By generating images corresponding to
several random vieWpoints of object models, we essentially obtain samples of edge

maps which can be used for estimation of the MRF model parameters.

5.2 Least Squares Formulation

For a second-order neighborhood, let 11% represent the set of labels assigned to the
neighbors of edge site t. The neighbors of both horizontal and vertical sites will
be treated in the same way by using the neighborhood numbering scheme shown in
Figure 5.2. Since we are interested in edge detection, each site can be assigned one of

the two possible labels: {EDGE, N ON-EDGE}. Let ('9 = [01, ..., 0P]T be the parameter

85

 

 

 

 

 

(a) (b)

Figure 5.1. (a) Synthetic range image of an object and (b) corresponding edge map
obtained from the CAD model of the object.

6 4
:3
5n n7 -3=”=5
: —|]I_—_I 2'] I ”6
3 = 1 1'= =’
2 0
(a) (b)

Figure 5.2. Neighborhood numbering scheme for (a) horizontal edge site, (b) vertical
edge site.

86

vector that deﬁnes the clique function. The number of parameters p depends on the
particular parameterization used to deﬁne the clique potentials. For example, for
an isotropic and homogeneous model with the second-order neighborhood shown in
Figure 5.2, the number of parameters is 35, when all possible clique conﬁgurations
are used as parameters. When the parameters are speciﬁed in an ad hoc fashion,
however, a portion of the possible clique conﬁgurations are assigned zero potentials
in order to reduce the number of free parameters of the model. In Section 5.3, we
show that there exists a parameterization with only 8 parameters for a binary process
with 8 neighbors.

The following formulation is based on the treatment by Derin and Elliott [27]. Let
U(l,,l,~r,,@) be the sum of the potential functions for all cliques to which the edge
site t belongs.

U (@1th =2 V(l =(<I>T (W )=@ Zn 0,, (5.1)
c: tEC i=1

where

§T(l¢,lM) 2 [n1,..., np]T

and n,- represents the number of clique conﬁgurations of type i in the neighborhood
of site t.

The local conditional probability at site t can be written as

Haw.) _ e'Ut’h‘Nve)
P(th) — Z¢(INH®) ,

 

P(ltIlM) = (5.2)

where Z;(l,v,,@) = :1, U(l¢,l,v,,@).
Substituting 1; = EDGE and ft = NON-EDGE in Eq. (5.2) and combining the two

resulting equations, we get

P(NON-EDGE,1N,)
P(EDGE,1M‘)

eU(sDGE,lM ,G)—U(N0N-EDGE,IM,@) _

 

Taking logarithms on both sides and substituting for U(lt, 1M, 9) from Eq. (5.1), we

87

have
P(mssnoa, 1M)

Q EDGE l _ Q O .306 T =
( ( , M) (N N a,lM)) 6) log P(snssJM)

 

(5.3)

For the second-order neighborhood under consideration, there are 256 (28) possible
values for the vector 11V:- The ratio P(Nos.soca,1M)/P(Eocs,l)v,) can be estimated
from the edge maps synthesized from object models by counting the number of local
edge conﬁgurations of type (nos.socs,lM) and dividing by the number of local edge
configurations of type (meal/(rt). To avoid the ill-conditioning of Eq. (5.3) due to
very small estimated values of P(mmsocs, 1M) and P(soosJM), we modify Eq. (5.3)

to
P(NON-EDGE,1N‘) + c
P(EDGE,1M,) + 6

 

(@(BDGE,1M)— <I>(NON.EDGE,1M))Te = log , (5.4)

where c is a small positive number.

By substituting each value of 1N. in Eq. (5.4), we will obtain 256 (28 possible
neighborhood conﬁgurations) equations in p unknowns. This overdetermined linear
system of equations can be solved by the method of least squares. This parameter
estimation can be carried out off-line, so there is no computational overhead at run

time.

5.3 A Canonical MRF Representation: Normal-
ized Potential Functions

For a speciﬁc Markov or Gibbs random ﬁeld, there could be many equivalent potential
functions which specify the same distribution [66]. So an important question to ask
is “Is there an equivalent MRF representation, for a given neighborhood, with the
minimal number of free parameters?”

It turns out that, for the case of binary processes, such a representation exists,

and the associated potential functions are called the normalized potential functions

which are deﬁned as follows [41].

88

 

 

 

 

 

 

 

 

 

Figure 5.3. Neighborhood for a simple MRF used to demonstrate the existence of a
unique canonical representation.

 

Clique DC] IE] II C] I

conﬁguration

 

 

 

 

 

 

 

Enumerative

potential 3 b C d e
Canonical

potential 0 0 g 0 f

 

[j : site labeled 0
I : site labeled 1

Figure 5.4. Clique conﬁgurations for the MRF with neighborhood shown in Figure 5.3
and the corresponding potentials under enumerative and canonical representations.

Definition: A potential function V 2 {VA : A g .C} is a normalized potential if
VA(l) = 0 whenever It =NON-EDGE for some t 6 A.

The choice of the label N ON-EDGE in the above deﬁnition was arbitrary. For a
given choice of the label in the deﬁnition, it is known that a normalized potential
is unique for every MRF [41]. Because of their uniqueness property, normalized
potential functions are also known as canonical potentials. We give below a simple
example to demonstrate the existence of canonical potential representation.

Consider a binary MRF pixel process with pixel labels {0,1} deﬁned using the

neighborhood shown in Figure 5.3. Figure 5.4 lists all possible clique conﬁgurations

89

Table 5.1. Conditional probabilities under two representations.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Neighborhood Conditional Enumerative Canonical
configuration probability representation representation
:ryz P(ylatz)
—(20+d) 1
000 P(OIOO) Fania)+e—(2b+c) 1+e—i9l
_(a+b+d) 1
001 P(0[01) (“June—(Mac) 1+c—l9+li
—(a+b+d) 1
100 P(OIIO) e—(a+:+d‘)+e-(b+c+c7 1+e-(9+fi
-(2b+d) 1
101 P(0I1 1) ﬁning—(n+0 1' +' c-Zg+2li
-(2b+e) -(9)
010 P(IIOO) e-(2aid)+e—(Zb+67 lie-i9;
-(b+c+c) -(9+f)
011 P(IIOI) c-Ta+:+d)+e-(b+c+e) ﬁ—lir 9+»
-—(b+c+c) ‘(9+f)
110 P(lIIO) ﬁnish—(mm lie-(9+!)
—(2c+e) ‘(9+2/)
111 P(IIH) c—rzbia)+c—(2c+ej W

 

 

 

 

 

 

 

90

for this neighborhood under isotropic assumption, along with their potentials under
enumerative and canonical parameterizations. Table 5.1 shows the conditional distri-
butions corresponding to the two parameterizations. By equating the two expressions
for conditional probabilities corresponding to the same neighborhood conﬁguration
(this can be done because, the two conditional distributions represent the same MRF),
it is straightforward to see that the parameters of the unique canonical representation

for this MRF are given by

f = a—‘Zb—d

g = 2(b—a)+e—d

This example shows that enumerative representation with 5 parameters (a, b, c, d, e)
is equivalent to the canonical representation with only 2 parameters (f, g).

The clique conﬁgurations with non—zero canonical potentials for the neighborhood
of Figure 5.2 are shown in Figure 5.5. As a consequence of the above deﬁnition, the
potentials for all conﬁgurations in which at least one of the site labels is NON-EDGE
is set to zero. Therefore, only eight parameters are needed to represent the MRF
deﬁned on the second-order neighborhood corresponding to the eight conﬁgurations
shown in Figure 5.5. The number of parameters that need to be estimated is thus

reduced considerably (from 35 to 8).

5.4 Experimental Results

The proposed method for estimating clique potentials was tested using a model
database of 10 3D objects [33]. The object models used in our experiments consist
of mechanical CAD models of 3D objects built using the GEOMOD solid modeling
system [56]. Several of these objects were used by Flynn and Jain in their 3D object
recognition system called BONSAI [33]. Figure 5.7 shows one range image for each of

these ten objects. For parameter estimation, the number of objects and the number of

91

- : Edge site with label EDGE

Figure 5.5. Clique conﬁgurations with non-zero canonical potentials.

_ _
_ _—
-
21.6 -19.46 49,34 1.33
l _ _
I
28.5 0.99 -41.28 -2.95
_ : Edge site with label EDGE

Figure 5.6. Canonical clique potential values estimated from 3D CAD models by the
least squares method.

92

 

Figure 5.7. Synthetic range images of objects used for estimating clique parameters;
only one view per object is shown here.

views (images) of each object should be sufficiently large so that the frequency of local
edge conﬁgurations computed from these images are consistent with the probability
with which these conﬁgurations appear in a range image from an arbitrary viewpoint.
We generated 100 edge maps corresponding to 100 random views for each of the 10
model objects and computed the local conditional probabilities of edge labels using
a straightforward histogramming technique. The parameters of the canonical MRF
representation can be estimated using the least squares formulation by setting the
number of parameters p = 8.

The estimated values of these 8 parameters from the local characteristics obtained
from CAD models by setting 6 = 1/256 in Eq. (5.4) are shown in Figure 5.6. The
parameter estimates are sensitive to the value of 6 used. However, changing the value
of 6 does not have the same effect on the edge detection results. Presumably, the

constraints speciﬁed by the parameter values estimated with different 6 values do

93

not change drastically, even though the numerical values of the individual parameters
change considerably. Notice that the estimated value for the same clique conﬁguration
is drastically different from that selected in an ad-hoc fashion (compare the values in
Figures 4.4 and 5.6). We must point out that these two sets of parameter values are
not directly comparable, because the number of parameters in the two representations
are different. However, even though the parameter values in the two representations
are very different, the contextual constraints expressed by them need not necessarily
be different.

Edge detection experiments were performed using our database of range images
shown in Figure 1.6. Here we report results on all the 11 images in our database. The
algorithm presented in Section 4.3 was used for edge detection. The same cliques
and potential function values were used for all the images. The edge detection results
for these images are presented in Figures 5.8 ~ 5.18. In all these ﬁgures, the following

details are shown.

0 The original range image is shown in part (a).
o The range image is shown as a pseudo-intensity image in part (b).

o The edge detection result using the algorithm of Section 4.3 using estimated

clique parameters given in Figure 5.6 is shown in part (c).

o The edge detection result using the algorithm of Section 4.3 using clique pa-

rameters given in Figure 4.4 is shown in part (d).

Figure 5.8 shows the results for the GRNBLK3 image. With estimated clique
potentials, the crease edge near the top of the wooden block is broken but the crease
Edge is correcty detected using the ad hoc clique potentials. Also, one of the jump

edges is not detected cleanly with the estimated clique potentials.

94

 

(a) (b)
(C) ((1)

Figure 5.8. Edge detection results for the GRNBLK3 image: (a) range image; (b)
pseudo-intensity image; (c) edges detected using estimated clique potentials; (d) edges
detected using ad hoc clique potentials.

95

 

Figure 5.9. Edge detection results for the HARRISCUP image: (a) range image;
(b) pseudo-intensity image; (c) edges detected using estimated clique potentials; (d)
edges detected using ad hoc clique potentials.

96

 

’ ’

Figure 5.10. Edge detection results for the COLUMNl/GRNBLK3 image: (a) range
image; (b) pseudo-intensity image; (c) edges detected using estimated clique poten-
tials; (d) edges detected using ad hoc clique potentials.

97

 

Figure 5.11. Edge detection results for the AGPART2 image: (a) range image; (b)
pseudo-intensity image; (c) edges detected using estimated clique potentials; (d) edges
detected using ad hoc clique potentials.

98

 

\‘Q Q

\

Figure 5.12. Edge detection results for the COLUMNZ/ELL image: (a) range image;
(b) pseudo-intensity image; (c) edges detected using estimated clique potentials; (d)
edges detected using ad hoc clique potentials.

99

 

Figure 5.13. Edge detection results for the BIGWYE4 image: (a) range image; (b)
pseudo—intensity image; (c) edges detected using estimated clique potentials; (d) edges
detected using ad hoc clique potentials.

100

 

Figure 5.14. Edge detection results for the ADAPTER/CURVBLOCK image: (a)
range image; (b) pseudo-intensity image; (c) edges detected using estimated clique
potentials; (d) edges detected using ad hoc clique potentials.

101

 

Figure 5.15. Edge detection results for the BALL2/PROPANE image: (a) range im-
age; (b) pseudo—intensity image; (c) edges detected using estimated clique potentials;
(d) edges detected using ad hoc clique potentials.

102

 

6@ 5%

Figure 5.16. Edge detection results for the BLOCK4/HALFSPH image: (a) range im~
age; (b) pseudo-intensity image; (c) edges detected using estimated clique potentials;
(d) edges detected using ad hoc clique potentials.

103

 

Figure 5.17. Edge detection results for the CAP/BOXZINCH image: (a) range image;
(b) pseudo—intensity image; (c) edges detected using estimated clique potentials; (d)
edges detected using ad hoe clique potentials.

104

 

Figure 5.18. Edge detection results for the COL2/BALL image: (a) range image;
(b) pseudo-intensity image; (c) edges detected using estimated clique potentials; (d)
edges detected using ad hoc clique potentials.

105

Figure 5.9 shows the results for the HARRISCUP image. The crease edge near the
bottom of the cup (top of the figure) is detected correctly from the estimated clique
potentials. With ad hoc potentials, however, several spurious edges are detected near
the crease edge.

Figure 5.10 shows the results for the COLUMNl/GRNBLK3 image. All physical
edges in the scene are detected in both cases. However, the crease edge on the head of
the COLUMNl object is wrongly detected as two parallel edges when the ad hoe clique
potentials are used, but is detected correctly when the estimated clique potentials are
used.

Figure 5.11 shows the results for the AGPART2 image. In this case, both esti-
mated and ad hoc clique potentials result in some errors in the edge detection. With
estimated clique potentials, no spurious parallel edges are detected but the crease
edge corresponding to the outer ring of the object is not connected. On the other
hand, with ad hoe clique potentials, the edges are all connected, but some spurious
parallel edges are detected.

Figure 5.12 shows the results for the COLUMNQ/ELL image. Here again, the
MRF-based edge detection algorithm makes different mistakes with the two sets of
clique potentials. With estimated clique potentials, the crease edge on the ELL is
broken at several places. On the other hand, with the ad hoc parameters, the crease
edge near the bottom of COLUMN2 is not detected very cleanly.

Figure 5.13 shows the results for the BIGWYE4 image. The crease edge near
the intersection of two cylinders is not detected completely with estimated clique
potentials. With ad hoc parameters, this crease edge is detected properly.

Figure 5.14 shows the results for the ADAPTER/CURVBLOCK image. The
spurious edges detected on the sides of the ADAPTER are due to the “holes” (pixels
where range data is not available) in the image. These edges can be eliminated by

appropriate preprocessing such as median ﬁltering of the input range image. All other

106

jump and crease edges in the scene are detected by the algorithm with both estimated
and ad hoe clique potentials.

Figure 5.15 shows the results for the BALL2/PROPANE image. All the signiﬁcant
jump and crease edges are detected using both sets of clique potentials. With the
estimated clique potentials, however, the algorithm detects a spurious parallel edge
close to the left side of ball. Notice that both objects were not included in the database
of objects used to estimate the MRF model parameters.

Figure 5.16 shows the results for the BLOCK4/HALFSPH image. The HALFSPH
object was not used for estimating the MRF parameters. All jump and crease edges on
the BLOCK4 object are detected correctly in both cases. On the HALFSPH object,
while the jump edges are detected accurately in both the cases, only estimated clique
potentials result in correct detection of the crease edge.

Figure 5.17 shows the results for the CAP/BOX2INCH image. Notice again that
the BOXQINCH object was not used for estimating the MRF parameters. The two
sets of clique potentials result in good edge detection for the CAP object, but fail to
connect two different crease edges on the edges of the BOXZINCH object.

Figure 5.18 shows the results for the COLUMN2/BALL image. As in the case of
the ADAPTER/CURVBLOCK image, the spurious edges detected on the ADAPTER
are due to the holes in the image. The edge detection results with estimated and ad
hoc clique potentials, are comparable for this image.

It is seen that nearly all the jump and crease edges have been detected by the
algorithm with both estimated and ad hoc clique potentials. It is also seen that
edges detected using the estimated normalized clique potentials typically contain
fewer spurious edges than with ad—hoc clique potentials. Table 5.2 summarizes the
author’s visual assessment of the segmentation results in Figures 5.8 - 5.18. The table
indicates that the edge detection performance with estimated parameters is as good as

the performance with ad hoc clique potentials. Also, from the edge detection results

107

Table 5.2. Visual comparison of edge detection results. The assessments in column 2
are as follows. “>”: estimated clique potentials result in better overall edge detection
than ad hoe clique potentials. “z”: edge detection results with the two sets of clique
potentials are nearly the same. “<”: ad hoc clique potentials result in better overall

edge detection than estimated clique potentials.

 

[Scene [ Assessment I

GRNBLK3
HARRBCUP
COLUMNl/GRNBLK3
AGPART2
COLUMNQ/ELL
BIGWYE4
ADAPTER/CURVBLOCK
BALLQ/PROPANE
BLOCK4/HALFSPH
CAP /BOX21NCH
COL2/BALL

 

 

 

 

 

IIVVA

 

 

 

 

 

 

IIVAIIAII

 

 

 

 

 

on images containing objects not used for estimating the MRF model parameters,
it is clear that all objects need not be used for estimating the parameters. The
MRF model deﬁned using the second-order neighborhood is capable of incorporating
edge continuity and smoothness information only, and these constraints are very well

represented by a few “typical” 3D objects.

5.5 Edge Detection in Intensity Images

The parameter estimation technique introduced in Section 5.1 is applicable to a va-
riety of MRF models. This section illustrates this point by using this parameter
estimation scheme to detect edges in intensity images. We assume here that edge
detection is performed as a processing step in a 2D object recognition system and 2D

models of the objects expected in the scene are available.

108

5.5.1 Edge Detection Approach

The edge detection approach proposed for intensity images is very similar to the
approach used for range images. Here we are interested only in step edges in the
intensity images. The problem of detection of step edges in intensity images is essen-
tially the same as the problem of detecting jump edges in range images. Therefore,
the statistic that we used for jump edges in range images could be used to detect
step edges in intensity images as well. The number of pixels used in the computation
of ‘step edgeness’, is the same as the number of pixels used for computing the ‘jump
edgeness’ in Chapter 4. Speciﬁcally, we use N = 4 in Equation (4.1). The algorithm
for intensity edge detection is essentially the same as in Section 4.3 except for the

likelihood ratio, LE which now involves only a single term.

5.5.2 Estimation of Parameters from 2D Object Models

The parameters of the MRF model that capture the relative frequency of edge con-
ﬁgurations in intensity images can also be determined from the 2D models of the
objects expected in the scene. We use a boundary representation of the 2D objects
in the model database, because the location of edges in the model is important for
estimation of clique parameters. This representation has been used by Ayache and
Faugeras [2] in their 2D object recognition system, called HYPER. In this represen-
tation scheme, the object boundaries are approximated by polygons. Each model
description consists of a set of line segments from the approximating polygon with

the following attributes:
(as, y, a, I).

Here a: and y are the coordinates of the midpoint of the line segment, a is the angle
of the line segment with respect to positive x-axis, and I is the length of the segment.
The parameter estimation scheme involves synthesizing different edge maps from

the polygonal models of the 2D objects in the database. The edge map of an ob-

109

Figure 5.19. Edge map synthesized using 2D model of an industrial part.

ject corresponding to a given vieWpoint can be synthesized by performing a two-
dimensional rotation of the models. Figure 5.19 shows one such edge map from a
random vieWpoint generated from the model of a “T” in our database. We empha-
size that the advantage of using object models to generate samples of edge maps
instead of obtaining edge maps corresponding to real images is that, a large number

of vieWpoints can be simulated by changing only the orientation of the model.

5.5.3 Experimental Results

The experiments were performed using a database of 10 2D objects shown in F ig-
ure 5.20. The number of edge maps used in the estimation of parameters should be
sufficiently large so that nearly all neighborhood edge conﬁgurations appear in this
sample set. For each object in the database, 100 different edge images were gener-
ated corresponding to 100 different rotations of the object model. The 8 canonical
clique potential values estimated using the proposed method are given in Figure 5.21.
Comparing these parameter values with the values given in Figure 5.6, it is seen
that some of the clique potential values are quite different in the two cases. This
reﬂects the nature of edge conﬁgurations in the range images of 3D objects and in-

tensity images of 2D objects. We used the Gibbs sampler [38] to generate samples

110

 

Figure 5.20. 2D objects in the database.

-2o.a7 ~2-14

: Edge site with label EDGE

Figure 5.21. Canonical clique potentials estimated from 2D models using the least
squares method.

111

(edge maps) from the MRF models of the line process using three sets of parameters.
Figures 5.22(a), (b), and (c) show edge maps generated after 100 iterations of the
Gibbs sampler using the ad hoe clique potentials (Figure 4.4), the clique potentials
estimated using 3D models (Figure 5.6), and the clique potentials estimated using
2D models (Figure 5.21), respectively. A careful observation of these three ﬁgures
reveals that the Gibbs prior distributions represented by the estimated parameters
(Figures 5.22(b) and (c)) have fewer edges than the Gibbs prior represented by the
clique potentials speciﬁed in an ad hoe fashion (Figure 5.22(a)). This implies that
the ad hoe clique potentials encourage formation of more edges than are found in
real images. Further, parameters estimated using 2D models result in more abrupt
endings of edges or boundaries than with 3D models. This behavior can be explained
because the edge maps generated from 2D polygonal models are only an approxima-
tion to smooth object boundaries as can be seen from the gaps in the boundaries in
Figure 5.19.

Figure 5.23 shows the results of edge detection on a 256 x 256 image of some
industrial parts and Figure 5.24 shows the results of edge detection on a 128 x 128
image (obtained using an Odetics sensor) of a scene in a manufacturing plant (this
image was provided to us by Odetics Corporation). The objects present in the image
of Figure 5.24 are not included in the database of 2D objects used to estimate the
clique potential values in Figure 5.21. Both ﬁgures show (a) the input gray level
image, (b) the edges detected using the MRF-based approach with clique parameters
estimated from 2D models of the objects, (c) edges detected using the MRF-based
approach with clique parameters speciﬁed in an ad hoc fashion, and (d) the output
of the Canny edge detector [16] for comparison purposes. The Canny edge detector
was used for comparison here because of its wide acceptance in the computer vision
community. The parameters of the Canny edge detector were ﬁxed at values that

perform reasonably for a wide range of images. The quality of edge detection results

112

1"” ‘

“P 'I' v d".
Jﬁltﬁww
Q's-9““? .
r'l‘. : "I I . O
:re. at. . '-

h"
4’3: a".
we?
: able"

L)».
. [IA-
:I.'-~.

‘IEEI-v' I: N
s: :41?!" _
same.

 

Figure 5.22. Sampling of MRF line process using the Gibbs sampler: edges generated
using (a) ad hoe parameters, (b) parameters estimated from the 3D CAD models, (c)

parameters estimated from 2D object models.

113

 

Figure 5.23. Step edge detection results for the INDUSTRIAL PARTS image: (a) the
input image; (b) edges detected using clique parameters estimated from 2D object
models; (c) edges detected using clique parameters speciﬁed ad hoe; (d) edges detected
using the Canny edge detector (with hysteresis threshold values, 0.9 and 0.5 and width
of the gaussian o = 1 pixel).

114

 

(C) (d)

Figure 5.24. Step edge detection results for the SCENE image: (a) the input image;
(b) edges detected using clique parameters estimated from 2D object models; (e)
edges detected using parameters speciﬁed ad hoc; (d) edges detected using the Canny
edge detector (with hysteresis threshold values, 0.9 and 0.5 and width of the gaussian
o = 1 pixel).

115

using estimated parameters is nearly the same as that obtained using parameters
speciﬁed in an ad hoc fashion for both images. Further it is seen that overall, MRF-
based edge detection is slightly superior to the Canny edge detector output because
the latter results in oversmoothed edges especially near the corners and also fails to

detect edges of low contrast.

5.6 Conclusions

Estimation of parameters of an MRF contextual model has been a formidable prob-
lem. We have presented a new off-line scheme for estimating parameters of MRF
(GRF) prior models when they are used to ﬁnd edges in intensity or range images.
The idea of equivalent representation of an MRF using canonical clique potentials is
used to reduce the number of unknown clique potential parameters to be estimated.
Experimental results presented on several range and intensity images indicate that
this parameter estimation method results in as good an edge detection performance
as obtained from parameters speciﬁed by a trial and error method. The signiﬁcance
of the result is that it provides a systematic method for speciﬁcation of MRF param-
eters instead of an ad hoc method. The edge detection results also indicate that it is

sufﬁcient to use only “typical” objects to estimate the line process parameters.

CHAPTER 6

Edge detection and labeling by
fusion of intensity and range
images

Fusion of information from different sensors is essential for robustness in several ap-
plications of computer vision [21] (6.9., navigation, inspection, target recognition).
Fusion of range (depth) and intensity images can provide information about the phys-
ical origin of edges — a useful intrinsic property of the scene. Range images provide
evidence for jump (depth discontinuities) and crease edges (surface normal discon-
tinuities). An intensity image, in addition, provides evidence for discontinuities in
albedo (reflective property of a surface). Therefore, combining the two sources of
information can help locate most of the important discontinuities in the image and
classify them according to their physical origin.

Fusion approaches differ in the formalism used to combine information from dif-
ferent sources. The approaches reported in the fusion literature fall into three cat-
egories [44]: (i) the Bayesian approach, (ii) the Dempster-Shafer approach and (iii)
heuristic methods. The Bayesian approach uses Bayesian statistics to incorporate
sensor uncertainty and a priori information [77]. When the variables in the system
are spatially arranged such as in a vision system, a Markov random ﬁeld models the

a priori information. The Dempster-Shafer approach [53] incorporates uncertainty

116

117

using the Dempster-Shafer theory of evidence. In this theory, probabilities are as-
signed to subsets of a frame of discernment, as opposed to Bayesian statistics where,
probabilities are assigned to individual events. While the literature is inconclusive on
the relative beneﬁts of the Bayesian and Dempster-Shafer theories [15], it is known
that Bayesian approach has a signiﬁcant computational advantage over the Dempster-
Shafer approach [4]. While Bayesian and Dempster-Shafer fusion approaches employ
a principled treatment of uncertainty, heuristic approaches use mechanisms such as
decision rules [51, 18], histogram processing [43], and pixel by pixel logical AND op-
eration [39] to fuse information from different sensors. Fusion algorithms that use
the heuristic approach lack robustness because of ad hoe rules built into the system.
Bayesian approach with MRF prior is preferable to the other two approaches because
(i) it incorporates sensor uncertainty, and (ii) it is computationally more efﬁcient than
Dempster-Shafer theory.

Fusion approaches also differ in level in the information processing hierarchy that
the integration of information takes place (see Figure 1.3). Fusion at higher levels of
the hierarchy are preferable when the images are not registered accurately [1, 69]. On
the other hand, fusion at lower levels helps in the detection of features that are useful
at higher levels. Therefore, we believe that low-level fusion approaches are preferable
when the images are registered accurately.

In this chapter, we develop a scheme for detection and labeling of edges from a
pair of registered intensity and range images in the Bayesian framework. We use
a coupled Markov random ﬁeld to (i) model the a priori knowledge about surface
smoothness and discontinuity locations, (ii) enforce correlations between different
sources of information, and (iii) enforce constraints on edge structure in the solution.

Following Gamble et al. [35], we label edge sites according to the physical origin of

118
the image discontinuities by choosing a label from the set A given by

A = {JUMP,CREASE,INTENSITY,NOEDGE}.

Figure 6.1 shows the relationship between different edge types and information
sources. We represent albedo and shadow discontinuities by a single label (IN TEN-
SITY), because it is not possible to discriminate between these two types of discon-
tinuities using only one pair of range and intensity images.

Our approach to fusion is different from related work on low-level fusion reported
in computer vision literature. Chou [19] uses the intensity image as partial evidence
of an edge in the depth data. The physical relationship between jump and crease
edges and the edges in the intensity image is not utilized. Gamble et al. [35] propose
a multiple lattice solution where an edge is detected in each image. Their system op-
erates in a loop where physical labels determined from these individual lattices feed
back information to integrate the different sources. Their architecture is incompletely
speciﬁed and no examples are given. The system by Wright [100] uses limited inte-
gration similar to Chou’s method, where the only constraint is that edges in different
images should occur at the same location. Our approach makes use of the physical

relationship between different edge types to achieve the desired integration.

6.1 Edge Labeling as Bayesian Estimation

In this section, we derive expressions for the a posteriori probability distribution of
edge and pixel labels, and the corresponding energy function for the fusion problem.
These expressions are derived analogous to the treatment in Section 2.2.

For the fusion problem, each site of the pixel lattice S has two measurement
variables and two label variables corresponding to the two input (range and inten-

sity) images. Since we are interested in line labeling, we use coupled multivari-

119

Intensity image Range Image
Shadow Albedo Jump Crease
edges edges edges edges

Intensity edges

Figure 6.1. Relationship between different edge types and information sources.

ate MRF model as shown in Figure 2.7. Note that there is no direct measure-
ment at edge sites and therefore, the line labels have to be determined based on
neighboring pixel labels. Let the measurements (observations) at each pixel site
i be denoted by vector Y; = (KR,I/,-’), where KR and Y}, represent the range
and intensity measurements, respectively. Similarly, there are two labels (the re-
stored range and intensity values) for each pixel site i denoted as X.- = (XiR,X,-I).
We denote the label at a line site j as a scalar I]; The range set of each IJ- is
A = {JUMP,CREASE,INTENSITY,NOEDGE}.

The size of the neighborhood desired for the MRF coupled process (X, L) depends
on the order of the surface continuity enforced. Since we need to recover crease edges
(discontinuities in the ﬁrst derivative), the neighborhood used should be large enough
to involve cliques that could be used to compute second derivatives of pixel values
in a local neighborhood. The neighborhood system that we use for this problem is
given in Figure 6.2.

The following three assumptions on the degradation process are necessary for

theoretical tractability and to make the computation of the a posteriori probability

120

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(b) (C)

Figure 6.2. Pixel and edge site neighborhoods: (a) neighborhood of a pixel, (b)

neighborhood of a horizontal edge site, and (c) neighborhood of a vertical edge site.

121

distribution feasible.

Assumption 1: The intensity and the range observations are conditionally indepen-

 

m
P(le) = P(yRIXR)P(yI|X') (6-1)

This assumption is reasonable as long as the noise processes corrupting the intensity

and range images are independent.

Assumption 2: The measurements (both range and intensity) at each site are

 

conditionally independent.

 

WX) =IIPM (am

(3")le =IIP(3'.-'|X.-) (6-3)

Assumption 3: The measurement noise in both range and intensity images is i.i.d.

 

Gaussian with mean zero.

This leads to

 

1 _(yR-IR)2
(3’? IX ) = e ”R . (6-4)
2n0§
(yf-sz
e 2"I . (6.5)

1
NBC-1W): —
j/21ro}

Here, a}; and a? are the variances of noise in the range and intensity images, respec-
tively.
Under these assumptions, the a posteriori probability distribution of the site labels

is given by [80]:

 

P((x.l)|y) = —1—e-”<(*">'”, (6.6)
ZY
where
R_$32 !_
U((x,l)ly)=2 (3" 2 2') + (y 2. +ZV( (x 1) (6.7)
«55 OR cee

122

We denote the local conditional energy at a site t with respect to a conﬁguration (x, l)
by Ut(g); it is the a posteriori energy of the conﬁguration obtained by modifying the
conﬁguration (x, I) such that the label at site t is changed to 9. Note that the symbol
9 denotes both pixel and edge labels depending on whether t is a pixel site or an edge
site. So 9 is a scalar for t 6 C, and g a vector of length two for t E S.

The potential functions VC in Equation (6.7) control the joint probability struc-
ture based on clique potential values. Perhaps the most important step in modeling
the contextual information using a Markov random ﬁeld is the speciﬁcation of clique
potentials or penalties which determine the corresponding Gibbs distribution. We
are interested here in enforcing zeroth (zeroth derivative) and ﬁrst-order (ﬁrst deriva-
tive) continuity in uniform regions of surfaces in the range image and a zeroth-order
continuity in the intensity image. Additionally, we would like to enforce structural
constraints on the local conﬁgurations of the edges by specifying penalties for edge
conﬁgurations that are not frequently observed. The pixel cliques used and the cor-
responding potential functions are given in Figure 6.3. The parameters a1, a; and
03 control the amount of smoothing performed on the pixel values.

The potential values assigned to edge cliques using trial and error experimentation
are given in Figure 6.4. The singleton edge clique is assigned four different potential
(penalty) values corresponding to the four possible edge labels. This is essential,
since if all the edge labels are penalized equally, then jump edge labels will always be
preferred. Notice that edge cliques of size two encourage continuity among the labels
of the same type by reducing the energy of such conﬁgurations. The integration of
range and intensity values is performed by a combination of pixel cliques given in
Figure 6.3 and the singleton edge cliques.

The model parameters ﬂI, ﬁg, ﬂ], a1, C12, and a3 can be expressed in terms of 6
parameters which are easier to specify (because these parameters do not depend on an

and 01)— h, kc, kJ, A], AC, and A1, by a suitable modiﬁcation of the corresponding

123

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

—1
r r
_J|]L_ l_ [H |ﬂ:| C]
(a) (b) (C) (d)
Clique I configuration 1 potential function
(a),(b) fllfz 01(f1I—f21)2(1-6(I,JUMP))(1—6(I,CREASE))(1—6(I,INTENSITY))

+az(f1R - sz)2(1- 5(1,JUMP))

 

(CHd) f1111.2121'3 013(f13 " 2sz + ff)2(1- 5(11,JUMP))(1— 5(11,CREA5E))
(1 — 6(12,JUMP))(1-— 6(12,CREASE))

 

 

(6(i,j) = 1 ifi = j, and 0 otherwise)

Figure 6.3. Pixel cliques and the corresponding potential functions.

Ill-I
I'll-llllll
II...
-0.05 0.08
III-I'll...
II...
-0.05 0.08
ENE“!
ﬁﬂﬂﬁs
-0.05 0.03
_ : JUMP_EDGE —

124

-0.04

-0.04

Iﬂﬂﬂﬂ

“ENE!

-0.0‘

: CREASE_EDGE

ﬁﬂﬂﬁﬂ =

\\\\\\\\

3:

INTENSITY_EDGE

Figure 6.4. Ad hoc edge clique potential function values.

125

expressions used by Blake and Zisserman [10] for weak membrane and weak plate
models. Here 11 is the sensitivity for detecting jump edges, and A): is the scale (in
number of pixels) for edge type [c which measures the amount of spatial information

desired to counteract noise. The various model parameters are then computed as

 

 

 

 

follows.
h2/\1
ﬁ’ "' 4.0m}2
50 = kc*51
51 = kJ*51
/\2
(11 = I
2.0 =0: 0}
A2
02 Z J 2
20*03
)‘40
a3 — 2.0M}2

The fusion problem can now be stated as follows: given the observation vector
y (with range and intensity components), estimate the label vector (x,l) such that

U((x,l)|y) is minimized.

6.2 TPM and HCF Algorithms for Fusion

The SA, TPM, and ICM algorithms for estimation of a posteriori labels described in
Section 2.3 can all be extended to work for coupled MRF models. The Thresholded
Posterior Mean (TPM) algorithm developed by Marroquin et al. [74] has been used
by Gamble et al. [35] for their multisensor fusion problem. In this chapter, we use
the TPM algorithm only to compare the convergence rates with and without fusion.
The TPM algorithm is stated in Figure 6.5.

While the TPM algorithm is computationally less expensive compared to the

simulated annealing (SA) algorithm, it still takes a prohibitively large number of

126

 

 

TPM Algorithm for Fusion
1. Initialize x = y. Initialize I by randomly choosing the label at site t,
It 6 {JUMP,CREASE,INTENSITY,NOEDGE} for each t E E.
2. (a) For each image a 6 (R, I}, where R stands for range image and I
stands for intensity image,

0 For each pixel site t, Update at? by sampling from the condi-
tional distribution

0 aa . a —U I? in ,i, 0
PW: |x5\{,},l, 31¢ ) = ‘3 ( I 5“” y‘ )/Zta

:zrf’I can be sampled from a Gaussian distribution [62].

(b) For each line site t, update I? by sampling from the conditional

distribution _ . .
P(ltll£\{t)a5() : e—U(lt|1£\{t},X)/Zt

3. Repeat (2) n times, saving realizations (x,l)(kH) through (x,l)("l
4. For each line site t, choose a label g such that g maximizes ﬂatly).

5. For each pixel site t, choose labels 91 and g3 such that 91 is the mean of
it], and g3 is the mean of if.

 

Figure 6.5. TPM algorithm for fusion.

 

127

iterations (> 200). Therefore, for our experiments we chose to use Chou’s HCF
algorithm [19] for coupled MRF optimization. This algorithm works for continuous
pixel variables as well. Therefore, it is suitable for optimization of the fusion energy
function which contains continuous valued pixel variables. The algorithm uses a
NULL label to represent the initial labeling at a site similar to the discrete labels
case given in Figure 2.3. The HCF algorithm for fusion is summarized in Figure 6.6.

Notice that the stability of a site is non-negative only when it is in a minimal en-
ergy state with respect to its neighbors. Therefore, the HCF algorithm is guaranteed
to reach a local minimum of the energy function.

It has been noted by Blake and Zisserman [10] that energy functions involving
second-order terms (an energy term consisting of second derivatives) converge ex-
tremely slowly. Because of this problem, previous approaches have avoided using the
second-order energy terms for surface reconstruction. We believe that the task of
ﬁnding minimal energy conﬁgurations in the fusion problem is made easier because of
the cues available from the intensity image. Since surface orientation discontinuities
almost always result in large intensity discontinuities, and because of the relative ease
with which intensity discontinuities can be found, we believe that by fusing the inten-
sity and range images, crease edges could be detected with much less computational

effort than if only range images were used.

6.3 Parameter Estimation of Coupled MRFS

The MRF model used for the fusion problem is a coupled MRF model consisting
of both pixels and edge sites. To our knowledge, there have been no attempts to
estimate the parameters of a coupled MRF. In the following, we describe how we
can extend our parameter estimation scheme for line process (Chapter 5) to estimate
parameters of a coupled MRF. We consider a. less general coupled MRF model than

the model described in Section 6.1 for simplicity of presentation and ease of testing

128

 

 

HCF Algorithm for Fusion
1. Initialize all edge and pixel sites to NULL é A.

2. Compute a stability measure G¢(x,l) for each site t 6 (5,5) as fol-
lows [19].

o If site t is a pixel site, then

_ AUt(xminyxmin + C!) If xtzNULL
Gt(x’l) _ { AUt(xm,'n,xt) otherwise

Notice that xmm, xt and a are all vectors of length two with a range
label component and an intensity label component. Here xmgn =
argminh,” Ut((r, 2)), where r and i are the range and intensity labels,
respectively, and 0 determines the penalty for committing to a label.
The purpose of a is to discourage pixel sites from committing to any
label till they have enough contextual information from the labels
of the neighbors. We set a : (03/2, 01/2) for all the experiments.

a If site t is an edge site, then

_ ’minleA,l¢lmmAUt(l,1min) if X¢=NULL
Gt(x’l) — { AU¢(lm,-,,,It) otherwise

where 1",," = argmin, U¢(l).
The function AU;(A,B) represents the energy difference Ut(A) —
U¢(B), where A and B are possible labels at site t.

3. Repeat

(a) Select the site with minimum stability
(b) Change its label to the one which corresponds to the lowest energy.

(c) Recompute stability for the site and its neighbors.

until (stability 2 0) for all sites.

 

Figure 6.6. HCF algorithm for fusion.

 

129

 

Figure 6.7. Neighborhood of a vertical edge site in the simpliﬁed MRF model.

the parameter estimation scheme. Specifically, we consider the case when only range
measurements are present. Further, edge cliques of size greater than one are not
considered. Therefore, the only parameters considered in the simpliﬁed model are:
02,03,ﬁc,and ﬂy. Also, since intensity measurements are not present, there are
only three possible edge labels, (JUMP, CREASE, NOEDGE}. In the interest of
compactness of the equations to be presented, we represent the JUMP, CREASE and
NOEDGE labels by J, C, and N, respectively.

Consider the neighborhood of an edge site for this simpliﬁed MRF model shown
in Figure 6.7. Let a configuration of the variables in Figure 6.7 (from left to right)
be represented by (23,, 11, 2:2, I, $3, 12, 1:4). Since the clique potentials in Figure 6.3 are
dependent on the first and second differences in pixel values rather than the actual
pixel values, we represent the conﬁguration of the neighbors of the edge site of interest

by the 5-tuple (D1, 11,61, 12, D,), where

D] = [(131 '— 2232 + (133'
d 2 I332 — x3|
Dr = I132 - 2x3 + and

The equations relating conditional probabilities and the parameters can be derived
analogous to Section 5.1.

P(l = NKDla I]: (1,127 Dr))

U(l = J,(Di,11.d,12, 19.)) — U(1= N,(Dza'nd, 12. 13.)) = 1°g P(I = no), 11 «112 D ))

 

130

P(l = N|(D1, ll, (1,12, D,))

an: c,(Dz,zl,d.12.D.)) — W = N ,(Du'ndv 12.1).» = 10g Pa = CKDI I. (112 D ))

 

Using only conﬁgurations in which 11 = 12 = N, and multiplying both sides of the

two equations above by -1, we have

P(l = J|(D,,N,d,N,D,.)
P(l = N|(D,,N,d,N,D,.)
r)
)

 

(a2d2+as(Df+Df))—(3J) = log (68)

P(l = C|(D,,N,d,N,D
P(l = N|(D,,N,d,N,D,.

 

)
)
(a3(D?+DE))—(Bc)= = log ; (6.9)

The values D1, d and D, are continuous, and therefore, the number of conﬁgu-
rations of type (D1, N,d, N, D.) is innumerable. For each such conﬁguration, Equa-
tions (6.8) and (6.9) give two linear equations relating the parameters and the con-
ditional probabilities. Since there are four parameters, (a2, 03, ﬁg, and 3]), we need
at least two different neighborhood conﬁgurations to estimate the parameters. By
using more than two neighborhood conﬁgurations, a least squares estimation scheme
similar to that in Section 5.1 can be used to estimate the parameters.

The parameter estimation scheme was tested using synthetic range images ob-
tained from the CAD models of 10 3D objects used in Chapter 5. The estimated
values of the parameters were found to be extremely sensitive to the choice of neigh-
borhood conﬁgurations.

This dependence of the parameters on the choice of neighborhood conﬁgurations
can be understood by studying Equation (6.9). This equation can be rewritten in the

form y 2 ma: + c as follows.

(1 = N|(D1,N,d,N,D,)
(1 = C|(D;,N,d,N,D,)

 

log i i = (-—(13)(D,2 + D?) + 50 (6.10)

Equation (6.10) is the equation of a straight line with slope = —a3 and ordinate

intercept :2 ﬂc. Figure 6.8 shows a plot of the estimated values of the left hand

log

P(I:N[(DI,N»vavD'D

 

P(lzCI(Di.N,d»N,Drll

131

 

 

 

 

 

0.0 0.2 0.4 0-5

(D,2 + D?)

Figure 6.8. Plot of Equation (6.10).

132

side (LHS) of Equation (6.10) against the corresponding values of (D,2 + D3). It is
clear from the ﬁgure that the estimated values of the LHS do not follow the expected
straight line relationship. Because of the large deviation of the plot from a linear
form, the estimated parameters also exhibit extreme sensitivity to the choice of the
neighborhood conﬁgurations.

It is not clear why the estimated values of the LHS of Equation (6.10) deviate so
much from the expected linear relationship. One probable reason is the inadequate
accuracy of the CAD model for representing surfaces. The CAD models used represent
curved surfaces using planar approximations, and therefore, the synthesized range
image may not be accurate enough to estimate the second-order differences. A study
of performance of the estimation scheme using a more accurate representation of
surfaces is necessary to get an understanding of the reason for the failure of this

parameter estimation method.

6.4 Experimental Results

The labeling scheme presented in Section 6.1 was tested on both synthetic and real
images. Synthetic data were used to study the effect of signal to noise ratio on the
detectability of crease edges, and to compare the convergence rates with and Without
fusion. The TPM algorithm was used for synthetic images, because of the ability
to specify the number of iterations for the algorithm. Synthetic data were obtained
from the CAD models of 3D objects. The real images were obtained by registering
range data from the Technical Arts Scanner with the intensity image sensed using a
separate CCD camera. The two images were registered using a camera calibration
algorithm [68]. The parameters of the coupled MRF process were speciﬁed by trial
and error method since our parameter estimation scheme did not provide reliable

estimates.

133

 

Figure 6.9. Edge labeling results using the TPM algorithm for the synthetic BLOCK4
image: (a) range image (on = 0.003 inches); (b) intensity image (a; = 10 gray levels);
(c) edge labeling after 100 iterations using only range image; ((1) edge labeling after
100 iterations using fusion; (e) edge labeling after 1000 iterations using only range
image; (f) edge labeling after 1000 iterations using fusion. Color codes for edge labels

are: red — JUMP edge, violet — GREASE edge, blue — INTENSITY edge.

134

 

.1 /

(e) (f)

4,—J—r

Figure 6.10. Edge labeling results using the TPM algorithm for the synthetic
BLOCK4 image: (a) range image (OR = 0.03 inches); (b) intensity image (01 = 10
gray levels); (c) edge labeling after 100 iterations using only range image; (d) edge
labeling after 100 iterations using fusion; (e) edge labeling after 1000 iterations using
only range image; (f) edge labeling after 1000 iterations using fusion. Color codes for

edge labels are: red — JUMP edge, violet — GREASE edge, blue — INTENSITY edge.

135

Figures 6.9(a) and (b) show a 128 x 128 synthetic image pair for the BLOCK4
object. The intensity image was synthesized such that there is a square shaped albedo
edge (INTENSITY) on one of the surfaces. The range and intensity images are
corrupted by independent additive Gaussian noise of standard deviations, on = 0.003
inches and a; = 10 gray levels. We compared the output of the TPM algorithm
with and without fusion, after 11 = 100 and n = 1000 iterations. The values of
the parameters used are: h = 0.007, )«J = 0.03, A0 = 1.3, A] = 4.0, kg = 1.3,
101 = 2.2. Figure 6.9(c) shows the output of the TPM algorithm after 100 iterations,
using only the range image (no fusion). Figure 6.9(d) shows the corresponding output
with fusion. It is seen that using both range and intensity images, not only does the
quality of crease edge detection improve, but also the square shaped mark in the
intensity image is detected as well. Figures 6.9(e) and (f) show the improvement
in edge labeling results after 1000 iterations. A comparison of these two edge maps
shows that when only the range image is used, the algorithm fails to converge to the
correct solution even after 1000 iterations. When the range and intensity images are
fused, the TPM algorithm results in a near perfect labeling after 1000 iterations.

The above experiment was repeated by increasing the level of noise in the range
image to OR = 0.03 inches. This experiment compares the sensitivity to noise, with
and without fusion. The values of the parameters used for this experiment are:
h = 0.07, A1 = 0.2, A0 = 1.3, A, = 4.0, kg = 1.2, 10.; = 2.2. The edge labeling results
shown in Figures 6.10(c) and (6) indicate that the TPM algorithm without fusion
fails to detect all the crease edges even after 1000 iterations. With fusion, however,
the algorithm detects all the jump, crease, and intensity edges accurately after just
100 iterations (Figure 6.10(d)).

The run time of TPM algorithm is 811.2 seconds for 100 iterations, and 8033.4
seconds for 1000 iterations, for a 128 x 128 image. This computational speed is too

slow for most applications. Therefore, for the rest of the experiments reported in

136

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.11. Edge labeling results for the BLOCKS image: (a) range image; (b)
intensity image; (c) detected jump edges; ((1) detected crease edges; (e) detected
intensity edges; (f) edges detected using the Canny edge detector (with hysteresis
threshold values, 0.9 and 0.5 and width of the Gaussian a' = 1 pixel).

137

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.12. Edge labeling results for the JIG image: (a) range image; (b) intensity
image; (c) detected jump edges; (d) detected crease edges; (e) detected intensity
edges; (f) edges detected using the Canny edge detector (with hysteresis threshold
values, 0.9 and 0.5 and width of the Gaussian a' = 1 pixel).

138

 

 

 

re

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.13. Edge labeling results for the FUNNEL image: (a) range image; (b)
intensity image; (c) detected jump edges; ((1) detected crease edges; (e) detected
intensity edges; (f) edges detected using the Canny edge detector (with hysteresis
threshold values, 0.9 and 0.5 and width of the Gaussian a = 1 pixel).

139

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 6.14. Edge labeling results for the BLOCK4/COLUMN1 image: (a) range
image; (b) intensity image; (c) detected jump edges; (d) detected crease edges; (e)
detected intensity edges; (f) edges detected using the Canny edge detector (with
hysteresis threshold values, 0.9 and 0.5 and width of the Gaussian 0' = 1 pixel).

140

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(c) (d)
..., 0
(e) (0

Figure 6.15. Edge labeling results for the AGPART image: (a) range image; (b)
intensity image; (c) detected jump edges; (d) detected crease edges; (e) detected
intensity edges; (f) edges detected using the Canny edge detector (with hysteresis
threshold values, 0.9 and 0.5 and width of the Gaussian a = 1 pixel).

141

this chapter, the HCF algorithm was used because of its computational efficiency.
Figure 6.11 shows the results of the labeling algorithm on a real 256 X 256 image
pair. The scene contains a rectangular wooden block with two albedo edges, rest-
ing on another wooden block with a triangular extension. Figure 6.11(a) shows the
range image of the scene and Figure 6.11(b) shows the corresponding intensity image.
The output of our edge labeling algorithm was subjected to a postprocessing stage
to remove intensity edges in the vicinity of jump and crease edges; these spurious
intensity edges are detected as an artifact of the registration error between the range
and intensity images. The output of the labeling algorithm after postprocessing is
shown in Figures 6.11(c)-(e), where each ﬁgure individually shows JUMP, CREASE
and INTENSITY edges, respectively. The values of the MRF model parameters se-
lected empirically are as follows: h = 0.10, A.) = 0.2, kg = 1.0, A] = 3.0, kg = 1.1,
IQ = 1.5. The noise model parameters were estimated from a homogeneous area in
the images. The estimated values are OR = 0.03 and 01 = 3.0. It is seen that all the
edges in the image have been detected and labeled quite accurately. In Figure 6.11(f)
we show the edges detected by the Canny edge detector [16] on the range and inten-
sity images. First, Canny edge detector was applied separately to range and intensity
images to detect step edges in the two images. Then the two edge maps were overlaid
on each other. Although, the Canny edge detector parameters were not optimized to
obtain the best edge map, it shows that a simple application of an edge detector to
the two images and combining the resulting edge maps does not yield good results,
thus stressing the importance of consistently integrating the information in the range
and intensity images.

Figure 6.12 shows the results on the 128 x 128 images of a calibration jig. The
values of the parameters used for this image pair are, h = 0.10, A; = 0.2, A0 = 1.0,
A] = 1.5, kg = 1.1, 10.; = 1.5, 0;; = 0.03, 01 = 5.0 (same as for BLOCKS image except

the value of 01). The output of the labeling algorithm after postprocessing is shown

142

in Figures 6.12(c)—(e). It is seen that both jump and crease edges in the image have
been labeled quite accurately. There are a few spurious intensity edges which can be
removed by further postprocessing steps such as connected component analysis.

Figure 6.13 shows the results on an image pair of a funnel of size 240 x 240. The
values of the parameters used for this image are, h = 0.11, A1 = 0.2, A0 = 1.0,
A1 = 1.5, kg = 1.02, IQ = 1.5, on = 0.03, 01 = 5.0. The output of the labeling
algorithm after postprocessing is shown in Figures 6.13(c)-(e). Both jump and crease
edges have been labeled accurately. It is also seen from Figure 6.13(f) that simple
fusion methods fail to give sensible results.

Figure 6.14 shows the results on the images of BLOCK4/COLUMN1 objects, of
size 128 X 128. The values of the parameters used for this image pair are, h = 0.04,
A] = 0.1, A0 = 0.9, A; = 1.4, kg = 1.2, 101 = 1.5, 03 = 003,01 = 3.0. The
output of the labeling algorithm after postprocessing is shown in Figures 6.12(c)-
(e). The crease edges in the scene are only partially detected. The intensity edges
corresponding to the shadow on the COLUMNl object and the albedo edge on the
BLOCK4 object have not only been detected reasonably but have also been labeled
accurately. However, several spurious intensity edges have been detected close to
jump and crease edges due to the registration error between the intensity and range
images indicating the sensitivity of the algorithm to registration errors.

Figure 6.15 shows the results on the 100 x 180 images of AGPART object. The
values of the parameters used for this image pair are, h = 0.05, A; = 0.1, A0 = 1.1,
A, = 1.5, kg = 1.4, k; = 1.5, 03 = 0.03, a; = 3.0. While the jump edges have
been labeled accurately, the crease edges in the scene are only partially detected.
This is evidently due to the fact that if the parameters are chosen such that there is
less penalty for crease edge labels, then high gradient points near the limbs will also
become crease edge candidates.

The current implementation of the HCF algorithm runs on a Sparc-2 workstation.

143

The CPU time taken to ﬁnd edges in a 128 x 128 image is about 100 seconds. In
contrast, the Canny edge detector takes only about 6 seconds of CPU time. The
computational speed of the HCF algorithm is slow for most applications. A run-time
analysis of the HCF algorithm shows that the algorithm makes about 20 visits to
each site on an average. This large number of visits is eXplained by the propagation
of spatial information required for accurate computation of second—order differences.
In defense of the HCF algorithm, it should be mentioned that all published energy
minimization algorithms, whether deterministic or stochastic, are computationally
expensive [10, 88]. One solution to this problem is to use massively parallel computers.
We have implemented parallel versions of TPM and HCF algorithms on a massively

parallel computer. The details of these implementations are presented in Chapter 7.

6.5 Conclusions

This chapter has presented an energy minimization approach for detection and label-
ing of edges by fusing registered intensity and range images. The chosen set of labels
for the edges (JUMP, CREASE, INTENSITY, NOEDGE) classiﬁes edges based on
the physical properties of the surfaces in the scene. Bayesian formulation is used to
combine the range and intensity values at the location of discontinuities, and the a
priori knowledge about the labels is modeled by a Markov random ﬁeld.

Results of the algorithm are presented on one synthetic and several real image
pairs. The experiments using the synthetic image demonstrated that convergence
of the energy minimization procedure is considerably faster (fewer iterations) with
fusion than without. On real images, the edges detected and labeled using the MRF
algorithm (HCF) match the physical edges in the scene. However, different model
parameter values had to be used for different images. Speciﬁcation of proper param-

eter values is a signiﬁcant drawback of the algorithm. Also, the algorithm is sensitive

144

to registration error between the range and intensity images. Another limitation of
this algorithm, which is also shared by all energy minimization algorithms, is the high
computational cost. Parallel computer implementation of this algorithm is necessary
to achieve the level of computational performance required for many real-time vision
tasks. This study does not completely determine the usefulness of MRF models of
context for the fusion problem, but it shows that when the model parameters are
chosen appropriately, MRF models result in good edge labeling results in the sense
that the detected edges and their labels match the physical discontinuities. Further
research on estimation of the MRF parameters for a coupled process is necessary to
determine whether or not MRF models are useful models of context for the fusion

problem.

CHAPTER 7

Implementation of Energy
Minimization Algorithms on the
Connection Machine

A signiﬁcant amount of computational effort in early vision applications involves
repeated local operations on individual pixels in an image. These operations can
be expressed as an energy function with interaction terms that are local in nature.
Regularization solutions to early vision problems involve the minimization of this
multivariate energy function. The number of variables over which this minimization
needs to be performed is usually of the order of the size of the image and is typ-
ically very large. For example, for a 256 x 256 image, the number of variables in
the energy function for the fusion problem in Chapter 6 is 196,608. Therefore, the
number of possible solutions is extremely large and an exhaustive search of the solu-
tion space to ﬁnd the optimal solution is impossible for all practical purposes. One
way to reduce the required computation is to use suboptimal methods to obtain an
approximate solution. Even then, the computational task is overwhelming for serial
implementation.

It is widely recognized that parallel computer implementation of the energy min-
imization algorithms is essential before these techniques can be useful in a real-time
vision system [38, 79, 35]. Fortunately, these algorithms are suitable for massively

parallel computation because they involve only local computations. Therefore, each

145

146

processing element needs to interact with only a small number of neighboring proces-
sors to carry out the computation. The computations at each processor are identical
so the algorithms are well suited for implementation on a SIMD (Single Instruction
Multiple Data) parallel computer [55].

This chapter describes a study of computational performance of two parallel al-
gorithms on the Connection Machine [48] for the minimization of the energy function
(Equation (6.7)) for the fusion problem presented in Chapter 6. The two parallel
fusion algorithms considered are: (1) parallel Thresholded Posterior Mean (TPM)
and (2) a parallel variant of the HCF algorithm. The parallel TPM is a stochastic
optimization algorithm. The parallel HCF algorithm, on the other hand, is a de-
terministic algorithm and is expected to be faster than the parallel TPM algorithm.
The computational performance of the parallel algorithms is characterized by two
values: (i) the run time on the parallel computer and (ii) the speedup over sequential

computer implementation which is given by

Run time on the sequential computer

 

Speedup = (7.1)

Run time on the parallel computer

The remainder of the chapter is organized as follows. Section 7.1 provides a brief
introduction to one of the most popular SIMD machines, the Connection Machine
parallel computer. In Section 7.2, we describe the parallel TPM algorithm. The
results of an implementation of the TPM algorithm on a 16,384 (16K) processor
Connection Machine (CM-2) are also presented. Section 7.2 describes the parallel
HCF algorithm and the results of its implementation on a CM-2. Section 7.4 presents
a comparison of labeling performance of the sequential and parallel versions of TPM

and HCF algorithms. Finally, in Section 7.4 we give some concluding remarks.

147

7 .1 The Connection Machine

The Connection Machine is a massively parallel SIMD computer with up to 65,536
(64K) l-bit processing elements [48]. The Connection Machine CM-l, which was the
ﬁrst implementation of the Connection Machine concept, had up to 512 bytes of local
memory in each processing element (PE) and had a peak performance rating of 2000
MIPS with 64K processing elements. On CM-2, the next generation machine, each
PE has up to 8K bytes of local memory. Each PE in a Connection Machine can
access the local memory of other PEs using a general interprocessor communication
network. An optional Floating Point Accelerator (FPA) enhancement can speedup
floating point operations by a factor of up to 20 [95]. A full conﬁguration (64K)
CM-2 with FPA enhancement has a peak performance rating of 2500 MIPS and 5000
single-precision MFLOPS [95]. The experimental results reported in this chapter are
for a 16K node CM-2 with FPA enhancement. For this conﬁguration, the CM-2 has
a peak performance rating of about 1250 MFLOPS [95].

The most suited programming style for the Connection Machine is the “data par-
allel” style. That is, individual data items are allocated to individual processors by
using the notion of parallel variables. A two-dimensional image which is typically
represented as 2D array on sequential machines, can be represented as a parallel vari-
able on the Connection Machine. Each pixel in the image is stored in an associated
PE; for efficient use of the Connection Machine, bulk of the processing involving this
pixel should be carried out in the PE in which it is stored. If the image size is greater
than the number of PBS available, each physical PE operates on more than one pixel
in what is known as the “virtual processing” mode [95]. The Connection Machine
supports modiﬁed versions of common programming languages, such as C*, *LISP,
CM FORTRAN and C / PARIS. Our implementation uses C*— a parallel version of the

C language. This language was selected because of the ease of programming rather

148

than computational efficiency considerations. The C* programming language does
not provide much low level control over the capabilities of the CM-2. Therefore, even
though development of a program on a CM-2 is fairly straightforward, optimizing
the code for computational performance is extremely tedious. Also, software mainte-
nance of an optimized program is difficult because optimized programs are typically

extremely unreadable.

7 .2 Parallel TPM Algorithm

The TPM algorithm for estimation of image labels given in Section 6.1 is inherently
parallel since the computations performed are essentially local in nature. Speciﬁcally,
the algorithm exhibits data parallelism [49] and is suitable for implementation on the
Connection Machine.

A fundamental requirement for an algorithm to be called a data parallel algorithm
is that, computations on different data elements should be identical. Consider the
computations at each pixel in the sequential TPM algorithm for fusion presented
in Section 6.2, which include (i) fetching values of the same number of neighboring
variables (pixels, edge sites), (ii) constructing the local distribution of pixel labels
based on these variables, and (iii) sampling a value from this distribution. These
computations are identical at the machine instruction level and, therefore, qualify as
data parallel.

At a cursory level, it appears that we can perform the computations at each pixel
simultaneously. However, as pointed out by Murray et al. [79], this is incorrect since
this approach does not take into account the local dependency of the pixel labels on
their neighbors (see Figure 6.2). Note that the computations in an SIMD machine
are synchronous, and therefore, new pixel values are computed based on old values of
neighbors. This means that the new pixel values may not be consistent with current

values of their neighbors (in a probabilistic sense). Murray et 01. give the following

149

example to illustrate the point.

“Consider pixels arranged alternately white and black in a chessboard
pattern. Under synchronous updating, the MRF potentials will force every
black to white, and vice versa, and similarly at the next cycle, causing the

pixel pattern to oscillate.”

The local dependency involved can also be understood by viewing the computa-
tions as an evolution of a Markov chain with number of states equal to the number of
possible labelings of the image. A necessary condition for the convergence of a Markov
chain to its equilibrium state is that the Markov chain should be reversible [74]. This
implies that if two pixels depend on each other, then they should not be updated
simultaneously; otherwise, “the reversibility of the resulting chain will be destroyed,
so it will no longer be possible to guarantee the convergence” [74].

From the foregoing discussion it is clear that for an MIMD parallel computer,
where the individual processing elements operate asynchronously, one could still up-
date all pixels in parallel. However, for synchronous SIMD machines, an alternative
scheme needs to be used. A widely used approach is to divide the set of variables into
subsets such that variables within a subset can be updated simultaneously [79, 72].
Such a scheme is called a coding scheme and was originally used by Besag [5] for
parameter estimation in Markov random ﬁelds.

The coding scheme we have used for the fusion problem is given in Figure 7.1.
We have chosen to assign one each of vertical and horizontal line variables along with
one pixel variable to each processing element. This arrangement of data elements
is necessary to force the data elements to be homogeneous in all the processing ele-
ments. Notice that this arrangement does not upset the data parallel nature of the
algorithm since the computations at all elements belonging to a coding set would

still be identical. The speedup achievable over a single processing element, using this

150

 

 

O.
.00
.0
O..-
O.
O
9'
.0.

06
9
Q.
0
9

.0
O

.0

0:0.
~?
33' .
o. O

éﬂ?

[o

_O

 

 

(Ii-3025 32v,"

 

 

 

 

 

 

    

   

 
 

  

 

o‘b’é’o‘é’o‘.‘

         
 
 

 
 

o

O o - 0 O O 4

o" '0 ﬁg. '0

0.0 . o .0
o o O

 
 
     

O
6..

 
  

 

 

 

o
099.
fofofofo’

159.41; :14: .

 

 

 

 

 

 

 

 

 

- Q 0'. O 0'. O
0.9.. O 9

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 7.1. A coding scheme for parallel updating of variables. PEs belonging to
different coding sets are shown in different shades.

151

 

 

 

 

 

Processor of
interest

 

 

Figure 7.2. Combined neighborhood dependency of pixel and line variables.

152

scheme is N/IC, where K is the Chromatic Number of the neighborhood connectivity
graph associated with the energy function, and N is the number of processing ele-
ments. It is easy to see that for the neighborhood dependency given in Figure 7.2,
the Chromatic Number is 4. The parallel TPM algorithm for the fusion problem is
stated in Figure 7.3.

The value of n and k are chosen empirically. We found that k = 100 and n = 200
gave reasonable results. We present below, the results obtained from an implemen-
tation of the parallel TPM algorithm on a 16K CM-2. Apart from an obvious interest
in determining the actual run times and the speedup achievable for varying image

sizes, the experiments were performed to test the following hypothesis.

0 Straightforward vs. Fine tuned coding: The individual processing elements in

 

a CM-2 are single-bit processors. That means, a 32-bit addition for example,
needs to be performed by 32, l-bit additions. Similarly a 32-bit multiplication
takes 32 x 32 number of steps on the Connection machine. More important, the
time taken for an operation depends on the number of bits in the data. This is
in sharp contrast to the traditional CPUs, where for example, a 32-bit addition
takes as much time as a 16 bit addition because the addition is carried out
in parallel. Operations that usually take O(1) time on a sequential long word
machine takes 0(k) or 0(k2) number of steps on the Connection Machine,
where k is the number of bits in the data. Therefore, it is not surprising that a
straight-forward conversion from sequential code to parallel code often performs
poorly on the Connection Machine. In other words, the order of performance
gain achieved by optimizing the implementation on a Connection Machine is
comparable to the order of performance gain achieved by changes at algorithmic

level in a sequential implementation.

Table 7.1 shows the execution times for images of three different sizes, for both se-

153

 

 

Parallel TPM Algorithm for Edge Labeling with Fusion

1. Load images YR and Y] on to the PEs.

2. Initialize in parallel for all PEs

Cp 2 Coding set (p)

:i: : yf

if) = y;

f2,“ = randomly from (JUMP, CREASE,INTENSITY, NOEDGE}
’ = randomly from {JUMP, CREASE,INTENSITY, NOEDGE}
where the subscript refers to the variables assigned to the processor p and

the superscripts V and H refer to vertical and horizontal line variables
at that processor, respectively. The symbols 5: and I refer to estimated

values.
3. For k = 1,4
Do in parallel
if( Cp 2 k)
Update if,:i:£,l:,l;l.

4. Repeat (3) n times, saving realizations (x, l)("+1) through (22,1)("1

5. Do in parallel
Choose labels 9’, gR,gV and 9” such that

(a) g] is the mean of a}; and gR is the mean of if.
(b) 9" maximizes Pp(l;/|y).

(c) g” maximizes Pp(l£’|y).

 

Figure 7.3. Parallel TPM algorithm for edge labeling with fusion.

 

 

154

quential and parallel versions of the TPM algorithm. The corresponding speedup over
the sequential implementation is shown in Figure 7.4. The execution times reported
in Table 7.1 do not include the time for image loading. The times reported are for
200 iterations of updating. The sequential version was timed on a Sun Sparcstation
2. There are three parallel versions for which the results are given in Table 7.1. The
version labeled naive is the straightforward implementation of the parallel program
where scalar variables in the sequential program were replaced by parallel variables.
The version labeled N 0 arrays replaces each element of a parallel array by the individ-
ual variables. This optimization was performed because, the Connection Machine PE
does not have the capability for indirect addressing, and therefore parallel array refer-
ences are very inefﬁcient. The second optimization was motivated by the observation
that the Connection Machine library subroutine to generate random numbers in each
PE, prand, is relatively slow (2 ms/call). The method of generating a standard nor-
mal variable used in the previous two versions requires 12 uniform random variables
(12 calls to prand). In order to minimize the number of calls to prand, an alternative
method of generating standard normal variables, which requires, on an average, only
one uniform random variable, is used (the additional computation required for this
method is usually more in a sequential machine, but is small relative to the time for
prand in the Connection Machine). This version is labeled, for lack of a better label,
as Newnorm in Table 7.1 and Figure 7.4. This version also includes other optimiza-
tions such as using the smallest sized storage space for all the variables. As can be
seen from the table, both optimizations result in substantial reduction in computa-
tion time. Figure 7.4 indicates that the speedup achieved increases with increasing
image sizes. This is because when there are more pixels than the number of physical
PBS, the ﬂoating point accelerator unit which operates in a pipelined mode is always
full thus increasing its throughput. The value of maximum speedup achieved is 60.

This speedup is reasonable because, the parallel TPM algorithm updates variables

155

Table 7.1. Execution time of the parallel TPM algorithm in seconds.

 

 

Program Image Size
Version 128 x128]256 x 256 1 512 x 512

 

 

 

Sequential 1613.1 6371.8 25568.1
Naive 231.5 644.4 1823.7
No arrays 87.4 233.5 642.4
Newnorm 55.7 155.3 424.4

 

 

 

 

 

 

belonging to different coding sets sequentially.

sectionParallel IICF Algorithm We see that the parallel TPM algorithm takes
approximately a minute of computing time on a CM-2 Connection Machine for a 128 X
128 image. This computational performance is quite slow for most vision applications.
For this reason, we consider another parallel algorithm for optimizing the energy
function (Equation (6.7)). This algorithm, called the parallel HCF algorithm, is a
parallel variant of the sequential HCF algorithm presented in Section 6.1.

The parallel HCF algorithm differs from the sequential HCF algorithm in the
order in which sites are visited. In the sequential HCF algorithm, only the site with
the minimum stability value can be visited next. In the parallel HCF algorithm,
all sites that have minimum stability compared to their neighbors can be updated
simultaneously. Swain et al. [90] have proved the convergence of the parallel HCF
algorithm. This algorithm is suitable for data parallel implementation since many
sites can be visited simultaneously. In order to force the computations at each data
element to be identical, we group a pixel site with its right vertical and bottom
horizontal line sites. The parallel HCF algorithm is stated in Figure 7.5. The coding
scheme given in Figure 7.1 is also used for this algorithm to prevent variables which
are directly dependent on each other from being updated simultaneously.

The purpose of utilizing the rank value of a site is to arbitrate when there are

156

 

 

A SPEEDUP

60

50

40

30

2O

10 /a
1 V V >
128X128 256x256 512X512 IMAGE SIZE

A : Fully optimized version
0 : Arrays replaced by individual variables
'3 : Naive implementation

Figure 7.4. Speedup of parallel TPM algorithm over sequential algorithm (Sun Sparc-
station 2) for different image sizes.

 

157

 

 

Parallel HCF Algorithm for Edge Labeling with Fusion

1. Load images YR and Y1 on to the PEs.

2. Initialize in parallel for all PEs

if? = 215

ii = 31.5

1" = NULL

if = NULL

Cp : Coding set(p)
Hp = Cp*3

12],] 2: Cp*3+1
12;, = Cp*3+2

where the subscript refers to the variables assigned to the processor p and
the superscripts V and H refer to vertical and horizontal line variables
at that processor, respectively. RWRX and RSI are the rank values of
the pixel site, vertical edge site and horizontal edge site at processor p,
respectively.

3. For i = 1,m’ter
Do in parallel

Update stability values GP,G;,/,Gf.

For site .9 6 {pixel at p, horizontal edge site at p, vertical

edge site at p} do
Make local copies of neighbors of 3.
If (s has the minimum stability among its neighbors) OR
(5 has a minimal stability and minimum rank
among its neighbors) then do

Update site label at s.

 

Figure 7.5. HCF algorithm for edge labeling with fusion.

 

158

ties in the stability values of the neighbors. The updating step in the parallel HCF
algorithm is deterministic unlike in the parallel TPM algorithm. The parallel HCF
algorithm converges in about 50 iterations (niter = 50). This is a signiﬁcant saving
in computation time compared to the number of iterations required for the TPM
algorithm. For a 128 x 128 image, the parallel HCF algorithm requires 3.05 seconds for
50 iterations on a 16K CM-2. This computational speed, while not sufficient for video
rate processing, is quite fast compared to sequential implementations. Figure 7.6
shows the speedup achieved using the CM—2 implementation over a SUN Sparcstation-
2 implementation of the parallel HCF algorithm. Notice that speedups for the parallel
HCF algorithm are more than that achieved for the parallel TPM algorithm. This
is probably because, the parallel HCF algorithm does not use the expensive random

number generator calls that are required for the parallel TPM algorithm.

7 .3 Edge Labeling Performance of Parallel TPM
and Parallel HCF Algorithms

The correctness of implementation of the parallel algorithms presented in this chap-
ter was veriﬁed by running these algorithms on the JIG image pair. It should be
emphasized here that the two parallel algorithms are not identical to their sequential
counterparts because of the change in the order of updating. Figure 7.7 shows the
edge labeling results on this image for (i) sequential TPM, (ii) sequential HCF, (iii)
parallel TPM, and (iv) parallel HCF algorithms. The sequential and the parallel
TPM algorithms were run for 200 iterations each (n=200, k2100). The parallel HCF
algorithm was run for 50 iterations (niter = 50). It is seen from the ﬁgure that the
edge labeling results of all the four algorithms are comparable. The computation
times using different algorithms to process this image are 1613 seconds, 100 seconds,
55.7 seconds, and 3.05 seconds, respectively, for sequential TPM, sequential HCF,
parallel TPM (on CM-2), and parallel HCF (on CM-2) algorithms. Notice that the

159

 

 

 

SPEEDUP
A
100
80
60
40
20
‘! ¥ +
128X128 256x256 512x512
IMAGE SIZE

Figure 7.6. Speedup of the parallel HCF implementation on CM-2 over Sparcstation
2 implementation for different image sizes.

160

 

 

 

 

 

 

Figure 7.7. Edge labeling results on the JIG image with: (a) sequential TPM algo-
rithm; (b) sequential HCF algorithm; (c) parallel TPM algorithm; ((1) parallel HCF
algorithm. Color codes for edge labels are: red - JUMP edge, violet — CREASE edge,
blue — INTENSITY edge.

parallel HCF algorithm consumes the least amount of computation time (on a parallel

computer) among the four algorithms with comparable edge labeling performance.

7 .4 Conclusions

This chapter studies the computational performance of energy function minimization
algorithms for the fusion problem on CM-2. It was found that these computations can
be carried out signiﬁcantly faster on the CM-2 than on the Sun Sparcstation-2, a state-

of—the-art sequential workstation. It was also demonstrated that a straightforward

 

161

conversion of sequential code to parallel code does not give a good speedup. Testing
the implementation of a parallel version of the HCF algorithm showed that energy
minimization for the sensor fusion problem can be computed in about 3 seconds on a
CM-2, for a pair of 128 x 128 images. Although this processing time is significantly
higher than the video rate (33 milliseconds), it is a signiﬁcant step towards achieving

that.

 

CHAPTER 8

Conclusions and Directions for
Future Research

This chapter summarizes the results of the research reported in this thesis. Several
directions for future research in the area of Markov random ﬁeld contextual models

are also outlined.

8. 1 Conclusions

This thesis has demonstrated Markov random ﬁelds as effective models of contextual
information for edge detection in range images and for fusion of range and intensity
images. Chapter 4 presented a new MRF model-based algorithm for detection of
jump and crease edges in range images. Experiments using several real range images
indicate that the edges detected by this edge detection algorithm indeed correspond
to the actual jump and crease edges in the input range images. However, like all other
edge detection algorithms, the edges detected by this algorithm do not always form
closed contours. To avoid this problem, a hybrid algorithm which combines the out-
put of our range image edge detector with the output of a region-based segmentation
scheme has been developed. This hybrid algorithm has been compared with a well-
known range image segmentation scheme developed by Hoffman and Jain [50]. The
results of the comparison indicate that the hybrid algorithm results in better surface

segmentations (surface segmentations that have fewer undersegmentation or overseg-

162

 

163

mentation errors) than the Hoffman-Jain algorithm. Since the location of edges in the
hybrid segmentation is determined by the MRF model-based edge detector, we inter-
pret the superior performance of the hybrid algorithm as evidence for the usefulness
of the MRF contextual model for edge detection in range images.

A key limitation of the MRF models has been the difﬁculty in estimation of the
model parameters. Chapter 5 proposed a solution to this problem, which uses geo-
metric CAD models of representative objects expected to be in the scene to derive
ground truth information for estimation of MRF line process parameters. Comparison
of performance (of the edge detection algorithm presented in Chapter 4) using esti-
mated and empirically speciﬁed MRF parameters indicate that edge detection quality
with estimated parameters is as good as edge detection quality with empirically spec-
iﬁed MRF parameters. This is important, because it enables us to use a systematic
method for the speciﬁcation of MRF parameters instead of a trial and error method.
The experiments also indicated that it is sufficient to use only “typical” objects to
estimate the parameters of the MRF model.

Chapter 6 presented a new algorithm for fusion of range and intensity images.
Experimental results of the fusion algorithm on real fused images, with empirically
speciﬁed parameters, indicate that edges detected and labeled using this fusion al-
gorithm match the physical edges in the scene. However, different model parameter
values had to be speciﬁed for each image. This is a major shortcoming of this al-
gorithm. The algorithm was also found to be sensitive to registration errors in the
input fused image. Another limitation of this algorithm is its heavy computational
requirements. Experimental results using parallel computer implementations of the
fusion algorithm indicated that the computational time can be signiﬁcantly reduced
for this algorithm on a massively parallel computer. This study does not completely
answer the question “are MRF models useful for modeling contextual information in

the fusion problem?”. However, it shows that when the model parameters are chosen

 

164

appropriately, MRF models result in good edge labeling results (i.e., the detected
edges and their labels match the physical discontinuities). Further study in the esti-
mation of the model parameters for a coupled MRF process is necessary to determine

the utility of MRF models for fusion.

8.2 Directions for Future Research

Some future research directions that extend the work in this thesis are listed below.

1. Large neighborhood for line process. An interesting direction for research is the

use of a larger neighborhood to deﬁne the line process. A larger neighborhood

 

can capture the curvature information in edges and so it may be possible to
impose stronger smoothness constraints. Our parameter estimation scheme in
Chapter 5 extends easily for larger neighborhoods. It is also possible to reduce
the number of parameters to be estimated to a manageable extent using the

canonical representation of the MRF prior distribution.

2. Multiresolution algorithms. Multiresolution optimization techniques can both
increase the speed and improve the performance of the algorithms [94]. A
multiresolution implementation of the fusion algorithm, for example, has the

potential to overcome registration errors in the input images.

3. More accurate CAD model. The parameter estimation scheme given in Sec-
tion 6.3 was tested using CAD models where the curved surfaces were repre—
sented using planar approximations. So, an obvious direction for research is
to test the estimation scheme by using object models which represent curved

surfaces more accurately.

4. Real-time optimization. The algorithms currently available for optimization
of MRF energy functions are too slow for real-time applications. New faster

algorithms for optimization are necessary to achieve this goal.

APPENDICES

APPENDIX A

List of Parameters

The edge detection and fusion algorithms developed in this thesis require several
parameters to be speciﬁed. This appendix provides a list of the parameter values to
make the work in this thesis reproducible. Tables A.1 — A.4 enumerate the parameters

used for experiments in Chapters 4 — 7, respectively.

Table A.l. The parameters of the edge detection and hybrid segmentation algorithms
in Chapter 4.

 

 

 

 

 

 

 

 

 

 

 

 

[ Parameters of the edge detection algorithm ]
Ad hoc MRF parameters see Figure 4.4
Parameters related to edge cm 1.05
strength statistics 00 0.05

window width for
local averaging 5 pixels
Parameters of the hybrid segmentation algorithm
Threshold t ] 0.60

 

 

 

 

165

 

166

Table A2. The parameter values used for experiments in Chapter 5.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[ Parameter estimation from 3D CAD models ]
e in Equation (5.4) 1/256
Estimated MRF parameters see Figure 5.6
Size of the synthesized edge map 150 x 150

[ Parameter estimation from 2D models ]
c in Equation (5.4) 1/256
Estimated MRF parameters see Figure 5.21
Size of the synthesized edge map 150 x 150

a 1 pixel

Canny edge detector parameters mask size 9 pixels
low threshold 0.5
high threshold 0.9

 

 

 

 

 

 

Table A.3. The parameters of the fusion algorithm in Chapter 6.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

r Edge clique parameters ] see Figure 6.4 l
[ Other parameters ]
[Image [ UR [0’1] h [ A] [AC ] A1] [CC [’81]
Synthetic BLOCK4
(Figure 6.9) 0.003 10 0.007 0.03 1.3 4.0 1.3 2.2
Synthetic BLOCK4
(Figure 6.10) 0.03 10 0.07 0.2 1.3 4.0 1.2 2.2
BLOCKS image 0.03 3 0.1 0.2 1.0 3.0 1.1 1.5
JIG image 0.03 5 0.1 0.2 1.0 1.5 1.1 1.5
FUNNEL image 0.03 5 0.11 0.2 1.0 1.5 1.02 1.5
BLOCK4/COLUMN1 image 0.03 3 0.04 0.1 0.9 1.4 1.2 1.5
AGPART image 0.03 3 0.05 0.1 1.1 1.5 1.4 1.5

 

 

 

 

 

 

 

 

 

 

 

 

Table A.4. The parameters of parallel TPM and HCF algorithms used in Chapter 7.

 

 

[ Parameters of parallel TPM algorithm]

 

 

 

Total number of iterations, n 200
Number of iterations to
reach equilibrium, k 100

 

 

 

 

 

[ Parameters of parallel HCF algorithm]

 

 

[ Number of iterations, niter ] 50 ]

 

 

BIBLIOGRAPHY

BIBLIOGRAPHY

[1] K. Andress and A. Kak, “Evidence Accumulation and Flow of Control in a

Hierarchical Spatial Reasoning System,” AI Magazine, vol. 9, pp. 75-94, 1988. E]

[2] N. Ayache and O. Faugeras, “HYPER: A New Approach for the Recognition
and Positioning of Two-Dimensional Objects,” IEEE Transactions on Pattern ;
Analysis and Machine Intelligence, vol. 8, pp. 44—54, 1986. [

 

[3] D. Ballard and C. Brown, Computer Vision. Englewood Cliffs, NJ: Prentice-Hall,
1982.

[4] J. Barnett, “Computational Methods for a Mathematical Theory of Evidence,”
Proc. Seventh International Joint Conference on Artiﬁcial Intelligence, pp. 868-
875, 1981.

[5] J. Besag, “Spatial Interaction and the Statistical Analysis of Lattice Systems,”
Journal of the Royal Statistical Society, Ser. B, vol. 36, pp. 192—236, 1974.

[6] J. Besag, “On the Statistical Analysis of Dirty Pictures,” Journal of the Royal
Statistical Society, Ser. B, vol. 48, pp. 259—302, 1986.

[7] P. Besl and R. Jain, “Three—Dimensional Object Recognition,” Computing Sur-
veys, vol. 17, pp. 75—145, 1985.

[8] P. Besl and R. Jain, “Segmentation Through Variable-Order Surface Fitting,”
IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 10, pp. 167-192,
1988.

[9] G. Bilbro and W. Snyder, “Range Image Restoration Using Mean Field Anneal-
ing,” Tech. rep. NETR 8819, Center for Communications and Signal Processing,
North Carolina State University, 1988.

[10] A. Blake and A. Zisserman, Visual Reconstruction. Cambridge, MA: MIT Press,
1987.

167

[11]

[12]

[13]

[141

[16]

[171

[181

[19]

[20]

[21]

[22]

168

D. Blostein and N. Ahuja, “Shape from Texture: Integrating Texture-Element
Extraction and Surface Estimation,” IEEE Transactions on Pattern Analysis

and Machine Intelligence, vol. 11, pp. 1233-1251, 1989.

R. C. Bolles and P. Horaud, “3DPO: A Three-Dimensional Part Orientation
System,” International Journal of Robotics Research, vol. 5, pp. 3—26,.1986.

C. Bouman and B. Liu, “Multiple Resolution Segmentation of Textured Images,”
IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 13, pp. 99—113,
1991.

P. Brodatz, Textures: A Photographic Album for Artists and Designers. New
York: Dover, 1966.

D. Buede, “Shafer-Dempster and Bayesian Reasoning: A Response to “Shafer-
Dempster Reasoning with Applications to Multisensor Target Identiﬁcation Sys-
tems”,” IEEE Trans. on Systems, Alan and Cybernetics, vol. 18, pp. 1009—1011,
1989.

J. Canny, “A Computational Approach to Edge Detection,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 8, pp. 679—698, 1986.

C. Chen and R. Dubes, “Experiments in Fitting Discrete Markov Random Fields
to Textures,” in Proc. IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, San Diego, pp. 298—303, 1989.

S. W. Chen and A. K. Jain, “ Object Extraction from Laser Radar Imagery,”
Pattern Recognition, vol. 24, pp. 587—600, 1991.

P. Chou and C. Brown, “The Theory and Practice of Bayesian Image Labeling,”
International Journal of Computer Vision, vol. 4, pp. 185-210, 1990.

P. Chou, C. Brown, and R. Raman, “A Conﬁdence-Based Approach to the Label-
ing Problem,” in Proc. of IEEE Workshop on Computer Vision, Miami Beach,
FL, pp. 51—56, 1987.

J. Clark and A. Yuille, Data Fusion for Sensory Information Processing Systems.
Boston, MA: Kluwer, 1990.

F. Cohen and D. Cooper, “Simple Parallel Hierarchical and Relaxation Algo-
rithms for Segmenting Non-Causal Markovian Random Fields,” IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, vol. 9, pp. 195-219, 1987.

 

169

[23] F. Cohen, Z. Fan and M. Patel, “Classiﬁcation of Rotated and Scaled Textured

[27]

[281

[29]

[30]

[31]

[32]

[33]

[34]

Images Using Gaussian Markov Random Field Models,” IEEE Transactions on
Pattern Analysis and IlIachine Intelligence, vol. 13, pp. 192—202, 1991.

P. Cooper, “Parallel Object Recognition from Structure,” Tech. rep. 301, Uni-
versity of Rochester, 1989.

G. Cross and A. Jain, “Markov Random Field Texture Models,” IEEE Transac-
tions on Pattern Analysis and IWachine Intelligence, vol. 5, pp. 25—39, 1983.

” in Proc.

M. Daily, “Color Image Segmentation Using Markov Random Fields,
IEEE Computer Society Conference on Computer Vision and Pattern Recogni-

tion, San Diego, pp. 304—312, 1989.

H. Derin and H. Elliott, “Modeling and Segmentation of Noisy and Textured
Images Using Gibbs Random Fields,” IEEE Transactions on Pattern Analysis
and IlIachine Intelligence, vol. 9, pp. 39-55, 1987.

H. Derin, P. Kelly, G. Wezina, and S. Labitt, “Modeling and Segmentation of
Speckled Images Using Complex Data,” IEEE Transactions on Information The-
ory, vol. 36, pp. 76—87, 1990.

R. C. Dubes and A. K. Jain, “Random Field Models in Image Analysis,” Journal
of Applied Statistics, vol. 16, pp. 131—164, 1989.

R. C. Dubes, A. K. Jain, S. G. Nadabar, and C. C. Chen, “MRF Model-Based

Algorithms for Image Segmentation,” in Proc. Tenth International Conference
on Pattern Recognition, Atlantic City, pp. 808—814, 1990.

T. Fan, G. Medioni and R. Nevatia, “Segmented Descriptions of 3—D Surfaces,”
IEEE Journal of Robotics and Automation, vol. 3, pp. 527-538, 1987.

P. Flynn and A. Jain, “On Reliable Curvature Estimation,” Proc. IEEE Com-
puter Society Conference on Computer Vision and Pattern Recognition, San
Diego, pp. 110—116, June 1989.

P. Flynn and A. Jain, “BONSAI: 3D Object Recognition Using Constrained
Search,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 13, pp. 1066-1075, 1991.

P. J. Flynn, CAD-Based Computer Vision: Modeling and Recognition Strategies.
PhD thesis, Department of Computer Science, Michigan State University, 1990.

7

'I-b“
34-..-5

 

170

[35] E. B. Gamble, D. Geiger, T. Poggio, and D. Weinshall, “Integration of Vision
Modules and Labeling of Surface Discontinuities,” IEEE Trans. on Systems, Man
and Cybernetics, vol. 19, pp. 1576—1581, 1989.

[36] D. Geiger and A. Girosi, “A Common Framework for Image Segmentation,” Proc.
Tenth International Conference on Pattern Recognition, Atlantic City, pp. 502—
507, 1990.

[37] D. Geman, S. Geman, C. Grafﬁgne, and P. Dong, “Boundary Detection by Con-
strained Optimization,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 12, pp. 609—628, 1990.

[38] S. Geman and D. Geman, “Stochastic Relaxation, Gibbs Distributions, and the
Bayesian Restoration of Images,” IEEE Transactions on Pattern Analysis and
IVIachine Intelligence, vol. 6, pp. 721—741, 1984.

[39] B. Gil, A. Mitiche and J. K. Aggarwal, “Experiments in Combining Intensity and
Range Edge Maps,” Computer Graphics and Image Processing, vol. 21, pp. 395—
411,1983.

[40] R. Gonzalez and P. Wintz, Digital Image Processing, Reading, MA: Addison-
Wesley, 1987.

[41] D. Griffeath, “Introduction to Random Fields,” in Denumerable Markov Chains,
Kemeny, Knapp and Snell (eds.), New York: Springer-Verlag, 1976.

[42] B. Horn and M. Brooks (eds.), Shape from Shading. Cambridge, MA: MIT Press,
1989.

[43] J. Hackett and M. Shah, “Fusion of Intensity and Range for Segmentation,”
Tech. rep. 14, University of Central Florida, 1988.

[44] J. Hackett and M. Shah, “Multi-Sensor Fusion,” Tech. rep. 16, University of
Central Florida, 1989.

[45] J. Hadamard, Lectures on Cauchy’s Problem in Linear Partial Differential Equa-
tions. New Haven, CT: Yale University Press, 1923.

[46] R. Haralick, K. Shanmugam and l. Dinstein, “Textural features for image classi-
ﬁcation,” IEEE Transactions on Systems, Man and Cybernetics, vol. 3, pp. 610-
621,1973.

171

[47] M. Hassner and J. Sklansky, “The Use of Markov Random Fields as Models of

Texture,” Computer Graphics and Image Processing, vol. 12, pp. 357—370, 1980.

[48] D. Ilillis, The Connection Illachine. Cambridge, MA: MIT Press, 1985.

[49] W. Hillis and G. Steele, “Data Parallel Algorithms,” Communications of the

ACM, vol. 29, pp. 1170—1183, 1986.

R. Hoffman and A. Jain, “Segmentation and Classiﬁcation of Range Images,”

IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 9, pp. 608—620,
1987.

G. 1111 and G. Stockman, “3—D Scene Analysis via Fusion of Light Striped Image
and Intensity Image,” In Proc. of the IEEE Workshop on Spatial Reasoning and
Multi-Sensor Fusion, St. Charles, IL, pp. 138—147, 1987.

G. Hu and G. Stockman, “3-D Surface Solution Using Structured Light and
Constraint Propagation,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 11, pp. 390—402, 1989.

T. Huntsburger and S. J ayaramamurthy, “A Framework for Multi-Sensor Fusion
in the Presence of Uncertainty,” In Proc. of the IEEE Workshop on Spatial
Reasoning and Multi-Sensor Fusion, St. Charles, IL, pp. 345-350, 1987.

R. Hummel and S. Zucker, “On the Foundations of Relaxation Labeling Pro-
cesses,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 5,

pp. 267-287, 1983.

K. Hwang and F. Briggs, Computer Architecture and Parallel Processing. New
York: McGraw—Hill, 1985.

[56] S. D. R. Corporation, I-DEAS User’s Guide. Milford, Ohio.

[57] A. K. Jain and R. Hoffman, “Evidence-Based Recognition of 3-D Objects,” IEEE

Trans. on Pattern Analysis and Machine Intelligence, vol. 10, pp. 783—802, 1988.

[58] A. K. Jain and S. G. Nadabar, “MRF Model-Based Segmentation of Range -

Images,” in Proc. Third IEEE International Conference on Computer Vision,
Osaka, pp. 667—671, 1990.

[59] R. Jain and A. Jain, Analysis and Interpretation of Range Images. New York:

Springer-Verlag, 1990.

172

[60] A. K. Jain and S. G. Nadabar, “Markov Random Field Applications in Image
Analysis,” in Data Analysis in Astronomy IV, V. Di Gesu et al. (eds.), pp. 39—50,
Plenum Press, New York, 1992.

[61] A. K. Jain and F. Farrokhnia, “Unsupervised Texture Segmentation Using Gabor
Filters,” Pattern Recognition, vol. 24, pp. 1167—1186, 1991.

[62] F. Jeng and J. Woods, “Simulated Annealing in Compound Gaussian Random
Fields,” IEEE Transactions on Information Theory, vol. 36, pp. 94—107, 1990.

[63] B. Julesz, “Textons, the Elements of Texture Perception, and Their Interac-
tions,” Nature, vol. 290, pp. 91—97, 1981.

[64] N. Karssemeijer, “A Relaxation Method for Image Segmentation Using a Spa-
tially Dependent Stochastic Model,” Pattern Recognition Letters, vol. 11, pp. 13-
23, 1990.

[65] J. Keller, S. Chen, and R. Crownover, “Texture Description and Segmentation
Through Fractal Geometry,” Computer Vision, Graphics and Image Processing,
vol. 45, pp. 150—166, 1989.

[66] R. Kindermann and J. Snell, Markov Random Fields and Their Applications.

American Mathematical Society, volume I, 1980.

[67] S. Lakshmanan and H. Derin, “Simultaneous Parameter Estimation and Segmen-
tation of Gibbs Random Fields Using Simulated Annealing,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 11, pp. 799—813, 1989.

[68] G. C. Lee and G. C. Stockman, “Obtaining Registered Range and Intensity
Images Using the Technical Arts White Scanner,” Tech. rep., PRIP Lab., De—

partment of Computer Science, Michigan State University, 1991.

[69] G. C. Lee, Reconstruction of Line Drawing Graphs Using Fused Range and In-
tensity Imagery. PhD thesis, Department of Computer Science, Michigan State
University, 1992.

[70] B. Manjunath, T.Simchony, and R. Chellappa, “Stochastic and Deterministic
Networks for Texture Segmentation,” IEEE Transactions on Acoustics, Speech,

and Signal Processing, vol. 38, pp. 1039—1049, 1990.

[71] K. Mardia, “Multi-Dimensional Multivariate Gaussian Markov Random Fields
with Application to Image Processing,” Journal of Multivariate Analysis, vol. 24,
pp. 265—284, 1988.

 

173

[72] A. Margalit, “A Parallel Algorithm to Generate a Markov Random Field Image
on a SIMD Hypercube Machine,” Pattern Recognition Letters, vol. 9, pp. 263-
278, 1989.

[73] D. Marr, Vision. San Fransisco, CA: Freeman, 1982.

[74] J. Marroquin, S. Mitter, and T. Poggio, “Probabilistic Solution of Ill-Posed Prob-
lems in Computational Vision,” Journal of the American Statistical Association,

vol. 82, pp. 76—89, 1987.

[75] A. Mitiche and J. Aggarwal, “Detection of Edges Using Range Information,”
IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 5, pp. 174-178,
1983.

[76] J. Modestino and J. Zhang, “A Markov Random Field Model-Based Approach to
Image Interpretation,” in Proc. IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, San Diego, pp. 458—465, 1989.

[77] HP. Moravec, “Sensor Fusion in Certainty Grids for Mobile Robots,” AI Maga—
zine, vol. 9, pp. 61~74, 1988.

[78] A. Mood, F. Graybill and D. Boes, Introduction to the Theory of Statistics. New
York: McGraw-Hill, 1974.

[79] D. Murray, A. Kashko, and H. Buxton, “A Parallel Approach to the Picture
Restoration Algorithm of Geman and Geman On an SIMD Machine,” Image
and Vision Computing, vol. 4, pp. 133-142, 1986.

[80] S. G. Nadabar and A. K. Jain, “Edge Detection and Labeling by Fusion of
Intensity and Range Images,” in Proc. of SPIE Conference on Applications of
AI: Machine Vision and Robotics, Orlando, vol. 1708, pp. 108-119, 1992.

[81] S. G. Nadabar and A. K. Jain, “Parameter Estimation in MRF Line Process
Models,” in Proc. IEEE Conference on Computer Vision and Pattern Recogni-
tion, Urbana, IL, pp. 528—533, June 1992.

[82] N. Nandhakumar and J. K. Aggarwal, “Multisensor Integration — Experiments
in Integrating Thermal and Visual Sensors,” in Proc. First IEEE International

Conference on Computer Vision, London, pp. 83—92, 1987.

[83] P. Perona and J. Malik, “Scale Space and Edge Detection Using Anisotropic
Diffusion,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 12, pp. 629—639, 1990.

[84]

[85]

[86]

[87]

[881

[891

[901

[91]

[92]

[93]

[94]

[95]

174

T. Poggio, V. Torre, and C. Koch, “Computational Vision and Regularization
Theory,” Nature, vol. 317, pp. 314—319, 1985.

N. S. Raja, Obtaining Generic Parts from Range Data Using a Multi- View Rep-
resentation. PhD thesis, Department of Computer Science, Michigan State Uni-
versity, 1992.

R. Sekuler and R. Blake, Perception. New York: McGraw-Hill, 1988.

B. Silverman, C. Jennison, J. Stander, and T. Brown, “The Speciﬁcation of Edge
Penalties for Regular and Irregular Pixel Images,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 12, pp. 1017—1024, 1990.

T. Simchony, R. Chellappa, and Z. Lichtenstein, “Pyramid Implementation of
Optimal-Step Conjugate-Search Algorithms for Some Low-Level Vision Prob-
lems,” IEEE Trans. Systems, Man, and Cybernetics, vol. 19, pp. 1408—278, 1989.

M. Stephens, R. Blissett, D. Charnley, E. Sparks and J. Pike, “Outdoor Vehicle
Navigation Using Passive 3D Vision,” in Proc. IEEE Conference on Computer
Vision and Pattern Recognition, San Diego, pp. 556-562, 1989.

M. Swain, L. Wixson, and P. Chou, “Efficient Parallel Estimation for
Markov Random Fields,” in Uncertainty in Artiﬁcial Intelligence, M. Henrion,
R. Shachter, L. Kanal, and J. Lemmer (eds.), vol. 5, pp. 407—419, Elsevir Pub-
lishers B.V., 1990.

P. Switzer, “Some Spatial Statistics for the Interpretation of Satellite Data,”
Bulletin of the International Statistical Institute, vol. 50, pp. 962—972, 1983.

R. Szeliski, “Regularization Uses Fractal Priors,” in Sixth National Conference
on Artiﬁcial Intelligence, Seattle, WA, pp. 749—754, 1987.

Technical Arts Corporation, 100X 3—Dimensional Scanner: User’s Manual and

Application Programming Guide. Redmond, Washington.

D. Terzopoulos, “Regularization of Inverse Visual Problems Involving Discon-

tinuities,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 8, pp. 413—424, 1986.

Thinking Machines Corporation, “The Connection Machine System Overview,

Model CM-2 Technical Summary,” Tech. rep. HA87-4, 1987.

 

175

[96] A. Treisman, “Features and Objects,” The Quarterly Journal of Experimental
Psychology, vol. 40A, pp. 201—237, 1988.

[97] D. A. Trytten, The Construction of Labeled Line Drawings from Intensity Images.
PhD thesis, Department of Computer Science, Michigan State University, 1992.

[98] R. Vaillant and O. Faugeras, “Using Occluding Contours for Recovering Shape
Properties of Objects,” in Proc. IEEE Workshop on Interpretation of 3D Scenes,
Austin, pp. 26—32, 1989.

[99] A. Witkin, “Scale-Space Filtering,” in Proc. Ninth International Joint Confer-
ence on Artiﬁcial Intelligence, Karlsruhe, Germany, pp. 1019—1022, 1983.

[100] W. Wright, “A Markov Random Field Approach to Data Fusion and Colour
Segmentation,” Image and Vision Computing, vol. 7, pp. 144—150, 1989.

[101] M. Levine, Vision in Man and llIachine. New York: McGraw—Hill, 1985.

[102] N. Yokoya and M. Levine, “Range Image Segmentation Based on Differential
Geometry: A Hybrid Approach,” IEEE Trans. on Pattern Analysis and Machine
Intelligence, vol. 11, pp. 643—649, 1989.

[103] M. Zhang, R. Haralick, and J. Campbell, “Multispectral Image Context Clas-
siﬁcation Using Stochastic Relaxation,” IEEE Trans. Systems, Man, and Cyber-
netics, vol. 20, pp. 128—140, 1990.

 

”11111111111111 8