r _: w
33331.8?
Michigan State
University

4* Ji

 

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

 

L_JL |L__l
LC]
l Ell l'
ILJSLJ
3--

ﬁ-f—T
m h

MSU Is An Affirmative Action/Equal Opportunity Institution
omens-m

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

THE CON
DRAWIN '

 

in

THE CONSTRUCTION OF LABELED LINE
DRAWINGS FROM INTENSITY IMAGES

By

Deborah Anne Trytten

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Computer Science Department

1992

THE COI‘

 

DRAWL

The Man paw
tision 1721. This }
tom intensity ima
reconstruction in a

shown to perform

that do not direc

Ctiled perceptual

paradigm to rect

6ﬁrst is the co
SeCOHd is 31) infe

The col}

ectior.

0/2,)? " (2%} 7

ABSTRACT

THE CONSTRUCTION OF LABELED LINE
DRAWINGS FROM INTENSITY IMAGES

By

Deborah Anne Trytten

The Marr paradigm for object recognition has been widely used in computational
vision [72]. This paradigm emphasizes the data driven reconstruction of 3D shape
from intensity images. There are a large number of paradigms that can perform this
reconstruction in a limited sense, but no paradigm or group of paradigms has yet been
shown to perform this reconstruction in a general setting. A shortcoming of the Marr
paradigm, as demonstrated by Lowe [67], is the failure to include grouping processes
that do not directly reconstruct shape. These grouping processes are collectively
called perceptual organization. Lowe proposed the addition of four paths to the Marr
paradigm to rectify this shortcoming. Two of Lowe’s paths are used in this work.
The ﬁrst is the collection of tokens in the image plane into perceptual groupings. The
second is 3D inference from these groupings.

The collection of tokens in the image plane into perceptual groupings is done using
an integration framework implemented as a blackboard system. Four modules were

used as knowledge sources: weak membrane edge detection, curvilinear grouping,

 

remit)“ grouping.

. I
we data is built

 

engined using a mo'
that uses ﬁgure gro:
the Mail: algorithm
Iii: the initial reprr
of heuristics hat w
are found in the re;
all) interpretation
Without explicitly re

Processes P-‘OPosed l

 

proximity grouping, and curvilinear line labeling. An initial representation of the
image data is built using the ﬁrst three knowledge sources. This representation is
analyzed using a modiﬁed curvilinear line labeling algorithm developed in this thesis
that uses ﬁgure—ground separation to constrain legal line labelings more closely than
the Malik algorithm [69]. This modified line labeling algorithm can diagnose problems
with the initial representation. Errors in the representation can be ﬁxed using a set
of heuristics that were created to repair common mistakes. If no irreparable errors
are found in the representation, then the modiﬁed line labeling algorithm produces
a 3D interpretation of the data in the input image. The 3D interpretation is created
without explicitly recovering 3D shape, and is therefore similar to the 3D inference

processes proposed by Lowe.

Copyright © by
Deborah Anne Trytten
1992

To my Parents

 

This dissertatio
adjr'erse group of I.
this work and my 1
the contributions 0‘

My ﬁrst acknov
Who has had the w

in computational vi

 

year as a Master der
point in my acaderr
and respect.

lwould also lii
My committee con
Page. Dr. Jain is tl
{PRiP} of Michiga
tothanlc Dr. lain
this thesis would l
have Created an er
E"50 made a Sthblt

shed the tongh

ACKNOWLEDGMENTS

This dissertation represents not an individual effort, but an effort on the part of
a diverse group of people. If it weren’t for the help and support of these people, both
this work and my life would have been greatly diminished. I gratefully acknowledge
the contributions of the following people.

My ﬁrst acknowledgment must go to my thesis advisor, Dr. Mihran Tuceryan,
who has had the wisdom of Soloman and the patience of Job. I became interested
in computational vision when taking a his course, CPS 822, in the winter of my ﬁrst
year as a Master degree student. The day he invited me to work for him was a turning
point in my academic career. He will always have my highest admiration, gratitude
and respect.

I would also like to acknowledge the contributions of my committee members.
My committee consisted of Dr. Anil Jain, Dr. George Stockman and Dr. Connie
Page. Dr. Jain is the director of the Pattern Recognition and Image Processing group
(PRIP) of Michigan State University that I was a member of for ﬁve years. If I were
to thank Dr. Jain for each of the beneﬁts I received as a member of the PRIP group,
this thesis would have a second volume. His hard work, dedication, and intelligence
have created an environment for research that approaches perfection. Dr. Stockman
also made a sizable contribution to the success of this endeavor. He was the man who
asked the tough questions. Although I may have been less appreciative at the time
when these questions were asked, my thesis was dramatically improved in ﬁnding the
answers and I owe him my thanks. The ﬁnal member of my committee, Dr. Connie
Page, was the ultimate proof that a woman can thrive in a scientiﬁc discipline within
the academic world.

I would also like to acknowledge the contribution of three other faculty members
Dr. Richard Dubes and Dr. John Forsythe from the Department of Computer Science
at Michigan State University, and Dr. Bernard Dubuisson, from the Department de
Genie Informatique of the Universite de Technologie de Compiegne, in Compiegne,
France. Although he was not a member of my committee, Dr. Dubes made valuable
contributions to both the content and presentation of my research. His continuous

vi

 

Q‘sestioniflg 0f ‘he
inciuded. to 100k 1
made a contrib'thi
nefrom leaViEg 5
19%. lhaye 116V6
had it not been f5
athree month sta
lzirst came to see
months in France.
treated me as if I
generosity of Dr. Q

My family. R:
Zairhary. and larr
They have picked
They were toleran
when I needed far.
this project woulc
always given me I
regret in ﬁnishin:
hold most dear. ‘
éorts to help mt
iffparation. Dur
‘0 Provide whate‘
consisted of lister
graduate student,
. ehrst three We
r. e trme‘ it Seerm

IWould also 1

questioning of the evaluation of research has forced many graduate students, myself
included, to look more critically at our work and the work of others. Dr. Forsythe
made a contribution to this work at a very early stage. He single handedly stopped
me from leaving school with only my M.S. when a job Opportunity came along in
1988. I have never regretted staying to ﬁnish my dissertation, and would not have
had it not been for his timely encouragement. I worked with Dr. Dubuisson during
a three month stay in France. This was another turning point in my career, when
I ﬁrst came to see myself as a researcher instead of as a student. During my three
months in France, he not only offered me an interesting project to work on, but also
treated me as if I were a member of his family. I have only my thanks to repay the
generosity of Dr. Dubuisson and his wonderful family.

My family, Richard Wesley Trytten, Marilyn McCall Trytten, Melissa 'I‘rytten
Zakhary, and James Zakhary, have played a large in the success of this endeavor.
They have picked up my spirits when they were falling, and celebrated my victories.
They were tolerant when work kept me from my familial responsibilities and available
when I needed family to separate myself from work. Without their love and support
this project would have been unthinkable. Although it is a cliche, my family has
always given me both roots and wings, and lots of other neat stuff too. My only
regret in ﬁnishing this thesis is that it has lead me 1200 miles from the people I
hold most dear. I would particularly like to thank my parents for their Herculean
efforts to help me maintain my composure during the ﬁnal frantic weeks of thesis
preparation. During this period they made themselves available at every juncture
to provide whatever help was needed without question or pause, whether this help
consisted of listening to yet another tearful collect phone call from an exhausted
graduate student, babysitting dogs, or driving to Lansing to do my household chores.
The ﬁrst three words written in this thesis were the dedication “To my parents”. At
the time, it seemed like a grand gesture. Now it seems insufﬁcient.

I would also like to acknowledge the contribution of my late grandparents. My
paternal grandfather, John M. Trytten, was the family academic and educator. My
long standing interest in education was sparked by his love of his career. My mater-
nal grandmother, Mary Wave McCall, had the foresight to look at a group of young
grandchildren and see that college tuition would be needed two decades away. Al-
though they did not live to see the completion of this dissertation, or even the start
of it, their support and inspiration are still appreciated; they are still missed.

My life as a graduate student was greatly enriched by the the Pattern Recognition
and Image Processing Group (PRIP). There were many PRIP researchers who came
before me who inﬂuenced my work. Chaur Chin Chen, Sei Wang Chen, Ywhyng Lee

vii

335mm Joe llilir
guidance early in r
1"; took our compr
We all started our
game week. The lé
without such WODC

lwonld also li}
have listened patift
questions or sugge
only acknowledge
time at Michigan i
nal wrath by com:
Chen. John Courtr
Huang. Tim New:
5W Walsh. Shar.

The PRIP sys.
k3owlﬁige the help
managers Who has
my stay here, I w
that Wu USCCl in i

lWOUlCl also 1'1}
Was “59d in the i
provided a Variet y
.he PRIP gr

' 0UP sn
Si MAN Sl'mmetr

nization, B;

\C Greg Lee w:

Sheleto

Hoffman, Joe Miller, Farshid Farrokhnia, and Patrick Flynn all offered me advice and
guidance early in my career. Greg Lee, Narayan S. Raja, and Satisha Nadabar and I
all took our comprehensive examinations together and thus formed the Gang of Four.
We all started our dissertation work within the same year, and ended it within the
same week. The last year of work would have been much harder and less rewarding
without such wonderful friends and colleagues.

I would also like to acknowledge the next generation of PRIP researchers. They
have listened patiently to hours of my talks and advice. They often asked interesting
questions or suggested productive avenues for further pursuit. I would like to espe-
cially acknowledge Sally Howden who has been my best friend through most of my
time at Michigan State University. Although I’m destined to earn somebody’s eter-
nal wrath by committing a sin of omission, I will fearlessly list their names: Jinlong
Chen, John Courtney, Chittra Dorai, Marie-Pierre Dubuisson, Hans Dulimarta, Qian
Huang, Tim Newman, Philippe Ohanian, Philippe Ballard, Sushil Bhattacharjee,
Steve Walsh, Sharathcha Pankanti, Jian-Chang Mao, and Yao Chen.

The PRIP system managers provided a wonderful working environment. I ac-
knowledge the help of David Marks, Frank Northrup, John Lees and the many student
managers who have been employed by the Department of Computer Science during
my stay here. I would like to thank Honda Shing for creating the typesetting style
that was used in this thesis.

I would also like to thank the following researchers who supplied software which
was used in the implementation of the ideas within this dissertation. Pat Flynn
provided a variety of X windows utilities and did much of the early work of making
the PRIP group such a nice place to do research. Ari Gross provided the code for the
SYMAN symmetry analyzer. Frederic Leymarie provided the code for the grassﬁre
skeletonization. Bilge Gunsel wrote the original weak membrane edge detector using

GNC. Greg Lee wrote the code for line labeling.

viii

 

 

LIST OF TABLI

115T OF FIGL’I

1 Introduction

1.1 Bachgrour

r o

1.3

1.1.1 TL
1.1.2 Fe
1.1.3 Li;
Problem 5

Organizat

2 Integration;

in

‘’1

Integratic
2.1.1 0
2.1.2 1'
21.3 x
2.1.4 Si
Selection
2.2.1 E
2.2.2 E
2.2.3 C
2.24 S
Summar

TABLE OF CONTENTS

LIST OF TABLES xi
LIST OF FIGURES xiii
1 Introduction 1
1.1 Background ................................ 4
1.1.1 The Marr Paradigm for Object Recognition .......... 5

1.1.2 Perceptual Organization ..................... 9

1.1.3 Line Labeling ........................... 16

1.2 Problem Statement ............................ 20
1.3 Organization of the Thesis ........................ 26

2 Integration: Methodologies and Modules 27
2.1 Integration Frameworks ......................... 30
2.1.1 Optimization ........................... 31

2.1.2 Uniform Integration Frameworks ................ 34

2.1.3 Non-Uniform Integration Strategies ............... 36

2.1.4 Selecting an Integration Framework ............... 42

2.2 Selection of Modules for Integration ................... 43
2.2.1 Evaluating Proposed Modules .................. 45

2.2.2 Edge Detection .......................... 46

2.2.3 Continuity ............................. 51

2.2.4 Symmetry ............................. 54

2.3 Summary of Choices ........................... 57

3 Obtaining an Initial Line Drawing 59
3.1 The Representation of Piecewise Continuous Objects in a Blackboard 61
3.2 The Continuity Module .......................... 64
3.3 Creating the Initial Blackboard ..................... 70
3.3.1 Finding Regions .......................... 70

3.3.2 Creating the First Figure Hypothesis .............. 75

ix

3:1 Example:

3.5 Summary

4 The Revised
4.1 Detection
4.2 Common

1.3 Summary

0‘

Updating LinI
5.1 Correctinzl
5.1.1 til
5.1.2 311'
5.1.3 A;
5.1.4 L
5.1.5 (‘0
5.1.6 E):
5-2 Other lise
5.2.1 5).:
5-2-2 She

 

3'3 computati

6 CODCIUSions a
6.1 Contribmi
62 mee \Vc

A Parameters

B TeSt BEd or I

3.4 Examples of Processing ..........................
3.5 Summary and Conclusions ........................

4 The Revised Line Labeling Module

4.1 Detection of Line Drawing Problems ..................
4.2 Common Problems with Line Drawings .................
4.3 Summary and Conclusions ........................
5 Updating Line Labeling
5.1 Correcting Errors in Line Labeling ...................
5.1.1 Mislabeled Vertices ........................
5.1.2 Missing Arcs ...........................
5.1.3 Adding Phantom Vertices ....................
5.1.4 Other Line Labeling Improvements ...............
5.1.5 Control of Strategies .......................
5.1.6 Experimental Results And Conclusions .............
5.2 Other Useful Modules ..........................
5.2.1 Symmetry .............................
5.2.2 Shape from Shading .......................
5.3 Computational Complexity, and Timings ................
6 Conclusions and ﬁxture Work
6.1 Contributions of The Thesis .......................
6.2 Future Work ................................

A Parameters
B Test Bed of Images

BIBLIOGRAPHY

84
94

97
98
108
120

125
127
128
131
135
143
145
149
152
152
154
156

159
161
162

163

166

175

 

2.1 The ABC]

 

3.1 The map;

3.2 A ramp Ct

 

3-3 Tangent _
3.4 Initial pro

4-1 The initia
«1.2 The numt
4-3 The dingy:
ll Acompar

PIOblems

5'1 The know
errors ma
5.?

The repai
5.3

A comm.
Enelabeh
5'4 The resu]
means th

hé'pOtheS
labeling ‘
that the
mains tit
that 110 c
0..)
5.6

0.7

The Com
Commit;
The run
Al The para
:‘12 T e Para

2.1

3.1
3.2
3.3
3.4

4.1
4.2
4.3
4.4

5.1

5.2
5.3

5.4

5.5
5.6
5.7

A.l
A.2

LIST OF TABLES

The ASCII version of the missing edge .................. 29
The mapping between modules and blackboard units .......... 63
A ramp edge from Figure 3.7 ...................... 73
Tangent and curvature properties of junctions ............. 82
Initial processing of test bed of images .................. 95
The initial labels for Figure 4.4 ..................... 103
The number of legal labelings for outside arcs. ............. 108
The diagnosis of problems in the test bed of images ........... 122

A comparison between the faults in the images from Table 3.4, and the
problems that were diagnosed using the ﬂoating object heuristic. . . . 123

The knowledge sources used to create the line drawing and possible
errors made ................................. 127
The repair knowledge sources used on the test bed of images ...... 150
A comparison between diagnosed problems and repaired problems in
line labelings. ............................... 150
The result of the line labeling process. A “Yes” in the intial column
means that a proper line labeling was found from the initial ﬁgure
hypothesis. A “Yes” in the ﬁnal column means that a correct line
labeling was found after correction. An “A” in the ﬁnal column means
that the line labeling was correct except for a single error. A “B”
means that a missing arc wasn’t replaced. A “No” in a column means

that no correct line labeling was found .................. 151
The computational complexity of the algorithms used in this thesis. . 156
Computational complexity of repair knowledge sources. ........ 157
The run times for the sample images from Appendix B in minutes. . . 158

The parameters for the continuity grouping algorithm of Section 3.2. 164
The parameters from Section 3.3.1 ................... 164

 

A.3 The para

Bl Characte
13.2 Characte
13.3 Charactc
Bt Charactc

A.3 The parameter values used in Section 3.3.2. .............. 165

8.1 Characterization of the images ...................... 171
B2 Characterization of the junctions. .................... 172
B.3 Characterization of the arcs ........................ 173
BA Characterization fo the surfaces ...................... 174

xii

1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8

1.9

1.10
1.11
1.12
1.13
1.14

1.15
1.16

2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11

LIST OF FIGURES

The Marr paradigm. ........................... 6
A 2.5 D sketch of a cube .......................... 7
A simple generalize cylinder ....................... 9
A two level hierarchical representation of a hand ............ 9
The Gestalt rules of organization .................... 11
The effect of background density on linear grouping ........... l2
Lowe’s additions to Marr’s object recognition scheme [67]. ...... 13
Mohan and Nevatia’s non-uniform integrated system for perceptual

organization from [75]. .......................... 14
Perceptual organization from the top down. .............. 15
A synthetic line drawing of a simple scene. ............... 16
Two legal line labelings for a cylinder. ................. 17
An accidental and non-accidental view of a cube ............ 18
An example of phantom vertices. .................... 19

An overview of the POLL system described in this thesis. Repair
knowledge sources marked with an asterisk have not been implemented. 23

Some objects that are legal in the piecewise smooth object domain. . 25
An object that isn’t piecewise smooth. ................. 25
An intensity image with an invisible edge ................ 28
Position of Table 2.1 in the image ..................... 29
The Hearsay-II blackboard system .................... 38
A tree and a telephone pole ....................... 41
Lowe’s bicycle ............................... 47
A shortcoming of Canny edge detection. ................ 48
(a) An intensity image. (b) The output of running GNC on this image. 51
Hough spaces do not match human perception ............. 53
Lowe’s criterion for grouping proximate edge pixels .......... 54
Skew symmetry is not unique without terminations ........... 55
An example of heuristic symmetry. ................... 56

xiii

 

3.12 Cooper“:

31 An 0V8”?
3.2 The algor'
3.3 A Vorono.
3.4 A synthetl
3.5 A motion
3.6 Inputs for
3.7 The intez.

shown in i
3.5 Thick ed;
3.9 A GNC St
3.10 The thins.
3.11 The count
3.12 A thinned
3.13 Initial ver
3.14 Initial V9:
3-15 An eitanrr
3.16 A image v
3.17 And imag
3-18 An object
3.19 The num‘:

 

2.12 Cooperation between SYMAN and the Grassﬁre Transform ...... 57

3.1 An overview of the initial processing. .................. 60
3.2 The algorithm for grouping based on continuity ............. 66
3.3 A Voronoi neighborhood of point P .................... 67
3.4 A synthetic continuity example. ..................... 68
3.5 A continuity example. .......................... 69
3.6 Inputs for the initial processing ...................... 71
3.7 The intensity values in the area of the image inside the square are

shown in Table 3.2 ............................ 72
3.8 Thick edges produced by GNC. ..................... 73
3.9 A GNC segmentation failure. ...................... 74
3.10 The thinned edges. ............................ 75
3.11 The connected components. ....................... 76
3.12 A thinned image with noise. ....................... 77
3.13 Initial vertex and curve units for the synthetic image ......... 79
3.14 Initial vertex and curve units for the real image ............ 80
3.15 An example of perfect processing .................... 85
3.16 A image with two incomplete edges .................... 86
3.17 And image with mislabeled junctions .................. 88
3.18 An object as seen from a non-general vieWpoint ............ 89
3.19 The numbered curves and vertices obtained from the input in F ig-

ure 3.18(a). ................................ 90
3.20 A legal line labeling for Figure 3.18 that is counterintuitive. ..... 91
3.21 A Curved object. ............................. 92
3.22 An image that was oversegmented .................... 93
4.1 An object that is not piecewise smooth ................. 99
4.2 Rotating an image changes convexity .................. 100
4.3 Four interpretations of a simple cube .................. 101
4.4 A cube with labeled junctions and curves. ............... 103
4.5 An image where the ﬂoating object heuristic fails ........... 104
4.6 Malik’s junction catalog. The “?” label indicates that this line could

have any one of the six legal line labels .................. 105
4.7 Eight legal line labelings for an outside T junction. .......... 107
4.8 The outside junction catalog. ...................... 107
4.9 The revised line labeling algorithm. ................... 109
4.10 The junctions and curves found from processing Figure 2.1 ....... 111

xiv

 

4.11 [at The 1:
4.12 (a) The 1.1

close tha‘
1.13 la) The i

grouped '.i

been groul
4.14 A L junc‘l
4.15 Paired 11..
1.16 Curvatur'
1.17 (a) The i:
«1.1-E (a) The i:

labeling 1,

 

4.19 [3.1 Show

labeling

5.1 An imag,.
'53 The line 1’
5'3 (a) The i;
5-4 Three hf}
5-5 The arc l;
"6 (a) The i:
0.1 Three hy;
The Corre
5'9 (a) The ir
5.10 An additi.
5.ll The line l
5'12 lal The in
5.13 Correct.

In;
3.14 A Skew re

5.15 Lille labe1

 

. 1 '

Bl {mm 1 t
8.2

ages 7 1
B3 ImAgeS 13
B‘l ImAgeS 19

4.11 (a) The input image.(b) The curves and vertices. ...........
4.12 (a) The input image. (b) The curve endpoints near vertex v44 are so
close that they are are grouped into a single vertex. ..........
4.13 (a) The input image. (b) Vertices v24 and v28 should have been
grouped into a single vertex. Vertices v3 and v7 also should have
been grouped. ..............................
4.14 A L junction and a curvature-L junction that are similar ........
4.15 Paired mistakes cannot be diagnosed in cylinders ...........
4.16 Curvature-L junctions and three tangent junctions are problematic . .
4.17 (a) The input image. (b) The curves and vertices ............
4.18 (a) The input image. (b) The result of processing. The modiﬁed line
labeling paradigm failed at vertex v54. .................
4.19 (a) Shows a legal labeling for a detected corner. (b) Another legal
labeling without a corner. ........................

5.1 An image with mislabeled vertex ....................
5.2 The line labels for the object in Figure 5.1 ................
5.3 (a) The input image. (b) The labeled arcs and junctions .......
5.4 Three hypotheses for the missing edge. .................
5.5 The arc labels for the triangular prism ..................
5.6 (a) The input image. (b) The initial output. ..............
5.7 Three hypotheses for the missing edge. .................
5.8 The correct line labels for Figure 5.7(c). ................
5.9 (a) The input image. (b) The lines and curves ..............
5.10 An additional phantom vertex .......................
5.11 The line labeling found from the object in Figure 5.9 ..........
5.12 (a) The input image. (b) The labeled lines and curves .........
5.13 Correcting Improperly Grouped Vertices .................
5.14 A skew rectangle .............................
5.15 Line labeling and symmetry can cooperate ................

5.16 Occlusion and symmetry ..........................

B.1 Images 1 to 6 ................................
B.2 Images 7 to 12. ..............................
B.3 Images 13 to 18 ...............................
B.4 Images 19 to 22 ...............................

XV

114

115
116
116
118
119

121

130
132
134
135
136
137
138
140
141
142
144
146
153
154
155

167
168
169
170

 

CHAPT

Introdui

Object recognitio:
computational visi
tational vision apj.
robotic tasks such
objects. which is a
has

no visual syst

Since it would not

 

can increase the u.

and robust. Flexih

rironment, for exa

Chenges in the env

Chnges in the pert

St‘sterns is a di

thou
011*: approach 1

W derived from i:

10 p61 40TH}

mm be made. If 1

will have to be DTO]

0b j ect

CHAPTER 1

Introduction

Object recognition is one of the most important and frequently researched topics in
computational vision, due to its potential for wide applicability to important compu-
tational vision applications. Applications of object recognition include a variety of
robotic tasks such as bin picking, where single objects are selected from a jumble of
objects, which is a common task for a robot to perform in manufacturing. If a robot
has no visual system, the objects that it tries to grasp must be accurately placed
since it would not be able to see and analyze deviations in position. A visual system
can increase the usefulness of a robot, but only if the visual system is both ﬂexible
and robust. Flexibility allows a robotic vision system to adapt to changes in its en-
vironment, for example when a product line changes. Robustness means that small
changes in the environment, such as spurious lighting, should make at most small
changes in the performance of the visual system. Creating ﬂexible and robust vision
systems is a difﬁcult task.

One approach to object recognition is to match a representation of an object that
was derived from image data to 3D descriptions of objects stored in a model database.
To perform object recognition in this fashion, some representation of the image data
must be made. If the representation of the image data is 2D, either the 3D models

will have to be projected to 2D for matching, or a direct 2D to 3D matching will have

 

to he done 35 in
2D image and n
the 3D informati
evident/9 and CO}
ject recognition
Wajjgm that 1
This paradigm C‘
nal sketch). 8X11
or intrinsic imagt
then matched to
In an intensi‘
Mmmmmﬂ
and relative dire
highly interrelate
dhcult. There 1
38. 49. 871, shapt
S1‘3160 114. 601. a
30m X” that car

matﬂematical rig

in a U 1
:general envir
hnao '

The difﬁculty

to be done as in [66]. Another alternative is to reconstruct 3D information from the
2D image and match this 3D information to the 3D object models. Reconstructing
the 3D information from the image data is a good choice since the image can provide
evidence and constraints for the reconstruction. A paradigm for performing 3D ob-
ject recognition by reconstructing the shape from intensity images, called the Marr
paradigm, that used the human visual system as a model, was discussed in [72, 9].
This paradigm consists of three steps: ﬁnding simple features in the image plane (pri-
mal sketch), extracting the 3D shape of visible surfaces (2.5 dimensional sketch [72]
or intrinsic image [9]), and using these surfaces to instantiate 3D primitives which are
then matched to stored object models.

In an intensity image, the light received by the camera is a function of several
factors such as the shape and reﬂective properties of the surfaces, the number, type,
and relative direction of light sources, and the viewing direction. These factors are
highly interrelated making the direct recovery of the 3D shape of the imaged objects
difﬁcult. There is a broad literature on techniques such as shape from shading [35,
38, 49, 87], shape from texture [15, 16, 22, 56], shape from motion [5], shape from
stereo [14, 60], and many other similar processes collectively referred to as “shape
from X” that can be used to reconstruct shape. While these algorithms are gaining
mathematical rigor [5], they still lack the robustness that is necessary to be useful
in a general environment. If 3D shape cannot be reliably reconstructed from the 2D
images, object recognition cannot proceed using the Marr paradigm.

The difﬁculty of reconstructing shape from 2D images has lead computational
vision researchers to look for other ways to obtain 3D information from 2D images.
The human visual system offers some insight into this process as well [97]. The hu-
man visual system can organize and interpret images even when 3D shape cannot be
reconstructed. To do this, the salient features of the image, called tokens need to be

identiﬁed, organized, and interpreted. The capability of the human visual system to

 

gag? tokens tOE'I
tents of the imaE-“l
tokens in the 1m?-
I

he recurstt'el." grc
|

and competing gr
I
ist'ound. Groupir.
been reconstructe .
that a pair of par:
would have little 1
can be grouped. i
The nature of Illl.‘
Perceptual orr:
Inon theme, instea
and iImportance of
Processes need to
ad hOC methods,
used. the resulting

itnnr . -
. one, PArtrcula

   
 

 

 

group tokens together in a meaningful way, without any prior knowledge of the con-
tents of the image, is called perceptual organization. Perceptual organization collects
tokens in the image plane together into signiﬁcant groupings. These groupings can
be recursively grouped, or subdivided as the scene is further organized. Cooperating
and competing groupings are explored until a globally good organization of the image
is found. Groupings of tokens can be used to infer 3D properties that could not have
been reconstructed on the basis of shape information alone. As an example, suppose
that a pair of parallel lines were found in the image plane. Shape from X paradigms
would have little chance of ﬁnding a reconstruction from this sparse data. If the lines
can be grouped, however, a 3D inference can be made: the lines are parallel in 3D.
The nature of this inference will be explained in Section 1.1.2.

Perceptual organization is a broad term for a variety of processes with a com-
mon theme, instead of being one specific process. To explore the general usefulness
and importance of perceptual organization in the object recognition problem, these
processes need to be combined. This combination could be done informally using
ad hoc methods, or using a more structured approach. If informal methods are
used, the resulting system may be nearly impossible to understanding, analyze, and
improve, particularly as it became large enough to be useful. An alternative is to
impose structure on the combination of perceptual organization processes using an
integration framework. An integration framework is a method for permitting multiple
sources of information to work cooperatively and competitively on the solution of a
single problem in a structured and orderly way. The integration framework that will
be used in this work is the blackboard system [32].

This thesis will describe the development of the POLL system (Perceptual Orga-
nization for Line Labeling), to ﬁnd labeled line drawings from single monochromatic
intensity images using perceptual organization. A line drawing is a map of the depth

and orientation discontinuities in the scene. A line drawing is said to be labeled if it

 

he an interprets?
but useful descri;
an intermediate t
more than a label

The goal of ac.
tious for a single ‘
til develop a ric}.
level data and he
terpret. and imprc
database called tl.
gives an archive 0
scene even if a corl
may be useful in s
5686 a few surface

Section 1.1 wil
10 understanding 1

Pel’Ceptua] Olgmiz

 

‘hﬁ thens will fol

organization of th

6

has an interpretation of the 3D topology attached. Labeled line drawings are a terse,
but useful description of the image data. The 2.5 D sketch, proposed to be used as
an intermediate representation in 3D reconstruction for object recognition, is little
more than a labeled line drawing with surface normal information added.

The goal of achieving complete, correct line drawings in all cases is far too ambi-
tions for a single thesis. The polyhedral case has been analyzed in [105]. This thesis
will develop a rich representation of the image data that can both be found from low
level data and be used by a general purpose intermediate level process to analyze, in-
terpret, and improve the representation. This representation will be stored in a global
database called the blackboard. A useful feature of a blackboard database is that it
gives an archive of information and interpretations that have been derived from the
scene even if a correct labeled line drawing is not found. This incomplete information
may be useful in some circumstances. A robotic end effector, for example, may only
need a few surfaces to determine a contact point for grasping.

Section 1.1 will review the current literature in three areas that are important
to understanding the purpose and sc0pe of the current research: object recognition,
perceptual organization, and line labeling. The statement of the problem solved in
this thesis will follow in Section 1.2. The ﬁnal section, Section 1.3 will give the

organization of the thesis.

1 .1 Background

Using perceptual organization and inference to obtain 3D information from single in-
tensity images was proposed by Lowe [67]. Lowe added paths to the Marr paradigm
for object recognition to create a more general and robust object recognition system.
The POLL system described in this thesis will use two of these additional paths to

obtain some 3D information from single intensity images without performing shape

 

 

reconstruction.
surfaces. for exarr
tinnit)’ in 3D or a
etidence. beyond
lla'r paradigm ar
tions to the para-j

lowe demons:
tar take 2D grou;
from 2D ETOUpirtg

long history in co:

1.1.1 The l\

Discerning the id.

sensor data is cal

19315 fall into one

appt'OétCl] t0 the p:

\ision. Recent litezl
Based object recogh
Modeling the hh

“ion to Start, siri

\ision. The humar.
truerst at ever.

Simdy mOdEllng ll. l

$3.9m performs 0b

 

 

 

5

reconstruction. This 3D information consists of relationships between the imaged
surfaces, for example whether an discontinuity in intensity represents a depth discon-
tinuity in 3D or an orientation discontinuity. This thesis can be viewed as additional
evidence, beyond that given by Lowe himself, that the paths added by Lowe to the
Marr paradigm are useful. Section 1.1.1 will review the Marr paradigm. Lowe’s addi-
tions to the paradigm, and perceptual organization will be discussed in Section 1.1.2.

Lowe demonstrated the usefulness of 3D interpretation by describing rules that
can take 2D groupings and produce 3D information. POLL obtains 3D information
from 2D groupings in the image plane via labeling line drawings. Line labeling has a

long history in computational vision, which will be reviewed in Section 1.1.3 .

1.1.1 The Marr Paradigm for Object Recognition

Discerning the identity and location of known objects in the world on the basis of
sensor data is called object recognition. Most computational object recognition sys-
tems fall into one of two schools of thought: the school that seeks an engineering
approach to the problem, and the school that tries to model and understand human
vision. Recent literature reviews which emphasize the engineering approach to model
based object recognition can be found in [10, 28, 104].

Modeling the human visual system seems like a natural place for computational
vision to start, since the human visual system is the inspiration for computational
vision. The human visual system is a large and complex system that is only partially
understood at even the earliest levels of processing. If human vision were well under-
stood, modeling it would be easier. Marr proposed a theory for how the human visual
system performs object recognition in [72]. A similar theory was proposed by Barrow
and Tennenbaum [9]. This theory divides the processing into three levels which move
from the image centered primal sketch, through the scene centered 2.5 D sketch, and

ﬁnally to object centered volumetric primitives (generalized cylinders). All processing

Intensity Image(s)

l

 

 

 

 

 

Image Features I
m
3 z
o g o
m m r1

H H-

r: «v s
o o
a
x

e !

 

 

2.5 D Sketch

 

Recognition

 

Object Models

 

 

 

Figure 1.1. A paradigm for object recognition inspired by the human visual system
[67].

is bottom up, and proceeds without feedback from higher levels. A schematic of the
process, taken from Lowe [67] is shown in Figure 1.1.

The ﬁrst intermediate representation in the Marr paradigm is the image centered
primal sketch. It has features such as edges, blobs, discontinuities, virtual lines,
groupings, curvilinear organizations and terminations that can be directly found from
the intensity image. Groupings are unique among the features in the primal sketch
in that they are difficult to ﬁnd, particularly relative to the other elements in the
representation. Although there is a vast literature on clustering (see [55] for a good
bibliography), the general grouping problem is difﬁcult. More recent approaches to
this problem, in particular that by McCafferty [73], will be discussed in Section 1.1.2.

The second intermediate representation in this paradigm is a viewer centered rep-
resentation called the 2.5 D sketch, or the intrinsic image in [9]. The features in this

representation are the surface normals, depth discontinuities, orientation discontinu-

 

 

 

L

Figure 1.2. A gra
represent discorrti

sudace orientatio

vector has a roug

ities. and rough d
implicitly has t‘r.
discontinuities a:
coordinate axis 1«

that are tisible i

,

 

 

 

 

 

 

O l a" o
o o
0 3°. -_ o
0 / / / / / / §_-_ 0
.......... / / / / / / _. ..........
l / / / / / / g l l
l / / / / / / g l l l
l 3 l 'l
I l l

 

 

 

 

Figure 1.2. A graphical representation of a 2.5 D sketch of a cube. The solid lines
represent discontinuities in depth and the dotted lines represent discontinuities in
surface orientation. A circle represents an vector sticking out of the page. Each
vector has a rough depth associated with it that is not shown.

ities, and rough depth estimates for visible surfaces in the image. This representation
implicitly has the scene segmented into surfaces, since the depth and orientation
discontinuities are present. Viewer centered representations are 3D and have their
coordinate axis located with respect to the camera coordinate system. Only surfaces
that are visible from the camera vieWpoint are represented, separating it from the
higher level object centered 3D representation which is independent of viewpoint. A
simple 2.5 D sketch of a cube is shown in Figure 1.2. The short lines and circles
represent the normal vectors to the surfaces of the cube. The dotted lines represent
surface orientation discontinuities. The solid lines represent surface depth disconti-
nuities. The approximate relative depth values that are a part of the 2.5 D sketch
are not represented, but would be necessary for a true 2.5 D sketch.

Many processes can be used to obtain the information necessary for the 2.5 D

sketch. Shape from texture, shape from stereo, shape from shading, shape from

 

contour. and 5h

 

m“ by the hut:
. I
corszstent 2.5 D
open questions 1
vision although

The 2.5 D S.-
centered represez.

have been review

 

V
l

geons [11. 94:. co

tater three are us

liars work lT‘T
K '.'

G

C5 represent V0

9 .
”D Cr055 Section

function that de

on the axis. Fic

axis - -
“Dd nonlim

.' l
Slug

1e Scale 0f d

mult-
iple Scales i

contour, and shape from motion (collectively referred to as “shape from X”) are all
used by the human visual system and have been modeled for use in machine vision
systems. How the outputs from these disparate processes are combined into a single
consistent 2.5 D sketch, and the nature of interactions between these processes are
open questions in both the understanding of human vision and in computational
vision, although they are beginning to be addressed [5].

The 2.5 D sketch is the last intermediate representation before the 3D object
centered representation. There are many possible 3D representations of shape, which
have been reviewed in [10, 28]. Some other important representations of 3D shape are
geons [11, 94], constructive solid geometry, superquadrics [86], and wings [27]. The
later three are used primarily with range data. The 3D volumetric primitive used in
Marr’s work [72], is the generalized cylinder (GC), also called the generalized cone.
GCs represent volumes in a compact, generative form. The GC has an axis or spine, a
2D cross section which is consistent at every point along the spine, and an expansion
function that determines the size that the cross section will have at this position
on the axis. Figure 1.3 shows a typical CO, with a circular cross section, straight
axis and nonlinear expansion function. GCs can be used as either a hierarchical
or non—hierarchical description. In the non—hierarchical method, as in [81, 21], a
single scale of description is used. In the hierarchical method, as suggested in [72],
multiple scales are used. Figure 1.4 shows a human hand represented at two levels
within a hierarchy. Object recognition is done in the Marr paradigm by matching the
hierarchical generalized cylinders instantiated from the 2.5 D sketch to stored object
models.

The Marr paradigm has been widely discussed, extended, and criticized, and re-
mains the single most inﬂuential model of the human visual system. The 2.5 D sketch,
in particular, is a goal that is being approached by integration efforts like those of

Aloimonos [3], and this thesis.

 

Figure 1.3. A Sir

and nonlinear €X‘

C

figure 1.4. On t
generalized cylin

 

 

upon-ton
reaction

U )7 -’

\Q/

Figure 1.3. A simple generalized cylinder with a circular cross section, straight axis
and nonlinear expansion function.

Figure 1.4. On the left is a low resolution representation of a human hand using two
generalized cylinders. On the right the same hand is represented at higher resolution.

 

 

 

 

 

1.1.2 Perceptual Organization

The Marr paradigm has had enormous inﬂuence on the field of computational vi-
sion. But human vision does not consist of the reconstruction of 3D shape alone.
Human vision can also organize a scene into meaningful groupings without any ex-
plicit shape information being present. An appreciation for this aspect of the human
visual system dates back to the Gestalt school of psychology [114], which identiﬁed
the first principles of what is now known as perceptual organization. These principles
include proximity, similarity, good continuity, closure, symmetry, and ﬁgure-ground
separation. Proximity groups tokens together based on their spatial location, as in
Figure 1.5(a). Good continuity, as shown in Figure 1.5(d) collects the curves such

that the curves on the left hand side are seen as two continuous intersecting lines,

 

rather than two
Geslt‘ti‘t group'illi

dscussed in Sect

Deﬁnition 1-1
ttm to dent‘f mi"

of its contents it;

In computati:
tothe forefront '
stardom. meanir.
argued that who
have been unlike:
segments should i
very unlikely thaf
lowe sharpened t
of tokens is propt
‘0 have arisen by
Collinear line segn
S~“t’mertts are lSol

at

t]. , .
we} are in a mon

 

 

10

rather than two separate objects that are coincidentally touching, as on the right.

Gestalt grouping was performed computationally by McCafferty in [73], as will be
discussed in Section 2.1.2.

Deﬁnition 1.1 Perceptual organization: the basic capability of the human visual sys-
tem to derive relevant groupings and structures from an image without prior knowledge

of its contents [67].

In computational vision, the importance of perceptual organization was brought
to the forefront by Witkin and Tenenbaum [115] who analyzed the importance of
structure, meaning spatiotemporal coherence or regularity, in visual perception. They
_ argued that when regular structure is observed, it is signiﬁcant because it would
have been unlikely to have arisen accidentally. For example, a pair of collinear line
segments should be grouped into a single longer line segment since it would have been
very unlikely that they would be aligned if they did not share a common causality.
Lowe sharpened this idea in [67] when he proposed that the signiﬁcance of a grouping
of tokens is proportional to how unlikely the relationship between the tokens was
to have arisen by accident, called non-accidentalness. In Figure 1.6, the same three
collinear line segments appear on both the right and left hand sides. When the line
segments are isolated, as on the right, the collinearity is perceived immediately. When
they are in a more cluttered setting on the left, it requires attention to perceive the
collinearity. The increased density of lines makes collinearity more accidental, and
therefore less signiﬁcant.

Lowe proposed auxiliary paths to be added to the Marr paradigm for object recog-
nition, described in Section 1.1.1, as seen in Figure 1.7. These additional paths give
the visual system a way to use non-accidental properties to contribute to the 2.5 D
sketch and to perform object recognition. The contribution to the construction of

the 2.5 D sketch occurs through another application of non-accidentalness called 30

11

Proximity

(a) O O O O O O
Similarity

(b) O O O O O O O 0
Closure

<c> E II I: I] E 3
Continuity

(d) 5 f 5
Symmetry

(e) E g E
FigureGroundSeparation

(f)

 

Figure 1.5. The Gestalt rules of organization. (a) Tokens are grouped on the basis of
proximity. (b) Tokens are grouped into pairs of dark and light circles using similarity.
(c) Curves are grouped into closed ﬁgures. (d) Continuity dictates that these curves
be interpreted as two crossing lines, instead of two touching shapes. (e) Symmetry
groups these curves. (f) Figure ground separation causes the abstract ﬁgure to be
seen in front of the rectangular background.

 

lieu-re 1.6. In htF
line segments. 1:
the ﬁgure on the
to tind the color.

trjerence. 3D it.

that led to any 5;

 

there are a pair c
tics of a pair ol ii
the camera in a '
insuch a positic
3D. This is clost
Section 1.1.3.
There are m
organization prir
“Sing a relaxatit
detect and verify
or perceptual or
T“"0 more ge
5101’]. and Moha
both STOUp edgt
Cam SlmCtUreS

L.
“Heron for p.

12

/l\\| /

\—

/\\_/\\l \ \
/—/|//\_/ \
\_

_/ |/—

Figure 1.6. In both the ﬁgures on both the right and left side, there are three collinear
line segments. In the ﬁgure on the right, the line segments are seen in isolation. In
the ﬁgure on the left, there are additional noise line segments. It requires attention
to ﬁnd the collinear segments in the left hand image.

inference. 3D inference uses the assumption that the camera was not in a position
that led to any special relationships in the image plane. As an example, suppose that
there are a pair of parallel lines in the image plane. These lines could be the projec-
tion of a pair of lines that are not parallel in space, but that would require positioning
the camera in a very singular position. Since it is unlikely that the camera was held
in such a position, it is likely that parallel lines in the image plane are parallel in
3D. This is closely related to the general viewpoint hypothesis to be discussed in
Section 1.1.3.

There are numerous examples of special purpose systems that use perceptual
organization principles. Tuceryan used perceptual organization to group dot patterns
using a relaxation framework [112]. Rosin and West used perceptual grouping to
detect and verify surfaces of revolution [100]. A curvilinear grouping algorithm based
on perceptual organization was done in [111].

Two more general perceptual organization systems are Sarkar and Boyer’s system
[102], and Mohan and Nevatia’s system [75]. These systems are similar in that they
both group edge elements hierarchically into larger and more perceptually signiﬁ-
cant structures using non-accidentalness criterion. Sarkar and Boyer use a uniform

framework for perceptual organization whereas Mohan and Nevatia use a non-uniform

 

Figure I

flatnework ShO'
“5ng lines inst

The distinc
uniform {Rune
Wildly Sign].
Wits easier
the toms lha

lOIIIl systems i

hierarchy 0i p
more ”bust, s

is uniqudy 3d;

dlSllDCtiO

l3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

"———> Intensity Image(s)
' L
Image Features I
m
:-
g 2 5 Perceptual
m m n Organization
'1 H-
:r e g
g 0
Verification
x
g ,
2.5 D Sketch H Perceptual Groupings
, 3D
Inference
Recognition
*—- ' L
Object Models [‘
Recognition

 

 

Figure 1.7. Lowe’s additions to Marr’s object recognition scheme [67].

framework shown in Figure 1.8. An earlier version of the Mohan and Nevatia system
using lines instead of curves in the domain of aerial imagery was shown in [74].

The distinction between uniform frameworks and non-uniform frameworks is that
uniform frameworks use the same processing at every level in the hierarchy of per-
ceptually signiﬁcant structures. A uniform framework, like that of Sarkar and Boyer,
permits easier extension because the basic grouping processing does not change as
the tokens that are grouped become larger and more perceptually signiﬁcant. Uni-
form systems also have the advantage of being easier to implement and understand.
Non-uniform frameworks can use different types of processing at each level of the
hierarchy of perceptually signiﬁcant structures. Non-uniform systems are usually
more robust, since they can take advantage of special features and processing that
is uniquely adapted to each level in the hierarchy of perceptually signiﬁcant events.

This distinction between uniform and non-uniform systems will be reinforced when

Fir—1mm

F—‘I

   

figure 1.8. hloha
oration from fr:

 

14

 

 

 

 

 

Symmetries

t

Proximity

   

 

Figure 1.8. Mohan and Nevatia’s non-uniform integrated system for perceptual orga-
nization from [75].

integration frameworks are discussed in Chapter 2.

Sarkar and Boyer’s strategy for perceptual organization uses a relational graph
data structure. Tokens (pixels, lines, joined straight lines, anti-parallel lines—APARS,
and joined anti-parallel lines-PAPARS) are grouped using the same strategy at all
levels of the perceptual hierarchy, making the system uniform. Each token is repre-
sented by an attributed node in the graph. The attributes describe the properties
of the token. For a straight line, the attributes are an identiﬁcation number, the
endpoints, the orientation, and the contrast across the edge. Compatible tokens are
joined by an attributed arc, where the attributes in the linear case are proximity,
angular compatibility, and continuity. Compatibility is determined by comparing a
subset of token attributes to tolerances, using binning in the space of attributes. All

selected attributes must be within tolerances for the tokens to be grouped.

 

Fifi” 1.9. This
contains the pic:
the PQICCptual g:

All Of the exar

“P. or data (1m,

 

The lime 00ntai;
nation that a da
if’Ee’ther into men
of Spots can be 0}

Perceptual 0,!

not tied to any I:

 

Figure 1.9. This image is hard to interpret without the top down information that it
contains the picture of a dalmatian dog. Once the contents of the image are known,
the perceptual groupings that identify the dog can be found [72].

All of the examples of perceptual organization discussed so far have been bottom-
up, or data driven. Perceptual organization can also work top down, as in Figure 1.9.
The ﬁgure contains a noisy image of a dalmatian dog. Without the high level infor-
mation that a dalmatian is in the scene, it is nearly impossible to collect the spots
together into meaningful groupings. Once the dog has been identiﬁed, the groupings
of spots can be organized properly.

Perceptual organization has the potential to produce ﬂexible systems since it is
not tied to any particular model of objects in the world. There are two important
open questions in perceptual organization. The ﬁrst question is which processes
should be used to perform perceptual organization and the second question is how
the combination of these processes should be done. This thesis will speak to both

questions.

16

 

 

 

 

Figure 1.10. A synthetic line drawing of a simple scene.

1.1.3 Line Labeling

In Lowe’s object recognition paradigm, there is a path for 3D inference from per-
ceptual groupings. Lowe’s demonstrations of the use of this path are in the form
of inference rules where particular groupings of 2D tokens give a 3D interpretation.
There is another type of 3D inference that has been used in computational vision for
many years: labeling a line drawing. A line drawing is a map of the orientation and
depth discontinuities in a scene, which are two important aspects of the 2.5 D sketch.
Figure 1.10 shows a line drawing of a simple scene containing a sphere, a cube and
a triangular prism. Lines a, b, e, g, and h are depth discontinuities. Lines c and
d are orientation discontinuities. Line f could be either an orientation discontinuity
or a depth discontinuity depending on whether the back face of the triangular prism
is touching the front face of the cube. Line drawings are labeled with symbols that
describe the local geometry. The assignment of labels is constrained by an exhaus-
tive list of combinations of labels that can be assigned within a prescribed domain of
objects called a junction catalogue.

There are three basic classes of edges (variously named by different authors) limb

edges, jump edges, orientation edges. A limb edge is a depth discontinuity that is

 

/\

 

. '1
figure 1.11. A C).
labelings is only the
in air. The interpre

formed when the lir
itself with respect 1
object, where two c
are labeled s... 0
hand side of the ar
0D POlthdral obje

depth that is not a

directionEil convent

17

O C)

l l l l

 

 

 

 

Figure 1.11. A cylinder and two line labelings. The distinction between these two
labelings is only the bottom edge. The interpretation on the left is a cylinder ﬂoating
in air. The interpretation on the right is a cylinder resting on a table.

formed when the line of sight is tangent to the imaged surface and the surface occludes
itself with respect to the viewer. A limb edge is therefore not a physical edge on the
object, where two distinct surfaces meet, but is an artifact of vieWpoint. Limb edges
are labeled «-—s— or —e-> with the surface closest to the viewer being on the right
hand side of the arrow as it is followed from tail to head. Limb edges do not occur
on polyhedral objects. A jump edge is an edge which represents a discontinuity in
depth that is not a limb edge. Jump edges are labeled as -+ or 4— using the same
directional convention as a limb edge. Orientation discontinuity edges are formed
when two surfaces with different tangent planes meet. The imaged surfaces must be
physically adjacent in 3D, and both visible in the image. At a convex edge, labeled
+ the visible angle between the surfaces is greater than 180°, otherwise the edge is
labeled concave —. Figure 1.11 shows two labelings for a right circular cylinder with
a planar top face. The interpretation on the left is a cylinder ﬂoating in air (making
the bottom edge a depth discontinuity). The interpretation shown on the right is a
cylinder resting on a table (making the bottom edge an orientation discontinuity). In

[79] the mathematical properties of line drawings were explored.

 

Figure 1.12. (a) A i
th.: viewpoint will

topology of the line
an e hall of viewpo
draWing remains cor

Junction labelin
extended to general
in [113], and origarr.
apolyhedral line tin

of the junction cata‘

World. each edge is f

Line labeling usr

 

18

 

 

 

 

 

 

 

 

 

 

(a) (b)

Figure 1.12. (a) A cube is shown from an accidental vieWpoint. Any deviation from
this viewpoint will cause another side of the cube to become visible and change the
topology of the line drawing. (b) A cube from a non-accidental vieWpoint. There is
an 6 ball of viewpoints around the current position where the topology of the line
drawing remains consistent

Junction labeling was originally done in the trihedral world [51, 29], and later
extended to general polyhedra [68], shadow edges and surface markings were added
in [113], and origami objects were used in [58]. To automatically label the edges of
a polyhedral line drawing and obtain a physical interpretation, an exhaustive search
of the junction catalog for the object domain of interest is done. In the polyhedral
world, each edge is further constrained to have a unique label along its entire length.

Line labeling usually requires that the imaged objects be viewed from a general
viewpoint. An object is said to be viewed from a general viewpoint when that overall
structure of the line drawing of the object is consistent over all viewpoints in a small
sphere of viewpoints surrounding the current viewpoint, called an 6 ball of vieWpoints.
The metric properties of the line drawing will vary over this 6 ball, but the general
topology must remain the same. Figure 1.12(a) shows a non-general view of a cube.
A general view of the same cube is shown in Figure 3.18(b).

A junction catalog for piecewise smooth objects was derived by Malik in [69].

Piecewise smooth objects differ from polyhedra in that the surfaces are not restricted

 

Figure 1.13. This ‘
inside upper curved
edge. and back to
edge and a part of

 

to be planar. but at
shadows and marki
deﬁnition of this dc
in the piecewise sm
labels along their l1
Vertex. called a ph
shows an object it
edge, (shown by a
lhe effects of pha
established,
There are a

assumed to be per

the curvilinear “.0

19

Figure 1.13. This horseshoe cylinder has two phantom vertices. Both are on the
inside upper curved edge. Here the edge changes from being a jump edge to a convex
edge, and back to a jump edge with no real vertices intervening. The occluded limb
edge and a part of the bottom edge are shown with dotted lines.

to be planar, but are only restricted to be C3. Origami objects, wire frame objects,
shadows and markings are not permitted in Malik’s object domain. A more precise
deﬁnition of this domain will be given in Section 1.2. Unlike the polyhedral domain,
in the piecewise smooth object domain curved edges do not necessarily have identical
labels along their length. This labeling change in the middle of a curve creates a false
vertex, called a phantom vertex on each curved edge in the line drawing. Figure 1.13
shows an object with two phantom vertices. Phantom vertices occur when a limb
edge, (shown by a dotted line on the right hand side), is occluded by the object itself.
The effects of phantom vertices on the combinatorics of line labeling have not been
established.

There are a number of difficulties with the line labeling paradigm. The input is
assumed to be perfect in that no edges are missing, and there are no spurious edges. In
the curvilinear world, changes in tangent direction must be distinguished from changes
in curvature, which is a difﬁcult process. Even with richer range images, accurately
ﬁnding curvature is a challenging problem [36]. The interpretations produced by line
labeling are not guaranteed to be unique, although they are usually limited to a small
number of labelings. Finding polyhedral line drawings from imperfect range data was

done in [62].

In the polyhedra

 

group of 9 blocks 0
was successfully per
:1052. In the polyhe
planar. This makes
PTOblem. A similar

Such “5 quadric sur

 

many objects. like .
lhe basic theorems
of linear equations
quadric surfaces. sc
general a the C 3 5

Line labeling 1
Object recognition
classiﬁes lines ant
Perfect line label]
liuel

abellngs ate

(All only produce

20

In the polyhedral world, line labeling and some object completion was done on a
group of 9 blocks of known size and shape in [34]. General polyhedral line labeling
was successfully performed, with the help of some manual editing, on intensity images
[105]. In the polyhedral world all edges must be straight lines, and all surfaces must be
planar. This makes it possible to use linear equations to constrain the solution of the
problem. A similar constraint could be used with a more complex parameterization,
such as quadric surfaces. These surfaces are still not suﬂiciently general as there are
many objects, like an hourglass, that cannot be represented with quadric surfaces.
The basic theorems that control the existence and uniqueness of solutions to systems
of linear equations do not extend into more general parametric representations like
quadric surfaces, so the probability of extending this approach to surfaces that are as
general as the 03 surfaces is remote.

Line labeling for curved-surface bodies was done by Chakravarty in [25]. 3D
object recognition using curved objects was done in [26]. Chakravarty detects and
classiﬁes lines and then extrapolates the junctions. To perform the matching, the
perfect line labelings from object models are analyzed to determine which imperfect
line labelings are likely to arise from the given object model. Chakravarty’s approach

can only produce a perfect line labeling after matching to the object model database.

1.2 Problem Statement

Improving the robustness and ﬂexibility of computational vision has been approached
from many different perspectives. Perceptual organization is promising because of its
generality, which encourages ﬂexibility. Perceptual organization is not a single pro-
cess, but rather a group of processes that cooperate and compete to ﬁnd a good
interpretation of the imaged data. Controlling this cooperation and competition is

an important research issue, called integration. Previous efforts in perceptual orga-

21

nization have used a non-uniform but ﬁxed structure, as in Mohan and Nevatia, and
the uniform structure of Sarkar and Boyer. Both of these systems are data driven,
which does not allow information to ﬂow from the top down to improve the percep-
tual organizations, unlike the system described in this thesis. This thesis will describe
an integrated system for perceptual organization and 3D interpretation that obtains
labeled line drawings from single intensity images. The system uses two of the paths
added to the Marr paradigm for object recognition by Lowe: perceptual organization
and 3D inference.

This thesis will describe the construction of the POLL system (Perceptual Orga-
nization and Line Labeling) that can ﬁnd labeled line drawings from single intensity
images. The input into the system will be a single intensity image of an object, or ob-
jects in an object domain which will be precisely described shortly. The output from
the system will be a line drawing labeled with a 3D interpretation, and a database of
intermediate representations and partial results that may be useful for further pro-
cessing. Finding labeled line drawings is a very difﬁcult problem, and there will be
some images which are not correctly processed. In these images, it is desirable to ob-
tain intermediate information that could be used by other processing to work towards
the goal.

The integration framework to be used is the blackboard system [32]. A blackboard
system, which will be further explained in Section 2.1.3, uses a common database
called the blackboard to hold input, partial solutions, and control data. These data
objects are processed by independent knowledge sources, each of which specializes
in one type of processing. These knowledge sources evaluate the current blackboard
conﬁguration, look for opportunities to contribute to improving the solutions, and
perform their specialized processing. The results of this processing are left on the
blackboard for other knowledge sources to opportunistically use.

Figure 1.14 shows the overall structure of the system. The knowledge sources

 

are divided into for
These knowledge St
ration. They are a

and make other CO

terpretive knowled:
aline drawing and
terpretation of the
diagnostic and llllt‘
source. This knowh
is suitable for fixin;
are Cilietl by the co
PCYCCPlual knowleci

global diagnostic in

repair knowledge Si
recomputation.

The perceptual
detection [ll]. an a
[ill]. a module tlnl
the type of line drl

particular modules

 

22

are divided into four groups. The ﬁrst group are the perceptual knowledge sources.
These knowledge sources bootstrap the initial line drawing using perceptual organi-
zation. They are also available to do specialized work for other knowledge sources,
and make other contributions to the blackboard database. The diagnostic and in-
terpretive knowledge source is a modiﬁed line labeling paradigm that can examine
a line drawing and locate inconsistencies. If there are no inconsistencies, a 3D in-
terpretation of the line drawing will be found. The information produced by the
diagnostic and interpretive knowledge source is available for the control knowledge
source. This knowledge source selects which of the available repair strategies, if any,
is suitable for ﬁxing a problems with the line labeling. The repair knowledge sources
are called by the control knowledge source to ﬁx the line labeling, and in turn call the
perceptual knowledge source to reevaluate previous results in light of the the more
global diagnostic information that is available. This system is integrated because the
repair knowledge sources can return results to the perceptual knowledge sources for
recomputation.

The perceptual knowledge sources used in this system are weak membrane edge
detection [13], an algorithm that performs edge grouping on the basis of continuity
[111], a module that analyzes vertex proximity, and a module that can determine
the type of line drawing junctions. Section 2.2 will explain the selection of these
particular modules. A modiﬁed line labeling algorithm will be used as the diagnostic
and interpretive knowledge source.

The domain of objects that will be used in this thesis is a subset of the piecewise
smooth objects used in [69]. Using the notation of Malik, let S,- be a bounded 03
coordinate patch in R”, such that the boundary for 5',- consists of a ﬁnite number of
piecewise smooth simple closed curves. Let B¢(P) be an 6 ball about point P. Let
B¢(P; M) E B¢(P) n M be the e neighborhood of P relative to M, where M C R3.

Deﬁnition 1.2 A 03 piecewise smooth surface in R3 is a subset M C R.3 for which

 

 

L‘

 

 

I
I
| .
I
l
I
I
I
I

Repair Knowledge S

 

23

Perceptual Knowledge Sources

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Diagnostic and Interpretive Knowledge Source

 

 

 

7.7
g

 

 

l Controller I

 

L-

 

 

 

 

 

 

 

 

I

I

I

I

I .................

Repair' Knowledge Sources
: ................. :
I I
: :
: l Reclassify In I :
I , I
I I
I I
I I
I I
: Add Art I l
I I
I ' I
I I
I , I
: :
: Add Phantom l g
I , I
I I
I I
: :
—'[ [Group Junction ‘ l .1;

: a :
I I
I I
I I
: Divide Junction "' g
I , I
n I
I I
I I
a u
l l Smooth Curve ‘ J l
: a :
I I
I I

Figure 1.14. An overview of the POLL system described in this thesis. Repair

knowledge sources marked

with an asterisk have not been implemented.

 

time 611525 a call!-

I

l:l...n a 51‘,

J

(I

‘-

tl'ee cortdtttomS f:
1. Theft crisis ‘4

2. There 61’le l.
Btlps“[) : [L
at P.

3. There are m
BAPUVI =

surface eleme'

 

 

An additional re
the object domain:
object is limited to ‘
in Figure 1.15. Exa:
origami objects. an

not legal because th

24

there exists a collection of surface elements 5' = (Sl,Sg,...,S,.) with each of the
S;,i = 1...n a subset of M. For every point P 6 M, exactly one of the following

three conditions is true [69]:
1. There exists an S.- E S such that B¢(P; M) = B.(P; S.) for some 6 > 0.

2. There exist two surface elements 5; and Sj in .S' such that for some 6 > 0,
B¢(P, M) = B¢(P; S;)UB¢(P;S,°). The tangent planes to S,- and 55 are distinct
at P.

3. There are m 2 3 surface elements S;,,S.°,,...S

B¢(P; M) = B¢(P;5.-,) U B.(P;S.-,) U ...B¢(P;S.-,,,). For any pair of these

such that for some 6 > 0,

surface elements, the tangent planes are distinct at P.

An additional restriction is added in this thesis to help control the complexity of
the object domain: the number of surfaces that can meet at a vertex in a domain
object is limited to three. Some examples of objects that are in the domain are shown
in Figure 1.15. Examples of objects that are not in this domain are wireframe objects,
origami objects, and a single nappe of a right circular cone. Wireframe objects are
not legal because there are no surface elements. Origami objects are not legal because
some of the edge points are adjacent to only one surface instead of the required two
surfaces. A single nappe right circular cone is not legal because it is neither smooth
at the apex, so the apex cannot satisfy condition 1, nor is the apex adjacent to two
distinct surfaces. A bent stove pipe, as in Figure 1.16 is also not within this domain
since there is a point along the edge joining the two parts where the tangent planes
of the two surfaces are equal.

There are a group of other assumptions that must be made for the line labeling
algorithm to be applicable. The projection model for the image must be orthographic.

Orthographic projection plus a linear scale factor, called weak perspective, was shown

25

 

Figure 1.15. Some objects that are legal in the piecewise smooth object domain.

Figure 1.16. An object that is not in the piecewise smooth object domain since the
tangent planes to the surfaces are equal at some point along the edge.

 

w be an approxime

a general viewpom

 

1.3 Organ

The organization (
perspective in the
f modules will be
an evaluation of ti
explain how the inf
l‘DOWledge sources
have been properl;
the diagnostic and
n~‘PYESemation of
defects Can be co:
Chapter,

lo

Chapter

r further Tesezm

26

to be an approximation to perspective projection in [52]. The objects must be seen in

a general vieWpoint. Shadows, texture and markings are not allowed on the objects.

1.3 Organization of the Thesis

The organization of the thesis is as follows. Chapter 2 will place this research in
perspective in the context of the current vision work. A list of desirable properties
for modules will be given and discussed. The choice of the integration framework and
an evaluation of the modules available in the literature will be done. Chapter 3 will
explain how the initial analysis of image information is performed with the perceptual
knowledge sources. Chapter 4 will give the method proposed to discern images that
have been properly analyzed by the initial processing from those that have not using
the diagnostic and interpretive knowledge source. Once inconsistencies in the initial
representation of the image have been found, Chapter 5 shows how some of these
defects can be corrected using the control and repair knowledge sources. The ﬁnal
chapter, Chapter 6, will list the contributions of the thesis and suggest related topics

for further research.

 

CHAP'.

Integra
Module

Visual Perceptuai
COllecIion of proc.
ETOUpings_ Flndix
The number of p
criteria for What
‘0 COoperzite am
Processes need tc
and “penise to

All example (
hedral Scene Can
the Simpler Poly

L
ale to Pr0cess ‘

CHAPTER 2

Integration: Methodologies and
Modules

Visual perceptual organization is not a single cohesive process, but rather a diverse
collection of processes working towards the goal of organizing images into meaningful
groupings. Finding good groupings in images is an inherently difﬁcult search problem.
The number of possible subsets of image tokens is enormous, and there are myriad
criteria for what makes a good grouping. For these perceptual processes to be able
to cooperate and compete to ﬁnd meaningful relationships within the image, the
processes need to be integrated. Integrated processes can use each others constraints
and expertise to guide the search for good groupings.

An example of the need for integration is shown in Figure 2.1. This simple poly-
hedral scene cannot currently be labeled using curvilinear line labeling [69], or even
the simpler polyhedral line labeling [113]; Sugihara’s line labeling system would be
able to process this image only if the missing edge were added by hand [105, pg.
178]. The imaged object is within both the domain of polyhedral objects and the
domain of piecewise smooth objects, but the front corner edge was not found by the
edge detector. It is unlikely that any edge detection scheme could recover this edge,

since edge detection relies upon spatial variation in intensities and there is no spatial

27

 

fit a d C .uly 0

 

28

 

(b)

Figure 2.1. An intensity image of a simple square block. (a) Notice that there is
almost no variation in intensity across the front corner of the block. This makes it
difficult for edge detection to ﬁnd this edge without ﬁnding other noise edges. (b)
Canny edge detection was run on the image in (a). An edge parallel to the top and
bottom of the block was found but the front corner is missing. This extraneous edge
occurs because the side of the block is curved.

variation in intensity values across the front edge of the cube. This can be seen in
Table 2.1 where the raw intensity values of the front portion of the rectangular prism
are shown. Line labeling modules require perfect input and cannot recover from edge
detection mistakes. And edge detection modules cannot give perfect edges. There-
fore, even an unavoidable edge detection failure, like that in Figure 2.1, will result in
a failure of line labeling. This serial failure of modules is one of the problems that
integrated systems hope to ameliorate.

Integrating the perceptual organization processes processes into a single, general
computational vision system is an ambitious goal. If the processes are combined
in an informal fashion, the resulting system may be difficult to understand and ex-
pand. Therefore, a more formal method for integrating processes, called an integration
framework is needed. Integration frameworks have been used within the shape from
X paradigms, as is suggested in the Marr object recognition paradigm where the

2.5 D sketch integrates processes such as stereopsis, shape from motion, shape from

 

figure 2.2. T
Where the inte

'0
“A
b—J
tQ
M
H

 

é;D
p...
I‘J
(\D

b—ﬁ

(A, (\J
C (X
H H
to M
~J CH
p—n
h.

H
(«0
(\D
Hp—a

(.5

 

HHﬁ—‘F—‘h—‘HH
mmrgrnrn

/
H
.5'

29

 

Figure 2.2. The input image from Figure 2.1 with a highlighted rectangle showing
where the intensity values in Table 2.1 occur in the image.

Table 2.1. The intensity values from the top front corner of the rectangular prism
shown in Figure 2.1 The numbers in the 1905 are from the top of the prism. The
numbers in the 1305 are the sides of the prism. There is no variation in intensity
across the front edge of the prism.

 

124 121 117 113 125 154 178 1897 192 193 195 194 196 197
124 122 117 114 114 132 163 183 191 193 194 195 195 196
125 122 120 117 113 119 143 170 186 191 193 194 195 195
128 125 122 119 116 113 121 151 179 192 195 193 193 189
130 127 125 123 120 115 116 136 166 183 185 180 170 160
132 129 128 126 121 118 115 123 142 157 154 143 132 126
134 132 131 129 125 121 119 116 121 124 120 114 114 116
134 133 132 129 126 122 120 119 119 117 117 120 122 123
135 135 133 131 128 126 124 125 126 126 126 125 127 127
134 135 134 132 129 129 129 126 128 128 127 127 128 129
138 137 135 134 133 133 132 131 132 132 131 130 130 130
139 139 137 136 135 134 134 132 132 132 132 132 132 131
139 139 137 134 135 136 136 136 135 134 135 134 134 135
141 140 138 137 137 138 139 138 139 136 136 136 136 136
141 141 140 137 138 140 140 138 138 138 138 138 141 141

 

 

 

 

shading. color.
imageis). An i.

In order to
addressed. The
i"hich perceptu
discuss the inter
the blackboard
‘0 he used in th

Section 2.2-

2.1 Integ

Integration has b
quantity of recent
as the recent inte
the dissatisfaction
struction techniqt

dissatisfaction of t

modules and para

15 improving the rt
i .
Important that svs

a Ser' '
1&1 cornhmatio

I ~
Obust m Its Weak

 

 

30

shading, color, texture, and edge detection, to help reconstruct 3D shape from input
image(s). An integrated system for Gestalt grouping has also been created [73].

In order to integrate perceptual modules, there are two questions that must be
addressed. The first is which integration framework will be used and the second is
which perceptual modules should be integrated in this framework. Section 2.1 will
discuss the integration frameworks that are used in computational vision, and select
the blackboard framework as the model for integration for perceptual organization
to be used in this thesis. The perceptual modules to be integrated will be chosen in

Section 2.2.

2.1 Integration Frameworks

Integration has become an important research topic in computational vision, as the
quantity of recent literature on the subject attests [4, 31, 41, 76, 77, 62, 73, 3]. Just
as the recent interest in perceptual organization can be interpreted as a measure of
the dissatisfaction of the vision community with the weaknesses of 3D surface recon-
struction techniques, interest in integration can be interpreted as a measure of the
dissatisfaction of the vision community with the lack of robustness of the individual
modules and paradigms that are currently available. The primary goal of integration
is improving the robustness and ﬂexibility of computational vision algorithms. It is
important that systems that are created from a collection of modules be stronger than
a serial combination of the individual modules. An integrated system that is only as
robust as its weakest module is of little value.

Integration techniques can be divided into two categories according to the unifor-
mity of treatment of the modules. Uniform integration schemes treat every module
being integrated in an equivalent fashion. This is similar to Sarkar and Boyer’s uni-

form perceptual organization system [102] where the same grouping mechanism was

used at every 1
simplicity. and
power. since 5}
sacriﬁcing the
unique to the
allows unique
robust system

Awmmm
that optirniza
Optimization '
Will then be .

dscussed in E

31

used at every level in a hierarchy of perceptual events. Uniformity has an attractive
simplicity, and can be easy to control. These virtues are paid for by a loss of some
power, since special attributes of individual modules cannot be incorporated without
sacriﬁcing the uniformity. Non-uniform integration strategies use methods that are
unique to the speciﬁc modules being integrated. This can be advantageous since it
allows unique attributes of the modules to be exploited, which may produce a more
robust system.

A commonality between both uniform and non-uniform integration frameworks is
that optimization is frequently used. There are a variety of techniques for performing
optimization that will be discussed in Section 2.1.1. Uniform integration strategies
will then be discussed in Section 2.1.2. Non-uniform integration strategies will be

discussed in Section 2.1.3.

2.1.1 Optimization

Many integration strategies use some form of optimization. Optimization takes a
heuristic criterion function that evaluates the goodness or desirability of a conﬁgura-
tion and then searches the space of all possible conﬁgurations for the conﬁguration
that maximizes (or minimizes) this function. An example of a system that uses op-
timization is the contour tracing paradigm of [7, pg. 136]. This paradigm uses two
criteria for the goodness of an edge: the strength of the edge (from a local edge de-
tection method, such as the Sobel operator), and smoothness of the edge. The best
set of smooth edges, as deﬁned by this heuristic criterion function, are found using
dynamic programming. Dynamic programming is only rarely applicable in optimiza-
tion problems since it requires the criterion function to be of a specific, non-general
form, but when it is applicable it is guaranteed to ﬁnd the best solution.

A common optimization method in computational vision is energy minimization.

Energy minimization is distinguished from other optimization methods by the pres-

ence of a Phl'SiCa'

distributions and -

 

 

used for image rec:
terr. with an energ
0‘ I h
between Gibbs d1:
Gibbs distributior.
- . I
Is performed usrn;
a posteriori (MAI
5.43} Optimizatior

ential Equations (

 

minimization usin
Section 3.2.
Vt hen energy

the solution foun

smoothness as a t
its solution exists
nation from the:
tision problems
edge detection. 1
functions that a'
iZation [89]. 35
y, and that the
a norm.

32

ence of a physical model that represents the goodness of a conﬁguration. Gibbs
distributions and Markov Random Fields (MRF) deﬁne an energy function which is
used for image reconstruction. The physical model used is a statistical mechanics sys-
tem with an energy function that determines its Gibbs distribution. The equivalence
between Gibbs distributions and MRFs is used to change the more easily understood
Gibbs distribution into a more mathematically tractable MRF. Energy minimization
is performed using the stochastic simulated annealing method to ﬁnd the maximum
a posteriori (MAP) estimate of the image contents given the degraded observations
[42]. Optimization can also be done using energy minimization using Partial Differ-
ential Equations (PDE), and the Euler-Lagrange equations. An example of energy
minimization using PDEs is the curvilinear grouping paradigm to be discussed in
Section 3.2.

When energy minimization problems are being solved, it is often desirable that
the solution found be smooth. Regularization theory is the rigorous approach to
smoothness as a constraint. A mathematical problem is said to be well-posed when
its solution exists, is unique, and depends continuously on the initial data. Any de-
viation from these conditions causes a problem to be ill-posed. Many of the classic
vision problems are ill-posed, including surface reconstruction, shape from X, and
edge detection. Regularization works by limiting the choice of solutions to a class of
functions that are maximally smooth. There are two principle methods for regular-
ization [89]. Assume that one is trying to solve the problem of ﬁnding 2 from the data
y, and that the problem is ill-posed. Let A be a linear operator, and I] - [I represent
a norm.

Az=y

In the ﬁrst meth

Here P is a lint
is Within 6 of ill

stabilized functic

The parameter .
smoothness. Larg

Although the
model the physic
lems to all image
shape from shadi
prion). In [107]. '
using nonquadrat
discontinuous. Ir.
tion at points of .
regularization by
are salient in ima

di ' .
SCOHllnultieS in

33

In the ﬁrst method, one chooses the solution which satisﬁes:

min [I Pz ||
"Al-3H5!

Here P is a linear stabilizing functional. The solution 2 is found such that A2
is within 6 of the desired data y and 2 has maximal smoothness by selecting the

stabilized function of minimum norm. In the second method one minimizes:

llAz-YII‘HHPZ”

The parameter A determines the relative importance of faithfulness to data and
smoothness. Large A values favor smoothness.

Although the classic regularization theory is mathematically rigorous, it doesn’t
model the physical world well in all respects. The stabilizer is applied in such prob-
lems to all image areas equally, including areas of discontinuity (this is avoided in
shape from shading by assuming that the contour of the isolated object is known a
priori). In [107], Terzopoulos offers an improvement on standard regularization which
using nonquadratic stabilizing functions with continuity control functions which are
discontinuous. In this method, two binary variables are set to turn off regulariza-
tion at points of depth or orientation discontinuity. This improves the usefulness of
regularization by preventing it from smoothing over discontinuities. Discontinuities
are salient in images, as can be seen by the inclusion of both depth and orientation
discontinuities in the 2.5 D sketch of Marr, and as such should not be smoothed.

Optimization minimizes a functional over the space of all possible solutions. This
space is often prohibitively large, so a wide variety of techniques for finding interpre-
tations that minimize complex functionals have been developed. If the functional is

of the appropriate form, dynamic programming can be used to ﬁnd the best solution.

lithe functional i:
also called hill clil
descent may get tr
plex functionals a:

Deed lo llSCd. Th

    
  
   
 
 
  
  
  
  

ods and stochastit
is shown in Sectio
ﬁtting a weak me;
cry minimization
applicable. but can
ninistic algorithm
nitima instead, 5

good glOba] minim

 

Optimization “

as well as (“.0 of t
21 2 Unlfo
lute

grallOD frame
tb _

e nonuniform in

I

non-uniform lute

.ln example of a ‘

 

34

If the functional is convex over the space of all interpretations, then gradient descent
(also called hill climbing) can be used. If the energy functional is not convex, gradient
descent may get trapped in a local minima and never reach the global minimum. Com-
plex functionals are rarely convex, so more sophisticated minimization methods often
need to used. These methods can be divided into two classes, deterministic meth-
ods and stochastic methods. An example of a deterministic minimization method
is shown in Section 2.2.2, with the Graduated Non Convexity (GNC) algorithm for
ﬁtting a weak membrane to data. Simulated annealing is a common stochastic en-
ergy minimization method [42]. Deterministic algorithms like GNC are not generally
applicable, but can work well when they are applicable as in [12]. More general deter-
ministic algorithms like gradient descent may not ﬁnd global minima, but ﬁnd local
minima instead. Stochastic methods are often computationally expensive, but ﬁnd
good global minima.

Optimization will be used in both uniform and non-uniform integration strategies,

as well as two of the modules that are integrated.

2.1.2 Uniform Integration Frameworks

Integration frameworks can be divided into two groups, the uniform frameworks and
the non-uniform frameworks. Uniform integration frameworks are distinguished from
non-uniform integration strategies by the similarity of treatment of the modules.
An example of a uniform integration strategy used for sensor fusion is Moravec’s
certainty grid from [76]. A certainty grid is a 2D lattice that represents the likelihood
that a given square on the ﬂoor of a robotic workspace is empty. Multiple range
sensor measurements are obtained from a diverse set of sensors on a robot. These
sensor measurements are integrated using Bayesian reasoning to iteratively update
the likelihood that each square is empty based on the latest sensor readings. The

uniformity of this strategy comes from using the same Bayesian updating rule for all

of the different t}‘

Gibbs distribt
work. Gamble. C
as texture. motio
is similar to a lin
tion discontinuit}
edge] or shadow.
module from colc
of a line process.
homogeneous reg
MRF.

Aloimonos pr
Aloimonos SUgge:
lives of the data
disContinuous re!
Conlinuititm. Sin:
la“ the Continu
them than that t
the palaﬂlttters .

1.

MCCaﬁeny I]

l” in I73]. rat

(Tet-lie a Single e

“”0“ shown in

th‘35e terms has
Sir ‘
amt. The mi]

8

35

of the different types of sensors.

Gibbs distributions, and MRFs can also be used as a uniform integration frame-
work. Gamble, et. al. use edge detection in combination with other modules such
as texture, motion, stereo and color [41] to create a good edge map. This edge map
is similar to a line drawing, with each edge labeled as a mark, specularity, orienta-
tion discontinuity (convex or concave edge), occlusion discontinuity (jump or limb
edge) or shadow. The MRF model ﬁrst combines edge detection with another single
module from color, texture, motion, and stereo. Each of these MRF models consists
of a line process, which models the edges, and a continuous process, which models
homogeneous regions. These four MRF modules are then combined into a single
MRF.

Aloimonos proposed a uniform integration framework in [3]. In this framework,
Aloimonos suggests that in addition to modeling the data and constraints, the deriva-
tives of the data and constraints should also be modeled. He calls this framework
discontinuous regularization. Fitting the derivatives has the effect of preserving dis-
continuities. Since the discontinuities in an image are in some sense more important
than the continuities, this idea has merit. This result produces a more uniform treat-
ment than that of Terzopoulos, but has many parameters. A learning algorithm for
the parameters is provided.

McCafferty uses energy minimization to perform integrated perceptual organiza-
tion in [73]. Taking his cue from the Gestalt school of psychology [114], he tries to
create a single energy functional that captures some of the Gestalt rules of organi-
zation shown in Figure 1.5. This functional includes terms for proximity, variance
of proximity, similarity, closure, continuity, and the variance of continuity. Each of
these terms has an arbitrary weight which represents the importance of this con-
straint. The minimization of the energy functional is done by simulated annealing.

He then shows that a single set of weights is inadequate for the task at hand and

argues that these
the examples gm
organization. the

they differ by as r

2.1.3 Non-l

Any integration st
ties discussed in 3
tion is widely used
hne labeling [70]. (
tral integration fr,
a}.

An example of
the quadtree regio
detection With reg
lm‘ and SOlVes a

dim" arise with

 

 

are not generally I
the detection of ter

3
or the planar surfa

t ‘ .

it t '

methods.

locate, examl

shape from Shodin
l

50

«2.1th scheme

36

argues that these weights should be set differently for different perceptual tasks. In
the examples given, which show some interesting and unique results for perceptual
organization, the weights vary wildly. In one example they are all equal, in others

they differ by as much as ﬁve orders of magnitude.

2.1.3 Non-Uniform Integration Strategies

Any integration strategy that does not exhibit the uniformity of the integration strate-
gies discussed in Section 2.1.3 will be classiﬁed as non-uniform. This type of integra-
tion is widely used for integrating pairs of processes, such as shape from shading and
line labeling [70], or shape from texture and motion in [3]. There is also a more gen-
eral integration framework for non-uniform integration, called a blackboard system
[32].

An example of non-uniform integration is shown in [84], where a shortcoming of
the quadtree region splitting and merging algorithm is overcome by combining edge
detection with region grouping. The method is laden with heuristics and parame-
ters, and solves a problem that is unique to quadtree segmentation. This problem
doesn’t arise with other region segmentation paradigms, and therefore the heuristics
are not generally useful. In [1], focus, vergence and stereo were integrated. In [16]
the detection of texture elements (texels) is combined with extracting the parameters
of the planar surface. In this method, texels are selected from a pool of candidate
texels using exhaustive search to ﬁnd a subset of texels that provides a unique pla-
nar interpretation. Both of these methods show improved results over unintegrated
methods.

Another example of non-uniform integration is Malik and Maydan’s integrated
shape from shading and line labeling in [70] where line labeling was used to create
some of the boundary conditions that are necessary for shape from shading. The reg-

ularization scheme of shape from shading is modiﬁed to include an additional term

 

at nonvlimb edger
assumptions that
line drawing as in“:

tion ii. The nece-

 

 

iniormation. norn.
ing restrictions. '1
in that its necessa
shading and line i

synthetic data are

 

In .3] three di
and motion. shape
and stereo. In tht
proofs of uniquene
systematically exa
the insights into a
benoted that sorn
correspondence p:

IILKEES, Ell, her im

r - .

37

at non—limb edges. Both line labeling and shape from shading have numerous strict
assumptions that must be satisfied. For example, line labeling requires a perfect
line drawing as input, and has the general viewpoint assumption, as discussed in Sec-
tion 1.2. The necessary conditions for shape from shading include complete boundary
information, normal directions at the boundaries, a known reﬂectance map, and light-
ing restrictions. This work goes against one of the general principles of integration
in that its necessary conditions are the union of necessary conditions of shape from
shading and line labeling. In fact, the necessary conditions are so strong that only
synthetic data are used in the examples.

In [3], three different non-uniform methods were presented: shape from shading
and motion, shape from texture and motion, and shape and 3-D motion from contour
and stereo. In these integrated modules, careful attention is paid to mathematical
proofs of uniqueness, existence, and stability. These modules are part of a strategy for
systematically examining combinations of modules in hopes of being able to generalize
the insights into a uniform framework, like those discussed in Section 2.1.2. It should
be noted that some of the most difﬁcult subproblems in these areas are not solved. The
correspondence problem, for example, involves matching pixels or features in pairs of
images, either images in a time sequence for motion, or images taken from different
vantage points for stereopsis. In both the shape and 3-D motion from contour and
stereo, and the shape from shading and motion it is assumed that the correspondence
problem has been solved. This problem is difﬁcult. In shape from texture and motion,
the textured surface is assumed to be planar.

When the number of processes to be integrated becomes large, creating a non-
uniform system is more difﬁcult. Typically many people need to be working on
separate parts of the system simultaneously. The modules need to communicate, but
if they do so directly then propagating changes, such as a data structure change in

a single process, may require alterations to many other processes. If this integration

“It

 

I
Figure 2.3. A1

is handled inform

processes WhiCh w

board system ha

solutions called th

that use the data

em. and a control

blackboard data a

38

 

 

 

 

 

 

 

 

 

 

Blackboard
Level I: o
O
Level k-l .
r—
O
O
' o
O
Level 2
Level 1
Bl eckboe rd Schedul ing
Monitor Queues
Focus of
Control
Database Scheduler

 

Figure 2.3. An overview of the Hearsay-II system for speech understanding.

is handled informally, chaos can result. A useful structure for integrating diverse
processes which work on a common problem is the blackboard system [32]. A black-
board system has three fundamental parts: the global database of data and partial
solutions called the blackboard, the knowledge sources that are independent modules
that use the data currently on the blackboard to advance the solution of the prob-
lem, and a control structure that decides which knowledge sources will process what
blackboard data at a given time. An overview of a typical blackboard system showing
the interconnections of these modules is displayed in Figure 2.3 [33].

The blackboard structure was originally designed for the speech understanding
task. A wide variety of other applications of blackboard systems exist including:
ocean surveillance where a group of hydrophones concealed under the water are used
to detect, localize and ascertain the type of ocean vessel within its range [82], the

computer vision problem of object recognition in [31], the analysis of complex aerial

photographs in [7
the planning of III
The hlackboa

|

alternatives. ﬁnal
I

contains hypothes
l
in generality as t I
levels used in [33
and database intel
including relationl
associated with J
control data is a
the focus of corny
The knOWlEdg
the blackboar d a, t
r.
EO’JICeS Consist of
the cor.

ditions un

, .
met tor a partrcul

Said to have been
aspecitic blackbo.
Can perform to irn
Pfil‘ate data struc

Np of related k

ackboarc
is in -
ore Sophistica

Control problems

Palate hl

Ther
e at e Seve

a know]
e
dge Sourr

 

39

photographs in [78], deriving the 3D conformations of proteins in solution in [48] and
the planning of military missions using an autonomous vehicle in [85].

The blackboard is a global database that contains input data, partial solutions,
alternatives, final solutions, and control data. It is organized into levels. Each level
contains hypotheses of speciﬁc data objects at a ﬁxed granularity. These objects grow
in generality as the level number decreases. For the speech understanding task, the
levels used in [33] were parameter, segment, syllable, word, word sequence, phrase,
and database interface. The control data includes relationships among the hypotheses
including relationships like OR, PART-OF, and AND. The credibility or certainty
associated with each hypothesis is also part of the control data. Another type of
control data is a set of scheduling parameters. These parameters are maintained in
the focus of control database.

The knowledge sources are independent modules that take the current state of
the blackboard and seek to improve it using their specialized knowledge. Knowledge
sources consist of two parts: a precondition and an action. The precondition speciﬁes
the conditions under which the knowledge source is useful. If the preconditions are
met for a particular blackboard object and knowledge source, the knowledge source is
said to have been triggered. For a knowledge source to be triggered, it must refer to
a speciﬁc blackboard object. The action is the processing that the knowledge source
can perform to improve the status of the current solution. A knowledge source has a
private data structure which cannot be accessed by any other knowledge source. A
group of related knowledge sources can be combined into a module that may have a
private blackboard that can be shared only among these processes. This organization
is more sophisticated than that shown in Figure 2.3 and engenders some special
control problems.

There are several knowledge sources that must exist in any application. One is

a knowledge source that determines if the processing should be terminated. This

knowledge source
of resources (boti.
important. This l;
helps to determin
current conﬁgurat
inowledge source.
linon‘ledge source
determine which 1
Which select a po;

are stored in the f

 

Blackboard 5.“
tision. The CW“
described in [31l-
c'oor environment
hack pointers whi
in any given part
stantiations. Schc
to control the se
SChemas for deter
sky. road line an<
ll"? groups: featr

11‘: , ‘
m“ model m-
d

piece-55m. \ ‘ith.
in ~ '
leboaI-d. Th
e
[rift-v

40

knowledge source should check both for successful completion and for the exhaustion
of resources (both memory and CPU time). The strategy knowledge source is also
important. This knowledge source examines the blackboard structure as a whole and
helps to determine which of several predefined strategies is most beneﬁcial given the
current conﬁguration. This knowledge source does not have information about other
knowledge sources, since this would violate independence. Therefore, the strategy
knowledge source doesn’t perform scheduling. It can adjust the parameters that
determine which levels of the blackboard should be emphasized and the parameters
which select a portion of the input for more intensive processing. These parameters
are stored in the focus of control database.

Blackboard systems have previously been used for image processing and computer
vision. The UMass Schema system is a blackboard system for computational vision
described in [31]. This system performs object recognition in a highly textured out-
door environment. The system blackboard consists of object hypotheses, conﬂicts and
back pointers which store information about which objects have been hypothesized
in any given part of the spatial plane. The object hypotheses are called schema in-
stantiations. Schemas are high level modules designed from basic knowledge sources
to control the search for a particular object. Within the house schema, there are
schemas for detecting foliage, grass, roofs, walls, shutters, windows, telephone wires,
sky, road line and road shoulders. The knowledge sources that are available fall into
ﬁve groups: feature based classiﬁcation, perceptual organization and grouping, geo-
metric model matching, relations (spatial, geometric, hypothesis space) and low level
processes. Within a schema there are internal hypothesis, strategies, and a local
blackboard. The internal hypotheses consist of image events (tokens) and endorse-
ments that summarize the positive and negative evidence for the hypothesis. The
strategies include simple control programs, control over the invocation of knowledge

sources, and control over the invocation of lower level schemas. Within every schema

Figure 2.4. Both a
as a vertical ans v

there is an object 3
blackboard. This
scribed in two waft
have their own loc

heing focused on 0

Although the it:

 

the schemas thmSI
be sufﬁciently gene
“leech understar.

‘93 0f sentences ra

l'ndouhtedly the r

notheri agei
the.

l he Omair
T“goal

41

a

A

 

 

Figure 2.4. Both a tree and a telephone pole satisfy the generic description of a tree
as a vertical axis with transverse branches.

there is an object hypothesis maintenance strategy (OHM) which updates the global
blackboard. This work differs from a general blackboard system as previously de-
scribed in two ways. Instead of using the global blackboard exclusively, the schemas
have their own local blackboards. Control is distributed to the schemas instead of
being focused on one uniform strategy.

Although the knowledge sources in this implementation retain general knowledge,
the schemas themselves are sharply focused. While this strategy is useful, it may not
be sufﬁciently general. The analogue of the focused image understanding schemas in
a speech understanding framework might be to have the system look for a speciﬁc
set of sentences rather than exemplars of a grammar with a restricted dictionary.
Undoubtedly the recognition will be better, but whether the loss in generality is
justiﬁed by the increased performance is debatable. The other diﬁculty with this
type of focusing is that objects not within the domain may be classiﬁed as belonging
to the domain. An example is shown in Figure 2.4. If a tree is deﬁned as a vertical
axis with transverse branches attached, then both objects in Figure 2.4 are trees.

Another image understanding application that has used a blackboard is described
in [78]. The domain is the interpretation of complex low altitude, aerial photographs.

The goal is to segment the image into the following categories: crop ﬁeld, bare soil,

 

.' ._',
0: 34'“.
[315 5?:
to War

Bia

42

forest, grassland, road, river, car and house. The blackboard stores the current seg-
mentation, basic properties of the regions (which are the smallest blackboard entities),
objects, (which are composed of groups of regions), spatial relationships between re—
gions and objects, (both spatial relationships and dependencies, and contradictions),
and the global imaging parameters. Each object detection subsystem (knowledge
source) returns symbolic evaluations of regions to the global blackboard. These sym-
bolic evaluations are used by object recognition knowledge sources to recognize ob-
jects. Scheduling is performed by sequentially hypothesizing that each region is one
of each of the eight basic types and reevaluating conﬂicts. The difference between
this system and an ordinary blackboard system is in the control scheme. There are
no waiting queues, scheduler, or blackboard monitor.

Blackboard systems have become an important framework for integrating knowl-
edge into a system which permits subsystems to both compete and cooperate to ﬁnd
globally good solutions. Image processing is a natural domain for blackboard systems
because of the large amount of data, the need for multiple processes to interpret the
data, and the ﬂexible control mechanism. Work is being done on specialized architec-
tures for blackboards, concurrency and parallelism, and real time blackboard systems

[54]. This bodes well for the future of blackboard systems in computer vision.

2.1.4 Selecting an Integration Framework

Blackboard systems have signiﬁcant advantages over other integration frameworks
for the research done in this thesis. Both discontinuous regularization and Markov
random ﬁelds minimize complex energy functionals. As the number of processes
integrated grows, the energy functionals may become highly non-convex which may
make energy minimization difﬁcult or impossible. It is also unclear how a process like
line labeling could be incorporated into these other frameworks, although [70] may

provide a method, subject to earlier caveats.

43

Blackboard system modules can be altered without undue hardship since knowl-
edge source independence is maintained. This is important since there may well be
signiﬁcant improvements in some of the modules in the future. This type of ﬂexibility
permits a system to evolve as the understanding of computational vision increases.

Another advantage of blackboard systems over the integration frameworks dis-
cussed above is that the blackboard has an independent control module that can
strategically explore the space of solutions. In the other integration methods, no
such control structure exists. Bringing these control issues to the forefront, offers an
opportunity to examine the interaction among the various modules. Methods with-
out elaborate control structures cannot provide this type of introspection into their
processing. Using a blackboard for an initial implementation of an integrated system
can lead to an improved understanding of successful control structures and make cre-
ating a less ﬂexible but more efﬁcient system easier. These advantages indicate that

a blackboard system is a reasonable choice for this integrated system.

2.2 Selection of Modules for Integration

Now that the blackboard framework has been selected, the choice of both general
module types and speciﬁc paradigms must be justiﬁed. First the general module
types will be selected. Although the list of potential modules is vast, the main
modules that were considered for use in this thesis were shape from shading, texture,
motion, stereo, edge detection, line labeling, symmetry, color, continuity, and shape
from contour. For this thesis, the processing will use a single grayscale intensity
image. This eliminates modules such as motion, stereo, and color. It should be noted
that these modules could be included in an enhanced version of this system.

There are three other criteria for modules: relevance, compatibility, and minimal-

ity. Relevance means that a module must provide some part of the 2.5 Dimensional

 

’9‘!

L
at?

v

SlﬂCt

line

Olr

3a

C011

44

sketch or provide some information that can be used to get part of that sketch. Com-
patibility between modules is also important. Modules need to be able to use either
raw input, or the imperfect results of other modules. Minimality is a practical con-
sideration. It is desirable to use the smallest number of modules to accomplish this
goal initially. Since the blackboard system allows for future expansion, other modules
can be integrated later as the system evolves.

Within the candidate list there are two modules that unequivocally must be used.
The ﬁrst is edge detection and the second is line labeling. Edge detection is necessary
since it takes the raw intensity data and provides the initial image tokens for analysis.
Line labeling is unique since it provides the 3D interpretation of the line drawing.
Therefore both of these modules are highly relevant.

The continuity module, (called co—curvilinearity in [75]) was selected on the basis
of compatibility and relevance. Compatibility is established since this module uses
as input the output of edge detection. Continuity is relevant because it provides a
bridge between the raw edges and a line drawing.

The choice of additional modules becomes more nebulous at this point. The ﬁnal
module selected was symmetry. Symmetry is relevant since it can use either the results
of edge detection or continuity. Symmetry can provide evaluations of the goodness
of regions that will be helpful in discovering and correcting both edge detection and
continuity problems.

Shape from shading or contour could also have been used. In fact, either would
be a good choice for the next module. Texture could also have been used, although
texture segmentation is a very difﬁcult problem even in artiﬁcial situations. The
ﬁnal choice of type of modules is: edge detection, continuity, and line labeling. The

continuity algorithm, which will be discussed in Section 3.2, is a contribution of this

thesis.

In?

Out

E53.

I’ot.

45

2.2.1 Evaluating Proposed Modules

With the general type of the modules ﬁxed, there are many choices for the speciﬁc
paradigms. The criteria discussed in Section 2.2 apply to the paradigms as well. The
blackboard paradigm makes it possible to implement and test alternate paradigms
within each module, but this is a time consuming procedure so paradigm decisions

should not be taken lightly.

There are eight criterion for paradigm selection.

Input Availability: The input to the paradigm must be available from either the

input image or another module.

Output Usefulness: The output of the paradigm must either be useful to another

module or be part of the 2.5 Dimensional sketch.
Real Data: The paradigm must be able to work with data that is not synthetic.

Robustness: The paradigm should result in only small changes in output if the

input is changed slightly.

Efﬁciency: The paradigm must be able to run in a reasonable length of time on

current hardware.

Potentially Parallel: It is preferable that all paradigms be potentially massively
parallel. Since work is being done on blackboard systems which use parallelism,

this will make it possible to implement this system in parallel later.

No Detailed Object Models: The paradigm may use only general constraints

(such as smoothness), and may not use speciﬁc object models.

Compatible with Object Domain: The paradigm must be able to work in the

domain of objects discussed in Section 1.2.

sect;
era:

'1
nuts

ﬁre j
thes

inter

46

For each module type a paradigm needs to be selected that meets as many of these
criteria as possible. For many of the modules, there will be no paradigm that meets
all of these criteria, therefore compromises must be made. These compromises will
be discussed as they are encountered. Line labeling was discussed in Section 1.1.3.
Section 2.2.2 will discuss the choice of edge detection algorithm. Section 2.2.3 will
examine continuity, although the ﬁnal choice of continuity module will not be made

until Section 3.2. Symmetry will be discussed in Section 2.2.4.

2.2.2 Edge Detection

Boundaries of imaged objects are psychologically important, can help initiate the
processing of other modules, and are useful for object recognition. The usefulness of
boundary contours for other vision modules can be seen by examining the number of
other modules that use boundary information to initiate processing. Line labeling,
shape from shading, and symmetry detection all rely on object boundaries being reli-
ably available. Lowe and Biederman show the usefulness of the boundary contours of
an imaged object to the recognition problem. Lowe uses the boundary in combination
with 2D to 3D inference, to contribute to the 2.5D sketch. He also uses the boundary
to bootstrap object recognition, as the bicycle in Figure 2.5 shows. Biederman’s geons
are instantiated using boundary based information. Many of the ﬁgures shown in this
thesis contain only boundary information, and yet still provide ample information for
interpretation.

Edge detection is one of the earliest problems addressed in the vision literature (see
[7], [63], or any other introductory text for simple edge detection methods). Torre and
Poggio discuss the optimality of edge detection in [109]. It was shown that several edge
detection schemes are near optimal. This has an ominous conclusion: edge detection
may not be able to do much better than what it is already capable of, as was shown

in Figure 2.1, and Figure 2.2. There has been much discussion of performing edge

47

/\\\\ /\\\‘
x \ /// \ ///
/ /\-'//\ \ / /\—//\\\
_// \ —// |
/ \/
(a) \ (b)

Figure 2.5. This broken line drawing of a bicycle is very difﬁcult to recognize in (a).
With the addition of one segment, recognition is much easier in (b) since the front
wheel of the bicycle can now be grouped into a complete boundary [67].

detection at multiple scales (i.e. different parameter values) [72, 109, 88, 24]. If edge
detection as it currently exists is in fact near optimal, then the extraction of complete
boundary information from images must be attacked using other means.

One of the most commonly used recent edge detectors is the Canny edge detector
[24]. A recursive version of Canny’s edge detector was reported in [30]. Although
Canny edge detection has been widely used, it was not selected for this work because
of two shortcomings. The ﬁrst shortcoming with Canny edge detection appears at
corners of regions. Vertices are very important to line labeling, and edge detector
that does not give good vertices is not helpful. A second shortcoming of Canny
edge detection is in the thresholding scheme. The thresholds are set using a global
histogram of the image. This method causes problems when images have very different
numbers of real edges. For an example, see Figure 2.6. Since it is preferable for this
thesis that the edge detection perform well with a single set of parameters, Canny edge
detection was eliminated from consideration. It should be noted that if the parameters
were set independently for each image, Canny edge detection would perform very well.
This problem with edge detection was discussed in [83].

The method of edge detection that was selected for this work is ﬁtting a weak

 

<\.I.lkﬂ
l
c

48

 

Figure 2.6. Canny edge detection is run with the same parameters of image that are
similar except for the number of real edges. (a) An image with very few edges. (b)
An image with many more edges. (c) The output of the Canny edge detector run on
(a). (d) The output of the Canny edge detector run on (b). There is no single set of
parameters that will give good edges on both of these images. The parameters used
in both of these images were a = 2, masksize = 17, lower threshold = .60, higher
threshold = .75. Many other combinations of parameters were tested.

“P
in:

the

49

membrane model using the Generalized Non-Convexity (GNC) method of Blake and
Zisserman [13]. This method has two advantages over Canny edge detection. It has
only two parameters instead of four, which greatly simpliﬁes setting parameter values.
Secondly, the proportion of edges in the image is not used in the computations. The
images shown in Figure 2.6 can both be segmented with the same set of parameters.

The weak membrane model is ﬁt to the image data using an energy minimization
technique, ON C. It has a one dimensional equivalent called the weak string that will
be used for illustrative purposes. The energy functional in the weak string is made
up of three terms. Let u,- be the weak string solution to the functional. Let 2:,- be the
initial data. Let I,- be a binary function such that I.- = 1 corresponds to a break in

the string. The energy functional E for the weak string is deﬁned to be.
E = 2“!“ - 1';)2 + A20 - 1,)(u; - 11,-4.1)2 + 01;) (2.1)

The ﬁrst term penalizes the solution u,- for being far away from the data 2:5. The
second term penalizes u.- for being far from its neighbor. This is a ﬁnite element
approximation of the ﬁrst derivative. This second term contributes to the energy only
when the string is unbroken. The ﬁnal term is a penalty for breaks in the string. These
three energy penalties together encourage the weak string to be simultaneously close
to the data, smooth, and unbroken. The parameters are set to balance smoothness
with breaks. If A is large, the string will be heavily penalized for not being smooth,
and will tend to break often. If a is large then the string will be broken less frequently,
but will not be as smooth. The two dimensional counterpart of a break in the string
is a tear in the weak membrane. The set of breaks in the weak membrane correspond
to the edges of the detected regions.

The energy functional needs to be minimized both with respect to 11,-, and with

respect to 1;. This problem is not convex. A problem that is convex has a single global

 

CE

I“.
an"

62

F!

to

c: ‘

50

energy minimum that can be reached by successively taking a step in a direction
that decreases the energy. When a problem is not convex, it has a set of local
energy minima that can be mistaken for the global minima and confound energy
minimization. In [13], it is proved that the energy functional is convex with respect
to u.- if I,- is ﬁxed. The energy functional is not convex with respect to 1;. To avoid this
problem, GNC creates a functional that is similar to the original functional except that
it is everywhere beneath the original function, and is convex. This functional will be
minimized with respect to 1;. GNC ﬁnds the function with minimal energy from this
convex functional and uses this solution as the starting point for the second functional.
This second functional is more similar to the original than the ﬁrst functional and is
between the original functional and the ﬁrst approximation. The second functional is
not convex, but is nearly so. GNC then ﬁnds the function with minimal energy from
this nearly convex functional and uses this function as the starting point for a third
functional which will be a bit closer to the original function and a bit less convex. This
process is repeated with the similar functional approaching the original functional
and becoming increasing less convex with each iteration. When the similar function
is suitably close to the original, the original functional is solved for u;. This procedure
has been shown to converge to the global minimum energy for the weak string, the
weak membrane, the weak rod, and the weak plate models if the discontinuities are
much further separated than the scale parameter A. GNC was compared to the
stochastic simulated annealing method on the weak string problem in [12], and was
found to have been closer to the optimal solution and more computationally efﬁcient.
GN C is not applicable in many instances where simulated annealing would be.

One advantage of selecting GNC for the edge detection module is that it tends to
ﬁnd thick but complete boundaries. An example can be seen in Figure 2.7. Another
advantage is that GNC has only two parameters, both of which have a physical inter-

pretation. These parameters can be changed into another pair of parameters which

51

 

Figure 2.7. (a) An intensity image. (b) The output of running GNC on this image.

corresponds to scale, and contrast sensitivity. The contrast sensitivity represents the
minimum difference in average grey level that will produce a break in the membrane.
The parameters were set initially, after experimentation, and were not altered during
the course of processing all of the images shown in this thesis. A shortcoming of GN C
is efﬁciency. Although it could be parallelized, its serial implementation is slow. On
a 240 row by 320 column image in the test set of images, Canny edge detection took
10.69 seconds. GNC run on the same image (under similar system load) took 215.12
seconds. Since it was not the intention of this research to create a real time system,

this compromise can be made.

2.2.3 Continuity

Edge detection alone cannot always provide perceptually meaningful lines and curves.
The purpose of a continuity module is to take the edges that are detected and organize
them into more perceptually meaningful structures, as was discussed in Section 1.1.2.
This can be done by grouping neighboring image tokens, as perceptual organization

would suggest. It could also be done by ﬁtting a high level model to the data. If a

52

high level model could be properly selected, it could produce globally good solutions
instead of locally good solutions. This would be propitious, but no known model is
sufﬁciently general to ﬁt all types of data.

Much of the previous research in this area has been dedicated to detecting straight
lines instead of curves [23, 105, 50]. The simplicity of straight lines makes them an
attractive model, particularly since ﬁtting techniques for straight lines are simple and
well understood [93]. As long as the object domain is polyhedral, lines are adequate.
Linear models are not adequate for piecewise smooth objects.

An example of global organization is the Generalized Hough transform (GHT).
The GHT [50, 6, 53] groups edge fragments into parameterized objects. Hough trans-
forms were initially used for detecting straight lines in images. In this method, each
edge casts a vote for every set of straight line parameters that are consistent with
its position in the image plane. The set of parameters receiving the most votes is
taken to represent the structure that is present in the image. In the linear case these
parameters could be the slope and intercept (although a parameterization with bet-
ter computational properties is usually used). In the circular case, there are three
parameters: radius, and the (2:, y) position of the center.

Hough spaces become difﬁcult to manage when the number of parameters is large
and have several other well documented shortcomings [45] including a relatively high
probability of recognizing an incorrect grouping in noisy images. Biederman has
demonstrated that the human visual system responds better to missing edges that can
be recovered by perceptual organization, than those that cannot. Figure 2.8 shows
three versions of a goblet. If a Hough space approach were used, the middle and
right ﬁgures would be equally recoverable since approximately the same percentage
of boundary has been removed. The human visual system recovers the middle goblet
more easily than the right, thereby demonstrating that the GHT is not what is used

in human perception.

 

.
(in r1

52

[)4
h

53

o F2 m
i T: ll
K) v

Figure 2.8. On the left a complete goblet is shown. In the middle, a degraded
goblet with contours deleted in regions where perceptual organization can recover the
original. On the right is a degraded goblet with contours deleted in such a way that
perceptual organization cannot recover the goblet. The left and middle versions can
be easily recognized, but the right cannot.

A continuity scheme which uses local criterion with a global model is Lowe’s
implementation of perceptual organization. Lowe emphasized recursively grouping
adjacent edge points into lines at multiple scales [67]. This grouping was done by
examining triples of points and deciding whether they are related by proximity alone,
or whether they should be grouped into a line. This decision was made on the basis
of the angle 0 as shown in Figure 2.9. Lowe uses (2t /1r) to represent the probability
that the pixels are related only by proximity and not by direction. When t is close
to w/ 2, it is unlikely that these pixels represent a linear structure. This processing
is shown to work very well in the linear case, and in a single example of the circular
case. Grouping circular arcs was handled in [103]. Fitting proximate edge points
to circles is of questionable value since there are a limited number of circumstances
where projection to circles is achieved. Spheres project to circles, and circles project
to circles if the viewPoint is constrained (as in aerial photography, the application
that was used for an example in [67]). The work has not been extended to ellipses

or other more general models and as such cannot be used in the piecewise smooth

object domain.

has?
hr pron.

2.2.4

Sumo
ngiS Wli
rnym
mane
him:
rudui

hd
and She
ﬁuded
l5 the s

ills ex

54

K .

Figure 2.9. This middle pixel is likely to be related to the two endpoint pixels only
by proximity and not by direction since the angle t is relatively large.

 

2.2.4 Symmetry

Symmetry has been widely studied in psychology, starting with the Gestalt psycholo-
gists who recognized its importance in grouping image elements [114, 61], and contin-
uing into more modern studies [8, 57]. Symmetry can be used in at least two different
contexts. In the ﬁrst context, symmetry can be used to recover shape information.
In [108] 3D surface shape was reconstructed from the contour of an image using a
regularized boundary value problem methodology when given a rough symmetry axis.

In the second context, symmetry can be a cue that image entities might be related
and should be hypothesized as grouped entities. A question that has been widely
studied is the exact deﬁnition of symmetry that should be used. Bilateral symmetry
is the simplest form. The image of an object is bilaterally symmetric if a straight
axis exists in the image plane such that all lines perpendicular to the axis intersect
opposite sides of the ﬁgure at points that are equidistant from the intersection of
the axis and the perpendicular line. Several methods for the detection of bilateral
symmetry have been proposed [71]. In [80] the conditions for existence of bilateral
symmetry in the image plane were investigated. It is shown that bilateral symmetry
exists in the image plane only when orthographic projection is being used on objects
that are local surfaces of revolution, under general viewpoint conditions.

Skew symmetry is more general. If projection is orthographic, the image of an
object is skew symmetric if a straight axis exists in the image plane such that all lines

that meet the axis at a ﬁxed angle 0 intersect opposite sides of the ﬁgure at points

55

Figure 2.10. There are an inﬁnite number of cross sections for this skew symmetric
pair of lines. Two possible cross sections are shown. Connecting the lines into a
parallelogram gives a unique cross section. The dotted line is the symmetry axis and
triples of closely spaced lines graphically display possible 0 values.

 

 

 

that are equidistant from the intersection of the axis and the angled line. Perspective
projection requires a small modiﬁcation to this deﬁnition. If an object is planar and
bilaterally symmetric, its projection in the image plane is skew symmetric. Unlike
bilateral symmetry, skew symmetry is ambiguous in that a unique angle 0 cannot
always be found. The left side of Figure 2.10 shows two different 0 values that both
satisfy the deﬁnition of skew symmetry. When the line ﬁgure is closed with ends, as
in the right hand side of Figure 2.10, the ambiguity is eliminated. Progress has been
made in the detection of skew symmetry [39]. A computationally expensive set of
heuristics for ﬁnding symmetry within an image was suggested in [96].

Several heuristic deﬁnitions of symmetry have been proposed [18, 17, 20] and
reviewed and compared in [98, 90, 91]. These deﬁnitions are more general, in that
they allow objects to be called symmetric that do not ﬁt into the strict mathematical
framework for either bilateral or skew symmetry. For example, a gourd, as shown
in Figure 2.11 is neither bilaterally nor skew symmetric. The axis shown, however,
does indicate some type of symmetry. Leyton investigated the relationship between
curvature extrema and symmetry axes and found that each curvature extrema has a
unique symmetry axis that terminates at the curvature extrema [65].

A more recent work was done by Leymarie and Levine in [64], who use a grassﬁre

transform to ﬁnd the skeleton of regions. A skeleton is an informal symmetry axis.

56

Figure 2.11. This gourd is neither bilaterally nor skew symmetric, yet the dotted axis
shown displays some type of heuristic symmetry

In this method, a distance transform is found in the interior of a single segmented
region. A snake, which is like a weak string or weak rod without the discontinuity
terms [59], is placed at the boundary and given a weight. A distance transform is
used as a potential surface where the distance from the boundary is thought of as
a height. The snake is permitted to fall down the potential surface into the valleys,
which are equidistant from both sides of the image giving a skeleton. An advantage of
skeletonization over strict symmetry deﬁnitions is that a heuristic axis can be found,
even if the region isn’t mathematically symmetric. A disadvantage is that there is no
hard and fast decision made about whether the region is or is not symmetric.

Gross and Boult [47], and Gross [46], developed a system called SYMAN, with two
methods for detecting skewed symmetries in real images. The ﬁrst method is moment-
based. Since any moment-based method depends on the region being complete, this
method will not work in the face of occlusion. They also proposed a local method
of ﬁnding symmetry. In this method a necessary condition for a symmetric pair of
points and their tangent directions is found. All pairs of points on the contour then
get put in a bin on the basis of the slope of the line joining the points. In these bins,

the necessary condition is used to ﬁnd the sharpest peak in the angle of skew in this

57

 

 

 

Figure 2.12. This region is the union of two skew symmetric regions. SYMAN
alone cannot analyze the symmetry. The Grassﬁre Transform cannot make symmetry
decisions. But together they may be able to segment the region at the dotted line
and verify that both subregions are symmetric.

bin. The bin with the sharpest peak in calculated angles of skew is taken to be the
correct symmetry axis. SYMAN, unlike the grassﬁre transform, gives a hard and fast
decision about whether a region is skew symmetric.

Both the methods of Leymarie and Levine [64] and Gross and Boult [47, 46] will
be discussed. These methods have complimentary properties that can be used to good
advantage in combination. As an example, SYMAN could be used to indicate that
the region in Figure 2.12 is not skew symmetric. SYMAN can not tell that this region
is the union of two skew symmetric regions. The Grassﬁre Transform will produce an
axis that is broken at the intersection of these regions. If the region is separated at

this break in the axis, SYMAN could verify that both regions are skew symmetric.

This cooperation will be explored in Chapter 5.

2.3 Summary of Choices

This Chapter explains a group of choices that were made that establish the basis for
the POLL system. There were three categories of choices. The ﬁrst choice made was
the selection of an integration framework. The second set of choices made was the

selection of the general types of modules that should be used in the framework. The

r pron
pestle
nodu

hoard

SCCOHC
draw
it is cc
1116 iii
Paradi
be the
their;
which
The 1
0nd v
imple

Stair

58

ﬁnal set of choices was the selection of a particular paradigm for each of the previously
selected modules.

The blackboard framework was selected as the integration framework because
it provides modular independence, and a ﬂexible control structure. Modular inde-
pendence is particularly important since research is still being done on in all of the
modules that were selected for this system. The availability of a public domain black-
board shell also factored into this decision.

Four modules were selected for integration, along with paradigms for each. The
ﬁrst module was edge detection. This module was necessary since the other mod-
ules depended on it. The weak membrane [13] will be used to detect edges. The
second module selected was line labeling, this was necessary to get the labeled line
drawing desired as output. Malik’s curvilinear line labeling with be used [69], since
it is compatible with the object domain described in Section 1.2. A continuity mod-
ule will be used to bridge the gap between edge detection and line labeling. The
paradigm selected was the author’s [110] curvilinear grouping algorithm, which will
be discussed in Section 3.2. The ﬁnal module selected for integration was a sym-
metry module. Since there are two types of symmetry paradigms available, both of
which have advantages, two different paradigms were selected to be used in concert.
The ﬁrst paradigm was the Grassﬁre Transform of Leymarie and Levine. The sec-
ond was SYMAN from Gross and Boult. The symmetry knowledge sources were not

implemented in software, but uses for these knowledge sources will be discussed in

Section 5.2.1.

CHAPTER 3

Obtaining an Initial Line Drawing

Line labeling is a method of extracting 3D information based solely on the legal
conﬁgurations of junctions that arise in the physical world. A labeled line drawing
is a very terse description of image data. This has both beneﬁts and shortcomings.
All that is needed to represent an image as a line drawing is an attributed list of
junctions, and an attributed list of the arcs. The attributes for junctions are the
junction type (L, curvature L, phantom, arrow, Y, three tangent, T, or terminal),
and the names and ordering of the arcs at the junction. The arcs only attribute is
a line label (jump, limb, convex, concave). While this simplicity helps to limit the
computational complexity, it makes it impossible to recover from mistakes made in
the creation of the initial representation. Even within the more simple polyhedral
object domain, Sugihara had to do hand editing of simple images in order to be able
to analyze line drawings [105]. The piecewise continuous object domain is much more
complicated and subtle, as will be demonstrated in Section 3.3. If recovery from early
mistakes cannot be made, the system will not be robust.

An overview of the early processing is shown in Figure 3.1. This processing is used
to analyze the input image and create a reasonable approximation of the structure of
the image for further analysis. Weak membrane edge detection is run on the input

image. The detected edges are grouped using a continuity module, and have vertices

59

 

 

 

 

 

 

 

 

1...... 1m. 8" as... M 0.... ”We as... l
Detection - Proximity

Salad 36‘ Select Best

 

 

 

Arcsendlunctiou

 

 

 

l (ludfylunaions

 

 

I InitielObjectHypotlmis

 

Figure 3.1. An overview of the initial processing.

assigned at their endpoints. These vertices are collected using spatial proximity to
group junctions for line labeling. The junctions are then classed and typed. This
gives sufﬁcient information for the ﬁrst pass of the line labeling algorithm that will
be discussed in Chapter 4.

Since it is expected that none of the processes utilized in this thesis will produce
perfect results, a richer description of the image data must be made than what would
be necessary for a line drawing. This description will be discussed in Section 3.1.
In Section 3.2, an algorithm for grouping curves using proximity and continuity will
be discussed. Section 3.3 will discuss the processing that is used to create the initial
representation of the imaged scene, including creating the ﬁrst line labeling hypothesis
for the image. The ﬁnal section, Section 3.4 will show examples of this processing

being used on both real and synthetic images.

61
3.1 The Representation of Piecewise Continuous

Objects in a Blackboard

In order to perform line labeling on the data that is obtained by processing the
input image, a blackboard database will be created. This database will contain a
variety of data, including the initial intensity image, the regions, curves, vertices, and
objects that are output by the knowledge sources. To discuss the organization of the
blackboard system being used in this research, some vocabulary is needed. The terms
that are used here are those used in GBB (Generic Blackboard), the blackboard shell
used in this thesis, and are given in [40]. In [19], similar ideas were discussed but with
different terminology. The terms to be used in this thesis are taken from the former.
The equivalences to ISR [19], a blackboard shell that was designed speciﬁcally for
vision tasks, will be given once in parentheses. The term “blackboard” used by GBB
will be replaced with “GBB blackboard” so as not to confuse the generic blackboard
ideas discussed in Section 2.1.3 with the speciﬁc implementation of those ideas in
GBB.

A GBB blackboard (hierarchy of frames) is a hierarchical tree structure composed
of GBB blackboards and spaces. Spaces (frames) are used to contain and index units.
Units (tokens) are the basic blackboard objects. When a unit is created it is stored
on a space. Such a unit is said to have been instantiated. Spaces are stored, in turn,
on a GBB blackboard. Indexing of units on spaces is optional in GBB, but in image
processing it can improve the efﬁciency.

Units hold the hypotheses for entities that are found in the image. For the curvilin-
ear line labeling paradigm, for example, there are units for the arcs and the junctions.
Each unit can store three types of information. Slots (features) hold the basic data,
such as the name and label of a junction. Indexes (spatial indexing grid from the

example in [19]) determine how the unit will be stored on the space. GBB permits

62

a wide range of indexing possibilities, including Frenet box ranges which are useful
for units which are not restricted to a single point. The ﬁnal type of information is a
link (handle). Links relate a given unit to other units. Different types of units may
be related using links.

The units used in this blackboard system are as follows.
Blame: the initial input data before analysis.
Figure: a group of regions, curves and vertices.
Region: a two dimensional array of spatially connected pixels
Curve: a one dimensional array of spatially connected pixels
Vertex: a point at the ends of a curve

All of the units could have been stored on a single space since they are all indexed
in the same coordinate system. This would have kept the database overhead to a
minimum, but would have made retrieval of a particular unit type less time efﬁcient.
By placing each type of unit on a separate space, retrievals can be done more quickly.
The price for the retrieval efﬁciency is some additional overhead for the separate
spaces. Since it is the units themselves that take the majority of the storage, and not
the database overhead, separating the spaces does not decrease the storage eﬁciency.
The spaces are all children of a single GBB blackboard.

Each of the prOposed modules uses a subset of these units for its processing.
Table 3.1 summarizes the mapping between modules and blackboard objects. The
slots for the units reﬂect the needs of the modules using that unit. Some slots store
information that is used by only one module. As an example, the frame unit stores
the ﬁlename of the input data, and all units have some variables stored to index into
the appropriate space. In GBB, information hiding cannot be done, as it could be in

a system such as Schema [31], so all slots are accessible to all modules, even though

63

Table 3.1. The mapping between modules and blackboard units.

M Unit

'ty curves

Symmetry
curves
Proximity curves
vertices

curves
vertices

 

they may not be used by all modules. Most slots store information that is used by
multiple modules. All units have a belief slot which gives a rough interpretation of
the importance of this unit at the given point in time. If a hypothesis has belief
zero, it is not used in further processing although it remains on the space unless it is
explicitly removed. Although removing a hypothesis with belief 0 would save storage
space and speed retrievals, it would also make it impossible to recover from a mistake.

The links between units can be divided into four categories. All links are bidirec-
tional, although this is not dictated by GBB but rather by the needs of this appli-
cation. Adjacency links describe spatial adjacency relationships. An example of an
adjacency link is the neighboring-curves link in the curve unit. This link joins two
curve units that are spatially adjacent. Adjacency links can also join different unit
types. An example of this is the link between regions and their boundary curves.
The region side of the link is called bounded-by-curve, and the curve side is called
bounds-region. A second type of link also exists between regions and curves. This
link is used for curves that are in the interior of regions. Units of the same type are
also linked as supporting-hypotheses and competing-hypotheses. The ﬁnal class of
links used are supplementary hypotheses. An example of a supplementary hypothesis

is the axis of symmetry of a region, which is stored as a curve.

64

While Figure 3.1 describes the ordering of the processing, it does not adequately
describe the blackboard implementation of the processing. Since the knowledge
sources are independent, they do not call each other directly. The initial process-
ing is controlled by the blackboard itself. As an example, the creation of a new region
triggers the curvilinear grouping module to create the edges. This is not done by
the module that created the new region, but is done as a consequence of placing a
new region on the blackboard. Similarly, the curvilinear grouping module does not
directly call the vertex module. The vertex module is automatically called by the
blackboard when the curve unit is instantiated and placed on the blackboard. This

type of processing greatly enhances the consistency of the blackboard.

3.2 The Continuity Module

Curvilinear grouping was recognized as perceptually important by the Gestalt psy-
chologists [114]. While grouping points into straight lines is frequently done in com-
putational vision [23, 67, 102], curvilinear grouping is less common. In the piecewise
smooth object domain used in this thesis, it is necessary to analyze both lines and
curves since the domain objects have both linear and curved edges.

Trytten and Tuceryan [110] designed an algorithm for grouping curvilinear edges.
This method takes edges detected using the Canny edge detector [24] and segments
them into smooth edgels. Edgels are a one dimensional representation of adjacent
edge points as a function of a parameter similar to arc length. A generalized Voronoi
neighborhood structure is then found for the set of edgels in the image and neigh-
boring edgels are tested for continuity. Starting with the longest (and hence most
perceptually signiﬁcant) edgels, hypotheses are created that the neighboring edgels
should be combined. These hypotheses are evaluated for smoothness and the best

hypothesis is grouped, if any smooth hypothesis is found. The full algorithm is shown

r-O

65

in Figure 3.2.

The Voronoi neighborhood is deﬁned as follows. Let H p(P, Q) be the half plane
which contains P formed by the perpendicular bisector of the line joining points P
and Q. The Voronoi neighborhood of a point P is deﬁned to be the intersection over
all Q of H p(P, Q). This is shown graphically in Figure 3.3. The Delaunay graph is
the dual of the Voronoi tesselation. In [2], Voronoi neighborhoods are shown to be
more perceptually signiﬁcant than other similar graphs. Implementation details are
given in [92].

The smoothness criterion used is similar to the weak string discussed in Sec-
tion 2.2.2. Since the weak string can only detect breaks, i.e. spatial non-adjacencies,
it cannot ﬁnd corners. A second model called the weak rod can be used to detect cor-
ners. It uses a second derivative term for regularization instead of the ﬁrst derivative
regularization term used in the weak string. Let u,- be the desired function. Let 1:.- be
the input data. Let lc, be a binary function that determines whether the rod is bent

at location i. The energy functional E for the weak rod is shown in Equation 3.1.
E = 2((14; - $02 + #(1 - kr)(u.‘+1 - 2U." + 115-1)2 + ﬂki) (3-1)

The weak rod energy functional shown in Equation 3.1, was modiﬁed for use in this
work. A ﬁrst derivative regularization term was added to improve the smoothness of
the solution. It should be noted that having a break in the rod where k,- = 1 will not
eliminate the ﬁrst derivative term from the energy functional, but will only eliminate

the second derivative term. The mixed string/weak rod energy function is shown in

Equation 3.2.

E = :01; - $02 + A(u,- — Its-+1)2 + M1 - k;)(u;+1 - 2U; + tit—1)2 + 3"; (3-2)

66

 

Algorithm 1

Input: Gray level image
[ Perform edge detection
[ ledge +— Result of preprocessing the edge image
Compute adjacency graph of the edgels in 1:49,
E 4— Sort edgels in [edge in descending order of length
G «— {}, the set of grouped edgels
foreach edge] e E E in descending order of length
5 +- {l
N (e) «— set of neighbors of e
foreach e’ 6 N (e)
de +— Delaunay edge between 6 and e’.
Construct a grouping hypothesis H consisting of the edgels (e,de,e’)

U «— Fit an energy minimizing curve to H ﬁnding curvature discontinuities
simultaneously '

S +- S U {U}
end
Select from S the edgel that has no discontinuities and has the minimum
energy.
if the result is not empty
remove the original constituent edgels and replace with grouped edgel.
else
remove edgel e from E and put into G
end
end
Output: G, the set of grouped edgels.

 

 

end

 

Figure 3.2. The algorithm for grouping based on continuity.

 

 

Figure 3.3. A Voronoi neighborhood of point P. The dividing lines of the half planes
are dashed.

This energy formulation is different from the mixed string and rod that was shown in
[13] where a second binary function was used to permit both bends and breaks to be
represented. The energy formulation with the second binary indicator function was
shown to have performance inferior to both the weak string and the weak rod in [13].

A disadvantage of the energy functional in Equation 3.2 is that it is not convex.
Unlike the weak string and weak rod, the GNC method cannot circumvent the non-
convexity. There are other methods for minimizing non-convex functions. Simulated
annealing, a computationally expensive stochastic technique could have been used.
Another technique that is commonly used is gradient descent repeated with multi-
ple starting positions [55]. This can also be computationally expensive and has no
guarantee of reaching a global minimum.

The technique used to minimize the energy functional in this thesis is multi-grid
relaxation as proposed in [106]. Multi-grid relaxation superimposes a pyramid of
grids at different resolutions on a ﬁnite element problem. Solutions found at a given
resolution are injected both to higher and lower resolutions (if they exist). At lower
resolutions, information propagates faster since there are fewer grid elements. This

encourages global smoothness, and efﬁcient processing. At higher resolutions, more

68

 

 

 

 

 

If
r/ f“,
/' l m.
a} \ ,
. [" Th. .
‘, I f l\.\,\
l. [I " \
__, l_,_ ~
[ _. —.
._ , l-_ x
K J
'. “a
.r' “v
..-/

 

Figure 3.4. (a) A synthetic input image. (b) The result of running the continuity
algorithm without edge detection on the image shown in (a). The grouping results
are displayed using color. Grouped curves are connected and share the same color.

detail can be ﬁt and local smoothness encouraged. This method offers no guarantees
about ﬁnding global minima in non-convex functions. The parameters for the Canny
edge detection, the mixed rod-string model, and multi-grid relaxation are shown in
Appendix A. .

Figure 3.4 is a synthetic image that is similar to a figure given in [69]. Canny edge
detection was not used on this image. Many gaps are ﬁlled in using the Delauney
edges. Most of the corners are found, with the exception of corners with large obtuse
angles. The continuity algorithm makes no distinction between straight and curved
edges. This uniformity of treatment is useful in images where both curved and straight
edges are present. Another example of the processing is shown in Figure 3.5. Both
the large and small sphere were correctly grouped, as the linear segments. Obtuse
corners were not found.

In [75], the same problem addressed in [110] was also solved. The initial proximate

69

 

(b)

Figure 3.5. (a) An input intensity image. (b) The result of running the continuity
algorithm with edge detection on the image shown in (a). Grouping results are
displayed as in Figure 3.4(b).

edge segments were broken into smooth curves at curvature extrema. These smooth
curves are grouped recursively into larger curves. Curves to be grouped are selected
from a ﬁxed window around the endpoints. The threshold for joining depends on
the difference in tangent direction, proximity, and curve length. This method does
not have the ﬂexibility in neighborhood relationships that the perceptually signiﬁcant
Voronoi diagram provides. It also examines more pairs of curves for grouping since
all pairs of curves within a subwindow of the image must be examined. This makes
the algorithm 0(n2), where n is the number of edges. The algorithm in [110] is 0(n)
in the average case since the average number of Voronoi neighbors is bounded above
by a ﬁxed constant. The curvilinear grouping module of Trytten and Tuceryan will

be used in this thesis.

3.3 Creating the Initial Blackboard

Since the initial'blackboard database is the foundation for all future processing, it is

important that it be an accurate representation of the important events in the image

70

plane. It is not necessary that it be perfect. In fact, if it were possible to reliably
create a perfect initial blackboard database, the integration that is the subject of this
thesis would be unimportant.

During the initial analysis of the input image, competing hypotheses for image
events are instantiated. These competing hypotheses will make it possible to back-
track and correct poor initial decisions at a later point in time. With this strategy,
operations like thresholding that are normally fragile can be used with more conﬁ-
dence.

The ﬁrst line drawing hypothesis is created in three parts. Section 3.3.1 describes
the separation of the frame into coherent regions. In Section 3.3.2, the processing
which starts with the region units and creates a competing set of curve and ver-
tex units is described. Once the curve and vertex units have been established, the
junctions need to have the number of incident arcs in the initial ﬁgure hypothesis

identiﬁed, and type (i.e. three tangent, curvature-L) determined.

3.3.1 Finding Regions

The purpose of the initial processing is to create a reasonable representation of the
region, curve and vertex units from the frame unit. All of the processing described in
this section is of a local nature. Local processing is advantageous at the early stages
because it is potentially massively parallel, although this virtue is not exploited in this
implementation. The disadvantage of local processing is that information in a small
neighborhood is not always reliable in a more global context. Since it is known that
the information may not be globally reliable, alternate hypotheses are kept. This
makes it possible for later processing to backtrack and reverse decisions that were
good locally, but are not good in a more global setting.

The processing in this section will be demonstrated using two images. The ﬁrst

image is a synthetic line drawing created with an interactive drawing utility. This

 

(a) (b)

Figure 3.6. The input images. (a) is a synthetic image generated using an interactive
drawing utility. (b) is a real image.

image is as close to a perfect image as quantization allows. The second image is a
real image that was captured in the PRIP laboratory. Figure 3.6 shows these images.

The ﬁrst step in the processing of the real image is to ﬁnd the edges using region
based edge detection. As was discussed in Section 2.2.2, the GNC algorithm [13] will
be used for this purpose. Although GNC claims to produce thin edges, it didn’t in
practice. The reason for this failure was that many of the edges in the processed
images were ramp edges instead of step edges. This was caused by two factors. The
ﬁrst factor is discretization error. Since the division between regions does not always
coincide with the division between pixels, some pixels will contain a mixture of two
regions. This produces a narrow ramp edge instead of a crisp step edge. The second
reason for GNC to produce thick edges is that the bath tub toys that were imaged
don’t have ﬂat sides. Even the sides of the toys that appear to be planar are actually
slightly curved. This can be seen in the left most face of the object in Figure 3.7.
This curving means that the edges are not step edges, which GNC expects, but are

ramp edges. Figure 3.7 shows an input image with a ramp edge marked. The data

72

 

Figure 3.7. The intensity values in the area of the image inside the square are shown
in Table 3.2

from inside this rectangle is shown in Table 3.2.

GNC responds to the ramp edges as repeated step edges and produces thick edges.
The edges found by GNC when run on the image in Figure 3.6(b) are shown in
Figure 3.8. Parameters for GNC are shown in Appendix A. Although this image
was correctly segmented, some images are not. As an example, the image shown in
Figure 2.1 is not properly segmented by GNC. This failure is expected since there
is very little evidence of any edge in the original image. GNC was not run on the
synthetic images.

In order to create one dimensional curves from the thick edges of GNC, thinning
was done. Although there are many thinning operations available, a two step pro-
cess has proven to be effective. The ﬁrst thinning algorithm used is from [44]. The
advantage of this algorithm is that it thins the edges uniformly in width. Unfortu-
nately, it doesn’t completely thin edges in the image that are neither vertical nor
horizontal, and shortens the length of curves. The second thinning algorithm used
is from [101]. If this algorithm is used in isolation, it leaves some undesirable hairs
connected to the edges. However, it does a better job of thinning diagonal edges than

the ﬁrst algorithm, and doesn’t shorten curves. The thinning algorithms were also

73

Table 3.2. A ramp edge from Figure 3.7

 

 

127 128 131 131 135 136 138 139 143 121 68 14 4 2 3
127 129 132 134 135 137 141 145 140 104 39 5 4 4 3
126 129 132 134 138 139 142 139 129 81 21 2 4 3 2
128 132 133 136 138 141 144 142 113 52 8 4 4 4 4
131 131 135 136 140 139 139 131 90 28 5 5 3 2 3
133 133 135 137 137 141 141 124 64 10 3 5 4 4 4
129 130 133 135 138 142 139 111 42 4 4 4 4 4 3
132 134 136 138 139 143 129 77 21 2 3 3 3 4 4
133 137 138 139 142 144 118 56 8 3 5 5 4 3 4
132 134 138 140 142 136 104 36 5 3 3 4 3 3 3
131 137 139 141 144 136 86 24 3 5 4 3 3 3 3
133 137 141 143 144 115 58 8 4 5 4 3 3 3 3
132 137 141 143 138 103 32 4 4 4 4 4 4 4 4
132 134 139 139 136 81 20 3 4 3 3 4 4 4 3
134 134 136 141 121 64 8 3 5 3 3 3 4 4 5

 

 

 

Figure 3.8. This image is the result of running GNC on the image in Figure 3.6(b).

74

Figure 3.9. The image shown in Figure 2.1 after processing with GNC. Notice that
the front edge of the cube is missing, as was expected. The same parameters were
used for GNC in all real images shown in this thesis.

run on the synthetic image, even though it was already nearly thin, because near the
vertices there were some two pixel wide edges. The serial combination of these two
thinning algorithms produces thin, nearly hairless edges. Neither thinning algorithm
has any parameters. The result of running the thinning algorithm on the images in
Figure 3.6(a) and Figure 3.8 is shown in Figure 3.10. The thinned edges are the
borders of the region units.

To get the region units, a connected components algorithm was run [99], using an
eight connected neighborhood relationship. Figure 3.11 shows the connected compo-
nents found from Figure 3.10. Different grey values correspond to distinct regions.
Small regions are given a unit with belief 0, and then included in the largest neigh-
boring region. A region is labeled as small if it has fewer pixels than the threshold
shown in Appendix A. Any unit with belief zero is not included in further processing,
although it is still available on the blackboard for retrieval should it be necessary

later.

75

V
1

Figure 3.10. The thinned edges.

 

Figure 3.11. The regions found by the connected components algorithm.

 

Hyna

Jil'.
v

Cf ii

1'.”

.4

DP

('7 C}

76

3.3.2 Creating the First Figure Hypothesis

The ﬁrst ﬁgure hypothesis is an initial guess at the proper line drawing that is found
using only local processing on the image. The regions that are detected by GNC will
be used to create this initial hypothesis. At this point in the processing, the input
image has been segmented into region units. From these region units, the competing
curve units and vertex units will be found. A coarse initial rating of these hypotheses
will give an initial set of arcs and junctions that will form the basis of the ﬁrst
ﬁgure hypothesis. Both the junctions and curves are chosen from a set of competing
junctions and curves. Curves that are derived from the same edges in the data are
established as competing curves. Only one of a set of competing curves can be in
any ﬁgure unit. Junctions that are spatially proximate also compete. Each curve and
junction has a belief value which gives it importance relative to the other competing
hypotheses. The belief values range from one to ten, but the only signiﬁcance of the

belief value is relative to the other competing units of similar type.

The Curves and Vertices

The boundaries of the region units are used to ﬁnd the initial curve units. First the
complete boundary of each region will be given a single curve unit. This curve unit is
initially given a belief value of one, meaning that it should be included in processing
but isn’t considered to be a favored hypothesis at this point. This boundary will
then be broken into pieces at any point with more than three edge neighbors (triple
points). Each piece will be a distinct competing curve unit with a belief value of
2 signifying that these curve hypotheses are more perceptually signiﬁcant than the
initial ones and should take precedence in processing. If no competing hypotheses
are found, the initial curve hypothesis has its belief raised to 2, meaning that it has

gained perceptual signiﬁcance. The curvilinearity algorithm described in Section 3.2

will then be run on the triple point curve units, or the original boundary curve unit

77

if it was unbroken by vertices. If corners are found by the continuity algorithm,
competing curve units will be created for each segment, with a belief value of three.
If no corners were found, the previous curve hypothesis has its belief value raised to
three. This leaves a unique hypothesis that is thought to be best at each of the points
on the boundary of the region. Since the continuity algorithm is consistently ﬁnding
better hypotheses than what triple points alone could provide, these hypotheses are
given the highest conﬁdence of the three curves initially. At the end of each of these
curve units, a vertex unit is instantiated by the blackboard.

There are edge points that are found by GNC that do not fall on the boundary
of a region. Some of these points are caused by hairs left on the edges by GNC that
have been thinned. Some of the other edges on the interior of a region are partially
detected edges. To collect these edges into reasonable structures, the edges that are
not on the boundary for each region are grouped using the continuity algorithm of
Section 2.2.3. By grouping only within single detected regions, undesirable groupings
between distinct regions are eliminated. These groupings are given a belief value
of three if they are more than a few pixels long, with the threshold value shown in
Appendix A, and a belief value of zero otherwise.

At this point, a set of competing curve units has been created. Each curve unit
will have either no endpoints (as with the boundary of an isolated sphere), or two end-
points. Some of the curve units will be only a few pixels long, particularly those near
the corners of the regions. Competing vertex units have not yet been created. Since
the continuity algorithm typically leaves several bends near a vertex, particularly
when working with real data, these small curves and vertices need to be combined.
The method for doing this initially is simple. When a vertex is created, it tries to
join with any other vertex in a local neighborhood to create a stronger vertex. The
size of this local neighborhood is determined by a threshold shown in Appendix A.

If a vertex is combined, the old vertex is given a belief zero. An artifact of this

78

Figure 3.12. An image with approximately ten small groups of edge points that are
not on the boundary of regions. The group that is at the far right corner of the
left triangular prism is perceptually signiﬁcant, and is grouped by the continuity
algorithm of Section 2.2.3.

vertex combination is that there are short curves with both endpoints at the same
combined vertex. These curve units are given a belief zero. This cleans away many
short, unimportant curves.

This processing results in an initial set of vertices and curves that will form the
ﬁrst ﬁgure hypothesis. The result for the synthetic image is shown in Figure 3.13.
The result for the real image is shown in Figure 3.14. The labeling scheme is as
follows. Curves are labeled with the letter “c” and an identiﬁcation number. Vertices
are labeled with the letter “v” and an identiﬁcation number. The label for a curve
is placed to the lower left of the middle of a curve. The label for a vertex is placed
at the vertex. Around the edge of the image is a frame that completes any junctions
that reach the edge of the image. This frame will be shown in all future images, and
is labeled a priori as a jump edge.

At this point the blackboard contains regions, a group of competing curve units,
and a group of competing vertex units. Within the groups of both curve units and

vertex units the competing units have been labeled with belief values to determine

79

 

 

 

 

 

 

 

 

 

 

 

 

 

030

Figure 3.13. The initial curve and vertex units found from the synthetic image origi-
nally shown in Figure 3.6(a).

80

 

 

 

 

Figure 3.14. The initial curve and vertex units found from the real image originally

shown in Figure 3.6(b)-

81

which appears to be best at this point in time. The blackboard also contains the
beginning of the ﬁrst ﬁgure unit: the collection of all non-competing curve units of
highest belief value and the collection of all non-competing vertex units of the highest
belief value. For the ﬁgure unit to be complete the vertices must be classiﬁed and

typed. These operations are discussed next.

Identifying Junctions

First vertex units are classified by the number of incident arcs. This classiﬁcation
yields four categories, junctions with one incoming arc, junctions with two incoming
arcs, junctions with three incoming arcs, and junctions with four or more incoming
arcs. The domain described in Section 1.2 excludes vertices with more than three
arcs, so junctions in the fourth category are not within the object domain of this
thesis. It should be noted that the domain of piecewise smooth surfaces discussed in
[69] does not have this restriction.

A much more challenging problem than classifying the junctions is typing them.
A summary of the attributes that determine junction type is given in Table 3.3. The
table does not distinguish between Arrow and Y junctions. This distinction is made
by examining the angle between the incoming arcs. If one of the angles is greater
than 180 degrees, the junction is an Arrow. If all three angles measure less than 180
degrees the junction is a Y.

In order to type the junctions, each of the curve units must have a tangent and cur-
vature value at both endpoints. Finding curvature is a notoriously difﬁcult problem
[36], even in richer range images. Therefore the curvature values that are calculated
should be treated with suspicion. Estimating tangent direction is somewhat easier.
Since later processing can help to correct mistakes, it is not essential that the cur-
vatures and tangents be perfect. The continuity module has been run on all of the

curve units. This improves the estimation of curvature and tangent direction since the

l
lﬁih
Am

JCL'

82

Table 3.3. This table summarizes the tangent and curvature properties of junctions.
Arrow and Y junctions (*) are distinguished on another basis. The symbol given will
be used to represent this type of junction in later ﬁgures.

 

 

 

 

 

 

 

 

 

 

 

 

ILClass l Type —1 Symbol I Tangents [ Curvaturesﬂ
[L One are I Terminal [ TM I None I None I]
Two arcs Curvature L CL Equal Unequal

L L Unequal Doesn’t Matter

Phantom P Equal Equal

Three arcs Arrow (*) A Unequal Doesn’t Matter
Y (*) Y Unequal Doesn’t Matter

T T Exactly Two Equal Same two Equal

Three Tangent 3T All Equal A Two Equal

 

 

 

 

 

 

 

 

 

curve being ﬁt has already been smoothed, has had outliers removed, and has been
separated at discontinuities. If this had not been done, the curvature and tangent
directions could be even more unreliable.

The simplest way to ﬁnd curvature is to ﬁt a circle to a set of points, and ﬁnd the
radius and the center. The multiplicative inverse of the radius is the curvature. A
normal to the ﬁt circle at the endpoint can be found from the vector starting at the
center of the circle and ending at the endpoint of the arc. The tangent direction is
perpendicular to the normal. The ﬁtting is done using a linear least squares routine
found in [93]. Circular arcs were fit in a neighborhood of the endpoint, with the size
of the neighborhood determined by a parameter given in Appendix A.

Although a straight line can be thought of as a circle of inﬁnite radius, this is nu-
merically troublesome. To avoid this problem, straight lines are ﬁt in a neighborhood
of the endpoint. The size of this neighborhood is determined by a parameter given
in Appendix A. The curvature for a straight line is zero, and the tangent direction
is calculated from the slope. The squared error, calculated in units of pixels squared,

between the circular ﬁt and the linear ﬁt is then compared and the model with the

83

better ﬁt is accepted.

The curvature values calculated in this fashion are rough estimates. Therefore
comparisons like those suggested in Table 3.3 must be revised. Even with synthetic
data, it is unlikely that two curvatures or tangents would be exactly equal. To
compare tangent directions, a threshold was used on the difference. This threshold
is given in Appendix A. This threshold was set arbitrarily, but is not critical to the
performance of the system since later processing can change this threshold to correct
mistakes.

The curvature values were found to very unreliable, particularly in the real images
that were used. There are several reasons for this. The objects used in the images
in this thesis have circular arcs, which project to ellipses. Fitting a circle to an
ellipse, particularly at the ends of the major axis where the curvature is changing
rapidly is difficult. Another problem is that the continuity algorithm may not break
the upper and lower halves of an ellipse exactly at the corner. This means that the
curvature will be calculated starting at different positions, which will certainly lead
to different curvatures. To get around this problem, curvatures were placed into only
three classes, which are described in Appendix A. Curvatures will be considered to be
equivalent if they fall into the same class. It would have been possible to ﬁt an ellipse
to the curve data instead of a circle, but this is computationally more expensive and
mathematically cumbersome.

With the curvature and tangent information, and suitable deﬁnitions of equiva-
lence among tangents and curvatures, the classiﬁed vertices in the initial ﬁgure unit
can be typed. This information completes the initial ﬁgure unit, that is the ﬁrst
hypothesis for the arcs and junctions in the imaged scene. Section 3.4 will show ex-
amples of images and their ﬁrst ﬁgure unit along with a discussion of some of the

problems encountered.

84

3.4 Examples of Processing

Although errors in the initial processing can be repaired later, it is important that
the ﬁrst ﬁgure unit be an accurate representation of most of the important events
in the image plane. Perfect line drawings should not be expected. In a test bed of
twenty-two real images used to reﬁne the techniques discussed in this chapter, only
one image had a perfect ﬁrst ﬁgure unit.

Six real images and their initial processing results will be shown. All of these
images were captured in the PRIP Laboratory on a Macintosh IIfx. No modiﬁcations
were made or hand editing done on the images. All images use the same parameter
values and thresholds, which are given in Appendix A. No hand editing of any ﬁrst
ﬁgure unit has been done. To display the type of the vertices, the vertex type is
used to label the vertices. The labels were shown in Table 3.3. The curve names are
included as before.

The ﬁrst image is the only real image processed which had a perfect initial ﬁgure
unit. This image and the ﬁrst ﬁgure unit are shown in Figure 3.15.

Many of the real images had missing arcs due to an undersegmentation using GNC.
The images in this category fall into two groups, those with completely missing edges,
as in Figure 3.9, and ﬁgures with partially missing edges. Figure 3.16 shows an image
where the GNC algorithm has failed to segment two regions, although there are some
edge points found.

Another problem with the initial ﬁgure units is data that should be interpreted as
a single junction is being interpreted as a pair of nearby junctions. An example of an
image with a junction labeling problem is Figure 3.17. In this image curves c4, c21
and c13 should combine to be a single arrow junction. Instead c4, c21 and cl5 form a
T junction, and c15 and c13 form and L junction. This occurs when the curvilinearity

algorithm puts breaks near triple points, but not so near as to permit the vertices to

 

85

 

 

(:16 ‘
-'- L\
.J-‘Eis -' --.__
II - 7 .-
iI “ ~~_ 5K.
II 221—. iii-“a
163 A- ‘ "H
I ~ _ -
f ‘ ”a: If
18 ‘1’ '1" lJ 19
i l
u . 9
“EH“. ; ‘1‘:
' "~. I" ll£11
a. __ lam ;
‘ ._ I
C51,} |'
7'3. .l J
A "73
c1 7 I"

 

 

(b)
t image. (b) the ﬁrst ﬁgure unit found by processing the

Figure 3.15. (a) The inpu

image shown in ( ).

86

 

 

 

' on
i i.“ ""-..___ _.__\__
l "I, 0‘6---'~-_
.i 1 “F"—
I l. _.
I,. .5" __~ —- l
a“. .3 _’_ . 2'63
42‘ . __—-vez
‘1" I ﬁrmed—"G35 K"-
__.,-—-——‘h. r *" ____---"V\QS
vac c5 --’ nor’ «:88 ‘k,
-. ,1
019 i. as". ‘1,
"° i“ is "x ‘-
q? .5- v 3'. 0%-
I" '1'- ‘i c .l' i"
~ . x. H. __
I.‘ It '78 __..-'c t. "I. h?‘_ $4
I": '5 air ‘71 7 I". I". I".
l_. ‘I ‘i ‘x
"I. -—“ 4". °"? W
'L' 540—“ 078 ,' I t. '-.
1" J15 ”I ‘ll l'n_
". I It. I._
la.- ______ 'k "u."
yu—r c13 é. I.
ll ‘L

 

 

(C)
ﬁgure unit.

Figure 3.16. (a) The input image. (b) The thinned edges from GNC. (c) The ﬁrst

87

be combined early in the processing. As an artifact of thinning the GNC edges, the
vertices are not sharp. The curvilinearity algorithm is putting breaks at appropriate
places from a local perspective. It is actually the triple points that are misplaced,
and not the continuity breaks.

Accidental alignments can cause problems with the junction labels as well. F ig-
ure 3.18 shows a rectangular prism that is close to a non-general vieWpoint. Whether
this vieWpoint is truly non-general is a question that is much harder to answer in
practice than in theory. The L junctions are clearly correct, but the T junctions
are suspect. Since it cannot be determined whether the viewpoint is accidental or
not, it is impossible to evaluate the junction labels. If the viewpoint is accidental,
the junction labels are correct. Line labeling will produce an interpretation that is
counterintuitive. This interpretation is shown in Figure 3.20. Physically it can be
interpreted as looking at an object through a rectangular hole. If this prism is in
a non-accidental vieWpoint, then the junction labels are incorrect. The T junction
between cl4, C20 and c3 should be an arrow junction and that the T junction between
c9, c20 and c7 should be a Y junction, and a side of the cube is missing.

The same processing that is used for polyhedral scenes can also be used for curved
objects. Figure 3.21 shows the grey scale image of a ceramic taco holder and its initial
processing result.

A ﬁnal image shows a bad processing result. Figure 3.22 shows an image where
oversegmentation has occurred on the right hand side. The right hand side has been
segmented into six curves: C43, C85, C36, C29, c117 and CH9 There are four phantom
junctions, and one T junction found. The T junction is between curves C28, C79,
and c117. This T junction could be an accidental alignment, but the arrow junction

between curves C52, c3 and C66 on the left hand side of the image hints that this T

junction is mislabeled.

88

 

x, |
k '. II
he. I' I
1. | |
I l
I I
.4 I.
I. _,
are... L

 

 

 

(b) -

Figure 3.17. (a) The input image. (b) The ﬁrst ﬁgure unit. There are two pairs of
mislabeled junctions. In the upper left corner there is a T junction and a nearby L
junction. This junction should have been an arrow. In the lower left corner at the
front of the triangular prism is a T junction and a nearby L junction. This junction
should also have been an arrow junction.

89

 

 

 

 

 

L as
I: —-
J, -_,_ "-m-
Tam 7,2.” _-
If _ ‘"'——~ _ l—"r-
" ~-- L
m— — l-'
__ __ ,6,
I7 ;' "I Is
I
#3 I
.‘ 'lx
Il .'
I J»
I ﬁle?
L _ J'
_, .1
__> .._ l
as"- _ .-
__~ __ j
- ~L
L as I'-

Figure 3.18. The rectangular prism shown in (a) is seen from a non-general viewpoint
since neither the right nor the left side can be seen. (b) The ﬁrst ﬁgure hypothesis
Notice that both sides upper edges have T junctions. Figure 3.19 shows

from (a).
the junctions labeled with numbers.

90

:15

 

 

v.2 --
I' " ~ __
J ”‘ ~—
' (:20 ' “ _, __
CI? l. ‘I
.63 }
II I
| I
l I
ll
3 £7
V25 lIi
iI
- .‘
c5 -__m ,-
v 3
C16

 

 

 

The numbered curves and vertices obtained from the input in Fig-

Figure 3.19.
ure 3.18(a).

91

 

 

 

(:15 "
L----
I]. 7" —_
3'” ‘ “air-2.-“
, ‘ --..__ ,L
c20‘— - _ ,~
Ii ”’ ~-— ,_ 9‘9
4:17 _;’ "‘I an
I I.
.53 ,I
.i I'
If i!

." j,

3 I

ii ;‘)C7

L - f
.. .. :l
._ ».-.- ll
cs- ' "— .‘A 'Is'
_ "—L
L c16 ‘

 

Figure 3.20. A legal line labeling for Figure 3.18 that is counterintuitive.

92

25

 

 

 

 

L &4 L

(b)

Figure 3.21. (a) An image of a ceramic taco holder. (b) The ﬁrst ﬁgure hypothesis
from (a).

93

 

(a) (b)

:57

 

 

 

Figure 3.22. (a) The input image. (b) The segmentation by GNC. (c) The ﬁrst ﬁgure
unit. GNC segments this image into 42 separate regions, most of them are small
regions on the right hand side. Thirty one of these small regions were deleted since
their size was below threshold. The other small regions are larger than the threshold,
and were not deleted. This leads to many false triple points, and therefore false
vertices.

94
3.5 Summary and Conclusions

To test this part of the system, the algorithms were run on the test bed of images
shown in Appendix B. The results are summarized in Table 3.4. Each of these
mistakes is attributed to the module that caused the initial error. As an example, an
arrow junction might be missing an are causing it to be seen as an L junction. This
error would be classiﬁed as as missing edge and not a junction type failure since the
junction classiﬁcation is correct given the incorrect input. This system avoids double
counting of errors.

There are eight different mistakes that can be made by these four modules. There
are two errors that can occur in the weak membrane edge detection. The ﬁrst is
that edges can be missing or broken, and the second is that extraneous edges can be
found. The vertex proximity module can make only two types of mistakes. Junctions
that should have been grouped can be separated, and junctions that should have
been separated can be grouped. The continuity module can make only two errors as
well. Either an extra break can be added, or a corner can be missed. The junction
typing algorithm also has two types of errors. An error can be caused by a mistake
in tangents. Mistaking a three tangent junction for an arrow junction is an example
of this error. Mistakes in curvature can also cause errors. An example of this error is
mistaking a Curvature-L junction for an phantom junction.

In this set of images there are a total of 245 visible junctions and 323 visible arcs,
as shown in Table B.2 and Table B.3. There were a total of 113 errors found in
the initial processing, so the majority of the visible junctions and arcs were properly
processed. There was only one image with a perfect initial ﬁgure hypothesis, Image
8. There was also only one image with a very bad initial ﬁgure hypothesis, Image 7.
The difﬁculties in Image 7 arise on the right hand side of the background block. The

rightmost surface of this block is close to an accidental alignment and is therefore

95

Table 3.4. The results of processing the test bed of images shown in Appendix B are
shown below. ME stands for missing edges. EE is extraneous edges. GJ are grouped
junctions. BJ are broken junctions. MB are missed breaks. EB are extraneous
breaks. CM are curvature mistakes. TM are tangent mistakes. MP are missing
phantom junctions. These nine failures summarize all of the problems found in the
test bed of images. The notation a was used for a Image 7 which had many extraneous
edges that were difficult to count.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Initial Line Labeling Problems
Image ME EE GJ BJ MB EB CM TM MP
1 0 0 0 O 0 2 0 1 1
2 1 0 0 0 0 0 0 0 0
3 0 0 1 0 0 0 1 2 0
4 4 0 2 0 1 1 0 1 0
5 O 0 0 2 0 0 0 0 0
6 0 1 0 0 0 0 0 2 0
7 2 a 0 1 0 3 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 1 0 1 0
10 0 0 0 1 2 1 0 2 0
11 1 0 0 0 0 1 0 0 0
12 1 0 0 0 1 0 0 0 0
13 0 1 0 0 1 3 1 0 0
14 O 0 0 1 0 2 0 2 0
15 3 0 0 1 2 0 0 l 0
16 3 0 0 0 O 1 1 l 0
17 2 0 1 1 2 0 0 2 0
18 3 0 0 1 1 1 0 0 1
19 0 2 0 1 1 3 1 0 0
20 0 2 3 0 0 3 2 1 0
21 0 1 0 3 2 0 0 0 0
22 0 7 0 1 0 4 1 0 0
Total 20 14 7 13 13 26 7 16 2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

96

segmented into a large number of small surfaces that this algorithm has difficulty
analyzing.

Appendix A shows the parameters that were used to calculate the initial ﬁgure
hypotheses. Table 3.4 shows that the choice of parameters has balanced the errors.
As an example, the vertex proximity threshold was set so that approximately the
same number of junction were incorrectly grouped (7) as were broken (12). The same
can be said for the continuity algorithm and the missing (13) and extraneous breaks
(13). The weak membrane edge detection is also creating roughly the same number
of extraneous edges (14) as missing edges (20).

The missing phantom junctions are not errors in the initial ﬁgure hypothesis. In
Malik’s line labeling paradigm [69], each curved arc was given a phantom junction.
These phantom junctions were not included in the initial ﬁgure hypothesis, but will
be added later as will be discussed in Section 4.2.

One particular problem noticed in this processing was that all three tangent junc-
tions were mistaken for arrow junctions. Almost all of the curvature-L junctions were
mistaken for L junctions. This happens because ﬁnding the curvature near junctions
is difficult. Better methods for distinguishing three tangent and arrow junctions and
curvature-L and L junctions will be proposed in Section 5.2.2.

The initial ﬁgure hypothesis is giving a good initial analysis of the signiﬁcant
events in the image. This information will be analyzed in Chap 4 to discover errors,

which will be corrected Chap 5.

CHAPTER 4

The Revised Line Labeling
Module

Line labeling has long interested vision researchers because it provides a way to use
local information about curves and vertices to build a more global interpretation of
image data. The practical difﬁculty in using line labeling, and especially in curvilinear
line labeling, is that the local information about curves and vertices is often imperfect.
Since the rigorous mathematical framework built in the proof of the junction catalog
assumes that a perfect line drawing is available as input, the algorithm is doomed
to failure in all but the most contrived cases. This is particularly true since the
algorithm was not designed to provide diagnostic information about the nature of the
failure. In this chapter, a revised line labeling algorithm will be presented that can
provide some diagnostic information about line labeling failures.

Section 4.1 will discuss the method for finding errors in line drawings. Section 4.2
will classify the errors that can occur in line drawings. Examples of line drawings
obtained from real images using the method of Chapter 3 will be shown, along with

the diagnostic information provided by this new line labeling module.

97

98
4.1 Detection of Line Drawing Problems

Section 3.4 showed examples of the types of problems that were commonly seen
in the generated line drawings. The paradigms that have been used to produce
these line drawings cannot produce perfect output from real images. Therefore it
is important to be able to analyze a line drawing and evaluate its strengths and
weaknesses. This section will show a modiﬁcation to the standard line labeling scheme
of [69] which provides better diagnostic information about the weaknesses of the
current line drawings. This diagnostic information will be entered into the blackboard
database and later be used by other modules to ﬁx some of the problems in the line
drawing.

In the piecewise smooth object domain, the junction catalog is guaranteed to give
one or more correct labelings for each object in the object domain. An object that
is not within this domain may still be legally, and perhaps even correctly, labeled as
shown in Figure 4.1. To restate this in a way that is more relevant to this discussion:
there is no guarantee that if a mistake is made in the initial line drawing that line
labeling will fail to interpret the image. Therefore, line labeling cannot be the only
arbiter of correctness in line drawings. This chapter will demonstrate how line labeling
can be instrumental in ﬁxing incorrect line drawings.

The visual system has a collection of heuristics that it appears to use when input
is ambiguous. As an example, the interpretation of a shaded sphere changes when the
image is turned upside down. If the lighter side of the sphere is at the top, the sphere
will appear convex. If the lighter side is at the bottom, the sphere will appear concave
[95]. The heuristic in use is that, unless there is reason to believe otherwise, light
comes from above. This is a reasonable and useful heuristic that removes a source of
ambiguity from the shape from shading problem. It can lead to incorrect conclusions,

particularly in situations such as Figure 4.2(a) which are the visual equivalent of

99

(a) (b)

Figure 4.1. The cone shown in (a) is not in the domain of piecewise smooth objects.
The apex of the cone is an edge point with only one adjacent surface, which is not
allowed. When the apex of the cone is away from the camera, an image of the cone
would look like (b). The curvilinear junction catalog labels would label this image as
either a limb or jump edge. The jump edge interpretation is correct, but this is only
a coincidence.

garden path sentences? These occasional mistakes are a small price to pay for unique
interpretations of images under ordinary circumstances.

One of the difficulties with using line labeling as a diagnostic tool is that there
are often multiple interpretations of a single object. Figure 4.3 shows four legal
line labelings for a cube. Each of the interpretations is physically possible, but the
human visual system seems to pick one as its favored interpretation. The preferred
interpretation is that the cube is resting on its base, and not adjacent to either
the left hand wall (if such a wall exists) or the background. This is a reasonable
interpretation since the human visual system, unlike computer vision systems, has
a sense of verticality. The human visual system is trained in an environment where
gravity reigns. Objects are usually resting on surfaces or other objects, just like they
are usually lighted from above. A heuristic like this, which I will call the gravitational
heuristic (GH), may be useful in computational vision as well.

A difﬁculty with using the gravitational heuristic in computer vision is that most

 

‘A garden path sentence is one that is intentionally misleading. An example is “The granite rocks
with the quake”. Having the word granite and rocks together, leads the listener to misinterpret the
verb as a noun.

100

 

Figure 4.2. The round shaded object in (a) is seen as convex because the visual
system assumes that lighting is from above in the absence of other evidence. (b) is
the same image as (a) rotated by 180 degrees. The round shaded object now appears
to be concave. The image is of a crater on the lunar surface.

camera systems do not have a sense of verticality. They do not know when they are
aligned with the gravitational ﬁeld and when they are not. Even if the mechanical
challenges of including this functionality could be solved, the integration into a vision
system would entail solving a number of open research questions that are far aﬁeld
from this thesis. Rather than open another set of research issues, it may be preferable
to look for a similar heuristic that doesn’t require an understanding of gravity.

Of the interpretations for the cube shown in Figure 4.3, interpretation (d) is unique
in that gravity appears to have been suspended. In the other three interpretations,
the cube is supported in some way, whether it be by gluing it to the wall or setting
it on a table. This suggests that a ﬂoating interpretation may be more useful for
a computational vision system since this interpretation doesn’t require any naive

physics.

101

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(C) (d)

Figure 4.3. This simple line drawing of a cube has four different convex interpretations
from line labeling. (a) is the cube glued to the side wall. (b) is the cube glued to
the back wall. (c) is the cube sitting on a table. ((1) is the cube ﬂoating in air. The
human visual system prefers interpretation (c), but interpretation (d) may be more
useful for computational vision.

102

In order to more precisely describe this interpretation, some deﬁnitions are needed.
Suppose that the image has been segmented into regions. Further, suppose that these
regions have been correctly labeled as either background or foreground. The objects
of interest should be in the foreground. The labeling of the background region is a
method for performing ﬁgure-ground separation, which is one of the Gestalt principals
shown in Figure 1.5. It would be preferable to have the system select the background
automatically. Although this assumption may appear to be overly restrictive, it can

be met by the processing shown in [75].

Deﬁnition 4.1 Outside Arc: An image curve is an outside are if it is adjacent to a

background region over its full length.

Deﬁnition 4.2 Inside Arc: An image curve is an inside are if it is not adjacent to

the background region over its full length.

Deﬁnition 4.3 Outside Junction: A junction is an outside junction if at least one

of its incident arcs is an outside arc.

Deﬁnition 4.4 Inside Junction: A junction is an inside junction if none of its in-

cident arcs are outside arcs.

Deﬁnition 4.5 Floating Object Heuristic: All outside arcs in a line labeling should be
labeled as either limb or jump edges with the direction being such that the foreground
region will be interpreted as being in front of the background region. The labels on

inside arcs are not restricted.

In Figure 4.4, curves cl5, c14, cll, c9, c5 and c3 are outside arcs. Curves c2],
c22 and c24 are inside arcs. Junctions v25, v13, v28, v29, v26 and v27 are outside
junctions; and junction v22 is an inside junction. The ﬂoating object heuristic would

give the labels shown in Table 4.1 as the initial labels for the curves.

m—._..._..- .__.-_.__

103

“C

m 'a" do

017

Figure 4.4. A cube with the curve and junction numbers identiﬁed.

Table 4.1. The curves in Figure 4.4 and their initial line labels. The vertices are
ordered in the direction column. The line label should be put between the ordered
vertices to determine the direction. For c3, for example, the labeling should be read
as v27 > v25, or v27 > v25.

 

 

 

 

Curve Direction Label
c3 v27 —vv25 —», —p—)
C5 V26 -§V27 —§, —§—+
69 V29 -)V26 -+, —)—)
cll v28 -+v29 —§, _,_,
cl4 v13 —»v28 —;, -+-+
cl5 v25 —w13 -+, —>—)
021 v22 -WZ5 —i —v—9, 4—, 4—§—, +, —
C22 V28 —w22 -# —)—), (—, +—4—, +, ..
C24 v26 —bv22 -) —§—b, «—, +—4—, +, ._

 

 

 

104

 

 

 

 

 

Figure 4.5. An image where the ﬂoating object heuristic fails. One interpretation for
this image is that the front corner of a cube is being seen through a round peep-hole.
Another interpretation is that the bottom inside corner of an empty box is being
seen through a peep hole. There are other interpretations as well. If the sphere and
the Y junction are labeled as the foreground, as they would be by most systems,
including [75], then there will be no legal labeling of this object using the ﬂoating
object heuristic.

There are examples where the ﬂoating object heuristic won’t work. One of these
is shown as a line drawing in Figure 4.5. This image could be interpreted as the
front corner of a cube being seen through a round peep-hole. The confusion between
foreground and background in this image causes the ﬂoating object heuristic to fail. In
fact, the terms foreground and background are somewhat ambiguous in this example.
While the cube may be the object of interest, it is located behind the peep-hole.
Images where the foreground and background are ambiguous, which are probably
rare in most applications, will cause the ﬂoating object heuristic to fail.

Line labeling with the ﬂoating object heuristic is much less combinatorially explo-
sive than ordinary line labeling. There are two contributing factors. The first factor
is that there are fewer interpretations of outside junctions. A casual inspection of
Malik’s original junction catalog, shown in Figure 4.6, shows that most of the legal

interpretations for inside junctions will no longer be legal interpretations for outside

105

Terminal. %
mm m V\‘ W /\‘ ’/\
Curvature-L:_/L ‘ {L (F.— (f:—

—» #1»

.ﬂ‘ {(3

"YYY’YY

Figure 4.6. Malik’s junction catalog. The “2’” label indicates that this line could
have any one of the six legal line labels.

Three Tangent :

junctions. A second factor that improves the combinatorics of line labeling is that
phantom junctions are not necessary with this heuristic, since a necessary phantom

junction can be identiﬁed by the line labeling process.

Theorem 4.1 If an image can be segmented into foreground and background regions,
outside junctions that have been labeled by the ﬂoating object heuristic will have at
most two interpretations, except for T junctions which may have as many as four

interpretations.

Proof: This theorem is proved by systematically examining the junction catalog
in Figure 4.6. All references to junction labelings will be from left to right, in the

appropriate category.

106

Terminal junctions have at most two interpretations. However, it should be noted
that a terminal junction cannot be an outside junction since the foreground and
background are not separated at a terminal junction.

Outside L junctions can only be labeled with the ﬁrst and second labelings. Only
one of these labelings will be applicable, depending on the location of the adjacent
background region.

Outside curvature-L junctions can be labeled only with the second and fourth
labelings.

There are only two labelings for three tangent junctions, only one of these will be
applicable depending on the location of the adjacent background region.

Outside Arrow junctions can only be labeled with the ﬁrst labeling.

Outside Y junctions can be labeled with the third, fourth and ﬁfth labelings,
but only one of these labelings will be applicable, depending on the location of the
background region.

T junctions can have the eight labelings shown in Figure 4.7. Four of these
labelings will be applicable, depending on the location of the background region.

The outside junction catalog is shown in Figure 4.8. The background region for
each legal junction label is marked. The reduction in the number of legal labelings
for outside arcs is shown in Table 4.2.

This completes the proof. C]

Table 4.2 shows a vast improvement in combinatorics for outside junctions with
the ﬂoating object heuristic. Five different outside junctions have a unique labeling
with the ﬂoating object heuristic. These unique labelings of arcs on a local basis will
improve the combinatorics of labeling inside arcs as well. This combinatoric improve-
ment can also be exploited to improve the line labeling paradigm. Outside arcs and

outside junctions are providing the strongest constraints. When this information is

107

 

 

 

 

// i/ // .... // ....

 

 

.. /, .. /, .../, /,

Figure 4.7. Eight legal line labelings for an outside T junction. When the background
region is known, there are only four possible label choices.

BRO
Bus-9 —» “9
Curvature-h‘ff—A {—

_» Bks
Three Tangent:

‘7\ m.—
mm

I“
Y: ‘1 - F ‘
_ kan 3w
< «— *w «—

77:57:7-TW

<«—«—«—«—

m7—su-7-sm Tut-7—

Figure 4.8. The outside junction catalog. The region labeled bkg is the background
region.

108

Table 4.2. The number of legal labelings for outside arcs.

unction ' ' tsi

TM

CL

A

 

T

used first in the line labeling algorithm, the legal labelings on the inside arcs are also
restricted. Algorithm 2 is the modiﬁed line labeling algorithm which takes advantage
of this combinatoric reduction.

The principal advantage of Algorithm 2 over the three step algorithm proposed
by Malik in [69] is that diagnostic information is provided for the blackboard in the
form of the failed vertex. Since the number of legal line labelings is often reduced to
a single set by the improved combinatorics, this diagnostic information can pinpoint
the area of the line drawing that is incorrect.

Section 4.2 will show images processed with this system and the diagnostic infor-
mation returned by this algorithm. It should be noted that after a legal labeling is
found using the ﬂoating object heuristic, the full set of legal line labelings can still be
obtained using Malik’s algorithm.

4.2 Common Problems with Line Drawings

When line drawings are obtained from real images, there are a number of problems
that can occur. In this section, line labeling problems that result from the processing
described in Chapter 3 will be shown. These problem line drawings will be analyzed

using the ﬂoating object heuristic, and the resulting diagnostic information will be

109

 

 

Algorithm 2

Input: A set of arcs A
A set of junctions J
A" <— The set of outside arcs in A
A‘ «— The set of inside arcs in A
A’ +—-A°L"JAi
change 4— true
while change +— true
change <— false
foreach are a E A’ in order
L +— the set of labels that are still legal for a
foreach label I e L
J «— junctions adjacent to a.
J ° +- The set of outside junctions in J
J f <— The set of inside junctions in J
J’4—J"CJJi (in order)
foreach j in J’ in order
if label I is not a legal label for arc a at junction j
change «— true
remove I from L
if L is empty return j as the failed vertex
end
end
end
end
end

end

 

Figure 4.9. The revised line labeling algorithm. The notation 0 is used to indicate
that the sets of junctions and arcs are concatenated, while maintaining the order.

 

110

discussed. Since the line labeling paradigm has no knowledge of the inner workings
of the other modules, it is not able to ﬁx any of the problems that it detects with-
out violating the independence of knowledge sources. The suggestions for ﬁxing the
problems will come from control knowledge sources to be discussed in Chapter 5, and
the other modules.

In the example images that were used in testing this system there were four com-
mon problems: missing or incomplete edges, mislabeled junctions, missing junctions,
and extraneous junctions. There are other problems that can occur in line drawings,
such as extraneous edges, but the vast majority of the incorrect line drawings that
were used to test this system fell into these four categories. This is a result of a
conscious decision to select parameters for the weak membrane edge detection that
undersegment the image, i.e. that divide the image into too few regions.

Incomplete and missing edges occur when the edge detection algorithm does not
completely separate two regions. An example of an unavoidable missing edge was
shown in Figure 2.1. When this image is processed, the curves and junctions found
are shown in Figure 4.10. When the ﬂoating object heuristic is used to process this
image, vertex v25 is correctly ﬂagged as a problem vertex.

Mislabeled junctions occur in line drawings because it is difﬁcult to ﬁnd the type
of a junction using only local information. Confusion occurs between junction types
of the same class, for example three tangent, arrow, Y, and T vertices. These mistakes
are inevitable, since in the mathematically rigorous curvilinear line labeling paradigm
the only distinction between a T junction and a Y is T junctions have two arcs that
have perfectly aligned tangent directions. Since this perfect alignment rarely happens
in practice due to positional noise, a threshold was used to determine how close to
perfect alignment two arcs must be to be considered a T junction. When this is
done, a Y junction or an arrow junction that falls within this thresholded range will

be improperly typed as a T junction. Figure 4.11 shows a triangular prism with a

111

 

 

 

 

# 61.
—-——ﬂ
_—»--""'"- “I
f_—'- 1:“ xx
‘1“.
all ,I ‘5‘ ix m
a? - «A
., N.
5 K K
\ \ kI
"a I. a"
. a
L. x x
, '\
. \ __.--*""',
‘I‘. \ a}. |
1 \ f- C‘
x“ vb— ____,_
3‘ .p-F'S-P-
e ’F
ee—r-F
as
617

Figure 4.10. The junctions and curves found from processing Figure 2.1.

112

junction v25 that is incorrectly labeled as a T junction. This junction was ﬂagged as
being faulty by the revised line labeling algorithm.

Another example of a mislabeled junction is shown in Figure 4.12. In this image,
a pair of junctions are so close together that the vertex proximity knowledge source
has grouped them into one junction. This problem is easy to diagnose, since there
are more than three incident arcs at vertex v44. The object domain was restricted
to junctions with at most three incident arcs. The modiﬁed line labeling module
correctly ﬂags v44 as a problem vertex.

A more challenging problem to diagnose is shown in Figure 4.13. In this image, two
vertices that should have been grouped into a single vertex are too widely separated
for the distance threshold in the vertex proximity knowledge source to permit them
to be combined into a single vertex. This problem is the opposite of that shown in
Figure 4.12. This demonstrates the fragility of thresholding. The modiﬁed junction
labeling paradigm detects vertex v24 as the problem vertex.

The most difﬁcult junctions to properly analyze are the curvature-L junctions
and three tangent junctions. In a local neighborhood, they are very similar to arrow
junctions and L junctions respectively. Figure 4.14(b) shows a curvature-L junction
that looks very much like the L junction in Figure 4.14(a), particularly in a small
neighborhood of the junction. To further complicate matters, junction labeling is
vulnerable to paired mistakes in junction types. In Figure 4.15 shows three legal
labelings for a cylinder. Only the interpretation in Figure 4.15(a) corresponds to the
usual interpretation.

Missing junctions are another problem in the input line drawings. Missing junc-
tions occur when the continuity algorithm does not detect a corner. The only type of
junctions that can be missed are L and Curvature-L junctions. Junctions with three
incident arcs cannot be missed because they occur at a position where three surfaces

are joined, and all such points are forced to be vertices. Junctions with three incident

113

 

 

 

m

W’s—"‘77P?

e 7 til, i. u I
5‘11 3|;
2.“ .1, 'l'
‘3“ I, l'.
‘v. 1
\‘il. __:i

“ﬂee
etl

 

 

(b)

Figure 4.11. (a) The input image.(b) The curves and vertices.

114

 

 

 

a
’._,_r'¢
,3 dv—.—_
" / i a ——~u
no .401 «In H an
- a ‘13 x .I’
.\ ,
:‘-\ d7“ \‘\ ,I’
l \ \ '§_‘ I}
as; \\ \ dl\ r
\ ‘. ..§ "\ M
Ru “\ ‘1 .1
\\ .5" ‘ '\.‘ (I
~. ,7 air \_
x ,‘ ‘~ . '2"
\ \ ~ I
«I. l “K ’
"\ "x? m“
\
K.
“Held.
' :-

 

 

(b)

Figure 4.12. (a) The input image. (b) The curve endpoints near vertex v44 are so
close that they are are grouped into a single vertex.

115

 

ell

 

 

r16”?
I"? i
I l, I
.I' ( .
u
.l I, l,
e I "‘5‘" a? ‘ I; a 0
HR, i. l\
”(a l I
iv?“

 

 

(b)

Figure 4.13. (a) The input image. (b) Vertices v24 and v28 should have been grouped
into a single vertex. Vertices v3 and v7 also should have been grouped.

116

   

(a) (b)

Figure 4.14. (a) The lower right hand side of this object is an L junction. (b) The
lower right hand side of this object is a curvature-L junction. In a local neighborhood
of the junction there is very little difference.

(a) (b) (c)

Figure 4.15. The cylinder shown in (a) has the junctions correctly typed and the
correct line labeling interpretation. The cylinder in (b) has a pair of mistyped junc-
tions. This line labeling shown is legal. The cylinder in (c) has two pairs of mistyped
junctions. This line labeling is also legal.

117

arcs may be mislabeled, or misplaced, but they will not be missing. Figure 4.16 shows
an image where a pair of curvature-L junctions were missed on curve c2. The modiﬁed
line labeling algorithm does not detect this problem. It does detect the problem with
vertex v3 which was incorrectly typed as a T junction. Vertex v4 was labeled as a P.

Most of the extraneous junctions in these images occur because the weak mem-
brane is not ﬁtting the edge data well, or because the continuity algorithm has found
a false corner. An example of a combination of these failures is shown in Figure 4.17.
In this image the weak membrane has segmented the image into 38 regions. Most
of these regions are along the right hand edge. These small regions could be ﬁlled
with some morphological operation like successively expanding and then contracting
the edges, but this would distort the junctions. Since the junctions are critical, this
sacriﬁce cannot be made. These extraneous regions create extraneous curves, which
create extraneous vertices. Vertices v118, v116, and v117 were all created this way.
The modiﬁed line labeling algorithm identiﬁes junction v115 as the problem junction,
since it is of type M. There are a number of other problems in this image including c3
and c114, which are noise edgels. Both of these noise edgels have a terminal junction.
They will fail when labeled as well.

Figure 4.18 shows an image with a phantom junction on curve c39. In the original
line labeling algorithm, all curved arcs must have an extra phantom junction added,
as was discussed in Section 1.1.3. Up to this point, no phantom junctions have
been added. Some junctions have been identiﬁed as phantoms since their tangents
and curvatures are not sufﬁciently different to be classiﬁed as either Curvature-L
junctions or L junctions, but these junctions are not necessarily placed properly to
be good phantom junctions. There has been no distinction made between curved
arcs and linear arcs up to this point. In fact, the only place where this distinction is
made in the original line labeling paradigm is in the placement of phantom vertices

on curved arcs, and in the junction types in the catalog. The modiﬁed line labeling

118

 

 

 

 

 

(b)

Figure 4.16. A right circular cylinder where two curvature-L junctions were not
detected. Neither of the three tangent junctions were properly detected. Vertex v3
is ﬂagged as being in error.

119

 

 

 

 

 

an
”1““
.J “"H.
ultra.K
m1
/“" I
f ‘L
a
8““; \k ’IUK‘
w “.1 h—P‘f “'3‘
—~_\ I‘d-h. ”k
\h‘ ms M\ / m.
‘ J
m “a. “H ’“ .-’ .n
k‘x ./ “I
“M “at I: 2"“
f1 ' M?“
’7, jar R‘- " 1
, r. ' J; aha” 6",
mo {-
1 F‘Hk- v”.-
thh I. ”'H‘J-
\
\. ‘1 ll 9
«a, x.
Elk. ||
\
Gu—eﬂz n
8’ - -

(b)

Figure 4.17. (a) The input image. (b) The curves and vertices.

120

algorithm failed at vertex v54, which is adjacent to the are that needed a phantom
junction added. As an informal experiment on the robustness of the detection of the
missing phantom vertex, the ordering of the outside arcs and inside arcs was reversed.
The labeling failure then occurred at v6, which is the other end of the arc that needed

the phantom vertex.

4.3 Summary and Conclusions

The test bed of images in Appendix B was used to test the ability of the modiﬁed
line labeling algorithm to diagnose line drawing problems. Table 3.4 summarizes the
errors that were made by the initial processing in these images. Table 4.4 shows
a comparison between the faults that exist in the images and the ones that were
diagnosed by the algorithm. Multiple diagnoses for a single image were attained by
manually correcting problems as they were discovered to permit further problems to
be diagnosed. The correction process is the topic of Chap 5.

Table 4.4 summarizes how well the problems found in the initial ﬁgure hypothesis
have been detected by the modiﬁed line labeling paradigm using the ﬂoating object
heuristic. Grouped junctions are easily detected since having more than three arcs at
a junction is not permitted within the object domain under study in this thesis.

Curvature and tangent mistakes in classiﬁcation of the junctions are also found
well. The exception to this occurs when a pair of mistakes hide a problem. As
an example, if a three tangent junction is mistakenly identiﬁed as an arrow and an
adjacent curvature-L junction is also mistaken for an L junction, then the mistake
cannot be diagnosed. In this case, the limb edge will be found to be a jump edge.
While this is unfortunate, it should be noted that both limb edges and jump edges
represent discontinuities of the scene depth and it is this similarity that permits

confusion.

121

 

 

 

 

a
d*'§
I \
J", “a.“
1‘ ‘-~_
/’ ha.
/" I‘
-..'. \-
.-'/ {3}“
/ f— r l
H x {u n
./ [ff ‘5" "T
"I l .I—P—wa 1"".
«I r '- uo ' G”
«I». ‘1’ “a
a x ‘
hie 2"
if: --.._‘_ ‘x.
“mg ‘4 :Z'
._\ a /
E, .a.‘ ”4:, ‘
db ‘9' "'
x, r
\. Jr“
‘5
cm

 

(b)

Figure 4.18. (a) The input image. (b) The result of processing. The modiﬁed line
labeling paradigm failed at vertex v54.

122

Table 4.3. A listing of the problems in the test bed of images from Appendix B show-
ing which problems are diagnosed. ME stands for missing edges. EE is extraneous
edges. GJ are grouped junctions. BJ are broken junctions. MB are missed breaks.
EB are extraneous breaks. CM are curvature mistakes. TM are tangent mistakes.
MP are missing phantom junctions. Notation a means that the problem diagnosis
was found at a vertex adjacent to the vertex where the problem actually existed. In
Image 7, the notation b was used to indicate that one of the problems was correctly
diagnosed, but the image was too cluttered to be ﬁxed using the methods of this
thesis.

ccch—tc
HCOODG
essay—cc
cg‘ccoc
occcoc
ooccoc
o occr-toc
oo NOHMOOE
soccer—I

H
p

“HMQCCOOOHOOOCO

0
0
0
1
l
0
0
1
3
0
2
0
0
0
0

NOCWOOHOOOOCOCCC
HOOOCNHHOOOC

oowwocr—HcI-‘Hccccco
ccccocccoocccoco
wwccccog‘cwcccoco
qwcwgcowocI—cco

wccccg‘cccooocov—o

H
N
H
y—a
H
N

 

123

Table 4.4. A comparison between the faults in the images from Table 3.4, and the
problems that were diagnosed using the ﬂoating object heuristic.

 

Source ME EE GJ BJ MB EB CM TM MP
Initial 20 14 7 13 13 26 7 l6 2
Diagnosed 12 11 7 8 0 3 7 11 2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(a) (b)

Figure 4.19. (a) Shows a legal labeling for a detected corner. (b) Another legal
labeling without a corner.

The mistakes that are the most difﬁcult to diagnose are the missing breaks and
extraneous breaks in continuity. The difﬁculty with diagnosis is that both labelings
shown in Figure 4.19 are legal. This causes problems with the diagnosis of both
missing and extraneous breaks. Although the problem in these cases cannot be diag-
nosed, it should be noted that the line labeling that is obtained is correct. In fact,
the correctness of the line labeling with out regard to the placement of these breaks
is what causes the poor diagnosis result.

The localization of the diagnosis of problems in the figure hypothesis is important.
In all of the cases reported in Table 4.3 except three, the problem was diagnosed at
the correct junction. In the remaining three cases, the problem was diagnosed at an
adjacent junction.

The modiﬁed line labeling algorithm has been shown to be useful for identifying

124

problems in the initial line labeling hypothesis. Not all problems can be identiﬁed, but
many problems can be. In all cases in the test bed of images, the junction that was
diagnosed as being the problem was either was the problem vertex, or was adjacent
to the problem vertex. As would be expected, there were no cases found where a

correct line labeling was diagnosed as being faulty.

CHAPTER 5

Updating Line Labeling

Previous chapters have shown how an initial line drawing can be found from a sin-
gle intensity image, and how problems in this line drawing can be diagnosed. This
chapter will focus on the repair knowledge sources that were created to use the di-
agnostic information coming from the modiﬁed line labeling algorithm to ﬁx some of
the problems in the line labeling, and the controller knowledge source that governs
which repair knowledge source is chosen.

This is the point at which the independence of the knowledge sources starts to
pay dividends. While it would be possible for the line labeling module and try to
correct problems that are the result of other modules, the complexity of this structure
would be high. The line labeling module would have to know about all of the data
structures from all of the other modules, and be able to modify its data structure
to call them. As an example, the continuity algorithm represents curves differently
than the blackboard. If the line labeling module were to call the continuity module
directly, either the line labeling module would have to understand the data structure
and routines from the continuity module, or the continuity module would have to
understand the data structure and routines from the line labeling module. This would
leave one of the two modules vulnerable to changes in the other module. When this

is multiplied by all the pairs of modules, the complexity would be unmanageable.

125

126

A blackboard system solves this problem by having each module translate to
and from the standard blackboard objects. Control knowledge sources are designed
to use the standard blackboard objects alone. Control knowledge sources have no
knowledge of the detailed inner workings of the other modules. They only know what
blackboard objects are legal input to what modules, and the control parameters, (such
as the parameters for edge detection), that each knowledge source has made available
to the blackboard. This division of labor makes it possible to make changes within a
single module without propagating them into other modules. The control knowledge
sources do need to know how to call individual modules, but they do not need to
know anything about the internal data structure of the modules. This structure is
ideal for experimentation with controlling the modules, since the modules themselves
do not have to be altered.

In order to ﬁx line drawings, two types of knowledge sources are needed. The
ﬁrst type are repair knowledge sources. These knowledge sources know how to ﬁx a
particular problem in a line labeling. For example, there are repair knowledge sources
that can insert phantom vertices into a line drawing, and hypothesize where edges
that are missing could be placed. These knowledge sources are instantiated by the
controller knowledge source, which has a strategy for selecting which of the repair
knowledge sources should be selected next.

This chapter is divided into three sections. In Section 5.1 the repair knowledge
sources that can be used to repair line drawing problems will be investigated, along
with the control knowledge source. Section 5.2 will discuss ideas for other perceptual
knowledge sources. Section’ 5.3 will give the computational complexity of all of the
algorithms discussed in this thesis, as well as the hardware speciﬁcations and run

times for the examples images given in Appendix B.

127

Table 5.1. The knowledge sources used to create the line drawing and possible errors

made.

 

 

 

 

 

 

 

 

Module Error
Edge Detection False Edges
Missing Edges
Continuity JFalse Corners
Missing Corners
Vertex Proximity Too much grouping
Not enough grouping
Junction Typing Wrong type
Wrong class
Phantom vertex missing

 

5.1 Correcting Errors in Line Labeling

A variety of problems with the initial line labelings were found by the modiﬁed line
labeling module. These problems now need to be identiﬁed and corrected. There were
four modules used to create the line drawing: edge detection, curvilinear grouping,
vertex proximity grouping, and junction classing and typing. Each of these modules
can, and does, make errors. These errors are cataloged in Table 5.1. The errors are
not mutually exclusive, for example a missing edge will cause two junctions to be of
the wrong class.

These errors will be ﬁxed with a set of knowledge sources called the repair knowl-
edge sources. Each repair knowledge source is an experts at ﬁxing one type of problem
that can occur in an image. The repair knowledge sources will be activated by the
controller knowledge source that decides which of the repair knowledge sources is
most likely to ﬁx the current problem vertex and instantiates the knowledge source.
The success or failure of the knowledge source will be reported back to the controller,
which can then either accept the solution, make more modiﬁcations, or stop the

processing.

128

Sections 5.1.1, 5.1.2, and 5.1.3, 5.1.4 will describe some repair knowledge sources.

Section 5.1.5 will describe the control knowledge source.

5.1.1 Mislabeled Vertices

Since the junctions in the line drawing have their types selected using thresholds for
similarity on both the tangents and curvatures, it is not surprising that mistakes
occur. An example of a typical mistake is shown in Figure 5.1.

There are two failures in this image. The ﬁrst occurs at vertex v11. The continuity
algorithm has broken what perceptually is one curve between v29 and v28 into two
pieces. The tangent and curvature approximations at v11 have caused this junction to
be labeled as a phantom junction. This problem does not cause a line labeling failure,
since the phantom junction has a labeling that matches the correct interpretation.
This problem will be solved in Section 5.1.4. The more serious problem is that vertex
v25 has been labeled as a T junction. This is a reasonable initial guess for the local
junction labeling module to make, but it is not consistent with the other information
in the image. The modiﬁed line labeling paradigm fails at vertex v25.

When a T junction is suspected of causing the line labeling problem, as in this
case, one way to reexamine it is to adjust the tangent threshold that was used to
label the vertex. In fact, since it is known that the T labeling is not correct, it is
best to set this threshold to zero. If this setting were made initially, T junctions
would almost always be mislabeled. After this adjustment is made, only vertex v25
is relabeled. The new junction label is an arrow, which is correct. The modiﬁed line
labeling scheme succeeds on this new image, and the correct labeling is found as is
shown in Figure 5.2. Although this line labeling is correct, it will not be perfect until
v11 has been removed.

This processing shows a possible place where compilation, which is can be an

important result of a blackboard system, could occur. In both of these examples, a T

129

 

e1!

 

 

III
I
X
3. '1’ '
‘1. i l
i" II it
.3 .
‘1‘
4.4.1
010

 

 

(b)

Figure 5.1. (a) The input image. (b) The labeled curves and vertices.

130

 

d5

 

 

II 1,,
1K *l‘ *2
’i.
.3“. ‘1:

‘-., I "1’

(Kt 4:1

3L4“...
a. a.

d.

 

Figure 5.2. The line labels for the object in Figure 5.1.

131

junction was found with the two arcs with the same tangent directions being outside
arcs. This conﬁguration is never legal with the ﬂoating object hypothesis, as careful
inspection of the junction catalog in Figure 4.6 shows. This information can now
be taken and incorporated into the algorithm that decides the types of junctions.
While this could have been discovered by other methods, this insight came from

experimentation with the blackboard system.

5.1.2 Missing Arcs

Since the modiﬁed line labeling algorithm can detect missing arcs, a procedure for
adding arcs is needed. The problem is approached by hypothesizing where a missing
edge could go and then picking the best of these hypotheses. This approach will
be useful in cases like that shown in Figure 2.1 where edge detection of any sort is
unlikely to ﬁnd an appropriate edge.

Figure 5.3 shows an example of a triangular prism. Vertex v22 is labeled as prob—
lematic by the modiﬁed line labeling algorithm. Assume that the control knowledge
source has decided to hypothesize a missing arc, as will be discussed in Section 5.1.5.
A vertex needs to be found as for the second endpoint for the missing arc. This vertex
must be in a region that is adjacent to the problem vertex.

The second type of vertex to be considered is a vertex with two incoming arcs.
Adding a third arc to this type of vertex would permit an interpretation that is legal
in the current object domain. Phantom vertices will not be used for the second vertex
since the phantom vertices that are presently in the line drawing are there as the result
of failures in the continuity algorithm and have no perceptual signiﬁcance. Vertices
which already have three incoming arcs could also be used for the second endpoints.
This was not done since the object domain speciﬁcally excludes objects with more
than three faces adjacent to a vertex.

Once the set of second vertex candidates is found, a curve is hypothesized between

132

 

 

 

 

 

(b)

Figure 5.3. (a) The input image. (b) The labeled arcs and junctions

133

the ﬁrst vertex and each of the second vertex candidates. These curves are added to
the initial object hypothesis individually, and this new hypothesis has its junctions
typed and labeled. If the labeling does not fail at the either of the vertices that were
modiﬁed, then the new object is put on a list of candidate objects. The best object
hypothesis will be chosen from these candidates in Section 5.2.1. For the triangular
prism in Figure 5.3, three alternate objects were proposed. These are shown in
Figure 5.4.

If the restriction on the number of faces adjacent to a vertex were to be removed,
vertices with three incoming arcs could be given a fourth arc. In this case, it would
be best to add a fourth incoming arc to a vertex only if there were no other legal
interpretations. This could be regarded as a simplicity criterion, similar to Malik’s
requirement that a junction with more than four incoming arcs be interpreted in such
a way that the minimum number of hidden surfaces are used in the interpretation.

When the hypothesis in Figure 5.4(a) is labeled, junction v22 is labeled as a T
junction. It is within only a few degrees of the acceptable T threshold. Even if it
had missed the threshold, it would have been labeled as an arrow junction. Neither
an arrow nor a T junction is a legal interpretation for this vertex. Figure 5.4(c) also
has v22 labeled as a T junction, and the hypothesis is rejected. The interpretation
that is accepted is in Figure 5.4(b). This interpretation labels v22 as a Y junction,
and gives the ﬁnal labels as shown in Figure 5.5.

Another example of this problem is shown in Figure 5.6. This image has two
problems. The bottom left corner of the block, on curve CS, was missed by the
continuity algorithm. This mistake is similar to the problem shown with curvature-
L junctions in Figure 4.16, except that this junction should have been an L. This
method cannot recover from this mistake, since the labeling is legal regardless of
whether a vertex is found on CS or not. The missing front edge of the block can be

recovered. As before, there will be three alternate object hypotheses. If the continuity

134

 

 

 

 

 

 

 

 

 

 

0'
#~ ‘
m 4" “- M} m
‘ T‘wx
' ‘I .a \
‘ ﬁ “3 .PF- n
i ‘01P...” ‘1 ll
I * .ir"—._:--'l
l \ arr“
I \ "ro- ‘
l
:k
we
0. m
'P er~-
'” 410 UP ,I' ‘ - ~. In
4' or- _‘ d’ N
‘ﬂ ‘\.\\ \ \‘q
¢\- V ""~ \ \‘ -\-
a \ g .\ .
a\\. _ \“ :1“ I17“.
I - '
A ....- a ,.
-v'"- u . .d-Fpnl es
tr“ “A \ 3"" _,_-.a
“__ﬁ_-" \ ”ﬂ-
£-‘“ i‘....—-" d
of.

 

 

(b) (C)

Figure 5.4. (a) Vertex v22 is joined with vertex v27. (b) Vertex v22 is joined with
vertex v6 (c) Vertex v22 is joined with vertex v26. Interpretation (b) is accepted as
the best.

135

it

01‘

— ——II

‘I

ell? $3 -.__> OI.
m"---._ _
L“, .
. ‘- i
m I
Is 3+ '1 J- .
2. __ﬂ 1
"‘ r - ~ ‘ _ V E
, Ir—«r’ a - :
inl+f _ _-_m I

.:_--__(.
mu __ _ .4

 

Figure 5.5. The arc labels for the triangular prism.

algorithm had been able to discern the corner on curve c5, there would have been
four hypotheses. These are shown in Figure 5.7. The hypothesis in Figure 5.7(a)
produces an arrow junction at vertex v38, which cannot be labeled. The hypothesis
in Figure 5.7(b) produces a T junction at vertex v38, which also cannot be labeled.
The ﬁnal hypothesis in Figure 5.7(c) produces a Y junction that is correctly labeled.
The legal labeling is shown in Figure 5.8. In Section 5.2.1 a heuristic will be discussed
that can distinguish between interpretations even if the center vertex is labeled as an

Y junction, instead of being labeled as a T.

5.1.3 Adding Phantom Vertices

There are two fundamental differences between polyhedral line labeling and curvi-
linear line labeling. The ﬁrst is that polyhedral line labeling can be aided by linear
numerical analysis as shown by Sugihara in [105]. Curvilinear line labeling cannot

rely on similar help because curved arcs cannot be represented adequately with linear

136

 

 

G10

 

 

 

011

(b)

Figure 5.6. (a) The input image. (b) The initial output.

137

 

 

 

 

 

 

 

 

 

 

on
K "‘ I
j ‘\ 1' 1‘
’5', ‘\ r \
." Lin.
u- r
/ ,i
l/ .1
no 1;, I: m
I )' /'I
.. p
,1" I"
l, .x'
,
\‘A '/
"g .n’
a
a!
(a)
r ~ '"
l
‘7‘ F i“ \\
i ' f ‘
.. .' \ x
[I ‘. .\ I; ‘\ 0‘
. ll "‘ l I", j u- l
. I I 1
I . I f
l! f / Il / I”
I .- ,I .
o1» '1: f I» m I, s '2’ I»
l i ’P I, f.
l. I". I'. " a
I ,I ,I / I
l f . / f I .1!
! "y "3‘ j I
i ‘ II" ." g,
I
a e" m

 

 

 

 

 

(b) (C)

 

Figure 5.7. Three hypotheses for the missing edge.(a) v38 is joined to v33. (b) v38
is joined to v32. (c) v38 is joined to v31. Interpretation (c) is accepted as correct.

138

“O

--__>¢. 1 '
"7 01d '. . I
{In “'7 3
,' _ ' - i
" m_.‘_"—.-m ;
‘I' if m , l
A :‘ i
I. '8 'l
. “r . - ,
d. '3’ In" ,' l ‘2.
u 4 .V 3
d C” :
." a" i
I.I (5 j
." " '
" , -4“ l
“(or F
m #

 

0‘7

Figure 5.8. The correct line labels for Figure 5.7(c).

equations. The second important difference is that curvilinear line labeling requires
all curved arcs to include an extra phantom vertex to accommodate possible hidden
limb edges. These extra vertices damage the combinatorics of curvilinear line label-
ing, and are often unnecessary. It was demonstrated in Chapter 4 that the modiﬁed
line labeling paradigm can ﬂag and locate vertices near a necessary phantom vertex
that is missing. Phantom vertices identiﬁed in this manner can be inserted into the
line drawing and the line drawing relabeled. In this way, phantom vertices do not
have to be placed around the line drawing but only put in when and where they are
needed.

In order to decide which arcs adjacent to the a problem vertex should receive a
phantom vertex, a distinction must be made between curved arcs and linear ones.
This distinction was made by comparing a line drawn between the vertices to the
actual arc. If the distance between the line and the arc was more than a few pixels,

as deﬁned in Appendix A, the arc was labeled as curved. The threshold is kept small

139

so that if an error occurred it would be more likely to produce an extra phantom
vertex than neglect a necessary one. All curved arcs adjacent to the problem vertex
have junctions added initially since there is no way to diagnose which of the curved
arcs needs the phantom vertex. This should still add far fewer phantom vertices to the
line drawing than the original strategy proposed by Malik where a phantom vertex
was added initially on every curved arc in the image.

When Figure 5.9(a) is processed, vertex v54 is ﬂagged as the point of failure. The
phantom vertex insertion knowledge source then inserts phantom vertices along any
arcs incident to v54 that are curved. At vertex, v54 arcs C47 and C38 were both
found to be straight, so no phantom vertices were added on these arcs. A phantom
vertex was added in the center of C39. The junctions were typed again, and the line
labeling was successful Figure 5.10 shows the object after the phantom vertex has
been labeled. The line labels found are shown in Figure 5.11.

- The placement of v55 in the center of the arc is suboptimal. The phantom vertex
in the image is probably closer to v6 than v54. By using shape from shading, shape
from contour or another module that can provide normal directions for the surfaces,
it might be possible to provide better placement of the phantom vertex. In Malik and
Maydan [70], shape from shading is used to place the phantom vertices properly. A
similar idea could improve the placement of the phantom vertex in this image. There
are two other small errors in this image that the modiﬁed line labeling algorithm
cannot detect. The first is that v20 is labeled as an L junction. The labeling of
curves c2? and c23 is correct. The second ﬂaw is that v38 is labeled as a phantom
vertex. These are small mistakes that do not prevent the proper interpretation of the

image.

140

 

 

an
pﬁ's
r” x“-
x’ “a.
// a~.\
f/ 3“"-
.- a
lf'” I“
," ”fl—5x /
M 14
,r” ,x’ “ vi
/ t' x”
IE I) grid—“r3 J“
a.
t“. '1.
5“- "
.62 is. “K.
~. .
\ L
~_\ n“-
.\ .\_ A?
a, .45x ,6: '
K WM 1"
. ' 43.
‘x .9
an

 

 

 

(b)

Figure 5.9. (a) The input image. (b) The lines and curves.

141

 

 

 

 

 

 

no
’m.5
I '5..-
I '3‘...
I -‘-
ff’ \ ‘5.
r‘la ‘H'
.4' I!“
J-b— .l
I f f...” 3““ ’uda
-- d {W " x"
-a’f #M d
a .4”; Jr" “0 "a'ﬂp “3 '
+ "x 1.
5s“ '3
an em... "K.
x a“!
N.
‘5. s, Ks... _//7.
~~._ a... ,4» ‘
ﬁ'x‘. I? I"
‘ I
X. JV“
-—
ﬁt— ‘43
61

Figure 5.10. Phantom vertex v55 has been added. Curve C39 was broken into c63
and CM. The placement of the phantom vertex is at the center of the curve, which
may not be perceptually correct.

 

142

 

 

 

 

v a 4‘
«M's.
”r H- 3"~_\
4"! ‘~ a.“
f“ I
.. + 4’ ' m
. " I!“ 337 ,L
.-' ,J“
«a x mm“ an
§ “-~ u
'3’ ““3: “Ru
5 ‘w. I
”H. «a \x + A’ I
“~ . (am A: ‘
at.“ ‘9
F.“ 4‘.
as ch
as

Figure 5.11. The line labeling found from the object in Figure 5.9.

 

143

5.1.4 Other Line Labeling Improvements

There are other improvements that can be made to the line labeling that have not
been implemented yet. The first improvement is to remove unnecessary phantom
junctions from the line labeling. Unnecessary phantom junctions occur in images
when the continuity algorithm makes an extraneous break in an edge. This happens
most frequently when the edge is noisy. When these unnecessary breaks have their
tangents and curvatures calculated and are labeled, they are frequently labeled as
phantom junctions signifying that they have no signiﬁcant difference in either tangent
or curvature.

Removing these arcs is more than a cosmetic improvement, as can be seen in
Figure 5.12. In this ﬁgure, vertex v37 has been labeled as a curvature-L. This has
happened because the adjacent pair of curves c6, and C36 have different approxi-
mated curvatures, but the same tangents. When phantom junctions v36 and v38
are removed, the tangents and curvatures will be recalculated for the longer arcs. It
is likely that a relabeling of this junction at that point will produce more accurate
curvature values, and permit this vertex to be labeled as a phantom vertex, which
can then be removed.

The method for doing this removal is as follows. The two curves that are adjacent
to the phantom junction are joined by a line between the endpoints. This arc will
often have zero length, but can have nonzero length when several breaks occurred
within a small area in the image. Then the curves will be returned to the continuity
algorithm with an additional constraint: no breaks are to be made. The continuity
algorithm then will use the energy functional E in Equation 5.1 to prevent breaks from
being made. The original curve data is represented as 22;, u.- is the curve of minimum
energy. This function is identical to Equation 3.2, except that breaks are no longer

allowed. The energy minimization procedure is the same as before, including the use

 

 

144

 

 

 

C1.
JFA‘A
_.H' \.
1" I" x"
\.
x.‘ \
T‘
b "1. ~.
‘
“a. \ . H‘-
.‘ ~ \
"‘--. L" dB
at." ‘-.,, -., a
m" \x. \\ ﬂ,
‘ x. \.
N. G— \‘l.
J\.\ ‘5 \‘x
‘hx, 1. "‘
a \» fr”
‘~_ .441 /
W ,
, .-
"°- g ,.~"£I:
\ ,
at
'1.

 

 

(b)

Figure 5.12. (a) The input image. (b) The labeled lines and curves

145

of multi-grid relaxation. Since this energy functional is convex, multi-grid relaxation

was not strictly necessary.
E = 2(1‘.‘ - an)2 + A(u.- - um)2 + Mum - 2a.- + um)2 (5.1)

This will force the continuity module to smooth the curve, which will produce more
reliable curvature values and tangent directions. When the continuity algorithm has
finished smoothing the curve, the new curve will replace the old curves in a new ﬁgure
hypothesis, which will be relabeled to verify the solution.

In Chapter 4 it was noted that the vertex proximity module made some errors
in grouping vertices. When these errors are diagnosed they can be ﬁxed. Examples
of problems with the vertex proximity module were shown in Figures 4.13, where a
pair of proximate vertices were not grouped, and Figure 4.12 where a pair of vertices
that should have been disjoint were grouped. Both of these cases of mistaken junc-
tion identity can be separated from other instances, like the mislabeled junctions of
Section 5.1.1, by noticing that there is a curve adjacent to the vertices that is very
short in proportion to the other curves. To group the proximate vertices, this curve
needs to be removed. To separate the incorrectly grouped vertices, the endpoints of
this curve need to be separated. In this later case, if there is no separation between
the vertices, an accidental alignment has occurred. Line labeling offers no guarantees
when accidental alignments are involved. Figure 5.13 shows an exaggerated example

of these processes.

5.1.5 Control of Strategies

The repair knowledge sources are each capable of using their specialized knowledge
to correct exactly one type of problem in an image. Selecting which repair knowledge

source should be used first is the job of the controller knowledge source. The strategy

146

c2

 

c2

 

 

 

 

 

   

(a) (b)

Figure 5.13. (a) A pair of proximate vertices that should be grouped. Curve Cl is the
small arc that should be removed. The dashed boxes show the original neighborhood
for vertex proximity.(b) A pair of proximate vertices that should be separated. The
dashed box shows the original neighborhood for vertex proximity. Arc c1 will be used
to perform this separation.

implemented here is to use a ﬁxed ordering of the repair knowledge sources. A more
elaborate strategy will be necessary as the number of repair knowledge sources is
expanded.

The general strategy for ﬁxing these problems is to address only one problem
junction at a time. If there are several problems in the image, they will be solved
sequentially. A problem is identiﬁed as being ﬁxed by a repair knowledge source when
the vertex where the failure occurs moves to a different location. A list of vertices
where corrections have already occurred should be kept to prevent endless looping
around the image. If a vertex is ﬂagged as being a failure a second time, then the
processing should either stop or backtrack to the ﬁrst correction of the vertex, and try
a different correction. This is a search process and, like all search processes, it could
be computationally expensive. The more closely the repair knowledge sources can be

matched with the diagnosed vertex problems, the less risk of combinatoric explosion.

147
The following repair knowledge sources are available:
1. Re-type junctions
2. Replace missing arcs
3. Insert a phantom vertex
4. Break a vertex into two vertices
5. Join two close vertices
6. Remove a phantom junction

There are two criteria for selecting the ordering of repair knowledge sources. Re-
pair knowledge sources that can be uniquely identiﬁed, and are unlikely to be falsely
triggered should be used ﬁrst. Repair knowledge sources that are risky, meaning that
they could produce a correct line labeling without correcting the real mistake(s) in
the image, should be tried last.

An example of a repair knowledge source that can be uniquely identiﬁed is the
repair knowledge source that breaks vertices apart. It can be uniquely identiﬁed by
the presence of a junction with more than three incident arcs. Similarly the repair
knowledge source that groups proximate vertices together is well identiﬁed by the
presence of a curve incident to one of the junctions that is much shorter than the
other incident curves. These knowledge sources should be tried ﬁrst.

Risk is another consideration when ordering the repair knowledge sources. A
repair knowledge source is risky if it can make alterations that may produce a legal
line labeling without ﬁxing the underlying problem in the image. In some sense, all of
the repair knowledge sources are risky, but two of them are particularly so: the repair
knowledge source that inserts phantom vertices and the repair knowledge source that

replaces missing arcs. These strategies should be tried as a last resort. Whether

148

phantom vertices damage the combinatorics of line labeling has not been established,
but they do certainly do not improve the combinatorics and are risky in the sense
that they relax an important constraint: that an arc may have one and only one
label. Inserting missing arcs is also risky. The risk in this knowledge source can be
limited by checking the arc that gets inserted against the raw image data, to look for
a subtle edge that the edge detection algorithm may have missed, or using symmetry
constraints as will be discussed in Section 5.2.1.

The repair knowledge source the re-types junctions is neither unique nor risky. It
should therefore be tried after the repair knowledge sources that break apart a vertex
and joins close vertices and before the repair knowledge source that inserts phantom
junctions and missing edges.

The only remaining repair knowledge source is the phantom vertex remover. Since
it will not be triggered falsely, it has no risk and can be run at anytime. In the interest
of improving the line drawing, it should probably be run ﬁrst.

This gives the ordering used for triggering repair knowledge sources in the con-

troller.
1. Remove a phantom vertex
2. Break a vertex into two vertices
3. Join two close vertices
4. Re-type junctions
5. Insert a phantom vertex
6. Replace missing arcs

As the number of implemented repair knowledge sources grows, more subtle strate-

gies can be developed for selecting among the choices. If a test set of images with

 

 

 

149

ground truth is available, an analysis could be done of the probability of each repair
knowledge source ﬁxing a problem for each junction type. This type of experimentao
tion could lead to a better understanding of the frequency of each type of error and

more sophisticated strategies for calling repair knowledge sources.

5.1.6 Experimental Results And Conclusions

Since the full set of proposed repair knowledge sources has not been implemented,
testing of the repair knowledge sources was done by manually editing the blackboard
data structure. The author was careful to not ﬁx problems using high level knowledge
such as the desired ﬁnal line labeling and apply the repairs mechanically.

A comparison can be made between the problems that have been diagnosed in
Chapter 4 and the problems that can be repaired using the repair knowledge sources
in this chapter. This comparison is shown in Table 5.3. This table shows that most
of the diagnosed problems can be ﬁxed using the repair knowledge sources. The
extraneous edge problem that was not ﬁxed was in image 7 where a near accidental
alignment lead to an excessively cluttered area of the image. The overall performance
of this system in detecting correct line labelings will be discussed in Chapter 6.

The ﬁnal measure of the success of this system is the number of images in the
test bed of images in Appendix B where correct line labelings are found. Of the
twenty-two images in the test bed given in Appendix B, one image could not be
labeled, one image had a convex edge mistaken for a limb, and ﬁve images had limb
edges mistaken for jump edges, and one image had a missing are that wasn’t replaced.
Therefore fourteen of the twenty-two images were processed completely correctly. Of
the eight failures, only one produced no line labeling. In the other seven failures,
only one or two arcs were mislabeled. Integration and diagnosis have dramatically

improved the initial ﬁgure hypotheses, as can be seen in Table 5.4.

150

Table 5.2. A listing of the problems in the test bed of images from Appendix B
showing which problems are repaired. ME stands for missing edges. EE is extraneous
edges. GJ are grouped junctions. BJ are broken junctions. MB are missed breaks.
EB are extraneous breaks. CM are curvature mistakes. TM are tangent mistakes. MP
are missing phantom junctions. Notation a indicates a correction that was manually
made that may be beyond the scope of the repair knowledge sources discussed in this
chapter.

3
a:
Z
Z
"a

oo-qcscnaswwr—
OOOOOHO
HCOOOO
OOMt—ICO
CHCOOc
COOOOO
OGOOOO
¢OOHCO
NOHNOO
CCOCCH

HHHv—ito
wwwo
g‘cccco

H
uh

mmHo—IHi—a
Hocooouc:

0
0
0
l
1
0
0
15 1
3
0
2
0
O
0
0

“HMWOCOCOHOOCOO

M
N

 

«cowccr—cccccccoc
oowwccv—u—‘co-u-icccccc
ccccccocoocccccc
wwccccocou—cccooc
Scccccmv—Hcccocwc
wcoccwcoccccccco

O’HCNOCOHOO

p—o
N
H
y...

Table 5.3. A comparison between diagnosed problems and repaired problems in line
labelings.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Source ME“ EE GJ "Bi MB EB CM TM MP
ﬁgnosis 12 ll 7 8 0 3 7 12 2
Repair 12 11 7 8 o 2 6 10 2

 

 

 

 

151

Table 5.4. The result of the line labeling process. A “Yes” in the intial column means
that a proper line labeling was found from the initial ﬁgure hypothesis. A “Yes” in
the ﬁnal column means that a correct line labeling was found after correction. An
“A” in the ﬁnal column means that the line labeling was correct except for a single
error. A “B” means that a missing arc wasn’t replaced. A “No” in a column means
that no correct line labeling was found.

 

Image Initial Final
1 N 0 Yes

 

 

 

 

 

 

 

 

 

2 No Yes
3 No Yes
4 N 0 Yes
5 No Yes
6 No Yes
7 N o No
8 Yes Yes
9 No Yes

 

10 No A

11 No Yes
12 No Yes
13 No Yes
14 No
15 No
16 No Yes
17 No A

18 No Yes
19 No Yes

 

 

 

 

 

w>

 

 

 

 

 

 

 

 

20 No A
21 No A
22 No A

 

 

 

 

 

 

152

5.2 Other Useful Modules

To keep the implementation of the integration ideas manageable, some interesting
and useful modules were omitted from the implementation. Two of these modules
could be particularly useful in future expansions of the system: symmetry, and shape
from shading. Symmetry is a higher level module which can help in the diagnosis
of line labeling problems. Symmetry can also be used to improve the line labeling.
Shape from shading can provide the normal directions of surfaces, which is necessary
for a more complete 2.5 dimensional sketch. It can also be integrated with the line

labeling algorithm to improve the robustness of line labeling.

5.2.1 Symmetry

The only high level process that has been used in this system so far is line labeling.
Including a symmetry module in this work would provide another avenue for top
down analysis and control. The symmetry algorithm under consideration is Gross and
Boult’s SYMAN system [47, 46], which can offer a binary evaluation of skew symmetry
in an image. Both types of symmetry module can provide useful information to the
blackboard system.

A example illustrating where SYMAN could be useful is choosing between the
possible legal labelings found when ﬁlling in missing edges, as in Section 5.1.2. If
the tangent threshold were set differently, Figure 5.6 could have two legal labelings
instead of one. Both of these labelings would have a Y vertex in the center of the
block, which is legal and an arrow vertex at one of the corners. Line labeling cannot
distinguish between these two hypotheses, one of which is clearly preferable to the
other. Symmetry can make this distinction. In the hypothesis in Figure 5.7(c), an
asymmetric region was divided into two symmetric regions. The hypothesis of Fig-

ure 5.7(b) divides an asymmetric region into two asymmetric regions. Since symmetry

 

 

153

v1 v2

 

 

v4 v2

Figure 5.14. A skewed rectangle. Vertices v1 and v3 are more likely to be missed by
the continuity algorithm than vertices v2 and v4. If v2 and v4 are found, v1 and v3
can be found using symmetry information.

is preferred over asymmetry, the hypothesis of Figure 5.7(c) is better than that of
Figure 5.7(b).

Although the same idea could be applied to Figure 5.3, it would not be needed.
Even if the tangent threshold were low enough to force the center vertex to change the
T label, the junction label found would be an arrow. An arrow junction is not legal
at this position, as a look at the junction catalog in Figure 4.6 will verify. Symmetry
could still be used to select the hypothesis.

Figure 5.6 shows another place where symmetry could be useful. In this image, a
corner was not detected along curve c5. Once the missing edge has been reinstated, as
is shown in Figure 5.7(c) and discussed above, this missing vertex could be detected
using the symmetry of the region bounded by v29, v38, and v31. The problem occurs
frequently in images. Figure 5.14 shows a skew symmetric rectangle. The vertices at
v2 and v4 are much sharper than those at v1 and v3. Since the continuity algorithm
essentially uses a curvature threshold to break corners, vertices v1 and v3 are more
likely to be missed.

Line labeling can also provide useful information to the symmetry module. In the
case of occlusion, as in Figure 5.15, line labeling can inform the symmetry module the

curve C] is an occluding curve that is part of another region and should not be used in

 

 

154

 

 

c1

c3

 

 

c4

 

Figure 5.15. If curve cl is labeled as a limb edge, it can be separated from c2, c3 and
c4 when symmetry is being analyzed.

the determination of whether the region is symmetric or not. There are cases where
this cannot be helpful, as in Figure 5.16. In this instance, the occlusion has blocked
most of the symmetric pairs in the curve from being seen. It is very unlikely that

SYMAN would agree that the region bounded by cl, c2, c3, c4 and c5 is symmetric.

5.2.2 Shape from Shading

Shape from shading uses the detected contours in the image, a model of the illumina-
tion, and the image intensities and reconstructs the surface normals. The integration
of shape from shading with line labeling was done in [70]. In this work, shading in-
formation is combined with a complete, correct, labeled line drawing to give surface
normals. The algorithm is tested only on synthetic images. Shape from shading could
be used in at least two different ways to enhance this system.

Shape from shading can be used to improve the performance of this system on
three tangent and curvature-L junctions. This is one place where the performance of
this system is lacking. Since limb edges have a signature that is different from jump
edges, as was shown in [70], they may be identiﬁed before junction typing begins. If

this can be done, many of the problems with curvature-L junctions and three tangent

 

155

 

 

c1 c5
PA

 

 

 

 

 

Figure 5.16. The object on the left has been occluded by the cube. It is unlikely that
the region bounded by c1, c2, c3, c4, and c5 will be seen ﬂagged as symmetric.

junctions will be eliminated. A vertex with three incident one of which is a limb edge
must be labeled as a three tangent junction. A vertex with two incident edges one of
which is a limb edge must be labeled as a curvature-L junction.

Another advantage of the early labeling of limb edges would be that the detection '
of curvature-L junctions could be improved. Figure 4.16 shows an image of a right
circular cylinder discussed earlier. Both of the curvature-L junctions on curve c2 were
not detected. Shape from shading might be able to discern that the ends of this curve
are limb edges while the central portion is a jump edge. This information could be
used directly to break the curve, or could be sent to the continuity algorithm. The
continuity algorithm could either be asked to return the single most likely breakpoint
by ﬁrst computing a solution without breaks, as was discussed in Section 5.1.4. This
solution could then be searched for the single point with the highest energy in a small
area around the place where shape from shading says that the limb edge changes to a
jump edge. Another approach is to have the continuity algorithm drop the threshold

for breaks so break points would be more likely.

 

 

156

Table 5.5. The computational complexity of the algorithms discussed in this thesis.
V is the number of vertices. A is the number of arcs. N is the number of pixels in the
image. M is the number of edge points found by the weak membrane edge detector.

p 'ty
W Mem N
tinuity M
ertex roximity
Vertex
M ' Line ' 0

V
V

 

A more mundane use of shape from shading would be to provide the normal
directions that would be necessary for a more complete 2.5 dimensional sketch. This

could be done as in [70], after the line labeling has been successfully performed.

5.3 Computational Complexity, and Timings

The computational complexity of the algorithms discussed in this thesis are sum-
marized in Table 5.5. Table 5.6 gives the computational complexity of the repair
knowledge sources. All computational complexities are worst case. The computa-
tional complexity of the modiﬁed line labeling algorithm is exponential in theory,
although it is much more efﬁcient when a background region is present. As Table 4.8
shows, when a background region is given there are unique labelings for most outside
junctions. These unique outside labelings propagate to unique labelings for inside
junctions in many cases. This improves the average performance of the modiﬁed
line labeling algorithm, but does not change the worst case complexity as given in
Table 5.5. i

The algorithms discussed in this thesis were implemented in Lisp, C, and Fortran
on a Sparcserver 690MP. The majority of the implementation was done in Allegro

Common Lisp version 3.1.4, using the GBB blackboard system shell version 1.2. The

 

 

 

157

Table 5.6. The computational complexity of the repair knowledge sources. AA is the
number of vertices in the adjacent regions which have have two incident arcs. N are
the number of pixels in the combined curves.

vertices
' arcs

p tom vertices

extra vertices
Grou vertices
B vertices

 

lisp was compiled before timings were done. C was interfaced to Lisp to improve the
execution speed of the program. Most array operations were implemented in C, as
was the energy minimization in the continuity algorithm. Fortran was only used for
the program that performs the Voronoi tesselation.

The run time of the edge detection, thinning and connected components algo-
rithms are all independent of the contents of the image, run times are given for one
image (image 2 in the test set in Appendix B). The weak membrane edge detection ran
in 214 seconds (3.5 minutes). The thinning took .7 seconds. The connected compo-
nents algorithm took 10.4 seconds. Table 5.7 shows the run time for the blackboard
system after weak membrane edge detection, thinning, and connected components
have been performed. The run times of these algorithms were found using the time

command in Lisp. All times exclude garbage collection.

 

 

158

Table 5.7. The run times for the sample images from Appendix B in minutes.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Image Time
1 6.3
2 3.2
3 5.7
4 12.8
5 3.9
6 3
7 15
8 3
9 2.6
10 1.7
11 2.8
12 2.9
13 3.9
14 4.9
15 7.9
16 9.5
17 7.2
18 6.5
19 10
20 14.6
21 4.6
22 7.2

 

 

CHAPTER 6

Conclusions and Future Work

This thesis described the POLL system for obtaining labeled line drawings from single
intensity images using an integrated blackboard system. The processing in the POLL
system is consistent with Lowe’s additions to the Marr object recognition paradigm.
The modules that were integrated in POLL are weak membrane edge detection, a
continuity grouping algorithm, and line labeling. Suggestions of other modules to be
included in an enhanced system include shape from shading, and symmetry detection.
The current system is able to generate an initial line drawing from an intensity image
in the domain of piecewise smooth objects. This initial line drawing is obtained by
integrating edge detection, a continuity grouping algorithm, and junction grouping.

The initial line drawings are then analyzed using a modiﬁed line labeling algo-
rithm that gives diagnostic information to the global blackboard. The diagnostic
information gives a location in the image where an inconsistency in the line labeling
occurred. The modiﬁed line labeling algorithm can ﬁnd errors such as missing edges,
improperly typed vertices, and missing phantom junctions. In almost all cases, the
error in the initial line labeling was diagnosed at the proper junction. In a small
number of cases, the error was at an adjacent junction. In no case was the actual
error more than one junction away from the diagnosed location.

A collection of repair strategies were created that can modify the initial line la-

159

 

 

160

beling in search of a more consistent solution. Repair strategies for missing edges,
improperly typed vertices, and missing phantom junctions were implemented. Repair
strategies for improving the grouping of junctions, and removing extraneous junctions
were discussed. Perfect line drawings remain an elusive goal in computational vision.
This system represents another step towards reaching this goal.

This system is unique in that it uses piecewise smooth objects, instead of the
more restrictive planar or quadric surface models that are often used, particularly
in range imagery as in [37, 62]. This enhances the generality of the system, but
also greatly increases the difficulty of the problem. Some images of objects with
non—quadric surfaces were used to test the system. The system also works without
reference to a set of object models. This distinguishes the system from efforts like the
Schema blackboard system [31], where object models were extensively used. Another
difference between this work and the Schema system is the scale of the project. The
Schema system was developed by a research group over a period of years, unlike the
system in this thesis that was developed by a single researcher.

The POLL system correctly processes sixteen of the twenty-two images presented
in the test bed of images in Appendix B. Of the remaining six, ﬁve have limb edges
mistaken for jump edges and one was not processed well. A suggestion for improving
the performance of the POLL system on limb edges was discussed in Section 5.2.2.

Mohan and Nevatia have an impressive system for perceptual organization [75],
as described in Section 1.1.2. The Mohan and Nevatia system performs perceptual
organization in a more complex environment than the POLL system described in
this thesis. The objects imaged have surface markings and specularities which POLL
does not permit since Malik’s curvilinear line labeling scheme cannot interpret surface
markings. The Mohan and Nevatia system does not provide a line labeling, or any
other 3D information from the perceptual organizations, unlike POLL. An interesting

result of the Mohan and Nevatia system is that individual objects are segmented,

 

 

161

even in cluttered environments. This segmentation would be particularly useful in
the present system since it would permit the ﬂoating object heuristic to be used on
all objects in the scene individually. This would correspond to the objects each being

ﬂoating in space, but occluding each other.

6.1 Contributions of The Thesis

The contributions of this thesis fall into three categories. In the ﬁrst category are
the early decisions regarding the selection of the framework and modules, and the
combination of modules that create the original interpretation of the image data. The
second category of contributions consist of contributions to the modules that were
integrated. The ﬁnal category is a set of strategies used to analyze the diagnostic

information returned by the line labeling algorithm to improve the line drawing.

0 Establishment of the Framework for Integrated Perceptual Organization

0 Selection of blackboard system framework

0 Combination of initial modules to provide the ﬁrst ﬁgure hypothesis
0 Contributions to Modules

0 Development of the continuity module
0 Contributions to line labeling module

0 Deﬁnition of the ﬂoating object heuristic

a Modiﬁcation of line labeling algorithm to diagnose line labeling prob-

lems
0 Integration methods for repairing diagnosed junction problems

0 Method for altering junction types that are close to threshold

 

 

162

o Insertion of necessary phantom vertices into line drawings

0 Strategy for placing missing edges in a line labeling

6.2 Future Work

This system was designed to be expanded into a larger and more robust system as
additional modules are integrated. The potential beneﬁts of incorporating shape
from shading and symmetry detection were discussed at length in Section 5.2. Of
these two modules, the shape from shading module holds the most promise for rapid
improvement of the system since it will improve the handling of three tangent and
curvature-L junctions that are not well handled in the present system. It would
also be interesting to integrate a second type of edge detection to ameliorate some
of the difﬁculties with the weak membrane edge detection, in particular, its lack
of smoothness of edges. Canny edge detection produces very smooth edges, and is
more compatible with the continuity algorithm. Corner and vertex detection can
be performed on a gray scale image, as done for trihedral vertices in [43]. This
could improve the system by providing vertex and corner information from a second
independent source.

There is also some interesting work to be done in the control of the knowledge
sources for ﬁxing line labeling problems. A small variety of strategies for improving
line labelings were demonstrated in this work. As the system expands, it is likely
that the number of strategies for improving the line labeling will also expand. This
will require a better method for choosing which of the strategies should be selected
to try to ﬁx the line labeling.

There is some interesting work going on in the parallel implementation of black-
board systems [54]. Vision is the perfect domain for parallelism because of the high

computational demands and potential for massive parallelism in visual problems.

 

 

 

APPENDICES

 

APPENDIX A

Parameters

Many of the modules and algorithms used in this thesis have parameters. These
parameters were set empirically, after experimentation with the test bed of images
which will be shown in Appendix B. In order to make this work reproducable, all of
the parameters used are given in this appendix. The parameters are in the order of
ﬁrst discussion.

Section 3.2 describes the continuity grouping algorithm of Trytten and Tuceryan.
The parameters for this algorithm fall into three categories: parameters for the energy
formulation of Equation 3.2, parameters for Canny edge detection, and the multi-
grid relaxation parameters. These parameters and their physical interpretations are
summarized in Table A.1.

In Section 3.3.1, edge detection was performed using the weak membrane edge
model and the GNC algorithm for energy minimization. The parameters for the edge
detection are shown in Table A2. Curve hypotheses were created for the boundary of
regions that were sufﬁciently large, the speciﬁc numerical deﬁnition of large is given
in Table A2 Edgels that were too short were removed from consideration. The
parameter for this is also given in Table A2.

Section 3.3.2 also used a variety of parameters. The circular arc ﬁtting parameters

were used to determine how many pixels near the endpoint of a curve should be used

163

 

 

164

Table A.1. The parameters for the continuity grouping algorithm of Section 3.2.

A1 rit arameter
anny ° 0
size

tinuity

ratio
Multi-grid relaxation number of resolutions
num iterations
c
maXimum iterations
size

 

Table A.2. The parameters from Section 3.3.1

Parameter

num iterations

Curve creation Small ' 20 °
urve creation 5 '

 

 

 

165

Table A.3. The parameter values used in Section 3.3.2.

arameter
ertex proximity
' arc tting minimum
max1mum

ear tting minimum

maXimum p'
Vertex typing t
t 'us curvature more 1000
us curvature to
us curvature

 

for approximating the curvature and tangent direction. The curvature values were
found to be unreliable, and were assigned to three classes: straight, curved and very
curved. These symbolic curvatures were used in all comparisons.

In Section 5.1.3, a parameter for the number of pixels that separate a line from

an arc is discussed. The value for this parameter is three pixels.

 

 

APPENDIX B

Test Bed of Images

The POLL system was tested on twenty-two intensity images captured using a Mac-
intosh Il-fx in the PRIP Laboratory of Michigan State University. These images are
labeled and shown in Figure B.1, Figure B.2, Figure B.3, and Figure B4.

The variety and complexity of these images is characterized by examining the
number of objects in each imaged scene, as well as the number and type of the surfaces,
junctions, and arcs that are visible. Table B.l gives the raw data on the number of
objects, junctions, arcs, and surfaces that are visible in the image. Table B.2 shows
the number of each type of junction visible in the images. Table B.3 shows the
number of arcs of each interpretation that are visible in the image. The surface types
are shown in Table B.4. Although planar surfaces are also quadric, only surfaces that
are quadric and not planar will be listed as quadric. The surfaces that are labeled
as planar in Table B4 are often not absolutely planar. Most of the side surfaces of
the bath tub toys used in Images 1 to 18 have shrunk and are curved. These surfaces

were still labeled as planar in Table B4.

166

 

 

167

   

Figure B.l. (3.) Image 1. (b) Image 2. (c) Image 3. (d) Image 4. (6) Image 5. (f)
Image 6.

 

168

   

(0

Figure B.2. (a) Image 7. (b) Image 8. (c) Image 9. (d) Image 10. (e) Image 11. (f)
Image 12.

 

169

 

Figure B.3. (a) Image 13. (b) Image 14. (c) Image 15. (d) Image 16. (e) Image 17.
(f) Image 18.

170

 

Figure B.4. (a) Image 19. (b) Image 20. (c) Image 21. (d) Image 22.

 

171

Table B.1. A characterization of the complexity of the images used in this thesis.

jects unctions
13

7

12
27

5

 

NwHi—HHHHH i—IHQDOONQCHAODMH
Hocoooqczcnuxw HO
mgﬁgﬁgathmqﬁa
“wcqaawﬁqoﬁgo
wwwwqcoouxwuwwwwawwgpwm

«mosaic

 

1
1
2
3
1
1
2
1
l
1
l
12 1
l
2
3
3
3
2
1
2
1
1

N
M

 

172

Table B.2. A characterization of the visible junctions in the image used in this thesis.
TM stands for terminal junctions. P are phantom junctions. CL are curvature L
junctions. L are L junctions. 3T are three tangent junctions. A are arrow junctions.
Y are Y junctions. T are T junctions.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

‘ Image Class 1 Class 2 Class 3
TM PCLL 3TA YT
1 0 10 4.0 5 21
2 | o 0 0 3 0 310
3 0 0 0 6. 0 4 0 2
4 0 [[0 010 0 7 2 8
5 0 0 0 3 0 2 0 0
6 0 0 0 4|0 2 0 0
7 0 0 010|0 416
8 0 0 0 3|0 310
9 0 “a 0 3|0 2 0 0
10 0 0 2 0|2 0 0 0
11 | 0 0 0 2|0 310
12 | |0 0 3|0 310
13 | |0 0 3|0 310
14 | |01 3|12 0 3
15 | 0 0 0 5|0 7 3 6
16 | 0 0 0 6|0 9 3 4
17 |n|o 1 4|16 3 5
18 |u|1o em 5 2 4
19 | |0 2 0|0 0 0 4
20 |n|0 2 0|2 0 0 2
21 |n|o 2 0|2 0 0 2
22 |n|o 2 0|0 0 0 2
Total |n| 2 12 78| 8 70 2149

 

 

 

173

Table B.3. A characterization of the visible arcs in the images used in this thesis.
Cvx stands for convex edges. Ccv is concave edges.

1:
3

 

0046:0116-me

r—ti—Ito
HO

12

NHHHHHHU—i

ccmqmmpw
tot—t .— 591—4 H
acaao‘wo‘aao’m‘ouah.

N
H
QUOQOHOCHOOONOOOOOODOO

 

N
N

 

ﬁcwwcqooowwwwwwwmwwmmwmg

174

Table B4. A characterization of the visible surfaces in the images used in this thesis.

mNOQCﬂthDMh-b

HCO
O

MMHl—‘HHHHHH
Hocoooqacnm-wm

4
3
4
9
2
2
6
3
2
1
11 3
3
3
3
9
9
8
6
0
0
l
1

{\D
[Q
O’COOOHHOOHOOOHOOOOCHOOH

QMNWNOOCOOOOODOOCOOOOOO

00
IO

 

BIBLIOGRAPHY

 

BIBLIOGRAPHY

[1] A. L. Abbott and N. Ahuja. Surface reconstruction by dynamic integration of
focus, camera vergence, and stereo. In Proceedings of the Second International
Conference on Computer Vision, pages 532—543, 1988.

 

[2] N arendra Ahuja. Dot pattern processing using Voronoi neighborhoods. IEEE
Transactions on Pattern Analysis and Machine Intelligence, PAMI-4(3):336-
343, May 1982.

[3] J. Aloimonos. Uniﬁcation and integration of visual modules: An extension
of the Marr paradigm. In Proceedings of the DARPA Image Understanding
Workshop, pages 507-551, 1989.

[4] J. Aloimonos and A. Basu. Combining information in low-level vision. In
Proceedings of DARPA Image Understanding Workshop, pages 862-906, 1988.

[5] J. Aloimonos and D. Shulman. Integration of Visual Modules. Academic Press,
San Diego, California, 1989.

[6] D. H. Ballard. Generalizing the Hough transform to detect arbitrary shapes.
Pattern Recognition, 13:111-122, 1981.

[7] D. H. Ballard and C. M. Brown. Computer Vision. Prentice Hall, New Jersey,
1982.

[8] H. B. Barlow and B. C. Reeves. The versatility and absolute efficiency of
detecting mirror symmetry in random dot displays. Vision Research, 19:783-
793, 1979.

 

[9] HO. Barrow and J .M. Tenenbaum. Interpreting line drawings as three dimen-
sional surfaces. Artiﬁcial Intelligence, 17:47-75, August 1981.

[10] P. J. Besl and R. C. Jain. Threedimensional object recognition. Computing
Surveys, 17(1):75—l45, March 1985.

175

176

[11] I. Biederman. Recognition-by-components: A theory of human image under-
standing. Psychological Review, 94(2):115-147, 1987.

[12] A. Blake. Comparison of efﬁciency of deterministic and stochastic algorithms
for visual reconstruction. IEEE Transactions on Pattern Analysis and Machine
Intelligence, ll(1):2—12, January 1989.

[13] A. Blake and A. Zisserman. Visual Reconstruction. The MIT Press, Cambridge,
Massachusetts, 1987.

[14] A. Blake, A. Zisserman, and G. Knowles. Surface descriptions from stereo
and shading. In B. K. P. Horn and M. J. Brooks, editors, Shape from Shad-
ing, chapter 2, pages 29—52. Massachusetts Institute of Technology, Cambridge,
Massachusetts, 1989.

[15] D. Blostein. Recovering the Orientation of Textured Surfaces in Natural Scenes.
Ph.D. thesis, University of Illinois, 1987.

[16] D. Blostein and N. Ahuja. Shape from texture: Integrating texture-element
extraction and surface estimation. IEEE Transactions on Pattern Analysis and
Machine Intelligence, ll(12):l233—1251, December 1989.

[17] H. Blum. A transformation for extracting new descriptors of shape. In
W. Wathen-Dunn, editor, Models for the Perception of Speech and Visual Form,

pages 362—380. MIT Press, Cambridge, Massachusetts, 1967.

[18] M. Brady and H. Asada. Smoothed local symmetries and their implementation.
International Journal of Robotics Research, 3(3), 1984.

[19] J. Brolio, B. A. Draper, J. R. Beveridge, and Allen R. Hansen. ISR a database
for symbolic processing in computer vision. Computer, pages 22—30, Dec 1989.

[20] R. A. Brooks. Symbolic reasoning among 3 D models and 2 D images. Artiﬁcial
Intelligence, 17:285-348, 1981.

[21] R. A. Brooks. Model-based 3d interpretations of 2d images. IEEE' Transactions
on Pattern Analysis and Machine Intelligence, PAMI-5zl40—150, 1983.

[22] L. G. Brown and H. Shvaytser. Surface orientation from projective foreshort-
ening of isotropic texture autocorrelation. In Proceedings of IEEE Computer
Vision and Pattern Recognition Society, pages 510—515, Ann Arbor, Michigan,
1988.

 

 

177

[23] J. B. Burns, A. R. Hanson, and E. M. Riseman. Extracting straight lines. In
M. A. Fischler and O. Firschein, editors, Readings in Computer Vision, Issues,
Problems, Principles, and Paradigms, pages 180—183. Morgan Kaufmann, 1987.

[24] J. Canny. A computational approach to edge detection. IEEE Transactions on
Pattern Analysis and Machine Intelligence, PAMI-8(6), November 1986.

[25] I. Chakravarty. A generalized line and junction labeling scheme with applica-
tions to scene analysis. IEEE Transactions on Pattern Analysis and Machine
Intelligence, PAMI-1(2):202-205, April 1979.

[26] I. Chakravarty. The use of Characteristic Views as a Basis for Recognition
of Three-Dimensional Objects. Ph.D. thesis, Rensselaer Polytechnic Institute,
Troy, New York, 1982.

[27] SW. Chen. SD Representation and Recognition Using Object Wings. Ph.D.
thesis, Michigan State University, East Lansing, Michigan, 1989.

[28] R. T. Chin and C. R. Dyer. Model-based recognition in robot vision. Computing
Surveys, 18(1):67-108, March 1986.

[29] M. B. Clowes. On seeing things. Artiﬁcial Intelligence, 2:79—116, 1971.

[30] R. Deriche. Using Canny’s criteria to derive a recursively implemented optimal
edge detector. International Journal of Computer Vision, 1:167—187, 1987.

[31] B. A. Draper, R. T. Collins, J. Brolio, A. R. Hanson, and E. M. Riseman. Issues
in the development of a blackboard-based schema system for image understand-
ing. In R. Engelmore and T. Morgan, editors, Blackboard Systems, chapter 8,
pages 189-218. Addison Wesley, 1988.

[32] R. Engelmore and T. Morgan. Blackboard Systems. Addison Wesley, New York,
1988.

[33] L. D. Erman, F. Hayes-Roth, V. R. Lesser, and D. R. Reddy. The hearsay—II
speech-understanding system: Integrating knowledge to resolve uncertainty. In
R. Engelmore and T. Morgan, editors, Blackboard Systems, chapter 3, pages
31—86. Addison Wesley, 1988.

[34] G. Falk. Interpretation of imperfect line data as a three dimensional scene.
Artiﬁcial Intelligence, 3:101—144, 1972.

 

178

[35] F. P. Ferrie and M. D. Levine. Where and why local shading works. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, 11(2):l98-206, February
1989.

[36] P. J. Flynn and A. K. Jain. On reliable curvature estimation. In Proceedings
of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, pages 110-116, San Diego, California, June 1989.

[37] Patrick J. Flynn. CAD-Based Computer Vision: Modeling and Recognition
Strategies. Ph.D. thesis, Department of Computer Science, Michigan State
University, 1990.

[38] R. T. Frankot and R. Chellappa. A method for enforcing integrability in shape
from shading algorithms. In B. K. P. Horn and M. J. Brooks, editors, Shape
from Shading, chapter 5, pages 89—122. The MIT Press, 1989.

[39] S. A. Friedberg. Finding axes of skewed symmetry. Computer Vision, Graphics,
and Image Processing, 34:138—155, 1986.

[40] K. Q. Gallagher, D. D. Corkill, and P. M. Johnson. be reference manual.
Technical Report 88-66, University of Massachusetts at Amherst, Amherst,
Massachusetts, July 1988.

[41] E. B. Gamble, D. Geiger, T. Poggio, and D. Weinshall. Integration of vision
modules and labeling of surface discontinuities. IEEE Transactions on Systems,
Man, and Cybernetics, 19(6):1576-1581, November/December1989.

[42] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and
Bayesian restoration of images. IEEE Transactions on Pattern Analysis and
Machine Intelligence, PAMI-6(6):721-741, November 1984.

[43] G. Giraudon and R. Deriche. On corner and vertex detection. In Proceedings
of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, pages 650—655, Mauii, Hawaii, 1991.

[44] R. C. Gonzalez and P. Wintz. Digital Image Processing. Addison-Wesley Pub—
lishing Company, Reading, Massachusetts, 2 edition, 1987.

[45] W. E. L. Grimson and D. P. Huttenlocher. On the sensitivity of the Hough
transform for object recognition. In Proceedings of the Second International
Conference on Computer Vision, pages 700-706, Miami, Florida, 1988.

 

 

179

[46] A. D. Gross. Shape from a symmetric universe. Technical Report CUCS—065-
90, Columbia University, New York City, New York, 1990.

[47] A. D. Gross and T. E. Boult. Analyzing skewed symmetries. Technical Report
CUCS—064-90, Columbia Univerisity, New York City, New York, 1990.

[48] B. Hayes-Roth, B. Buchanan, O. Lichtarge, M. Hewett, R. Altman, J. Brinkley,
C. Cornelius, B. Duncan, and O. Jardetzky. PROTEAN: Deriving protein

structure from constraints. In R. Engelmore and T. Morgan, editors, Blackboard
Systems, pages 417-431. Addison Wesley, 1988.

[49] B. K. P. Horn. Obtaining shape from shading information. In Shape from
Shading, pages 123-173. The MIT Press, Cambridge, Massachusetts, 1989.

[50] P. V. C. Hough. Method and means for recognizing complex patterns. U. S.
Patent, no. 3069654, 1962.

[51] D. A. Huffman. Impossible objects as nonsense sentences. In B. Meltzer and
D. M. Michie, editors, Machine Intelligence, volume 6, pages 295-323. Edin-
burgh University Press, 1971.

[52] D. P. Huttenlocher and S. Ullman. Object recognition using alignment. In
Proceedings of the First International Conference on Computer Vision, pages
102-111. IEEE, Jun 1987.

[53] J. Illingworth and J. Kittler. A survey of the Hough transform. Computer
Vision, Graphics, and Image Processing, 44:87-116, 1988.

[54] V. Jagannathan, R. Dodhiawala, and L. S. Baum. Blackboard Architectures and
Applications. Harcourt Brace Jovanovich, Boston, 1989.

[55] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall,
Englewood Cliffs, New Jersey, 1988.

[56] Y. C. Jan and R. T. Chin. Shape from texture using the Wigner distribution.
In Proceedings of the IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, pages 515-523, Ann Arbor, Michigan, June 1988.

[57] B. Jenkins. Redundancy in the perception of bilateral symmetry in dot textures.
Perception and Psychophysics, 32(2):171-177, 1982.

[58] T. Kanade. A theory of origami world. Artiﬁcial Intelligence, 13:279-311, 1980.

180

[59] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models.
International Journal of Computer Vision, pages 321-331, 1988.

[60] B. Kim and P. Burger. Calculation of surface position and orientation using
the photometric stereo method. In Proceedings of IEEE Computer Vision and
Pattern Recognition Society, pages 492-498, Ann Arbor, Michigan, 1988.

[61] K. Koffka. Principles of Gestalt Psychology. Harcourt Brace, New York, 1935.

[62] Greg C. Lee. Reconstruction of Line Drawing Graphs for Objects with Quadric
Surfaces using Wing Features. Ph.D. thesis, Michigan State University, East
Lansing, Michigan, 1992. I

[63] M. D. Levine. Vision in Man and Machine. Computer Engineering Series.
McGraw Hill, New York, 1985.

[64] F. Leymarie and M. D. Levine. Simulating the grassﬁre transform using an
active contour model. IEEE Transactions on Pattern Analysis and Machine
Intelligence, l4(1):56—76, January 1992.

[65] M. Leyton. A process—grammar for shape. Artiﬁcial Intelligence, 34(2):213—247,
March 1988.

[66] D. G. Lowe. Three—dimensional object recognition from single two-dimensional
images. Artiﬁcial Intelligence, 31:355—395, 1987.

[67] David G. Lowe. Perceptual Organization and Visual Recognition. Kluwer Aca-
demic Publishers, Boston, 1985.

[68] A. K. Macworth. Interpreting pictures of polyhedral scenes. Artiﬁcial Intelli-
gence, 4:121—137, 1973.

[69] J. Malik. Interpreting line drawings of curved objects. International Journal of
Computer Vision, 1:73-103, 1987.

[70] J. Malik and D. Maydan. Recovering three—dimensional shape from a single
image of curved objects. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 11(6):555-566, June 1989.

[71] G. Marola. On the detection of the axes of symmetry of symmetric and almost
symmetric planar images. IEEE Transactions on Pattern Analysis and Machine

Intelligence, 11(1):]04-108, January 1989.

181

[72] D. Marr. Vision. W.H. Freeman and Co., San Francisco, 1982.

[73] J. D. McCafferty. Human and Machine Vision, Computing Perceptual Organi-

zation. Ellis Horwood Series in Digital and Signal Processing. Ellis Horwood,
New York, 1990.

[74] R. Mohan and R. Nevatia. Using perceptual organization to extract 3D
structures. IEEE Transactions on Pattern Analysis and Machine Intelligence,
PAMI—ll:1121—1139, November 1989.

[75] R. Mohan and R. Nevatia. Perceptual organization for scene segmentation and

description. IEEE Transactions on Pattern Analysis and Machine Intelligence,
l4(6):616-635, June 1992.

[76] H. Moravec. Sensor fusion in certainty grids for mobile robots. AI Magazine,
pages 61-74, Summer 1988.

[77] Sateesha G. Nadabar. Markov Random Field Contextual Models in Computer
Vision. Ph.D. thesis, Michigan State University, East Lansing, Michigan, 1992.

[78] M. Nagao, T. Matsuyama, and H. Mori. Structural analysis of complex aerial
photographs. In R. Engelmore and T. Morgan, editors, Blackboard Systems,
chapter 9, pages 189—218. Addison Wesley, 1988.

[79] V. S. Nalwa. Line—drawing interpretation: A mathematical framework. Inter-
national Journal of Computer Vision, 2:103-124, 1988.

[80] V. S. Nalwa. Line-drawing interpretation: Bilateral symmetry. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, 11(10):1117—1120, October
1989.

[81] R. Navatia and T.O. Binford. Description and recognition of curved objects.
Artiﬁcial Intelligence Journal, 8:77-98, 1977.

[82] H. P. Nii, E. A. Feigenbaum, J. J. Anton, and A. J. Rockmore. Signal-to—symbol
transformation: HASP/SIAP case study. In R. Engelmore and T. Morgan,
editors, Blackboard Systems, chapter 6, pages 135-157. Addison Wesley, 1988.

[83] T. Pavlidis. Why progress in machine vision is so slow. Pattern Recognition
Letters, 13:221-225, 1992.

 

182

[84] T. Pavlidis and C. Liow. Integrating region growing and edge detection. IEEE
Transactions on Pattern Analysis and Machine Intelligence, PAMI-12, March
1990.

[85] G. Pearson. Mission planning within the framework of the blackboard model.
In R. Engelmore and T. Morgan, editors, Blackboard Systems, chapter 21, pages
433—442. Addison Wesley, New York, 1988.

[86] A. P. Pentland. Perceptual organization and the representation of natural form.
Artiﬁcial Intelligence, 28:293-331, 1986.

[87] A. P. Pentland. Local shading analysis. In B. K. P. Horn and M. J. Brooks,
editors, Shape from Shading, chapter 15, pages 443-488. Massachusetts Institute
of Technology, Cambridge, Massachusetts, 1989.

[88] P. Perona and J. Malik. Scale space and edge detection using anisotropic dif-
fusion. In Proceedings of the IEEE Computer Society Workshop on Computer
Vision, pages 16—22, Miami, Florida, 1987.

[89] T. Poggio, V. Torre, and C Koch. Computational vision and regularization
theory. Nature, 317:638-643, 1985.

[90] J. Ponce. Ribbons, symmetries, and skewed symmetries. In Proceedings of the
DARPA Image Understanding Workshop, pages 1074—1079, Cambridge, Mas-
sachusetts, 1988.

[91] J. Ponce. On characterizing ribbons and ﬁnding skewed symmetries. Computer
Vision, Graphics, and Image Processing, 52:328—340, 1990.

[92] F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction.
Springer-Verlag, 1985.

[93] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical
Recipes in C: The Art of Scientiﬁc Computing. Cambridge University Press,
1988.

[94] Narayan S. Raja. Obtaining Generic Parts from Range Data Using a Multi-
view Representation. Ph.D. thesis, Michigan State University, East Lansing,
Michigan, 1992.

[95] V. S. Ramachandran. Perceiving shape from shading. Scientiﬁc American,
pages 76—83, August 1988.

183

[96] K. Rao. Shape Description from Sparse and Imperfect Data. Ph.D. thesis,
University of Southern California, Los Angeles, California, December 1988.

[97] Irvin Rock. The Logic of Perception. The MIT Press, Cambridge, Mas-
sachusetts, 1982.

[98] A. Rosenfeld. Axial representations of shape. Computer Vision, Graphics, and
Image Processing, 33:156—173, 1986.

[99] A. Rosenfeld and A. C. Kak. Digital Picture Processing. Academic Press, New
York, 1976.

[100] P. L. Rosin and G. A. W. West. Detection and veriﬁcation of surfaces of rev-
olution by perceptual grouping. Pattern Recognition Letters, 13:453-461, June
1992.

[101] T. Sakai, M. Nagao, and H. Matsushima. Extraction of invariant picture sub-

structures by computer. Computer Graphics and Image Processing, 1:81—96,
1972.

[102] S. Sarkar and K. L. Boyer. A highly efﬁcient computational structure for percep-
tual organization. Technical Report SAMPL—90—06, The Ohio State University,
November 1990.

[103] E. Saund. Labeling of curvilinear structure across scales by token grouping. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion, pages 257—263, Champaign, Illinois, 1992.

[104] P. Suetens, P. Fua, and A. J. Hanson. Computational strategies for object
recognition. ACM Computing Surveys, 24(1):5—61, March 1991.

[105] K. Sugihara. Machine Interpretation of Line Drawings. The MIT Press Series
in Artiﬁcial Intelligence. The MIT Press, Cambridge, Massachusetts, 1986.

[106] D. Terzopoulos. Image analysis using multigrid relaxation. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 8(2):129-139, March 1986.

[107] D. Terzopoulos. Regularization of inverse visual problems involving discon-
tinuities. IEEE Transactions on Pattern Analysis and Machine Intelligence,
8(4):413-424, July 1986.

[108] D. Terzopoulos, A. Witkin, and M. Kass. Symmetry-seeking models and 3D
object recognition. International Journal of Computer Vision, 1:211-221, 1987.

 

 

184

[109] V. Torre and T. Poggio. On edge detection. IEEE Transactions on Pattern
Analysis and Machine Intelligence, PAMI-8(2), March 1986.

[110] D. A. Trytten and M. Tuceryan. Segmentation and grouping of object bound-
aries using energy minimization. Technical Report CPS—90-07, Michigan State
University, East Lansing, Michigan, December 1990.

[111] D. A. Trytten and M. Tuceryan. Segmentation and grouping of object bound-
aries using energy minimization. In IEEE Conference on Computer Vision and
Pattern Recognition, pages 730-731, Mauii, Hawaii, June 1991.

[112] M. Tuceryan. Extraction of Perceptual Structure in Dot Patterns. Ph.D. thesis,
University of Illinois, Urbana-Champaign, Illinois, 1986.

[113] D. Waltz. Understanding line drawings of scenes with shadows. In P. H. Win—
ston, editor, Psychology of Computer Vision, pages 19—91. McGraw-Hill, New
York, NY, 1975.

[114] Max Wertheimer. Laws of organization in perceptual forms. In W. D. Ellis,
editor, A Source Book of Gestalt Psychology. Harcourt Brace, New York, 1939.

[115] A. P. Witkin and J. M. Tenenbaum. On the role of structure in vision. In
J. Beck, B. Hope, and A. Rosenfeld, editors, Human and Machine Vision, pages
481—543. Academic Press, New York, 1983.

"I[1|||l|l||||l|||||||lllf