LIBRARY
~ Mlchlgan State
a ,Unlverslty

I

 

 

 

rues IN RETURN aoxm machini- Momnom your room.
TO AVOID FINES Mum on or baton duo duo.

DATE DUE DATE DUE DATE DUE

UN 0 2 11* ' A

I,
~ 2;}

a
ﬁ

MSU IsAn Namath. mm Oppomnny Institwon
WM‘

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

U

 

 

 

 

 

 

_ ____________ __

ﬁ—__

 

 

H
Knowled

Hierarchical Token Grouping In
Knowledge-Based Tubular Object Extraction

By

Qian Huang

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Computer Science Department

1994

Know

,1»
l .
w- -
-L 3“. _.
~AI.C;T‘\ "‘
.A.
f‘ . ‘
.i‘t‘: , I
“a. .
7v -
«"2"! I" «l
’.:,. It]
Jaf‘zm. ‘ .
I A .
..~,.‘c',‘!
.. . ‘
I‘M. '.
1".“
'w‘ .
n i“
r». v
1’2“.
”3“). v .
J.‘u
.I.‘

|
D
.“I.
J v» -:
”Ii.“

;..‘
1“ , y
‘-.\:w.'
‘I h
.‘ .l)
.
‘\
.lf;..l
. 7‘
‘ ..r .V.
k.
5‘: 'v.
c .
A”

 

ABSTRACT

Hierarchical Token Grouping In
Knowledge-Based Tubular Object Extraction

By

Qian Huang

This dissertation addresses the issues in extracting tubular objects from 2D and
3D images using an integrated model-based approach under a problem solving archi-
tecture called hierarchical token grouping.

Tubular objects exist in important application domains: blood vessels, plant roots,
bacteria, and roads are all tube—shaped. Recognizing this class of objects from images,
quantifying them, and understanding their networks will be of great beneﬁt to many
domains. Unfortunately, a robust recovery of tubular objects from noisy images can
not be achieved using simple segmentation methods. An integrated, hierarchical, and
descriptive approach to the problem is proposed.

Both modularity and integration are indispensable aspects of vision problem solv-
ing. A set of tools for a set of problems has to be effectively integrated to achieve an
overall solution. We propose a formalism called hierarchical token grouping (HTG)
for such an integration purpose. Using the central theme of grouping, we exploit the
homogeneity in vision problem solving. Heterogeneous modules can be treated as ho-
mogeneous operational units that systematically aggregate perceptual tokens across

all levels of abstraction. A formalism is developed that establishes a consistent and

.
.1..o-rr.>‘v ‘

.r‘ :(
:‘_.,..:G..

 

 

I I ‘

l
q-A'.px~“1r)
._,i.u.'r:l .‘liv ‘4: 4‘

V.

I
Xv” 1"; r- 9

a
.Llu In"

' I
.4. . _
in). lfgr gm'

 

 

TI
' I
U3 ..
‘k. A”Ht‘r
~.l P'
try-[i 'T ”--
t
M ( .
1.1.:
.
a.“
m,
‘“ “A z \‘i' 'r
. .. ‘
u. .
A . ‘r .
"a :‘ l u
w. ‘4
‘ ‘4‘"4‘

 

.
a
I'll

systematic framework for integrating modules, cues, and knowledge, all in a glob-
ally coherent mechanism. The formalism is used in, but not limited to, developing a
model-based recognition system for tubular objects.

An integrated model-based recognition strategy is adopted to extract tubular ob-
jects. The geometric shape information is combined with the imaging information in
a generalized stochastic tube model, a parameterized form of a class of Generalized
Cylinders. This model renders the recognition problem as a. parameter estimation
problem which is subsequently solved in a hierarchical fashion using optimal ﬁlters.
Due to the descriptive nature of the model, a new scheme is used to visualize a.
sweeping along a 3D object trajectory.

The model-based tubular object recognition system developed under the frame-
work of HTG is tested and evaluated using a total of 53 2D images and 59 3D volumes
including synthetic data and real data consisting of blood vessels, plant roots, bacte-
ria, and wires. Experimental results demonstrate the effectiveness of the integrated
model-based recognition strategy and the usefulness of the framework of hierarchical

token grouping.

To my parents Hongyi Liu and Shu Yun Huang

iv

ACKNOWLEDGMENTS

Many people have, in many ways, made this dissertation possible. I gratefully
acknowledge their contributions.

Throughout the years of my stay at Michigan State University, each and every
member of my committee has provided me with their continuous support, encour—
agement, and help. I sincerely thank Professor George Stockman for what I have
learned from him, for his sharing many hours of discussions with me, and for always
urging students to do a better job in research. I am very grateful to what Profes-
sor Anil K. Jain and the late Professor Richard C. Dubes have taught me through
their excellent courses in Computer Vision and Pattern Recognition as well as their
insightful questions during seminars and meetings. They inspired my interest in this
field. My special thanks go to both of them also for the opportunity they presented to
me and for their confidence in me through all these years. I sincerely thank Professor
John Weng for generously spending time to discuss various research issues with me.
I appreciate very much what I learned from Professor Alvin J. M. Smucker who got
me engaged in a good yet difficult computer vision problem at the early stage of my
research which has led to many fruitful thoughts and part of the results reported in
this dissertation. I thank Professor Richard Enbody for advising me on various issues
when he served on my committee. Thanks also go to Professor Betty Cheng, Professor
John Geske, and Professor Lionel Ni for their support, guidance, and particularly for
their helpful answers to my cross-field questions. With this opportunity, I especially

like to express my deep respect to late Professor Richard C. Dubes who had influenced

..'f- \\

"\)
y
-

3.31:2 TN
'i ‘ in ~ 14- ~
“till ..;‘I AUX!
o... .. .. , ‘u‘
“I .‘t‘fiﬁf‘ of adj}

we 5mg m-

.4
‘

w‘».,i. '
I M . -
than mi 539,.

" -.. I
3‘1 .‘M‘ ‘

r?"- P ‘

‘ ‘ '7‘ r...
T Lu- .mciuy
s
.
.,~. . .
l,‘ |- .‘ .
... tut. Ih‘ : 7‘“
h' I
...
\Ivl‘ ‘r'
A. ~n x-
‘<|A in. 7‘
I: w 18.
' I
'b._
{kl‘ u h‘r- r.“ l
.A.I “'.
v if
i'i-‘i-
|‘ ' ~
~c ‘. f?» .jg.
A. .
“ ~~*‘l.l
T I
..~
--~. ‘ b
.»‘ c .
NV

' I
‘1
. 3"! N.
‘ ._l?‘i‘;n.
4‘ ‘1'. I.
u
C.
A;- y“-
i .1 '9 n.
i" in:
‘.
'r..
\( ‘~~
h. r .
‘J‘V-"-t
"~4. v

 

'1.
I r .
q .__
"Cu; in
"CW r»
A."
RAIN}. . :-
qa‘lj g. _
«'JPQ

if? I
u'; tr
a .19 1,...
.
.J.
‘
u“,l.'
"Cf
*. In, .
.' If"
T.
,I‘ \ .
‘ Jaw.
‘ QSv ~.
.
P.

many graduate students with his wisdom, decency, and sincerity toward science. He
had always generously volunteered his help and guidance to me. My experience of
doing research with him on the subject of fractals was invaluable.

With my love, I thank my parents, Hongyi Liu and Shu Yun Huang, for giving me
the sense of value of education, for providing me the strength to survive hardships, for
always being with me even though we may physically be so far apart; with my love,
I thank my sisters, Wen Liu and Yi Huang, for constantly standing by me and being
so patient and supportive; with my love, I thank my brother, Xian Liu, for sharing
with me many inspiring thoughts and talks and for always setting good examples
for me; with my love, I thank my aunts, Shu Jun Huang and Shu Shao Huang, for
sacriﬁcing so much to help raise us during the very difﬁcult period of our lives. I
thank my husband, Eric J. Byrne, with all my heart, for his love, understanding,
encouragement, patience, and support. His faith in me has pulled me through many
difﬁcult times. I also sincerely acknowledge the contribution of Mr. Thorn Stone who
helped me tremendously at the initial stage of my stay in the United States. Without
these people, many things would not have been possible, including the completion of
this dissertation.

I sincerely appreciate the Pattern Recognition and Image Processing Laboratory
at Michigan State University. Professor Richard C. Dubes, Professor George C. Stock-
man, Dr. Mihran Tuceryan, Professor John Weng have, and especially Professor Anil
K. Jain have made our PRIP lab an excellent environment for research. A special
thank also goes to our laboratory manager John Lees. His capability and support to
the research activities in the lab have made a difference.

Lastly, but not the least, my thanks go to all my fellow graduate students, in-
cluding the former and the current Prippies. I thank Patrick Flynn, Narayan Raja,
Deborah Trytten, Sateesha N adabar, Greg Lee, Hans Dulimart, and Timthy Newman

for their assistance, warmth, and friendship. Special thanks go to Sally Howden for

vi

D

.

,.u
1’."

m; .Vali. I

" r V .
'h'v s-k.

y ,y I .
.. -..‘..‘...p \h'

l

her always being there, listening, sharing, and helping. Thanks also go to current
graduate students in the lab: Shaoyun Chen, Jinlong Chen, Yuntao Cui, Chitra Do-
rai, Marie-Piere Dubuisson, Sally Howden, Kalle Karu, Jianchang Mao, Sharathcha
Pankanti, Nalini Ratha, Dan Swets, Marilyn Wulfekuhler, and Yu Zhong. I enjoyed
very much our activities together, both academic and social, and I will always value
the friendship we built at Michigan State University. The experience we shared, full

with joys and frustrations, as graduate students is very special and unforgetable.

vii

TOP T.—

Q

ll

5.

HST Of fl
1 Introdum

A. E I). o 2 . ,I. i or. a 4.0 1.4.. .. J .9.) x .. .\. i.
-.. at... «:1. .10. )1. a). .1. .i. )1. 3.1. mm...
a: c a: s .u. a n. . a) a .1 a a: a n) c D...

l. drv) I h A u u. I.
» ~ C I: rm . . . I . .
)) w) )J J . Li. .. a... Al. x.-. .1. a]. T
i 1 \I/ r t v) . . . . .
i I; .l. r\ .d T1 rl. at. 0. n). .1; a... D.
ll.
E
-1“ .hJ vll lid AI 5 q.) I ‘.
q . 1 A H .ﬂly huh ﬂl~ «Hi/h

9..

TABLE OF CONTENTS

LIST OF TABLES
LIST OF FIGURES

1 Introduction
1.1 Machine Vision Problem Solving .....................
1.1.1 Tubular Object Recognition ...................
1.1.2 Deﬁning Vision Problem Solving .................
1.1.3 Computational Problems .....................
1.1.4 Characteristics ..........................
1.2 Motivations ................................
1.3 Homogeneity Behind Heterogeneity ...................
1.3.1 Homogeneity in Operation ....................
1.3.2 Homogeneity in Representation .................
1.3.3 Homogeneity in Operational Principle ..............
1.4 Contributions of the Thesis .......................
1.5 Organization of the Thesis ........................

2 Hierarchical Token Grouping

2.1 The Architecture .............................
2.2 Grouping Hierarchy ............................
2.2.1 General Representation: Token .................
2.2.2 Uniform Operation: Grouping ..................
2.2.3 Uniﬁed Grouping Principle: Homomorphism ..........
2.2.4 Grouping Agent ..........................
2.2.5 Generic Token Grouping Algorithm ...............
2.3 Properties .................................
2.3.1 A General Constructive Paradigm ................
2.3.2 A Homogeneous Architecture ..................
2.3.3 Object-Oriented ..........................
2.3.4 Systematic Behavior .......................
2.3.5 Consistent Interaction Interface .................

2.3.6 Cohesive Integration Environment ................
2.3.7 Opportunistic Problem Solving .................
2.3.8 Distributed and Parallelism ...................
2.4 Practical Potentials ............................

viii

xii

)(iii

JRMH

7
10
13
15
17
17
18
19
20
21

24
24
26
27
30
31
42
43
46
47
47
47
48
49
49
49

i. a... 4.. ‘1 9.. 34. 11
I} '1‘. 1.3.. 4“.

VII. v .U. at. a... .A‘. . . l .

l A .l u I) N. l; DI . w 0 r3. Tl \.!- ﬂ...) 0.2 .x.) _.n. .1. 1 .\.I. I.) w. . I L a; ~ 3 n.. a I .\.I ~ 3 I? .v t . 11 at
o . ,7. .v LA . u . n - . Al. . . . bu. . . . n y . . . 4 .l... . u u. I .
‘1 1‘. I1. 1... -_|/ -.\I .1 r U: c a .... lA 1-4 v. A a P \.I— 1.!— ‘nr.~ 4.. .«L 0..) 0.... .H lia ‘1 01 0.1 WA. 5.ch I H \|\ K’s \Irh I) «I'll )J
arr him Aim Tl. Aim nib e.\.. U PI .1). .. I). .. I). .. l. .. I). .. I). .. Pt I). .. )xJ. .. 1.). .. ll: 1.). ._ 7.). l). .. r. x. xi). .. xi). 7). )1. .n\._ l .\J V J - . . . . ill . .. . .. In. . . .
“I“ . . . . . . . u f . I13 1%; 4n“ ‘ 1‘1 14 ‘1. l} I} 4 '1
LU
rdu «Lylw a I d Fl .u 3:un .I‘A o~I.J U I DI; u )4. I?
n; - .\.. . B :14. I), . «.14. and glib. I I I3. IT. IT.
3 4

2.4.1 Formal Method To Build Complex Vision Systems .......
2.4.2 Automation and Quick Prototyping ...............
2.4.3 Software Reusability and Quick Prototyping ..........
2.5 Theoretical Implications .........................
2.5.1 Process Modeling .........................
2.5.2 Dynamic System Behavior Description .............
2.6 Summary .................................

Background

3.1 Representations ..............................
3.1.1 Taxonomy .............................
3.1.2 Point Representations ......................
3.1.3 Region-Based Representations ..................
3.1.4 Boundary-Based Representations ................
3.1.5 Relational Representations ....................
3.1.6 Hierarchical Representations ...................

3.2 Recognition Methodologies ........................
3.2.1 Three Levels of Processing ....................
3.2.2 Taxonomy of Recognition Methodologies ............
3.2.3 Abstraction of Recognition Methodologies: Token Grouping

3.3 Interaction and Integration ........................
3.3.1 Neighborhood Interaction ....................
3.3.2 Interframe Interaction ......................
3.3.3 Intermodule Interaction .....................

3.4 Computational Paradigms For Vision ..................
3.4.1 Marr’s Computational Model ..................
3.4.2 Lowe’s Computational Model ..................
3.4.3 Model-Based Hierarchical Paradigm ...............
3.4.4 A Concrete Mechanism: Hierarchical Token Grouping .....

3.5 Summary .................................

Tubular Object Recognition

4.1 Object Domain ..............................

4.2 Geometric Shape Modeling: Generalized Stochastic Tube Model . . .
4.2.1 Survey ...............................
4.2.2 Modeling Local Shape ......................
4.2.3 Modeling Global Shape Dynamics ................

4.3 Modeling Imaging Processes .......................
4.3.1 Photometric Sensor Modeling ..................
4.3.2 MRA Sensor Modeling For Blood Vessels ............

4.4 Model—Based Tubular Object Extraction ................
4.4.1 4-Stage Recognition Strategy ..................
4.4.2 Initial Segmentation .......................
4.4.3 Automatic Seeding ........................
4.4.4 Local Recognition .........................

ix

50
51
51
52
52
52
53

55
56
.56
56
61
64
66
67
68
68
71
76
76

84
86
87

89
89
90
91
95
102
104
105
106
108

\r"

“3 O

‘ (.1: g,”
1w ., . , ,
5") W _
. '_..o _
v4 '
‘a

'1'

‘—

_< . '7 1.
6’ I

«'p.

" ._.w v
w
.
‘1

u‘rd
' D

4.4.5 Global Recognition ........................ 130

4.5 Summary ................................. 136
Tubular Object Recognition Via HTG 138
5.1 Representation Hierarchy ......................... 139
5.1.1 Object Model Decomposition .................. 139
5.1.2 Tokens and Token Hierarchy ................... 140

5.2 Knowledge Hierarchy ........................... 153
5.2.1 Acquiring Dynamic Environment Information ......... 154

5.2.2 Knowledge Representation .................... 155

5.2.3 Knowledge Hierarchy for Tubular Object Recognition ..... 161

5.3 Grouping Hierarchy ............................ 163
5.3.1 4-Stage System Diagram in HTG Design ............ 163
5.3.2 Initial Classiﬁcation ........................ 165

5.3.3 2D Seeding ............................ 170
5.3.4 3D Seeding ............................ 173
5.3.5 3D Tubes ............................. 175
5.3.6 3D Tubular Objects ....................... 176
5.3.7 Grouping Hierarchy for Recognizing Tubular Objects ..... 177

5.4 Interactions ................................ 178
5.4.1 Interaction Models ........................ 179
5.4.2 Data Flow Models ........................ 181

5.5 HTG Design For Recognizing Tubular Objects ............. 182
5.5.1 HTG System Architecture .................... 182

5.6 Summary ................................. 183
Experiments and Results 185
6.1 Performance Evaluation ......................... 186
6.1.1 Performance Accuracy and Robustness ............. 187
6.1.2 Two Types of Errors ....................... 187
6.1.3 Parameter-Based Evaluation ................... 188
6.1.4 Boundary-Based Evaluation ................... 189
6.1.5 Region-Based Evaluation ..................... 192

6.2 Synthetic Data Generation ........................ 193
6.2.1 2D Synthetic Data Generation .................. 195
6.2.2 3D Synthetic Data Generation .................. 196

6.3 3D Tubular Object Visualization .................... 197
6.4 Experimental Results From 2D Images ................. 199
6.4.1 Results and Evaluation on Synthetic Data ........... 200
6.4.2 Results and Evaluation on Real Data .............. 204

6.5 Experimental Results From 3D Volumes ................ 212
6.5.1 Results and Evaluation on Synthetic Data ........... 212
6.5.2 Results and Evaluation on Real Data .............. 221
6.5.3 3D Visualization ......................... 231

6.6 Summary ................................. 231

7 Conclusion
7.1 Conch:
7.1.1
-, q
I . A --

733 C cats."

I
q .)

IQ .___.. .2
.

‘l-l
.. ..r‘

1 Distance 1
1.1 301'}.

BmHOGRA

7 Conclusions, Contributions, and Future Work 246

7.1 Conclusions ................................ 247
7.1.1 Hierarchical Token Grouping ................... 247
7.1.2 Integrated Model-Based Object Recognition .......... 250
7.2 Contributions ............................... 253
7.2.1 A Formalism for Vision Problem Solving ............ 253

7.2.2 Model—Based Tubular Object Recognition Via Hierarchical To-
ken Grouping ........................... 254
7.3 Future Research .............................. 255
7.3.1 Formalisms in Vision Problem Solving ............. 255
7.3.2 Model-Based Object Recognition ................ 256
A Distance Transformation: Chamfering Technique 258
A.1 2D Chamfering .............................. 258
A2 Extended 3D Chamfering ........................ 259
BIBLIOGRAPHY 262

xi

\‘l'

‘II

1 latrines:

- Ti? Ti“-

ii“? In

4

Q. ‘
1 ~ .. ,
1,! Rex ..
.. N
\i‘V‘r ,‘
A‘ h 2““

LIST OF TABLES

3.1 Taxonomy of representation schemes and examples ........... 57
3.2 The relationship between the three level processing paradigm and the

ﬁve issues in recognition methodologies .................. 73
5.1 Bottom-up information flow on the interaction channels within GH. . 181
5.2 Information ﬂowing on interaction channels between GH and KH. . . 182
6.1 Ground truth for 2D synthetic test images ................ 201
6.2 The results for 2D straight synthetic objects ............... 203
6.3 The averaged results from all 2D straight synthetic objects ....... 204
6.4 The results for 2D curved synthetic objects. .............. 206
6.5 The region-based evaluation of performance on plant root images. . . 207
6.6 Boundary-based evaluation of system performance on plant root images.208
6.7 Ground truth for 3D synthetic volumes .................. 214
6.8 Ground truth for 3D synthetic helixes. ................. 217
6.9 Results from 3D curved synthetic objects. ............... 218
6.10 Results from 3D synthetic helixes ..................... 220

6.11 Summarized evaluation of the system performance on 30 real volumes. 232

xii

3.1 Exam:
1.? Hieran

3.1 The 1:.

31 TWO \‘
poi: :

q ..
'3'- IKPJC
9i.‘ ‘
ML It",
”3:00.356

f0? (if;

I .

llarrs

LL“)
..

4..
h—‘
: ._-,
C3
‘1
.1

l» C
Llalttpr
M
I~~ A‘ {11.1"
4 ‘ ‘
‘3 A Y‘( ~ -
. D Ji-G
:1 R
Q (J: . .
4. ‘ G.
.) T
DE 1!
v ~
4;, 3“: E
.l a
' Isier.
’5- I

1.1
1.2

2.1
3.1

3.2

3.3

3.4

4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8

4.9
4.10

4.11

LIST OF FIGURES

Example of tubular object recognition. .................
Hierarchical system diagram for tubular object recognition .......

The framework of hierarchical token grouping. .............

Two viewpoints of an image representation. (a) Intrinsic images, (b)
point representation. ...........................
Implicit and explicit spatial relationships among cells. (a) Implicit spa-
tial relationship representation implemented as an array. (b) Explicit
representation for a 4-connectivity relation. (c) Explicit representation
for an 8-connectivity relation. ((1) Explicit representation for spatial
relationship “left”. ............................
Marr’s computational model for vision. The picture is taken from
“Perceptual Organization and Visual Recognition” by David Lowe.

Lowe’s computational model for vision. The picture is taken from
“Perceptual Organization and Visual Recognition” by David Lowe.

2D digital images of (a) 2D blood vessels, (b) 2D plant roots, (c) 2D
bacteria, (d) 3D blood vessels from MRA scan. ............
A tubular object. .............................
A rotated and translated tube model ...................
Rotational relationship between UVW and XYZ. ...........
The images of a cylindrical surface (left), its vmin’s (middle), and
vmax’s (right) ................................
Intensity proﬁles along the cross sections of plant roots. (a) unnor-
malized proﬁles over 15 cross sections, (b) normalized proﬁles over 15
cross sections. ..............................
Ideal blood ﬂow. .............................
The sensor measurements conﬁguration on a cross section within a
blood vessel in (a) real situation, (b) theoretical model. .......
The model-based 4-stage tubular object recognition system diagram.
An example of initial segmentation on a 2D intensity image of plant
roots. (a) An intensity image of plant roots, (b) intensity histogram of
(a), (c) initial segmentation result on (a), ((1) initial edge detection on
(a). ....................................
An example of 3D subvolume of blood vessels from MRA scan. (a) A
3D subvolume projected using MRI, (b) intensity histogram of (a).

xiii

6
7

25

58

60

80

82

90
92
96
97

101

106

108

109
110

111

4.13 Initial
1.13 The in
I'i(’\\‘t“d
4.14 Exam
1.15 Exam};
the res
dlezznir
dim.
4.16 Exam;
1.41sz
VGCIOIS
i:- Range
1.15 Sensiu

. An era

1 Decoy}
2 lhffafr
3 fray
r
COEEEI:
132.); ,-
31 Exam:
A

l
.~l_
eiﬁhjﬁe

2 f ‘

J-J 8.31.”)?

J-l Kr- . '
LUV." .'

,3} _,

' Rr.aty;

a r'-.' ._

p If-Llfc
J \
J r, v
‘r “9 a:
l
1 .,
A .XA. 1:?
)

‘.
' I \
1“]:
k4.) '
w.

.[1

t1

3‘33 .5

fr ' I J
2.31“

4.12
4.13

4.14
4.15

4.16

4.17
4.18
4.19

5.1
5.2
5.3

5.4

5.5
5.6
5.7
5.8

5.9

5.10
5.11
5.12

6.1
6.2
6.3
6.4
6.5
6.6

6.7
6.8

Initial segmentation result (b) from a 3D blood vessel volume (a).

The images of initially detected surfaces from an MRI subvolume,
viewed from front, left, back, and right, respectively ...........
Examples of overlapping (a) and enclosed (b) relationships. .....
Examples of the detected principle curvatures from the front view of
the vessel shown previously. Leftzdepth image of the surface; Mid-
dle:minimum curvature direction vectors; Rightzmaximum curvature
direction vectors. .............................
Examples of the detected principle curvatures from the the back view.
Left:depth image of the surface; Middlezminimum curvature direction
vectors; Rightzmaximum curvature direction vectors. .........
Range of optimization. ..........................
Sensitivity of SNR to (a) radius, (b) orientation. ...........
An example of the empirical distribution of p. .............

Decomposed object model .........................
Hierarchy of decomposed object model ..................
Examples of spatial relationships among point tokens. (a) 4-
connectivity relation among input point tokens, (b) 8-connectivity re-
lation among 3 boundary point tokens. ................
Examples of overlap regions and the four corners of the region. (a)
Region generated by an overlap relation. (b) Region generated by an
enclose relation. .............................
Bounding boxes for (a) a 2D ribbon, (b) a 3D cross section. .....
Bottom-up hierarchy of tokens. .....................
Knowledge hierarchy for tubular object recognition. ..........
Relationship between 2D and 3D tubular object recognition and hier-
archical token grouping. .........................
The Grouping hierarchy for tubular object recognition. ........
Communication channels within GH. ..................
All the communication channels in the system ..............
Tubular object recognition system diagram under HTG .........

Examples of the distance distribution ’DB for image R2 at (a) initial
stage, (b) intermediate stage, and (c) ﬁnal stage. ...........
Examples of helixes generated using different parameters, (a) w = 36,
c = 1.0, b = 0.1778, (c) w = 20, c = 3.5, b = 0.0508. .........
Examples of 2D synthetic tubular objects, (a) two straight objects in
S23, (b) one curved object in S26, (c) two curved objects in S27.

Effects of noise on 6’ and 63 for 2D synthetic straight objects. . . . .
Effects of noise on p" and 6' for 2D synthetic curved objects ......
Estimated local tube centers superimposed on input images (a) S23,
(b) S26, (c) S27. .............................
Examples of man made tubular objects of pipes. ...........
Examples of man made tubular objects of wires. ...........

xiv

114

114
116

119

119
126
127
128

140
141

144

147
149
153
162

164
178
179
180
183

190
197
200

205
207

F -
I

‘-‘.|
. ,.
II x‘
7“
F'.’
A ."_
0.3.:
a \
f, '1
9‘.

t?

4kﬁi \

6.9

6.10
6.11
6.12
6.13
6.14

6.15
6.16

6.17
6.18
6.19
6.20

6.21
6.22

6.23
6.24
6.25

6.26

6.27

6.28

6.29

6.30

Examples of organic tubular objects of bacteria. ........... 210

Examples of organic tubular objects of plant roots. .......... 211
Detected tubes for test images (a) R1, (b) R2, (c) R3, ((1) R4, (e) R5. 211
Detected tubes for test images (a) R6, (b) R7, (c) Bl, (d) B2, (e) B3. 212
Average improvement in 153 and eg. .................. 213
Examples of the distance distribution DB for image R5 at (a) initial

stage, (b) intermediate stage, (c) ﬁnal stage. ............. 215
Superimposed center points of local tubes for (a) MM7, (b) MM8. . 215
The recognition result for input image MM4, (a) input image, (b)

detected boundary, (c) the tubes found. ................ 216
Examples of 3D synthetic volumes. (a) volume S32, (b) volume S34,

(c) volume S35. ............................. 216
Three synthetic helical objects. (a) helix-1, (b) helix-3, (c) helix-6. . 217

Values of ﬁgure of merit along objects (a) 834-1, (b) 835-1 and 835-3. 219
The estimated radius for helix-2 under different noise situations. . . . 221
Sensitivity of the ﬁgure of merit to noise (n). (a) helix-2, (b) helix-6. 222
Experimental results on a helix. (a) the helix without noise, (b) the
estimated helix axis from (a), (c) the helix with noise degree n = 20.0,
(d) the estimated helix axis from (c). ................. 222
Estimated local tube radii with n = 0 along (a) helix-2, (b) helix-6. . 223
The estimated tube lengths for synthetic objects helix-1 and helix—6. . 224
Volumetric blood vessel data displayed using MIP, (a) subvolume sub2,
(b) subvolume sub14, (c) subvolume sub16, (d) subvolume sub33. . . 225
Initial classiﬁcation results from four subvolumes (sub30, sub26, sub5,
and sub21). The ﬁrst column ((a),(d),(g),(j)) shows the original vol-
umes. The second column ((b),(e),(h),(k)) shows the initial segmenta-
tion. The third column ((c),(f),(i),(l)) displays all the initially detected
surface points using MIP technique. .................. 234
Result of initially detected surface points from subvolume sub5 (ren-
dered as range images), viewed from front, left, back, and right, re-
spectively. ................................. 235
Seeding results for volumes sub16 ((a)-(c)), sub7 ((d)-(f)), sub29 ((g)-
(i)), and sub5 ((j)-(l)). The ﬁrst column: original input volumes. The
second column: initially detected surface points. The third column:
seeds detected and superimposed on input. .............. 236
Seeding results for volumes sub14 ((a)-(d)) and sub2l ((e)-(i)). The
ﬁrst column: original input volumes. The second column: initially
detected surface points. The third column: initially detected interior
points. The fourth column: seeds detected and superimposed on input. 237
Effectiveness of the model-based automatic seeding approach. For all
three input volumes, even with very noisy surface detection, there is
no seed detected. The ﬁrst row: original input volumes. The second
row: initially detected surface points. ................. 237

XV

6.31 Effective
detectio
and (6'!
these vc

6.3? [E
{a} the
the sea
inx'aiid;
giobal :

i633 lifted}
(ax \lx
the 9%
in‘va‘fzt
globe;

6.34 s...

{efte
{‘3' S‘Ne‘
"Q. g
53? x r.

6.31

6.32

6.33

6.34

6.35

6.36

6.37
6.38
6.39

Effectiveness of the model-based automatic seeding. The initial surface
detection results ((b) and (f) for input volumes sub17 and subl3 ((a)
and (e)) are very noisy. Seeds are generated only near the objects in
these volumes ((c) and (g)). .......................
Effectiveness of the model-based veriﬁcation of seeds from data sub14
(a) the detected seeds, (b) the seeds that are veriﬁed, the brighter
the seed, the higher the conﬁdence in the seed, (c) the seeds that are
invalidated using the tube model, (d) the seeds that actually activate
global sweepings. .............................
Effectiveness of the model-based veriﬁcation of seeds from data sub34
(a) the detected seeds, (b) the seeds that are veriﬁed, the brighter
the seed, the higher the conﬁdence in the seed, (c) the seeds that are
invalidated using the tube model, (d) the seeds that actually activate
global sweepings. .............................
Sweeping results for volumes sub26 ((a)-(c)), sub7 ((d)-(f)), sub29 ((g)-
(i)), and sub5 ((j)-(l)). The ﬁrst column: original input volumes. The
second column: seeds detected. The third column: recovered object
axes. ...................................
Sweeping results for volumes sub21 ((a)-(c)) and sub14 ((d)-(f)). The
ﬁrst column: original input volumes. The second column: seeds de-
tected. The third column: recovered object axes. ...........
Sweeping results for volumes sub34 and sub33; (a),(c): original input
volumes; (b),(d): recovered object axes. ................
A real volume with the portion of visualized segment marked.

Twenty ﬁve visualized cross sections (frames of a “movie”) .......
3D visualization of nine of the cross sections, starting from the 14th
in the previous ﬁgure. ..........................

xvi

238

239

240

CHAP

Introdi

l'
‘ Y

L, ..
z. .0»,le U.

i I

.

11.. " 11 ‘ . ' ‘

‘ "‘l“Aflif?.,"(‘
In}

I']

”(91“ P" 1
' “Add,” ’ 2‘
‘IA

F - ., V ‘
~-.-. \' f - -
s.. . 0.11)”? ’c‘r ‘

".i'!‘ h .
41‘ I . _
IL. ‘63“).(4
Fag.

v”: .7.
“11.131921 T

.‘L:‘

.IAf‘t.
‘.._.\‘ _.
- «rm-u...

l“§ ‘
‘Q‘_' an
‘d'.1,._
L“f
'~
‘4
4
.='-.,,“.
.q‘l}, \

 

‘5
‘ ii”.
«1;» ~
g -
cw .u
Jul» _
‘y‘ f
‘-
‘HC‘; 11
‘. 16“,; 1
.3

CHAPTER 1

Introduction

The problem of recognizing or extracting tubular objects from both 20 intensity and
3D volumetric data is addressed using a proposed vision problem solving architecture
called hierarchical token grouping. Tubular objects exist in various application do-
mains: blood vessels in the medical ﬁeld, bacteria in biology, plant roots in agriculture,
roads in remote sensing, wires and paths on circuit boards in an industrial inspection
environment. Tubular objects sometimes form complex networks. Biological tubular
objects are often deformable. Automatically recognizing this class of objects from im-
ages, precisely quantifying important features associated with these objects such as
their geometric shape, volume, or branching frequency, reliably describing the shape
dynamics of the objects, and understanding their networks will be of great beneﬁt to
all these application domains.

In attacking this practical vision problem, homogeneous characteristics exhibited
at multiple levels of problem solving are observed. The homogeneity is then examined
carefully from a more general vieWpoint with the motivation of developing a useful
architecture for vision problem solving that facilitates the integration of machine
vision systems in a more systematic and consistent framework. By exploiting the
homogeneity, a constructive and hierarchical architecture that employs knowledge at

many levels as possible is proposed under which vision problems can be solved using

.'..'- . - ..
Hind. . upt‘fa.
m.rETa..LJE PL; 2!

slﬁ~ 9n, '- --<
nu.'\-. .khl \ 1“ III"

...'..'. 5 1
‘ :91. '9 ”f!

‘-ul'\,“\-JL

 

“TN :~ ‘ '
. LJ‘ \ ‘ h- i
'- d H LT

 

 

 

 

\F:
i V
.t—y-c L‘ ,
'I
i'ic'f‘w'.
(TVL -
Hi
In > ,
:1”?
A
£9._
mm.
i' r.nN,_
“‘QHY- __
K‘ ‘J,
:1 .
m -
" ;'I; 1' .
“\I‘w.
v

a uniform operational scheme, a consistent representational interface, and a coherent
integration environment of top-down and bottom-up information. This proposed ar-
chitecture is motivated by application problems and applied to develop a hierarchical
model-based recognition system that extracts tubular objects from input data. The
architecture of hierarchical token grouping has potential as a general vision problem

solving paradigm.

1.1 Machine Vision Problem Solving

Vision is a highly sophisticated sense of human beings. It is achieved in an eye-brain
biological process where eyes provide powerful sensors and the brain forms a marvelous
analysis system. To some, the ultimate goal of machine vision is to accomplish, by
means of machine processing of images, at least, the same capability of perception
as human beings. Sometimes, however, even more is expected of a machine vision
system, such as the quantiﬁcation of objects, something not usually provided by the
human visual system.

How exactly do human beings achieve the task of vision? What is the distinc-
tion between the eye and the brain? What is the explanatory model that correctly
characterizes human vision? While machine vision became a subject of interest and
study several decades ago, researchers in human visual behavior have been trying
to answer these questions for centuries. Even until today, unfortunately, answers to
these questions are still fragmentary. A systematic analysis of human vision is not
yet available.

Nevertheless, results from human vision research have greatly inﬂuenced the re-
search in machine vision. With the dramatic success of digital computers in almost
every corner of modern society, computer scientists hope to achieve the same in the

ﬁeld of machine vision. Much effort has been made toward the methodologies that

 

v
37". 2‘ ”-‘ "
mull -lul $1 ’ ll
.i

1|
.‘A"?"" -'-“
f8:n.ail~>.c
.;.”Xg:lLE a

 

 

. 1
C“? «WAN " '
‘1 .Lgl '"r-y
‘«..“-
.,‘
.,:_:‘
'.‘,I'J‘ 7),.1'
-‘ ' . _
M_' ‘i
A -.IA.-l
ii ‘
Y b,
4‘; ‘\ >|“I.
.4 l y
‘ A
‘G. U
~“‘
‘
db» .1
a,(dp\.C‘
.‘.
"L-
Hun. ‘ .
M. .Fsg v
‘1‘: ‘ .._i
‘A.1l(- a
*g [I
T .
i'; ‘gi
“i ‘i. y
“~> 1r
.

 

U, .‘
“‘1".
Jun; YW-
Alﬁ‘.
“ t“
4.
e .
r,»
. 1‘ i
S (\\ ll“
‘j‘jav—‘v
.
v ‘ I
3‘ .
. ‘—I
\2

attempt to make computers possess the capability of perception. Lacking an exact
model of the human visual system, these methodologies are basically engineering
approaches that correlate input signals with perceptual output.

In this thesis, in search of solutions to application problems, we had to study an
essential issue in building machine vision systems. Integration is a signiﬁcant aspect
in designing and building computer vision systems. Realizing that the major obstacle
in integrating a machine vision system is the heterogeneity in modules, in data, and
in knowledge, we exploit the inherent homogeneous characteristics in vision problem
solving and utilize them to seek a more consistent and systematic way of assembling
machine vision systems. We propose a vision problem solving architecture, called Hi—
erarchical Token Grouping (HTG), that possesses a set of desirable properties. This
proposed architecture is intended to provide a formalism for designing, developing,
and describing machine vision systems. It does not provide solutions for particular
vision problems. The proposed architecture is applied to the problem of extract-
ing tubular objects using the integrated model-based approach. A computer vision
system designed for tubular object recognition is developed under the paradigm of
hierarchical token grouping.

In this chapter, we discuss the problem of tubular object recognition and present
our philosophical view toward vision problem solving that establishes the foundation
for the proposed problem solving architecture. In Section 1.1, the problem of extract-
ing tubular objects from 2D images is introduced. A set of important observations
about how objects of interest are to be recognized in a hierarchical paradigm are
made. Based on these observations, the process of general vision problem solving is
deﬁned as a series of inverse mappings, each of which corresponds to the recovery
of some meaningful perceptual information. The central operational theme of group-
ing is identiﬁed and its relationship with the two essential computational problems

involved in vision problem solving is established. The characteristics of such deﬁned

. . o 1
ESIOD QWME’YIJ 5

1"! .- ' ”w;
.... “6 U3." ”JP
L l~
2"“1757” "'1"
ALMA". “PL! 1;) C . .

cartons of II '

no

u
m.

I U c .'.
mam: 1:1 .‘t‘r

1.1.1 Tub

U; l 1" 'V‘
“k s.»' l‘f‘“‘ "
, ' i
.t .‘ -
I: ,7" ' ' ,A .
-. ‘ui‘ 'Ji' '4‘ ‘ I
‘ .

.ﬂgfplr- ~ , I
t "95. dip‘

nu
'<_(‘.“p V _
‘16 >
‘ (J3) 94'
- ‘

I‘STR 5‘
‘ ON.
\‘\
a}.
i v .‘.”,
. A_L.- _
Lia“ \.
A
“
,4. u
. }I:I
‘\~ ‘5.
\_ rs
'r
. |
a i" .L
.;;. "Q1
‘i v;__
.4.
. gs‘l‘

Vision problem solving process are examined. Based on these discussions, in Section
1.2, we describe the motivations of the proposed paradigm. In Section 1.3, inherent
homogeneous characteristics of vision problem solving are presented. The major con-
tributions of this thesis are summarized in Section 1.4. and its organization will be

explained in Section 1.5.

1.1.1 Tubular Object Recognition

The problem of recognizing or ertracting tubular objects is to (1) identify instances
of the objects of interest, (2) describe the geometric shape of the extracted objects,
and (3) quantify the extracted objects by providing a set of measurements 3' Exam-
ples of tubular objects can be blood vessels in medical images, bacteria in biological
pictures, plant roots in the images taken from underground, roads in remote sensing
imageries, and wires or paths of circuit boards from the images taken under a con—
trolled environment for inspection purposes. Features that characterize the shape of
tubular objects can be their symmetric axes, their cross section parameters such as
radius, the deformation characterizations, or branching frequencies. Measurements
that quantify tubular objects can also be made such as the area (in 2D case) or the
volume (in 3D case) an object occupies. While this research solves the problem of
tubular object recognition in both 20 and 3D cases, only the 2D case is presented
in this chapter for the purpose of illustrating our observations toward vision problem
solving. The complete formulation of the problem is given in Chapter 4.

Let us consider the problem of extracting tubular objects from 2D intensity im-
ages. A given input image is a set of discrete points, each of which carries an inten-
sity measurement. The recognition task is to organize these input points according

to some criterion. Such criterion must be established based on our knowledge about

 

‘In this thesis, we use “extracting” and “recognizing” interchangeably because our viewpoint is
that no extraction can be done without recognition.

. I 1 ‘ '
,..,. orrul- "
.rvl'h. \Jelu‘dr Ll‘l

.' _ .- - .. .)
1:5,!an l li.(‘ -

€§TII€IUC (LIN

.‘,Z .. .’- .
3.5.19" ' m e. s

v. ‘4. A

271-002 . (one

2

 

 

l‘l ‘ '
'.- ‘ i" -
‘ vt v1.1 1‘?“
. "0
m. ‘
9“ ref. . ‘
sung!“ ft- . ,
"th

 

both tubular objects and the nature of the input. From experience (knowledge), we
know that the 2D projection of a 3D tubular object usually produces two parallelly
symmetric curves formed by the shadow of the object. Due to the smoothness of
object surface, such curves should also be smooth. That is, the object silhouette, or
“ribbon”, corresponds to two smooth curves in a 2D image plane. A curve can be
detected from a set of connected edge points and a tubular object may be enclosed
within a pair of parallel curves. Note, this is a top-down knowledge decomposition
process which correlates what we know about the objects to the data that a com-
puter vision system takes as input. From the vision problem solving point of view,
such a decomposition describes a constructive approach to perform tubular object
recognition. Figure 1.1 shows such a constructive viewpoint in recognizing tubular
objects. The task of a computer vision system for recognizing tubular objects is to
organize the input in a meaningful way using the knowledge discussed above. The
recognized objects need to be represented in a descriptive way so that the computa-
tion of volumes, areas, and branching structures can be facilitated. From experience,
we know that simple segmentation processes can not provide a robust solution to this
problem and an incremental process scheme is required. In each step of processing in
Figure 1.1 can be considered as one level of abstraction performed by an individual
module. For example, at the lowest level, an edge detection module can extract the
edge points. Based on its result, another module at the next level may recognize
the straight lines which can be further grouped (aggregated) to form polylines that
approximate curves. Pairs of parallel curves can be identiﬁed through an even higher
level module. The question of whether a pair of parallel curves corresponds to a 2D
tubular object needs to be answered by yet another higher level module whose func-
tionality may be to verify certain intensity conﬁgurations based on the knowledge
about the shape of tubular objects and the imaging condition under which the input

is acquired. This is a bottom-up information recovery process.

 

 

 

 

 

 

-k-c‘.--’ ....... 'ocn

 

 

 

 

    
  

--F.J---J--'f-T-ﬁ---J--q.-.r

--‘Lylon{co+!.,oqgoc'ooo‘ra-‘--..t-
- ‘- ”*;¢o+oo+ocupoo\oco‘o

-’-/u L- -“ncJ ....... L---L--*oo‘+-

l I I ‘
-'Vz’vt-C-f.-- C-D'r-cn‘pocﬂ-no“-oukol ”my,
o ‘ -h

     
 

Figure 1.1. Example of tubular object recognition.

Combining both the top-down knowledge decomposition and bottom-up informa—
tion recovery process, Figure 1.2 gives a diagram of a computer vision system with
two hierarchies, one for the decomposed knowledge (on the left) and the other for the
incremental information reconstruction process (on the right). The recovery process
is a continuous inverse mapping, from a set of discrete points to a partition of the
image plane using the knowledge from the left hierarchy. Each module on different
levels of the recovery process has its own functional role and none of them is isolated.
They interact and together they carry out the recognition task, using the top-down

knowledge, in a bottom-up fashion. We observe, ( 1) different types of perceptual

 

Hierarchy

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

of Hierarchy of
Knowledge Information Recovery
T0940" Elongated Elongated
Object object A
Parallel Parallel
ribbon nbbons
Curves Curves
Lines Lines
Edge points Edge points
B .
Points Pixels ottom up

 

 

 

 

 

 

Figure 1.2. Hierarchical system diagram for tubular object recognition.

entities are extracted at different levels and they are essentially grouped (aggregated)
from lower level entities, (2) conventionally, both the methodologies and the represen-
tations adopted by the modules at different levels can be heterogeneous. While the
heterogeneity does not affect each individual module, it could cause difficulty when
these modules are organized to form a system in which they interact. It is conceivable
that the more homogeneous the individual modules are, the easier to integrate them,

hence, the more manageable the overall system becomes.

1.1.2 Deﬁning Vision Problem Solving

In general, from a computational point of view, a vision system is to recover three
dimensional information from a discrete set of points that carry sensor measurements.

Such a problem solving process can be informally deﬁned as the following.

.l rsmn #2254? .r.

z. . .. . .1 ..
nut. ll.) 7:93: h
I

I ~'
» 4
,_~_.
a A
'5’
‘3-
’s
I
-w—‘

4.
-H
a

A
r - o
If t (r! o
“'4‘ it?
C

. l
r. .f -
‘1 W U“ . "(L -l
‘ ‘“ 11,, I
.. j e
v .‘ - ‘. t
4.. J1 Al-.'_l"l‘ ’A‘I‘

‘DZ ’W"v-~;,,
I“ ii‘lx‘.‘\P ll?
.

if)" ,
I__ cfr!”T" v‘ y
-'“n’. ‘r,
A A‘
1'!“
.31: .A
L; F37“

. I |‘&: lf"
‘lA 0‘

.-.; ' .
"U‘l v .
. : “1E :r[.,'V
v a
1‘.
Jig-ii“:
- “
-;;~3 l“
K i I,
‘1 “.l

A vision system takes a set of discrete points, each of which carries sensed informa-
tion, as input and produces a partition, over all the data points, of NI entities, each of
which is symbolically associated with a label, designated for certain meaning {such as
object name), and a description, which may be hierarchical, about the characteristics
of the corresponding object inferred from the image data, including its 3D physical ap-
pearance, its organization, its spatial or componential relationship with other entities,

or even its functional properties.

Notice, the partition here is semantically meaningful and is realized not merely
based on the characteristics of the visual data but, importantly, is also based on the
use of knowledge that a vision system is equipped with. Such a partition can also
be recursive or hierarchical, i.e., a single region in the partition can have an internal
partition.

If we consider the process of imaging as a mapping from an interpretation of a
3D scene to a discrete image function, the recovery task that a vision system needs
to accomplish is an inverse mapping from a discrete function, a perceived image, to a
3D scene or a meaningful interpretation of the image. Denoting a mapping function
by M, an interpretation of a 3D scene by I, a discrete image function by f, then the

mapping from an interpretation of a 3D scene to an image is:

01‘

Mapping function M can often be formally deﬁned as a concatenation of a series of

'1

I ab 0'

(‘b

(‘5’
-, ‘

:1
HQ
.,,

3' ”V: :Y‘y"- .. ‘
r ‘ “new re
‘9
t),
l\‘-«,l_.1
“nulliu‘x‘ §“ . . .
.\ -tjtu C
‘I‘v-
._1I (f
‘l J a \‘pP (1‘ '
W... l‘
- 4 ~0_
V“‘-:_ll‘"\. .'
Vlkl‘c" ‘Jr:,
b‘.
l"?
f. ._
Ltd?“ : v
.1 013.,“ -1
.A; > ‘ .1“
Stills—WI
“a ,
‘u I
he. (‘9‘, .
.‘_ .
P-
J" .
0 "if .1,
.‘:p
., g ‘
f1" .

“elixir-<1
.rg.‘ lr
J \4‘ v.
‘A
:N ‘Y:\_
we.»
‘ufm . ,
“GL1,
.,‘_
D t -

mappings, each of which carries out one type of data transformation:

M: MIOMQO...OMm,

01‘

f = M1(M2(./\/l3(...(.Mm(I))..))).

Using the same notation, the process of vision problem solving is formally deﬁned

as an inverse mapping:

01‘

Similarly, such an inverse mapping can be accordingly decomposed into a concatena-
tion of a series of inverse mappings, each of which performs one type of inverse data
transformation:

M” : M" 0M"

m m-I

o ...0M,-I,

01‘

I = M7.’(M;’_1( ;’.2(~-(M7’(f))~)))-

Each of the decomposed inverse mappings recovers a certain type of information. For
example, certain intrinsic properties of the 2D image can be retrieved in some early
step of the entire recovery process. These properties can then be utilized to recover
other observable features that may be still non-geometric. Based on these features,
information about the 3D geometric structures may be inferred which can then be
mapped to some 3D entities. The linkage between 3D entities and concepts yields
an interpretation of the input image. Knowledge plays an indispensable role in this

process of vision problem solving. That is, the inverse mappings deﬁned above apply

ms kinds of l

Bull] the int}

l~ L ‘r ( _;.-
new the ~,m< .I

v

'.

hardy what K11

v,‘ y I
~13...» O '1 ]
.‘.‘t‘....'?(l. l3 mt.

1.1.3 Com

3 35212303 car;

Add," ’1‘ ‘
av‘v _ ‘ > '
”min? [13:13!“

i!
1
9" ‘
rpm“ ,
~‘ ualy..r

.,_
Sc 3“ rf .
. J1 . ,9 .
‘1' .
\“i.'(» v-
-~- \ -1
t if [fl-lain”...
‘1.‘t
.-
. - P
“~' ‘-.Cv .
i". A)“ f . ‘
‘4 g.‘
‘ 1
IE r
.‘ ‘
J.lr \ ~.
1,8".
‘ til"? fl E“
‘ ,\»~
A 4.; (‘4
‘ a
lF’.-F_1
CLK 91__
\ A'HE‘ .3
x.
F‘;-,;
h‘ Il‘Y \
_ lak) \ ‘K

10

various kinds of knowledge along every step of the recovery.

Both the informal and the formal deﬁnitions for vision problem solving provide
merely the speciﬁcation of the input and output relationship of a black box system.
Exactly what kinds of computations are performed within the black box is left un—

speciﬁed. In the next section, this aspect of vision problem solving is examined.

1.1.3 Computational Problems

A partition can be achieved in either a divisive or an agglomerative fashion. A
divisive method starts with the entire data set as one entity and then successively
divides each set into smaller sets according to some criteria, often dissimilarity. It is a
decomposition process in a coarse-to-ﬁne hierarchy. An agglomerative method starts
with the individual data point as an entity and then combines or groups entities
recursively into larger ones until some criteria, often similarity, are violated. This
approach is a bottom-up grouping process in a ﬁne-to-coarse hierarchy. Either one of
the methods can be used alone to solve the partition problem adequately.

In practice, however, the conceptual difference between a divisive and an agglom-
erative method, that one is a coarse-to-ﬁne operation while the other is a ﬁne-to-coarse
operation, will undoubtedly affect the implementation of a computer vision system.
On the other hand, the choice of which method to use may be constrained by the
nature of the available data and other considerations. In both cases, however, one
thing is common: the effect of the operational criteria, similarity or dissimilarity, used
to perform either the decomposition or the grouping is crucial.

In our View, the agglomerative grouping method is preferred from points of View
of both human vision and practical computation. Evidence from psychological study
reveals the grouping phenomena inhuman perception. Observations in human per-

ception indicate that similarity among non-geometric attributes is generally correlated

0

lb 90311391 I'lC 51

.'.

' l
.. o. 2,.) 9y
»!.ltr-|J.d'.il.l‘.l§ \Ai
; A

’5‘ \r.~qI - l... r

u s

“ii-6.4-. \ CA. A..
-

renﬁonin"

areasuhpqgi
.£-.',. .l» ‘ 1v ‘
trim “a: (we:

flmllerogy

A

\n

' '7
‘v-v-_.‘-.'. ‘. a
I, .
L,V‘L‘valit d l:~\
. .

n
Gl

. ‘_ .
-r<..
l"

.‘ Y
l-MV'AJv ﬁll-La} I

3313.: PW- .
he
.
a'i'i'

'Y',‘ is -_
sits h a:

I .
.ﬁ

v

. $.3— M
N" UJ’ttia‘ep

 

11

to geometric structures[93], i.e., geometric structures can be detected from those sub-
populations that have similar non-geometric attributes (simplest can be intensity or
intensity changes) observed from image data. This suggests a bottom-up grouping
operation with the attributes of individual entities at a ﬁner level to be computed be-
fore a subpopulation of these entities can be determined. Research in computational
vision has been inﬂuenced by the ﬁndings of such grouping processes in human vision.
From the computational point of view, the input data to a computer vision system is
typically a discrete array. What a computer immediately encounters are the values
at individual quantized points, i.e., what is available at this point in a computer is
not an entirety but the ﬁnest details that a digitizer allows. From these two consid-
erations, it seems more natural to perform the task of vision problem solving in a
ﬁne-to—coarse hierarchy, i.e., start with the ﬁnest level and then gradually achieve an
overall partition through subsequent abstractions by grouping.

How can grouping achieve the overall task of machine perception? The experi-
ments conducted by Stevens et al in human perception showed that there exist two
kinds of scales in human visual processes[93]. One is the scale of the constituent
intensity changes and the other is the scale of local geometric structure. Marr also
noticed the indispensable roles of these two scales in perception[62]. Based on their
observations, Stevens et al propose that these two scales are substantially independent
of each other. Therefore, there are “two distinct computational problems”: detect-
ing intensity changes across spatial-temporal scales, and detecting structures across
spatial scales [93].

The ﬁrst computational problem is to detect perceptual information at certain
levels of abstraction across multiple images, either from a multiple resolution scheme
or from a time sequence. The goal along this dimension is to recover the accurate
properties of certain type of perceptual entity. One example is the detection of zero-

crossings using the Laplacian-of—Gaussian at different scales proposed by Marr[62].

miner exam;

,. i .‘ J-‘
mm it mi

,. . .
jvvr-‘-"' . .- l
M... (“it’ll ll .1.

‘l
I

. . -

.I'.‘ .Fﬂvve'wr

... {1, .‘.,Lui,

: l '

.‘m «a-

.'lI . '1 |(' -"3’ I

.‘.“..-cdltl (1
.0. §

12

Another example is the recovery of motion and structure information from a sequence
of images. This computational problem has been viewed as a grouping problem[62,
34]. In this thesis, we call it grouping across multiple images.

The second computational problem concerns the recovery of 3D structural infor-
mation which may or may not be directly revealed in the image data due to the
limitation of local intensity information, as pointed out by Marr[62]. To reassem-
ble the structural pieces, “global organization is detected, in part, by ‘bootstrapping’
from local organization” [93]. This constructive vieWpoint of human perception is sup-
ported by many other studies in human vision[57]. Perception seems to be achieved
through a continuous process of such bootstrapping or, from the viewpoint of this
thesis, a continuous process (hierarchy) of grouping. We call grouping along this
dimension grouping across multiple levels of abstraction.

Therefore, to solve the computational problems involved in vision tasks, grouping
is performed along two dimensions, one is the dimension of spatial-temporal scales and
the other is the dimension of geometrical scale. A continuous process of grouping in a
space along these two dimensions leads to the perception, hence, the interpretation of
a perceived scene. Although the two computational problems are distinct, they may
also interact. In the literature, some researches have also exercised such an interaction
of the two types of computational problems in perceptual grouping across multiple
resolutions [59, 80, 87, 88, 30, 29]. The process of vision problem solving is a process
in which groupings can be performed alternately along the two dimensions.

Having identiﬁed the computational problems and the grouping behavior involved
in solving these computational problems, what characteristics does vision problem

solving possess? We discuss this topic in the next section.

1.1.4 Char

.’

“"9 r0 .}.I~'- Q
Dtuttﬁltt id. (1' I
‘ .' h "r.“ '.~ ~ ~.
hilt] “Unldn \lP‘4(l
‘ V t u ‘
ttlt‘sfztltilifﬁ if.

' l
'y- r. .1 .
warmth.

Constructive a

5513'

I
r '-.,‘.u
hit-.612: II) ¢-

'v
rs

L"“*‘15Ct"-")tt'--v
"‘f ".d

‘:.~~..\.; 1

.. In)”, “

f H “A fi‘t'i‘g'
. .,,

llOdularitY

\

~L'4‘22K l . .
.f; P‘Vrv

-,_
1w)“ ,
~g ‘ .
.‘.“&A \,.w . .
till-“Y l:-

--‘ ‘L‘iT‘Jh-
:Vz‘w_ \
s|‘<'f‘
.* l" v
M 'V‘ - 7‘
"4m r
t

13

1 .1.4 Characteristics

Different characteristics of vision problem solving have been demonstrated in studying
the human visual system. They are discussed separately below and will be explored
in developing the architecture we propose to solve a wide range of machine vision

problems.

Constructive and Hierarchical

Physiological observations of the human visual system suggest that human perception
is achieved in a multi-level process. Evidence exists that demonstrates that vision is
constructive and hierarchical[57]. From a computational point of view, a. constructive
vision leads to a divide-and-conquer scheme which reduces the complexity of indi-
vidual computational units that comprise a vision system. Perceptual information is

recursively recovered and perception is achieved in an incremental fashion.

Modularity

Studies in psychophics, anatomy, and physiology provide evidence for modularity of
human vision[99, 49]. Individual modules with specialized functional roles in the
human visual system can be identiﬁed [57]. Such identiﬁable individual modules in
the human visual system has led to the shift of research attention to the isolated

study of individual modules.

Integration, Cooperation, and Competition

Although evidence for modularity exists, modules are not independent but rather
they interact. The integration behavior among individual modules has long been ob-
served. In machine vision, even though the research results from studying individual
modules will eventually contribute to the success of ﬂexible computer vision, merely

a bag of tools for a bag of problems will not provide an adequate overall solution to

\u¢$l

.. ~l "

2 J l
u.‘ Act-7"‘le Del-U]
m ‘

‘ 1
I... ‘ 4" .'
KLIrTPZT. lllU‘J 1 .

. , . ,_
”or bend» fur

I
15:; IS arming».

Role of Know
f1? imam \"is

...,._> h“
La: 'lFA‘7719‘tffl

If “Q ltf‘ﬁf‘. 0‘?

l

‘ I
i,._ L . - . I
.3? “tn-41f: .‘. 1‘.

I‘ , l
3:.‘3ii'tr-gge l
.‘,_ ._
Q":E L-G: Ei'l'iY‘J'A
3‘ t~.
5.)“.‘12.’
U100 '
:enelt
:ﬁ 1
‘ o- r ,
‘ I V‘-
, .11...cf. it?
VQ~. ’
~21 ; -,, , _.
"V ' I -.
“‘4 has:
I
Bagel: .’ .
..-v;.h._!l'\ .r ‘
- .h \
V: ‘-!. \“‘.‘
7‘ Gt 1 l

.t
‘ 99-1».
\ u
: MWN,‘
V"..‘\ ‘
\. >
a
Try. .1
‘ army
‘|t-‘
"«_‘ I»

14

machine perception. The cooperation and competition among individual modules is
crucial to achieve perception. In the context of constructive and hierarchical vision,
different modules act both individually and cooperatively. Cooperation and compe-
tition behavior is the outcome of integrating individual modules. An overall visual

task is accomplished by integrating the capabilities of different computational units.

Role of Knowledge

The human visual system is not just a simple feed forward mechanism. It seems
that whenever there is a forward path, there is also a corresponding feedback path[8].
It has been observed that top-down feedback physically exists in the hierarchy of
the human visual system[57]. These observations indicate that knowledge plays an
indispensable role in perception and is applied whenever is possible. While the role
of knowledge has been acknowledged since the beginning of machine vision research,
there has been no coherent way to apply knowledge in the process of vision problem

solving.

Homogeneity

The human brain can perform many heterogeneous tasks with essentially homoge-
neous and simple computational units (neurons). These computational units are ho-
mogeneous in the sense that every one of them computes in a similar fashion and the
computational results are represented in a consistent way. The networks connected
by such homogeneous computational units are therefore homogeneous, consistent,
systematic, and coherent biological computers.

In machine vision research, in order to achieve the same set of different visual
tasks, both the methodologies and the representations adopted are usually also het-
erogeneous. It may not become a concern in the research related to studying isolated

individual modules. However, such heterogeneity will introduce much difﬁculty in

- 'o‘s..' .
lI_E’=_....c.,’._e 1'

.'. r-q] 1r: - .‘.
4‘.Gl 3‘: 6‘ b .

.‘.

g“. ,

l
-~.<1 km 1‘-
f~:. 'tif

Vii‘llt
l.‘ t ,
l. Infill
.17‘,‘)'
J'Cuun,

‘M
rs
fr
"l'Jl'F Q3!
. J"
Plftrry,

V
.1_:‘V-
“(Dr .

g: r“? l

15

integrating these individual modules, causing serious problems in building computer

vision systems.

1 .2 Motivations

Identiﬁable individual modules in the human visual system have led to the shift of
research attention to the isolated study of individual modules. As mentioned before,
merely a bag of tools for a bag of problems certainly will not provide an overall
solution to computer vision. While the view of constructive and hierarchical vision
prevails, the issues of cooperation and competition among individual modules become
crucial in achieving machine perception. It relates to an important issue in computer
vision: how to integrate individual modules, multiple sources of information, and
different types of knowledge. It is impossible for computer vision to succeed without
an adequate solution to this problem. Kanal recently emphasized the need for a

formalism for integration[54]:

Formalisms for developing algorithms and parallel implementations for many of
the individual tools have received much attention in recent years. But the inte-
gration of heterogeneous computational components, multiple sensors producing
diﬂerent types of data, and heterogeneous knowledge bases, is a signiﬁcant sys-
tem design problem for which we currently have only ad hoc techniques. Clearly
more systematic methods and formalisms need to be developed for the design of
complex multilevel systems consisting of heterogeneous modules performing spe-
cialized local computations while interacting with other modules at the same and
different levels of a hierarchical organization. Such interaction involves infor-
mation and decision ﬂowing back and forth, with competition, and cooperation,

all in the context of global constraint satisfaction.

llrsity also poi

. 1 ,
- - - - - 4
”internal flld‘ .

l " .
> x e u . . .. ._
.tm. mluw

l
I

.j‘ l

fl.

1,
'1 Ut‘t

.' 1 i
(l nig'tfr .'..

~' I. .
fluffy]? 9'

to g ,i
f it

fray“ 0,; (it.

nuance: ;-.-..

’ . ’ v .
tl]vl';n,~' .
“L- ;‘ ‘v.,

. Gt (“Ff \l'ﬁ

FRI-l \I
" i'ﬁft‘t v v
.. r-..Gl.v Oink“...
if “ P
{In “9 J
‘ ‘JA Mata
. ‘ ‘
~\.'-I ‘
i ‘41 i‘ ~ .‘
a“.
If“ ‘
...e,i_,0,w_ .
: iii-1'
‘I
.l_“.d \\.
. (A r
.'y,
If ‘lf T'pq .
‘n I- . ‘
'36 iv..-
1]., Diff-l
'1” (’ﬁ \
‘ ‘Hf‘ 7‘.
* 'wlF- l
A
'L‘ _

.lﬂ
.,.
3'21“», .
‘ art-“t
. ‘1‘,r“‘
«J. .
r
A l:

. -‘(
'll".
"tat A

16

Minsky also pointed out the importance of developing the discipline in organizing
managerial machine vision systems so that advantages of different methodologies can

be fully exploited and the disadvantages can be compensated[65]:

It is time to stop arguing over which type of pattern classiﬁcation technique is
the best because that depends on our contest and goal. Instead we should work at
a higher level of organization and discover how to build managerial systems to
exploit the different virtues and evade the diﬂerent limitations of each of these

ways of comparing things.

Cooperative and competitive behavior is the outcome of integration while integra-
tion is achieved through interacting. In the literature, interaction has been attempted
at several scales. The ﬁrst scale is local neighborhood interaction. The second scale
is the interaction across multiple images. The third scale is the interaction among
individual modules across levels of abstraction. To date, there is no general paradigm
that supports ﬂexible interactions among individual modules in a consistent environ-
ment that allows integration of heterogeneous types of information and knowledge in
a globally coherent way. In our View, the major obstacle is from the heterogeneous
nature of data, of knowledge, and of the techniques employed to solve heterogeneous
visual tasks.

Heterogeneity exists due to the inevitably different nature of the visual tasks
involved. We need to eliminate as much unnecessary heterogeneity as possible so
that its negative effect on integration can be minimized. In the meantime, we need
to explore the homogeneous characteristics in vision problem solving and to utilize
them in developing a more consistent and systematic integration environment. To do

so, the following questions need to be answered:

0 What kind of homogeneous characteristics exist?

0 What do they offer and why are they useful?

. How do we till

7 ‘ ~ ‘ i.
12:56 {it’ll .‘t‘tllull.

tie taster to the '-.l

1.3 Homo

sameness dyer

teciinzez'tett

he.

. P- . ‘ v

by av n1>,1. . Q t .

.m .t t ‘ '
Pisa.c.}vl.tf> 1.4(1‘

1.3.1 Homc

_t
\y‘

, .
a ".V" -. 1 A ”u
.. 1339‘1 it: 5!

”iv-u.
: -y~ '
we. I l\ Fr .‘..
JJ+ 3.0%th a
V A ‘A-

TWT‘t‘t‘ "“ ‘ i
" JUN E) A". (1T. 1‘"

ilk "K. t- : [D1

-..‘ ‘

3H"? (M ‘

-.I._ Lu ~ ., .-
«u \lxl‘d] r

.71]
». “1'; lF“ ' .
l. ‘ \‘l‘M—J“
a“,
‘iJuf'Aa‘l‘; i
<3 Jrlr‘

17

o How do we utilize them in the context of integration?

In the next section, we try to answer the ﬁrst two questions. Chapter 2 is devoted to

the answer to the third question.

1.3 Homogeneity Behind Heterogeneity

Homogeneous characteristics can be explored in three aspects of vision problem solv-
ing: homogeneity in operation, homogeneity in representation, and homogeneity in

the principles that govern operations. We examine them separately below.

1.3.1 Homogeneity in Operation

As discussed in Section 1.2, the process of vision problem solving is a process of con-
tinuous grouping, along, maybe alternately, the two dimensions corresponding to the
two computational problems involved in vision. With this perspective, different mod-
ules, which may superﬁcially seem heterogeneous, perform groupings consistently to
solve the visual problems at different levels of abstraction. Here, each module is de-
ﬁned as a computational unit that possesses the functionality of data transformation
or unit transformation from its input to its output.

This uniform vieWpoint toward the heterogeneous individual modules immediately
leads us to a potentially homogeneous architecture for vision problem solving with a
hierarchy of grouping processes. Such a role of the grouping operation in a complex
computer vision system has been speculated by several researchers [106, 34, 93, 17].

Brady indicated explicitly in 1981[17]:

It is equally clear that grouping operations need to be deﬁned at each level of
resolution of each representation in the visual system, in order to impose hier-

archical structure upon the representation.

1:6 mast import :2

law ultimo '- 'l:
M. to 1.,1.‘ 1.

1.3.2 Homo;

l WIRHIUCtit'e in.

1

.l
.4
all

'5. Perceptual

6

' ! c
'0' ""t “l " l
toe \.§ 161 we:

A, a. 7'
‘Li'\ r

..._..i1ti'.s or exit

Sh‘ﬁl ?~ ‘p . l ' 1
f"»4 LU .jljl ii lift";

’9’r- r . ‘
. “'f‘l‘tfeiitdlll't'N

”"TW-o -

“mason mm.

“"9... 1 .

'1 M. if}? ’ ]- [,
HUIJ t‘ I. Y.”

\.
.
{'1' '
. *. U‘-
i“. 7'v
I [Dalila]: ’
41,

J5 3",
-.§.[i];: lli‘l ‘
If [4 .
div)
“A.
"ll" ]|l_
‘l‘if‘s
V. .
ll ,[P-n V
s It] ‘
\uf. ,
‘ All.)
I.
"if";
f ‘4: .‘I-
‘CL lJn
“Jﬂl.

18

The most important advantage offered by a homogeneous architecture for vision prob—

lem solving is the systematic behavior of all individual modules.

1.3.2 Homogeneity in Representation

A constructive problem solving scheme implies that a solution is achieved incremen-
tally. Perceptual entities derived at one level of abstraction bridge the solutions
for the visual tasks that are below and above in the hierarchy. They may provide
constraints or evidence that are useful for other levels of processing and can be propa-
gated to both higher and lower levels along the hierarchy. Therefore, the functionality
of representations for the perceptual entities at different levels is mainly to provide
the interface among individual modules. Such an interface should facilitate different
information requests and efficient information delivery. For complex vision systems
with ﬂexible interactions among modules, heterogeneous representations at different
levels will undoubtly introduce difficulty for different modules to communicate with
each other.

Is the heterogeneity in representations inherent in vision problem solving? Con-
sidering the nature of perception, visual data can be adequately characterized by
limited types of information. For example, any perceptual entity can be character-
ized by its properties, intrinsic or extrinsic, such as surface type or some conﬁdence
measure, and by its relationships with other entities which can be spatial, temporal,
or organizational. In our view, no matter how heterogeneous the underlying percep-
tual entities to be represented are, a homogeneous representational scheme is not only
possible but also beneﬁcial in order to provide a consistent interface among different
modules.

When the homogeneous operation, grouping, is coupled with a homogeneous rep-
resentation, a process of grouping operations systematically imposes a hierarchical

structure on the representation at all levels.

1.3.3 Homogt

life: heterogemn‘.‘
gems grouping 11'.
ties that gown: tl
r attic-3118. In
more fillﬂ'iltf‘fﬁ fruit
”GTE lljt‘ saii'lt-
be governed by Pi?

iiilffon. That is.

was it. “3:3, .3

537.“ or de‘itz‘al

a . -
r. . ,H ‘ ,
01 C1,; djLUI'i ltx“-*3[
‘ ‘ ' - A.

1“?” r .
J”‘WWF 'ri "
J H‘p’lhh

.’
ca“ ("1:-
‘ )L. f O

l l L n
A

Fiﬁ?
— ~ '
' ”in“ [hf-i:
A; 4;
His,
fer?
wt 'r-l‘.
iJH’HlfJI
ﬁ‘ p
"gUTPrI‘ ‘
. l C(‘
.3?»
“44?»
‘gllffd to
Evy“
3t. ,

19

1.3.3 Homogeneity in Operational Principle

When heterogeneous modules are viewed from the operational perspective as homo-
geneous grouping units, the behavior of each grouping module is determined by the
rules that govern the grouping operations. Take the addition operation in algebra
as an example. The computational process that performs the addition takes two or
more numbers from its domain and produces one number in its range. Although
behaving the same way for different domains and ranges, an addition operation can
be governed by either a set of rules for binary addition or a set of rules for decimal
addition. That is, the functionality of the addition is determined by the rules that
governs it. With different ruling principles, an addition process may perform either a
binary or decimal addition. Therefore, the addition is an abstract operation deﬁned
at an axiom level, while binary and decimal additions are operations deﬁned at the
concrete level. Similarly, in this thesis, grouping is deﬁned as an abstract aggregation
operation. The functionality of a grouping process is determined by its employed
grouping principles or grouping criteria because a grouping operation is performed
if and only if its grouping criterion is satisﬁed. Ultimately, the inherent heterogene-
ity of individual modules lies in the grouping criteria. The inherent heterogeneity is
now preserved in a way such that it affects only the internal behavior of individual
modules.

Is there any homogeneity among different grouping criteria? If so, such homo-
geneity must indicate some more abstract principle behind the grouping criteria for
different problems and the grouping operations are actually all, explicitly or im-
plicitly, governed by this generalized principle. In the literature, researchers have
attempted to generalize the principles of organizing perceptual data. Lowe has used
the principle of nonaccidentalness for organizing 2D perceptual data under the as-

sumption of vieWpoint invariance[58]. Haralick has considered the general principle

20

of homomorphism[41]. In their paper discussing the role of structures, Witkin and
Tenenbaum have proposed a uniﬁed organizational principle called least distortion or
fuzzy identity [106]. In this thesis, we argue that an extended concept of homomor-
phism, presented in Chapter 2, can be used as a generalized grouping principle.

The homogeneity in operational principle identiﬁes the general information flows:
top-down and bottom-up, that determine a uniﬁed syntax to specify ﬂexible grouping

criteria.

1.4 Contributions of the Thesis

The primary contributions of this thesis are in two areas of computer vision research:
the area of studying formalisms for vision problem solving and the area of study-
ing model-based recognition methodologies. Our contributions to the area of vision

problem solving are:

e We explored, using the central theme of grouping, the homogeneous charac-

teristics behind seemingly heterogeneous machine vision techniques.

0 We exploited the above identiﬁed homogeneity and proposed a homogeneous
problem solving architecture for vision, called Hierarchical Token Grouping.
Through this proposal, we established a formalism for developing complex

computer vision systems.

0 We showed that this architecture possesses a set of desirable properties that

implicate both the practical and theoretical potentials of the architecture.

0 We demonstrated that the proposed vision problem solving architecture can be

applied to real-world problems.

Our contributions to the area of model—based recognition are:

. \Ve dor'ns'z'

,.
the satietzt i

I We (im’t‘l't’s‘
l

jects. mull“

life desist."

system turf

We tested 1
Ike eizjw-ri
reverting
scheme all.

«1*- ..
F. Jul-PHI .‘l ’i

SOliif? rPEl "]
A. g

21

0 We developed a Generalized Stochastic Tube Model that describes both

the salient and dynamic shape properties of tubular objects.

0 We developed an integrated model-based approach for recognizing tubular ob-

jects, including plant roots, wires, bacteria, pipes, and blood vessels.

0 We designed an automatic multi-level reconstruction strategy and developed a

system under the framework of Hierarchical Token Grouping.

e We tested the developed system and evaluated its performance and robustness.
The experimental results show that the integrated model used is effective in
recognizing the class of objects we are interested in, the hierarchical recognition
scheme allows a more robust object recovery, and the propose machine vision
problem solving architecture is useful in developing machine vision systems that

solve real problems.

Chapter 7 will give more detailed discussion and concrete conclusions about these

contributions.

1.5 Organization of the Thesis

The remainder of this thesis is organized as follows. In Chapter 2, using the central
theme of grouping, the homogeneous architecture of Hierarchical Token Grouping is
formally deﬁned. The desirable properties of the proposed architecture are examined.
Both practical and theoretical potentials of this architecture are speculated. We claim
that the proposed vision problem solving architecture is a generalization of many
existing computer vision techniques that are designed to solve different problems
using seemingly heterogeneous methodologies.

A literature survey is presented in Chapter 3. Four aspects in computer vision

that are related to vision system integration are surveyed: representation, recognition

.5 p — - . . . 3.
I.t‘t..w,lti.c§;t 5 t}

for 'ision. Inﬁll:
ill.r the p8.“
tetniques cart I
to support the .
tag»? of existizn

.. ,. . l
l: 5e: and tier «

1':th Pf“ , . (u
...t.... F‘- I
. kl ill _.td

lhe prunifri;

Q I
. 'T‘ei l 1
71 “.'ULPVT)’ ’

med
""av;\:‘n ‘
-'....zkt‘ and ill

t.., ' ‘
use? mantel cite

“‘1 .Zcf‘e’o v
...M“~ Jvt-a ‘ .
' v I i J“ ft“

*1 - 1
j, <
< .5 i

Q! .
”‘Nﬁy‘l o
‘q ‘lJI _
5 l .
~91",
f'.
erl“:
H W!
“N- s
V 1 .
.__
if‘tf- I
[J'Jflf Q ‘
._ l «
‘0 11
‘3: cf
‘Ji Sh.—
:‘IJlly'tv

,~If‘t

22

methodologies, interaction and integration, and computational models and paradigms
for vision. Techniques in each of these aspects are classified into several categories.
W'ithin the perspective of token grouping, we demonstrate that each category of
techniques can be treated as a token grouping problem. Such a demonstration is
to support the claim that the proposed architecture is a generalization of a wide
range of existing techniques and can be used for a wide range of vision problems.
As yet another demonstration, a real-world problem of tubular object recognition is
considered in Chapter 4 and its solution is posed as a hierarchical token grouping
problem in Chapter 5.

The problem of tubular object recognition is examined in detail in Chapter 4.
A model-based approach is proposed for recognizing tubular objects from both 20
intensity and 3D volumetric data. The modeling includes (1) a generalized stochastic
tube model characterizing the structural properties of tubular objects, and (2) the
imaging process models, predicting the expected cross sectional sensor measurements.
An automatic multi-level recognition strategy is proposed that exploits the power
of the models at different levels of abstraction. In our solution to the problem of
tubular object recognition, many classical vision problems are encountered such as
segmentation, perceptual organization, and matching. In Chapter 5, we demonstrate
how these individual problems can be formulated as token grouping problems and how
conventional heterogeneous techniques can be organized in a consistent and systematic
way to form a computer vision system.

In Chapter 5, the problem of tubular object recognition presented in Chapter 4
is posed as grouping problems at different levels of abstraction. The strategy de-
signed for solving the problem is realized under the paradigm of Hierarchical Token
Grouping. An object model is decomposed into different levels of detail which cor-
responds to the hierarchy of object reconstruction. This hierarchy is mapped to a

set of grouping agents, each of which is responsible for one type of perceptual entity.

‘ -11: -- 9 1
he 30mm m
. _ '1 3‘ '1
mm mom
‘ - . D -|'
us entire sun
r6- Jereraml. u
Imitatror; ma:
'i“':‘al ("'V'VlTrj‘
1.6.35.3 1 ..Ll.li A
aﬁﬁlllC pit-Jill
Exziermotx:
l
(‘T

1 v
..‘_:P}lj an“: {:(“IIQ

‘ ‘
4’3‘J3‘iﬁt‘d. BU'~ '

A n

““ “5 Plant '1
z-- ",1. . ,
“mm-1mm 2
;\

Fl. . ‘ . n
"“"”"“d In C:
1‘ gr‘Ji‘TJITm

.
.‘u’.
?‘_‘ ‘ ‘
H- ex
Li H
“an , ( .
K v
[1 in x u
b _<.‘
.F’ljé-Q:\L .

23

The specifications for this set of grouping agents define the tokens involved, inter-
action models, integrating multiple sources of information, and the architecture of
the entire system. Based on these specifications, various descriptions of the system
are generated, including its knowledge hierarchy, its grouping hierarchy, and its com-
munication channels. Since the problem of tubular object recognition involves many
classical computer vision tasks , Chapter 5 serves as a demonstration in terms of how
a speciﬁc problem can be solved under the paradigm of hierarchical token grouping.

Experiments and results are reported in Chapter 6. To evaluate the system de-
signed and developed in Chapters 4 and 5, three performance evaluation methods are
proposed. Both parametric synthetic data and real data are used to test the system.
Real data includes man made objects such as wires and pipes, and biological objects
such as plant roots, bacteria, and blood vessels. System performance and robustness
are evaluated based on the data for which ground truth is available. A new scheme for
visualizing the recognized 3D objects is also proposed and applied in Chapter 6. The
experimental results have shown the effectiveness of both the model-based approach
proposed in Chapter 4 and the system realized under the framework of hierarchy of
token grouping. The beneﬁts of the descriptive methods are demonstrated.

The major contributions highlighted in the previous section of this chapter are
restated in Chapter 7. Finally, the conclusions and the discussions about the future

research are also given in this chapter.

CHAP'

Hieram

l
,

s Ills clap»,

.‘Wvé
‘- L51

””5 Oi \IlSiu

f
Ll. i3”: if
w J "

ping' Silt

'
a V
'k‘.‘ T» ,
‘LlE‘ » '
- ‘13. ?

~.l -
- t l.

,

1‘, ,1

CHAPTER 2

Hierarchical Token Grouping

In this chapter, a paradigm for computer vision problem solving, called Hierarchical
Token Grouping, is proposed. The paradigm is based on the homogeneous charac-
teristics of vision problem solving presented in Chapter 1 using the central theme
of grouping. Such a paradigm is not intended to provide particular or general solu-
tions to vision problems but rather it provides a formalism by which heterogeneous
solutions for vision problems at different levels can be organized consistently and sys-
tematically, supporting integration of modules, cues, and knowledge, all in a globally

coherent mechanism.

2. 1 The Architecture

The architecture of Hierarchical Token Grouping is shown in Figure 2.1. It has two
distinct hierarchies, the left one represents the knowledge hierarchy, denoted by K H ,
and the right one is the grouping hierarchy, denoted by CH. The imaging model,
denoted by I M and located in the middle of the two hierarchies, emphasizes the
indispensable role of the sensor in perception.

Within the grouping hierarchy, various grouping agents are hierarchically orga-

nized to incrementally recover what is seen in the visual data. In GH, grouping is

24

25

Level of Alisa-action L

 

 

Figure 2.1. The framework of hierarchical token grouping.

extended along two conceptual dimensions, one is the dimension of spatial-temporal
scales, denoted by M, along which multiple images exist from either multiple res-
olutions or a time sequence, and the other is the dimension of geometrical scale,
deoted by L, along which multiple levels of abstraction span. The two conceptual
dimension axes form a grouping space, denoted by V. At any level of abstraction,
say I E L, a certain type of perceptual entity is to be recovered from the grouping
domain of perceptual entities extracted from other levels. A perceptual entity can
be characterized by a set of features some of which may be obtained by grouping
corresponding perceptual entities across multiple images, either multiple resolutions
or a time sequence, along dimension M. The two computational problems of percep-

tion (discussed in Chapter 1) are solved incrementally and alternately along the two

tnnmuudhnms

While each of t
laiESOtiirole.t.
ru: (are at \iari(,
mmumhﬂ.
in the form of u;
ﬁg‘ieprocessri
azdsuch a Ter(V{
Otjﬁfl HiOdEl. l.)
lHnaﬂstolu

._,,
(Ed-ilHEIGthP.

.'

1573121130“ film ii
mad I ,\1_

arr};
“Qiy'l‘i‘pl‘lg = ‘

.. 'A 11
1

kn. ‘
.'.“)ffrgtﬂ.
. 143 On ‘he'

’
"9‘.
. 4 Hr. :
‘NJLdlaI ‘1)?
l I

The

rec (.‘.

0p... .
‘wt‘ 5?!»
- r¢~1‘lil.rm.

26

conceptual dimensions by utilizing the knowledge stored in KH.

While each of the three major components (KH, CH, and I M ) in the framework
has its own role, they interact. Through the imaging model, the two hierarchies com-
municate at various levels by either retrieving knowledge from KH or updating the
contents in K H. Knowledge is organized as a hierarchy in KH and flows downward
in the form of expectations or expected events at different levels of abstraction. Dur-
ing the process of problem solving, information is reconstructed incrementally in CH
and such a recovery makes use of the knowledge stored in KH such as decomposed
object model. Depending on the type of sensor used, the knowledge retrieved from
K H needs to be rendered into a form in which it can be appropriately applied in
CH. Therefore, while information ﬂows bottom-up in CH and top-down in K H , in-
formation also ﬂows between the two into each other’s hierarchy through the imaging
model I M .

In this architecture, the information recovery process (GH) is separated from the
knowledge (K H and I M ) It implies that a solution to a vision problem directly
depends on the solutions to three subproblems. The ﬁrst subproblem is related to the
information recovery process which is treated as grouping problems in this framework.
The second subproblem is related to knowledge engineering, including knowledge
generation, retrieval, and updating. The third subproblem concerns the use and
learning of knowledge during problem solving or the interaction between the above
two. Each subproblem is itself a separate and yet difﬁcult problem. The emphasis of

current study is mainly put on the various aspects of the ﬁrst subproblem.

2.2 Grouping Hierarchy

In Figure 2.1, grouping hierarchy GH consists of a set of grouping agents that are

organized hierarchically along the two conceptual dimensions. Each grouping agent,

1y -,‘ ‘ g - ‘
mentztec b\ one
usual task Ha g
ottextiire. A g
. 4,. . J . “‘I
chalet; at or.

{FEEDIHE arm-,1

~ ‘5. ‘ g
.‘UJEPi Of .I‘P l“
. . ‘ _ A
”A? V‘ A ‘ .. ~ )
kc... JG dCl.9i\\Pi_
9;; J '

“'4‘. ‘46:, ‘ .

z \ Pd pp

‘F‘fl‘il"?" r ~. . ,_
“Mm pruw

ﬁ‘qul:r!nn 0..

‘l (I
.
[LP Erl-le}.r
" ‘ 4:‘~l.g\

_|
- '.
:3“ -...
“ ‘.~|I, '1'.
“r ui P
4'."

27

identiﬁed by one block in GH in Figure 2.1, is responsible for solving a speciﬁc local
visual task via grouping such as extracting straight lines or extracting a certain type
of texture. A grouping agent acts on a domain consisting of the perceptual entities
extracted at other levels and organizes the data in the domain. The output of each
grouping agent includes a new set of perceptual entities, each of which is an organized
subset of the domain entities, representing the abstraction of its input. The output
can be accessed by other grouping agents so that grouping operations continue until
the derived perceptual entities correspond to meaningful objects. This information
recovery process is the inverse mapping described in Chapter 1.

There are several important issues associated with every grouping agent: rep-
resentation of both domain and output perceptual entities, the grouping operation
and the associated principles that govern the grouping behavior, and its interaction
model with other grouping agents. If a grouping agent is viewed as a black box, then
the groupings describe its behavior; grouping principles affect only the internal func-
tionality of the box; the interaction model determines how the box is connected to
other black boxes or other grouping agents; and the representation of both domain
and output entities serves as the interface between it and the outside world. In the
following sections, these issues are formally addressed and are generalized across the

grouping space V.

2.2.1 General Representation: Token

In the context of constructive vision, grouping agents interact through exchanging
information. Therefore, the issue of representation is directly related to the issue of
interfacing among different agents. In this section, a homogeneous representation,
called token representation, is defined for the perceptual entities of all levels.

A token represents a general perceptual entity or a meaningful event in perception,

visual or conceptual. It can be a point at the ﬁnest image resolution such as a pixel, or

ll can i)? an ohjer

+1" ~ ’
writ? mm W"
.:,.. _"
‘.;;«€fl320frdlii\' G
ffr'n‘arf l‘ A i i I

. “up idl UliTJ

V‘ -
W’Rfr . i

"\vx..“.:l‘.1‘g .‘..
a 4.. l~ ‘i'

F l“
tym- _
Qumran. it

‘l; ‘
iry‘fir’z .
ﬁne.

28

it can be an object at a conceptual level. Each token can be adequately characterized
by its organization, its properties which may be intrinsic, extrinsic, relational, or even
functional, and its deformation. Every token can be organized to form one or more
other tokens.

Within the grouping hierarchy, the token is the uniform representation employed
by all grouping agents. Tokens are scattered in an n + 2 dimensional space, called
token space with n spatial dimensions and ‘2 conceptual dimensions, where n. is the
dimensionality of input image(s). Within the grouping space V formed by the two
conceptual dimensions L and Al, tokens form a hierarchy along each dimension,
representing the perceptual entities abstracted at different stages of perception.

Formally, let T,” be token i at v, where v=(l,m)€ L x 1W 2 V. A token is a
3-tuple:

Ti” = {Capan'vaiIL

where C: is a set of component tokens from other levels that specifies the organization
of T,”, F,” is the feature set that characterizes token Ti”, and DE’ denotes the deforma-
tion characterization of T,” at v. Speciﬁcally, denoting the grouping domain at v by

G: and the grouping principle at v by 672’, the organization of T,” is deﬁned as:

c: ={T;;: |1g m 3 mm 6 agency) = T}, (2.1)

where n is the number of tokens to be aggregated, T}: is token jm from vm E V,
and the predicate G:(C:’) = T means that a grouping criterion, denoted by CZ, is
satisfied by Cf at v.

A feature set F,” is a union of two classes of features:

Fit) 2 Pi” U Rf, (2.2)

29

where P,” denotes the set of property features that characterize T,” and R? is the

set of relational features that describe the relationship between T,” and other tokens

I . . o
T,” ,2 # J. Property feature set P,” CODSlStS of, say N , feature measurements:
‘U . a
P,- = {7711,7712,...,mNP}. (2.3)

Each property feature can be intrinsic such as surface orientation, or extrinsic such

as a conﬁdence measure. The relational feature set is deﬁned as:
R? = {(N",{(k-m’,a.~.k,)})}. We NR. 133's god"), 1.37477. (2.4)

where N3 is a set of relation names, (Mn, {(kj,v’,a,-,kj)}) describes relationship N"
between T,” and other g(D<1") tokens, with k,- and v’ together pointing at the kj'th

token at v' and

aiJc, = {air}, 0 S t S qn

being a set of qn attributes that describe the relationship between token Ti” and TEJ'.
Therefore, Rf deﬁnes a set of M NH H relationships, with Nne NR, between token T,”
and other tokens.

The deformation characterization 05’ is deﬁned by some measures that capture the
deformation by describing the deviations of actual observed perceptual events from the
corresponding expected perceptual events. Such explicitly represented deformation
information offers a basis from which a tolerance model for extracting the perceptual
entities at v may be established.

Token representation is homogeneous and general. It provides adequate descrip-
tive power for characterizing perceptual entities at any v E V and offers a consistent
interface among different grouping agents. The feature set of a token determines

what information is accessible from v. The information carried by a token, upon

30

being accessed, gets propagated to other grouping agents. All tokens that are related
according to the component description C,” for different v’s form a hierarchical repre-
sentation for the perceptual entity at the top level. By tracing G,” down to the bottom
of GH, descriptions with various degrees of details about the perceptual entity can

be obtained.

2.2.2 Uniform Operation: Grouping

Grouping is deﬁned as an abstract aggregation operation. Token grouping is a com-
position of a set of tokens into a new token. Consider a grouping operation at an
arbitrary v E V. Let T” be the set of all tokens formed at v E V. Assume the
grouping domain at v, denoted by GE, is a union of all sets of tokens from various
levels v’ E V,

I

G: = UUIELu T” , L” 2 {v1, ..,v1}. (2.5)

A grouping operation at v E V is deﬁned as:

{Tf’} H Tg’, T,” e 0:, (2.6)
The tokens that are aggregated, {Tf’}, become the components of the new token,
C,” = {T}’}, that determine its organization and collectively specify an observed
event extracted from the visual ﬁeld. A grouping operation is constructive due to the
nature of aggregation.
Recall, a token is characterized by its feature set. Characterizing the newly aggre-
gated token T,” is part of a grouping operation. Feature set F,” of T,” can be derived
by a feature aggregation:

II:C,”UT”I——+F”,

2

where H gathers the information about a set of individual tokens, which may include

31

non-component tokens, and establishes the feature set of Ti)-

2.2.3 Uniﬁed Grouping Principle: Homomorphism

Grouping criteria specify the principles of grouping. Deﬁne a grouping agent to be
a process that carries out grouping operations according to given grouping criteria.
For a particular grouping agent at v E V, denoted by GA(v), its grouping criterion

G” can be generally deﬁned as

G” = :j; /\ 6:, (2.7)

where G: and G: are two predicates that are connected by an “AND” operator.
Speciﬁcally, G: speciﬁes a grouping subspace or domain of GA(v). A grouping sub-
space is where the candidate tokens to be grouped reside. The term G‘c’ deﬁnes the
criterion to determine whether the evaluated agreement between an observed event
and an expected event signals a signiﬁcant similarity. Therefore, G: deﬁnes the rules
(principles) that govern the grouping operation at v. A set of tokens from grouping
domain G: can be grouped to form a new token at v provided that the set of to-
kens satisfy criterion G2. Note, deﬁnition (2.7) is a general form of deﬁning grouping
criteria for the grouping agents at all levels.

As mentioned earlier, the general form of grouping domain G: of GA(v) is a union
of sets of tokens from different levels (along either L or M):

G'"

3 O

Uv'eLvT”, (2.8)

where L” is a list of levels v E V and T”, is the set of all tokens from v' E L”. Since
L” explicitly lists the levels with which GA(v) will interact, it deﬁnes the interaction

model at v, either an intermodule interaction model or an interframe interaction

‘ I ‘ ..
node. (1"
the EFOYJDI
K? ‘ 'P 'i '13 ‘
._lC-L.li.:.ie'-
t-e prepare

we present

V
r ~- .‘. -
“M ml: Cf.)

'i;;rﬂ I
m‘ G “0'! t
.._C.,‘J:‘v; g,
.5.
;;_ .
a U APT
l Phi A
., r , .
"ifs/1?"

32

model, depending on along which conceptual dimension (L or M) GA(v) performs
the grouping. The interaction channels between GA(v) and GA(v’),v’ E L” can be
established according to (2.8) so that all the cues accessible from these levels can
be propagated into GA(v). Which speciﬁc cues are to be used and how they are to
be integrated are further made explicit in criterion 0;. In the following sections,
we present a uniﬁed criterion, homomorphism. We ﬁrst introduce the mathematical
concept of homomorphism and its previous use in computer vision. Then we examine

how this concept is to be used as a generalized criterion for grouping.

Mathematical Homomorphism and Exact Matching

Mathematically, a homomorphism is a mapping function [28, 41, 90]. Let [G; *] be
an algebraic structure where G is the domain and * is an operation or relation that
acts on G. One example of an algebraic structure is a graph that consists of a set
G of nodes and a relation * among nodes, represented as arcs connecting nodes. To
deﬁne a homomorphism, assume another algebraic structure, say [G’; #]. Then, the
mapping,

h : G H G', (2.9)

is a homomorphism if

Mg.- * 93') = h(gi)#h(gj)a Vgsgj 6 G. (2-10)

Note, this deﬁnition does not require h to be an injection (one-to-one). Therefore,
the existence of a homomorphism or for two structures to be homomorphic merely
claims a similarity but not necessarily identity. When the mapping is injective, then
h is a monomorphism. The strongest case is an isomorphism when h is bijective (one-
to-one and onto) [28, 90]. One example of a homomorphism is discussed by Doerr

and Levasseur[28]. Considering a “picture-taking” process, a camera is a mapping

gunman {fa
{Mr-LIIIIPTN‘

thrill one d

Ilia is a (4‘1“
be appiied.

la dismiss;
lit 20le of
marlingrll. €-

A».
.“l ..

«air- 1 '
\-\Mi.4‘ls.lt‘ EU“. 1

“El é ~ ‘\.
)t'
H.
k 4" .
“ ‘3 T‘ '
1
cHPlIro

33

function that transfers something three-dimensional onto a photograph, something
two-dimensional. The mapping, h : R3 r—+ R2, corresponds to a homomorphism. Even
though one dimension is lacking, we still are able to recognize much of what is present
in the 2D image due to the similarity. But a question arises: how similar is similar?
This is a question that has to be answered when the concept of homomorphism is to
be applied.

In discussing matching problems in computer vision, Shapiro and Haralick used
the notion of relational homomorphism to describe both exact and inexact structural
matching[41, 90]. We brieﬂy introduce this concept of relational homomorphism and
examine how it is used in deﬁning exact, and later inexact, matching problems.

Let D = (P, R) be a structural description of an perceptual entity where P =
{P1, P2, .., Pu} is a set of primitives, one for each of the n parts of the entity, and R
represents the interrelationships among parts. Each primitive is a binary relation P,- E
A x V where A is a set of attributes and V is a set of values. R = {PR1, PR2, .., P125}
is a set of named N-ary relationships over P. Each PRk,k = 1,2,..,K is a pair
(NRk, R1,) where NR;c is the name for relation Rk and Rt 6 PM“ for some positive
integer Mk. Assume an N—ary relation R g PN over set P. A function, h : P H Q,

maps elements of P into a set Q. Deﬁne

Ro h = {(q1,.,qN) E Q | 3(p1,..,pN) E R with h(p,-) = q,~,i=1,..,N}. (2.11)

Let S C_: QN be an N-ary relation over Q. A relational homomorphism from R to
S is a mapping, h : P H Q that satisﬁes R o h Q 5. That is, under a relational
homomorphism, each element of PN is mapped to an N-tuple in S Q Q”.

Based on the above deﬁnitions, an exact matching can now be formally deﬁned.

Assume a stored model or a prototype structural description Dp = (P, R) and a

(urinate str:

1
Kit

”j

I

:3
:9

tries U

C

-,_‘
sees If .,

‘Adl‘ (L‘-
I‘.

‘ .
‘ . ls
1
9-.
‘1. ..
" “phi .
"MJr"
Pith:

it p, .
«do;
map F
"W."

34

candidate structural description DC 2 (Q, 5). Let

P = {P1,P2,..,P,,}, Q = {Q19Q‘29'°9Qm}’

and

R = {(NR1,R1),..,(NRk,Rk)}, S = {(1VS1,81),..,(lVSk,Sk)).

Dc matches Dp if there exists a mapping

h: P H Q,
satisfying
1) h(P,-) = QJ- => P,- g Q,,an(l (2.12)
2) NR, = N51: R; O h Q 51' (2.13)

The ﬁrst condition (2.12) describes a mapping function that gives the correspondence
between the primitives of the two structural descriptions. The second condition (2.13)
states that the mapping function h must be a relational homomorphism from each
relation of one description to the relation with the same name of the other descrip—
tion. By this deﬁnition, an exact match is achieved through ﬁnding a relational

homomorphism h.

e—Homomorphism and Inexact Matching

The notion of exact matching is only appropriate for perfect data. In real situations,
an exact match of structural descriptions should never be expected. The concept of

inexact matching is necessary.

In an inexar

rionships anion:

-.;,L,:. .-
“(hinting fhflti

ll : i ll

Here. ll'p detirn

“Eli

35

In an inexact matching, possible missing or distorted parts and untrue interrela-
tionships among the parts in a candidate structure are taken into account. Deﬁne a

weighting function:

W = (Wp, Wk), Wp = {w1,w-2,..,uin}, W'R : {VVRIVW W’RK}. (2.14)

Here, Wp deﬁnes a primitive weighting function:

WP: P H [0,1]

satisfying 21;, w,- = 1, that assigns a weight to each primitive in P. W3 is a set of

N-tuple weighting functions in which each l/VRk, 1 S k S K, is a function

WRk: Rk H [0,1]

satisfying 2,63,: WR;c = 1 that assigns weights to the .Mk — tuples of relation R), for
each k,1 S k S K. Suppose R is an N-ary relation over P, S is an N—ary relation
over Q, Wp is a weighting function for P, and WR, is a weighting function for R.
Deﬁne an e — homomorphism from R to 5 with respect to weighting functions Wp

for P and WR, for R:

h : P H Q, (2.15)
such that
2 W12, 3 e. (2.16)
rER,h(r)¢S

That is, the total weight on those N—tuples that are not satisﬁed by h with respect to

S is less than threshold 6. Additionally, for each corresponding pair P,- and Q, such

hathli‘: Ci

with

(41”,

Fug,.fxann
: ;,.
2122' Iridppli.g
0‘ . . 0
.LPUHLTUFIIntT
nimnunreat

in inexact

i~h0momogﬁ

T; l. it) (1
its"? ‘

* nan a
L.
ti r‘,

36

that h(P,-) = Q], the following should be satisﬁed:

with

(amvvmn') E Pia (arnvvinq') E ij “in E at:{(1:n|(1.m 6 Ali

and afn is a threshold on attribute am 6 A by which the value of attribute am from a

I

candidate primitive Q], i can differ from the value of am from a model primitive

:m'j,
P,, vmg. Examine conditions (2.16) and (2.17), an e—homomorphism is an inexact or
fuzzy mapping with respect to the speciﬁed 6 and at, where e is the tolerance for all
the untrue interrelationships among parts while at is the tolerance for total distortion
on primitive attribute values.

An inexact matching can now be deﬁned based on the above deﬁned

6 — homomorphism. Let D}; be a weighted model structural description deﬁned as:
l3: (P? WP: Ra Wit),

where P and R are deﬁned as before. l/Vp indicates the importance of each part P.- E P
and WR speciﬁes the importance of each relation R)c E R. A candidate structural
description D0 = (Q,S) inexactly matches a model structural description Up, with
respect to (l) the tolerance for attribute distortion or at, (2) the tolerance for a
missing part denoted by pt, and (3) the tolerance for structural distortion represented

by rt 2 {6; I R)c E R}, if there exists a mapping

h: P H Q U {null} (2.18)

:33

(’5

[I

.hat sari.

r’y.» i~.. ‘ '
hr». HITTINTEW

on

(I!

;. _: 'i -
\ . .
h 5ﬁvrkinar

in n;
Can be tolerate:

a ‘l I.
at We“ ' ' ‘l '
4....lflm 1;},

iii-‘9.“ H ii
‘ du‘ «:19 P”

l l

'3? U"H ‘
‘ filJ' )Pf‘
- ~. A.

A

v" l.
, v- 4 ,
.ue.il,11"0 ‘
..~ -
\
u.

: hr-
..I,,“!IJ"r. 10”,} .
.. ,(l N,"-
.< 1., ,.A

F-
u. exam] DIP

9:41.23 .‘
r \ISAUI
[' Vi
\ ﬁg]
‘4' ‘3‘.
._ 'p >
Ill. 1
ii

37

that satisﬁes

1) h(P,-) = Q, E Q with respect to at, (2.19)
2) ZP.EP,h(P,)=nuu “’1' S Pu (220)
3) NR; = NS => 12. is c,- -— homomorphism (2.21)

with respect to l/VR, from R,- to 5,.

In this deﬁnition, the tolerance models, (2,, pt, and rt, speciﬁcally deﬁne “how similar
is similar” in matching or indicate the degree of “dissimilarity” (inexactness) that
can be tolerated. Typically, a matched candidate structure will be considered as
an identiﬁed instance of the model structure. Therefore, in effect, a homomorphism
serves as the criterion for deciding whether a set of primitives should be organized,
or grouped.

Extending the precise concept of homomorphism to the concept of e-
homomorphism allows the inexact handling of various visual tasks. A slight revision
to the constraints on the weighting functions provides the capability of also identifying
non-structural or non-relational similarity using the same notion of homomorphism.
For example, we can relax constraint (2,6,3,c WR)c = 1) to (0 S Zregk WR;c S 1).
Then, by letting all WR;c = {0} (weigh all relations zero which means that the interre-
lationships among parts actually do not play a role in matching), the inexact matching
deﬁned in (2.18) represents a thresholding operation based on some attributes.

To what extent does the principle of seeking homomorphisms generalize the or-
ganizational principles across both problem solving dimensions? In the context of
computer vision, a homomorphism is a mapping from an observed visual event to a
model event. The principle of homomorphism states: perform a grouping if and only
if a homomorphism, or a mapping, can be found. Haralick has considered various

structural matching problems at different levels of abstraction and formally proved

that they ca
process of in
are integrah?

of homomor:

-.~ " ‘ '
..‘UfilOfllUf-pill5

uninv~§ - ~l
tidrAf‘d am:

:t‘é.‘ entities is

'li' ”:3 '
«J tweet iv t

”:|'-'."" "” .
-J..iun.'uf;rl11\”‘

SEN": ~ K
”Aa‘ l "‘
..liUl‘, PM

fr». ‘

:."" .‘

- ”JP-HE prtprf's
V

‘L
:‘C‘~ '
x“. ”Diff f v
~\.. 1 f‘
- ' 0 xii“

Honlomorph;E

lav

ifF‘i v , ‘
‘ ‘r‘lf‘JQlN‘p.

a, ]
CT‘ I_" r
u }r\ .
rened 9W"
. l
9....
””5 '30"

“will of h.

‘9 ~-'
». 1_ ii] (I "5 L

i.
“3:1

0 , '

FTOIJPH.E ‘
fi‘z‘i‘ '
lls ‘r,

‘AlvOIXPd I
k.‘3‘ (‘0 \

”190063“:
ll
h J‘f‘i‘ ,

ltd Pr)r.">
3“ Fri»

‘ \ilv -
Jmufllﬂ‘v

38

that they can be formulated as homomorphism ﬁnding problems [41]. During the
process of homomorphism ﬁnding, multiple sources of information and knowledge
are integrated under the framework of structural matching. The original concept
of homomorphism was extended to e-homomorphism in such a way that both the
homomorphism in structure and in non—structural attributes can be simultaneously
evaluated and assessed. Such an extension further equips this principle with the ca-
pability of generalizing the grouping criteria across all levels of abstraction. For the
grouping along the spatial-temporal dimension, various criteria for grouping percep-
tual entities have been proposed [61, 34, 87, 87, 30, 29]. Since these criteria are deﬁned
no differently than many of the criteria used by Haralick in proving them equivalent to
homomorphism satisfaction problems, they can also be formulated as homomorphism
satisfaction problems. Therefore, the principle of homomorphism finding governs the
grouping processes along both conceptual problem solving dimensions. This will be

the topic of the next section.

Homomorphism in Token Grouping

As we introduced earlier, G‘c’ gives the grouping principle or the criterion about when
an observed event and an expected event signals a signiﬁcant similarity. Employing
the concept of homomorphism as a uniﬁed grouping principle across the entire GH,
we call G: the homomorphism criterion that describes a homomorphism validity test
for the grouping at v E V. There are three components deﬁned in G2: the perceptual
events involved, the homomorphism evaluation function, and the validity test. The
ﬁrst component identiﬁes two events between which the homomorphism is evaluated.
The second component deﬁnes a certain measure that quantitatively reﬂects the de—
gree of homomorphism or similarity. The third component is necessarily related to a

test that assesses the signiﬁcance of the evaluated homomorphism.

In our can:
curring in or 1
subspace Ci. 2
C13,. l o
.ere cetec.e

.

v
new! other

tacrmuumfnq

.. L ‘
.0 JE‘ used in r?

reﬁll lﬁ‘ STU?“
.[V' ‘
“if“ EientS art:

.-e taro is to

r». -
.‘.ti[)ﬁ.""' ‘ .
. , ..s .‘.

(U
QT‘W-w
-.. n. -‘
"Hi-<11 hi )Y
‘L Cf):

39

In our context, the homomorphism is evaluated between an observed event oc-
curring in or inferred from the input, represented by a set of tokens from grouping
subspace G3, and an expected event or a model instantiated from the observable ev-
idence detected from visual input. Let’s denote these two events by E: and ES,
meaning observed event and expected event at v, respectively. The speciﬁc type of
homomorphism to be identiﬁed is determined by the types of features of the events
to be used in evaluating the homomorphism. Since each token is described by a set of
property features and a set of relational features, three types of homomorphism may
be deﬁned. The ﬁrst type is structural homomorphism. The second is the homomor-
phism in property. The third is the combination of both. If only relational features of
the events are used to evaluate the homomorphism, pure structural similarity between
the two is to be identiﬁed which may correspond to some subisomorphism. If only
property features are used, the similarity to be identiﬁed among two sets of tokens
is with respect to certain properties, such as surface orientation, without concerning
how the involved tokens are spatially related. The third type is to identify some
structural homomorphism and in the meantime to demand certain properties hold on
the component tokens involved.

Since an observed event seldomly matches exactly with an expected event, homo-
morphism has to be evaluated in an inexact fashion[90]. An evaluation function is
needed to measure the degree of agreement between the observed and the expected.
Given Ef,’ and Eg, a homomorphism evaluation function, denoted by H, measures
the homomorphism between the two based on two parallel sets of features from the

events. Let S}; be a property selection function at v,

I

8;)? = UU’EL”{Sml)Sm27 ",Sval}v a Smi : [091]»
K

5‘ be a rtir

CL.

Ices? two tum
twain is defii;
ﬂ. . .
.. irieract WEI
be unized in If

i ‘-
iffiﬁf‘d as a fl";

:1
rfw '
g‘J‘ ; I
. ‘ q , ..
l» lfllpgv- .
[HM ‘
A._l “»$
\. l~ .
v dirk)“ IIi‘I’

El‘l. [D
f ‘ JD.“ v~
. He‘ll 1(JL‘Q
‘ \
tr
\l.J a : ‘
hgnPflpi

1“»41.
"iill‘t
A Pd
,. O _ .
3rd};

40
and Si? be a relation selection function at v,

I

5;)? : Uv’EL”{Snlasn2a ”a'SnN I}U 3 SM : [091]

v
R

These two functions select a set of features from level 1” 6 L” on which the grouping
domain is deﬁned. That is, L” deﬁnes all the levels that the grouping agent at v
will interact with, while 5}; and S}; deﬁne what features, properties or relations, will
be utilized in the grouping agent at v. Based on these two selection functions, H is

deﬁned as a function of the chosen features of the domain tokens involved. That is,
H(E§,Eé’) = H([(5i3 X PL”) U (5% X 31.1510,“ng PL”) U (5i: X RMlle),

where (SfD x PL”) is a set of selected property features of the involved tokens and
(5}; x R”) is a set of selected relations among involved tokens. [..]° is the set of all
chosen cues from the bottom-up ﬂow, specifying observed event E3. [..]e is the set
from the top-down ﬂow, specifying expected event E: . Information from bidirectional
ﬂows is integrated in homomorphism evaluation function H and the outcome of the
integration inﬂuences the grouping decisions at v.

Note, the selection functions here differ from the weighting function used by
Shapiro and Haralick in deﬁning inexact matching problems. The property selec-
tion function is to choose a set of features that characterize certain aspects of the
component tokens that are important in deciding whether they can be abstracted
into a higher level perceptual entity. The relation selection function is to choose spe-
ciﬁc interrelationships that have to be satisﬁed among the component tokens of the
potential new token. In effect, these two selection functions together determine an
attributed graph to be used in identifying a homomorphism. On the other hand, the

weighting function described in [90] is to weigh the importance of each node and each

arc in a gm“ g

The ‘lt’gree (
dlf The hon}
Let HP dt‘TIOIe

I»? expressed in

mean

' V
”f ' v
I” .-. 911.1?"

I} Y
‘14 l". . ‘ - '
‘* \Uht‘ﬁc 1

fun."-
"“lPOEEm 5.21

All lllligii'

A'L l \‘t’

enamel if 9}»-

5 1mm .
a fUC ‘, 1‘

41

are in a given graph. Therefore, they play different roles and both are important.

The degree of agreement between E: and E: is speciﬁed by the evaluation outcome

of H. The homomorphism exists if and only if the evaluation outcome is signiﬁcant.
Let H}; denote the validity threshold of H at v. Then the grouping criterion G: can
be expressed by a generic homomorphism test:
1:: H(E:, E:) E Hf), (2.22)
where ’_—‘_’ means “comply with” or “satisfy”. That is, an observed event E: is homo—
morphic, either in structure or in properties or in both, to an expected event E: if and
only if the homomorphism evaluation outcome H(E:, E:) complies with the validity
threshold H}. A new token T,” can be formed if and only if E: is homomorphic to
E: with respect to the given criterion G:. Once the homomorphism test is passed,
all the tokens involved in E: become the components of T,” and together form the
component set C,”.

An intuitive interpretation of deﬁnition (2.22) is: a new perceptual entity (T,”) is
extracted if the visual data (E: ) agrees with (C‘—: H};) what we are looking for (if E:
is from a focus of attention) or what we think we see (if E: arises from evidence in
data). The expected event E: is stored in and accessed from the knowledge hierar-
chy. Note, the validity threshold H)”; can be a set of thresholds with respect to the
chosen features. They are similar to the thresholds used in deﬁning inexact matching
problems and specify the tolerance to the inexactness between the observed data E:
and the expected E: . H}; is part of the knowledge associated with E: and can be
learned from training. In the case study presented in Chapter 4 and 5, we will show,
by examples, how this can be done.

The grouping criterion deﬁnition (2.7) is uniform across all grouping agents. At

any v E V, it is where the bottom-up (E:) and top-down (E:) information ﬂows meet

:er I, -

.. its lf'
,f

Hut

‘5 i

C7
1.)..-

it“ ,
J De (I.
ll

42

and get integrated. It speciﬁes simultaneously (1) the interaction model at v (L” in
deﬁning G:), (2) the sources of information that are propagated into GA(v) (G:, E:,
and Ht), (3) the speciﬁc cues to be used (5:) and 5%), and (4) how the cues are to be
integrated (H). Notice, while different grouping agents can deﬁne ﬂexible interaction

and integration schemes, they all use exactly the same speciﬁcation form as defined

in formulae(2.7),(2.8), and (2.22).

2.2.4 Grouping Agent

Having deﬁned representation, operation, operational principles, and interaction
model (the issues associated with every grouping agent), we now formally deﬁne
a grouping agent. A grouping agent is a computational unit that (1) takes a set
of inputs, (2) organizes its inputs via grouping according to some principles, and (3)
generates a set of outputs. Denote a grouping agent at v E V by GA(v). It is formally
deﬁned as:

GA(v) = (I:,G:,O:,), (2.23)

3, to be:

with its input I: and output 0

Iii : {GZvEgingla 0v : {Tvaisj’ngla

at

and T” to be the token set formed at v:

T” {Ti} if Ci” 6 G§,GZ(CE’)
0 otherwise
where i E [1,Nv] and N, is the size of T”. The input of GA(v) is speciﬁed by two

general sources of information, the bottom-up ﬂow (G:) and the top—down ﬂow (E:

and Hit). The output is speciﬁed by (1) T” that contributes to the bottom-up ﬂow

50111211 new mix“
I" [if any) and I
i333). the home
fordifierent r's.
grouping decisio
mung GA: 1‘ is
Every token

careptual diam
”Vial dimemt

l ..1
rate. 162815 Di 3

PIA“.- - '
,. 12:)”‘(1 to fern:

imzl

P t-- \v‘
If f‘.\dl‘..,)i

51*: f '
..ciure featurv‘

43

so that new tokens can be carried out of GA(v) and (2) the new expectation E: at
v' (if any) and the revised validity threshold Hf; (both will be posted in KH). In
(2.23), the homomorphism criterion G: determines the internal behavior of GA(v).
For different v’s, inherent heterogeneity exists in G:, but affecting only the internal
grouping decisions of GA(v). Therefore, it does not interfere with the interaction
among GA(v)’s (that is through I: and G:,).

Every token in T” is abstracted from either the tokens from lower levels along
conceptual dimension L or the corresponding tokens from multiple images along con-
ceptual dimension M. That is, a token in T” may be a composition of tokens from
lower levels of abstraction or may be a token that represents a reﬁned version, via
grouping, of a set of corresponding tokens collected along conceptual dimension M.
For example, a set of corresponding curves extracted at multiple resolutions can be
grouped to form a reﬁned curve at the same abstraction level [59, 80, 87, 88, 30, 29].
Another example is to group a set of point correspondences to extract motion and
structure features for a given pixel token.

Definition GA(v) = (1:, G:,O:,} is concise and adequate for describing a group-
ing agent. More importantly, it is uniform for any v, serving as a general form of
formal speciﬁcation for all GA(v), v E V. With this general form, even though differ-
ent gl‘Ouping agents can be individually responsible for different types of perceptual
entities, they can all be uniformly deﬁned as a grouping process. Therefore, a vision
problem solution can be described by a set of such speciﬁcations, each of which deﬁnes

one grouPing agent and the collection of which describes an interconnected hierarchy

of token grouping or GH.

2.2.5 Generic Token Grouping Algorithm

With the generally deﬁned token representation, grouping operation, and grouping

criteria, we now give a generic token grouping algorithm for any GA(v), v E V.

44

Generic Token Grouping Algorithm

GA(v)={G”,T”}
{ GSSPACE <—_—. G; ;

T” <= (0 ;
Nv = 0 ;
Loop {
{35,113 <= K_Retrieve(G:) ;
C,” <= O_Event(G") ;
F,” <= H(C.-”) ;
E: <= {G:,Ff} ;
H <_—. H_Evaluate(E:, Eg’) ;
if (H 9—4 HE)
D? <= Deform(C.paFiva€) ;
T,” <= {05,3803} :

E: <: K-Instantiate(T,-”) ;

T” <2 T”UT,-”;
NU <= Nu +1;
endif.

} while(NOT Terminate) ;

} End of GA(v).

- Token Grouping Agent at v

- Form grouping domain

- Initialize output token set

- Retreive knowledge

- Choose candidate tokens

- Compute features

- Form observed event

— Homomorphism evalaution
- Homomorphism validation
- Collect deformation

- Form a new token

- Generate hypothesis

- Add the new token

In this algorithm, “GSSPACE” denotes the grouping subspace determined by G:;

T” is a set of N, newly formed tokens at v; “K_Retrieve” is a knowledge retrieval

module that accesses an expected event and its associated validity condition from

l
the time
deternizix
ation lien
1
15a: com
E, and c

l . ‘
v-LM-r. _
*Lli.‘ ll Kid

91,;
‘I‘il‘ v. '
‘ ”firm
“i. ‘
if]: T” Sir.
L. .l

45

the knowledge hierarchy; “O_Event” is a grouping candidate selection module that
determines the observed event E:; “H_Evaluate” conducts the homomorphism evalu-
ation between E: and E:; II is a feature extraction module; “Deform” is the module
that computes the deviation of the newly formed token from the expected event
E: and collects it as deformation information about the tokens at v; and, ﬁnally,
“KJnstantiate” is a module that dynamically instantiates a new expected event E:',
which may be evidenced by the new token, at level v’ of K H . The condition “Termi-
nate” is satisﬁed when a consistent interpretation of the scene is reached at the top
level and such a decision is cascaded in the form of a termination signal to all the
lower level grouping agents. As we can see, before each grouping agent is terminated,
it keeps generating tokens some of which may represent alternative interpretations
for the perceptual entities at that level. Since a grouping agent does not stop until it
is suppressed by high level decisions, it will maintain all the grouping options open
which facilitates possible backtracking.

This generic algorithm is structured. It decomposes the solution to a visual task at
any level into six subsolutions, each of which has a distinct and well defined functional
role. Therefore, the algorithm establishes, within the context of token grouping, the
principle of decomposing the solution for a speciﬁc visual task at any level: decom—
posing by functional roles (six roles). Due to the uniformity among grouping agents,
this principle is applicable to all levels. By following this decomposition principle,
any vision problem solution can be posed as a grouping problem.

The algorithm describes how a grouping agent is assembled from distinct func-
tional modules. For any v, the algorithm shows exactly what functional modules
should be working where in a grouping agent. The identical algorithmic structure for
all grouping agents indicates that it is feasible that (1) the creation of grouping hier-
archy GH can be automated and (2) all the interaction channels can be established

automatically based on speciﬁed G:,‘v’v E V. In this way, computer vision systems

w m r. a ..
e . - . . ..m

«A . l .C . HT. rt.

2. H y . . I

‘l.

n,

'Y‘r

'!

,l
u

u

a

. ua

.r.
.P.

. . . .L

..U .5
.n ,. lam .

46

can be effectively integrated or assembled in a globally coherent and structured way.

Hierarchical Token Grouping: Vision Problem Solving

Having all the notations deﬁned, let’s now go back to Figure 2.1 to reexamine grouping
hierarchy GH. Each grouping agent is responsible for a speciﬁc visual task. It takes
input from both bottom-up (G:) and top-down (E:,H};) information flows, where
the bottom-up ﬂow provides a set of so far extracted perceptual entities and the top-
down ﬂow provides expectations in the form of instantiated expected event with its
associated criterion for such an observed event to be identiﬁed, and integrates them
in homomorphism evaluation function “H_Evaluate”. The homomorphism evaluation
governs the internal behavior of a grouping agent that ultimately determines how the
data in grouping domain should be organized. If a new entity (T,- E {T}) is formed
via grouping, it is made available to other grouping agents. A newly formed token
may provide evidence for another perceptual entity. In this case, a new expectation
(E:') can be instantiated in the knowledge hierarchy KH. The deviation of the new
entity from the corresponding model (E: ) stored in KH (if any) can be collected to
characterize the deformation (D). It can be seen from Figure 2.1 that all the grouping

agents are identically structured.

2.3 Properties

The goal of the proposed framework is not to provide solutions to vision problems but
rather it provides a general architecture under which vision problem solutions can be
realized in a principled way. The proposed architecture possesses certain properties
that makes the process of realizing a machine vision system more standard, more

efﬁcient, more ﬂexible, and more manageable.

2.3.1 A C

llelnerarchicz
.'. ..,. i- .-
.uinauune\is
n properties (
THIJETLlIlOIl lllt‘
i'r ‘

in next (liaz.
”fired as a ,1

97‘ .« ~~i
pr»mEaUd

In

”(‘N‘ " ~9l
limpet \i‘.

irrli ,
whiwune I

if ~11 .
i a, gm'u‘plm

aft—1""; '
' ”“495 1F c
‘ s"

.rg. -
tqllzed Q
‘. \ i d
"2L-
'tL (.
.t

“Jeni. -I

10;:
'l'lfteS§ ’
a 31,»?

47

2.3.1 A General Constructive Paradigm

The hierarchical architecture lends itself to a paradigm of constructive problem solving
for machine vision. It is general because (1) it is established based on the abstraction
of properties of vision problem solving, (2) it is a generalization of most existing
recognition methodologies and recognition paradigms (this will be demonstrated in
the next Chapter), and (3) GH can have an arbitrary number of levels. It can be
viewed as a uniﬁed architecture for the paradigms of both conventional symbolic

processing and artiﬁcial neural networks.

2.3.2 A Homogeneous Architecture

The perspective of grouping establishes the basis for a homogeneous problem solving
architecture. It is homogeneous in representation, in operation, and in the structures
of all grouping agents. Such a homogeneity is achieved by exploiting the common
properties in solving vision problems at different levels of abstraction and fully uti-
lizing them. The homogeneity leads to overall vision systems that are structured,
more manageable, and easier to build. The inherent heterogeneity is preserved in
such a way that it only affects the internal behavior of individual modules but does

not hinder the interconnection or integration of overall systems.

2.3.3 Object-Oriented

The modern concept of object-oriented software development means that software is
organized as a collection of discrete objects that incorporate both data structure and
behavior. Therefore, in an object-oriented system, the data structure hierarchy is
identical to the operation hierarchy. Object-oriented development is a conceptual
process that is independent of a speciﬁc implementation. That is, it is fundamentally

a way of thinking but not a programming technique [83]. Based on the abstraction

or genrraliuzlion of
token grouping pres

systems. Each taker

 

but not in a pll}‘8l<‘h
panning operat inn .
lnohjectorient ,.

neufels:
0 Object mmie l r
0 D‘ynnrrirr him!
0 functional m:
training It

1" arch

in" - ‘ '
995 the object mu

P[Warp . .
‘ -Ull§ dillOIﬂ

S“? .
.'.,EYD. ‘he Clelff’ﬂ ;(

are l0 1'1 ‘
.-e transform

”the <-
.\Q '
,.lem_ 8,1,

851:;
Ling beneﬁt; I,

lt’

l.
3.4 System;

n .
in, Urpf_
A“ Ur 07‘ . .
a 5.01;},
:‘ Ell D[ r, a
5 ””héﬂ‘ior n

48

or generalization of the essence of vision problem solving, the theme of hierarchical
token grouping presents a methodology of object—oriented modeling of machine vision
systems. Each token grouping agent is an object (in an object-oriented modeling sense
but not in a physical object sense) with its token representation as data structure and
grouping operation as behavior.

In object-oriented modeling methodology, a system is described by three kinds of

models:
0 Object model describes the objects in the system and their static relationships,
0 Dynamic model describes the interactions among objects in the system, and
0 Functional model describes the data transformation within the system.

Examining the architecture of hierarchical token grouping, the grouping hierarchy GH
gives the object model of the system; the collection of L”, Vv E V, that describes the
interactions among grouping agents (objects here), provides the dynamic model of the
system; the collection of G:, v E V, each of which deﬁnes how a set of domain tokens
are to be transformed into another set of output tokens, describes the functional model
of the system. Such an object-oriented modeling methodology for vision problem

solving beneﬁts the development of computer vision systems in practice.

2.3.4 Systematic Behavior

The uniform grouping operation results in systematic behavior. Such a systematic

grouping behavior naturally imposes a hierarchical structure on object representation.

fl

2.3.5 Consis

liken represenrar
mag grouping an

ten of perception. a

 

I
an: possesses acleqr:
l .. l. '
reaes of abstract rur:
contains most of I'm

138:? 2011.

2.3.6 Cohesi

1
fr" ‘
lr .-e proposed art

5?,_,_ \-
..i.LG(t‘C m El 0,)

rl"‘

5453'"? 4' .
.G‘Ga‘is It: emu,"

p»:

iliich can in

-i
Ur Fowl»: '
. ”“hfllnﬂ (l-
: ‘dt .,

d

tucrf‘r ~
d\UUH among

Since inteemri

acti- -

iffy!” -

a N” lhe lllit’ra
e--'
“a,

form of form-
(

t“

a},- r
A. L r
“m comm.

”iﬁiﬁ’rqy. -
3 Wit};
. ”n

OPDC

|l‘ilri. ,
"5 prrjm

Lt

ill

”in s

Err
”In” 09“,. r
» ‘1 ‘ I.

49

2.3.5 Consistent Interaction Interface

Token representation is consistent at all levels. It provides a homogeneous interface
among grouping agents that facilitates the interactions at all scales. Within the con-
text of perception, a token is a generalized scheme for representing perceptual entities
and possesses adequate descriptive power for characterizing perceptual entities at all
levels of abstraction. We will demonstrate in Chapter 3 that token representation
contains most of the representation schemes used in the literature, hence a general-

ization.

2.3.6 Cohesive Integration Environment

In the proposed architecture, integration of modules, cues, and knowledge are all fa-
cilitated in a cohesive environment. A grouping agent is an encapsulated entity that
separates its external aspects, which are accessible by others, from its internal prop
erties, which can be hidden from others. This encapsulation is the natural outcome
of combining data structure and behavior within one single entity and makes the
interaction among grouping agents more easily deﬁned.

Since integration is achieved through interaction, a cleaner encapsulated inter-
action environment will support integration more effectively. As indicated earlier,
because the interaction models for different grouping agents are all deﬁned in a gen-
eral form of formal speciﬁcations, interaction channels can be established automati-
cally from concise speciﬁcations. Therefore, the integration can be facilitated more

efficiently within a globally cohesive environment.

2.3.7 Opportunistic Problem Solving

Vision problem solving is opportunistic by nature. The hierarchical token grouping

architecture offers an opportunistic problem solving paradigm. Grouping agents start

to function Wh
are described i
don scheme ca
Once the data
is a hierarchy

to a data ﬂair

2.3.8 Di

lhe proposer;
’1“ _ y . v
plulf? rne rrihe'

One eras-191,.

1a..
A, rh€$\\

”T13 if; L‘ .,

. dilllf]
‘lf v-i
K . \

UH Prr
”~90."

50

to function whenever appropriate opportunities arise. In general, such opportunities
are described by the readiness of domain tokens. Such a data dependent synchroniza-
tion scheme can be extracted from the interaction models of all the grouping agents.
Once the data dependency is guaranteed, an entire machine vision system organized
as a hierarchy of grouping agents is synchronized on flow of tokens, a scheme similar

to a data flow architecture.

2.3.8 Distributed and Parallelism

The proposed architecture can be implemented in a distributed environment to ex-
plore the inherent parallelism. Often, some grouping agents can work independently.
One example is the grouping agents that are responsible for connected components
and straight lines. They both are grouping results from pixel tokens but they do not
rely on each other. Such independent grouping agents can be sent to distributed pro-
cessors so that inherent parallelism can be exploited. Given a set of grouping agents,
the classiﬁcation of the independent sets of grouping agents can be derived from the
data dependency relationships extracted based on the speciﬁcations for the grouping
agents. As far as the data dependency can be ensured through synchronization, even

dependent grouping agents can work in a distributed computing environment.

2.4 Practical Potentials

2.4.1 Formal Method To Build Complex Vision Systems

The framework of Hierarchical Token Grouping presents a disciplined and engineering
way of building complex machine vision systems. It is developed based on the nature
of vision problem solving. While the behavior of different modules may not be an

important issue in studying individual modules, it becomes an crucial matter when

integrating indie
at The propu-
tle behavior of d

to: integrating rt

rcenections earn:

it“ 9 t- ‘
lax/ll; UlllDlll (if

r... t -

.0 a specifica
.rt

1.):

es user speci.‘

' I v.

1%.

.uUill‘lh -.H
my. all ill?

9

RH ' ‘
wfh a lilEIllV a
99131 0f mach "

t .
.lfil '
JWI-Orrentp
1i
P'usneq :
’t r. -
eurdpg‘lll t
c
.. V
‘ PT'Z'Ule
‘ In R 1
..r) ‘
x.
.159pr
“(JIQ‘ ml?
in

" ~(JI‘Q:
M r
9‘- .
swim“
.‘o '
4 ‘l
3 -f a n
\
a:rl \-
< CO“
Are r
‘[)(J
is t? -
{9,11% “~
”la O’v-r
,«U‘
t‘r ‘
R1 or: y

51

integrating individual modules to produce a larger system that achieves a bigger
task. The proposed framework introduces a formalism that captures the essence of
the behavior of different modules and utilizes it in developing a homogeneous method

for integrating machine vision systems.

2.4.2 Automation and Quick Prototyping

U
‘U.

Since a grouping agent at it can be completely deﬁned by (1:, G:, O ,} and the inter-
connections among a set of grouping agents can be extracted from their corresponding
input/output deﬁnitions, the structure of an entire system can be described by such
formal speciﬁcations. Therefore, it is feasible to develop an environment that com-
piles user speciﬁcations and automatically generates the desired system architecture,
including all the speciﬁed grouping agents and the interaction channels among them.

Such a highly automated system development environment will beneﬁt the develop-

ment of machine vision systems, including quick prototyping, debugging, and testing.

2.4.3 Software Reusability and Quick Prototyping

Object-oriented development methodologies promote information sharing and offer
the prospect of reusing designs and codes. Because of the emphasis on abstraction
and encapsulation (concentrate on object structure instead of on procedural structure
of problem solving), common structures can be shared, even by different systems.
Therefore, ultimately, a system can be built by assembling reusable components from
libraries. Consider solving vision problem under the framework of hierarchical token
grouping, if a generic set of perceptual entities is identiﬁed (such as the set of geons)
and a corresponding set of grouping agents that extract these generic perceptual
entities via grouping is developed, it is conceivable that many computer vision tasks

can be solved by assembling different grouping agents together to form vision systems

that are suitable t1

2,5 Theor

2.5.1 ProceS

 

 

1""

oer; though thr- pr
icieat dereloprrienf
tision problem solt

:zartretical poi“

1'
Hit ll.

ed as an al

, .
,,. , .
(3%)de

or the element: in

is appropriately in‘

u. algebraic 5N er'

:(‘Y‘ru‘it- l '
..mnf. etemerit "‘

n. " 4

\' .‘h ‘ . I
« ..tEII opera: EU!) 1
r, .

.2 T ‘

.'...ce the process (
{tau 3)“?

. em at an

if»: ‘

.ec. product of

h. "Y

..trntar modelino r
‘3

lv
E‘
dug”-
‘—.

1.
”45'.“

“eff GVent dvv-

52

that are suitable for particular application needs.

2.5 Theoretical Implications

2.5.1 Process Modeling

Even though the proposed architecture possesses practical potentials in terms of ef-
ﬁcient development of vision systems, it is essentially an architecture for modeling
vision problem solving processes rather than an implementation by itself. From a
theoretical point of view, a grouping agent, say GA(v), can be mathematically mod-
eled as an algebraic system with domain G: and the grouping operation performed
on the elements in the domain. Grouping operations are associative. If a null token
is appropriately introduced, this algebraic system becomes a monoid which is a class
0f algebraic systems deﬁned at an axiomatic level with associative operations and an
id'iintity element[28]. Since the direct product of the monoids defined using grouping
as their operation is still a monoid, the entire grouping hierarchy GH is also a monoid,
hence the process of vision problem solving can now be modeled (abstract) as an alge-
braic system at an axiomatic level, a monoid. This agrees with what theory dictates:
direct product of algebraic subsystems allows us to create larger systems. Such a
formal modeling method for vision problem solving allows the utilization of the rich
literature on algebraic theories, especially its applications in formalizing behaviors of

discrete event dynamic systems.

2.5.2 Dynamic System Behavior Description

A machine vision system may be distributed. Different computational units may
OPEI‘ate both synchronously and asynchronously, depending on the dynamics of the

SyStem. The state transition of a system can be opportunistic, depending on the

53

input data. Describing these different aspects of the behavior of a dynamic system
will be of great beneﬁt.

There are various formal methods in the ﬁelds of concurrent systems, software
engineering, and parallel computing, that can be used to describe systems consist-
ing of computing agents that communicate through channels. These formal methods
provide mathematical semantics for process, concurrency, nondeterminism, and com-
munication. For example, Hoare’s T-CSP (Communicating Sequential Processes),
Petri nets, and I/O automata are good candidates to provide behavior descriptions
for complex systems. Based on speciﬁcations for grouping agents, different types
of information can be compiled, such as interaction models, data dependencies, and
state transition conditions, and used to establish descriptions about different aspects
0f the system. For example, Petri nets can be built from a set of grouping agent spec-
iﬁcations. At run time, dynamic system behavior can be visualized by allowing the
tIokens ﬂow within the Petri nets built. Data dependency relations can be established
based on the input / output speciﬁcations. Control points can then be set accordingly
to ensure necessary synchronizations. Parallelism can also be maximized once the
data, dependency is known. All these can be compiled automatically from the formal

SPeCiﬁcation of a vision system.

2 .6 Summary

In this Chapter, using the central theme of grouping, the homogeneous architecture of
Hierarchical Token Grouping is formally deﬁned. There are three major components
in this paradigm: the Grouping Hierarchy, the Knowledge hierarchy, and the Imaging
Madel. Each of these three components has a distinct role in vision problem solving
bUt they also interact. The knowledge hierarchy has essentially the same organization

as the grouping hierarchy and both interact at various levels of abstraction, sometimes

54

through the imaging model.

The Grouping Hierarchy consists of a set of grouping agents, each of which is
an encapsulated object that combines both data structure and behavior, extracting
a certain type of perceptual entity via grouping. From the perspective of grouping,
heterogeneous visual modules can now be viewed as homogeneous operational units. A
token representation is designed to represent the perceptual entities at different levels
of abstraction. Such a generalized representation scheme offers a consistent interface,
facilitating the interactions among different modules. The concept of homomorphism
is extended to be used as a uniﬁed grouping principle across the entire Grouping
Hierarchy. It identiﬁes two general sources of information flow so that a general
Syntax of formal speciﬁcation for deﬁning ﬂexible grouping criteria is established.
This speciﬁcation provides (1) interaction models among grouping agents, (2) sources
0f information to be integrated, and (3) the knowledge to be used.

The proposed architecture possesses a number of desirable properties with respect
to efﬁcient system design, quick prototyping, testing, and system component sharing.
Both practical and theoretical potentials that the proposed architecture may offer are
speculated.

The proposed architecture is a generalization of many existing computer vision
techniques that are designed to solve different problems using seemingly heterogeneous
methodologies. In Chapter 3, we will survey the literature from the perspective of

token grouping and demonstrate the generality of the proposed architecture.

CHAPTER 3

Background

In this chapter, we survey four important aspects in vision problem solving: rep-
resentation, recognition methodologies, integration, and computational paradigms.
Representation is concerned with how a computer denotes or symbolizes objects and
relationships. Recognition methodologies discuss how individual visual tasks can be
achieved using a computer. Integration deals with the issues of how to combine dif-
ferent kinds of information and methodologies in solving complex vision problems.
Computational paradigms are related to general theories about what needs to be com-
puted and when during the process of vision problem solving.

Instead of merely surveying the literature related to these areas, we will also put
the literature into the perspective of hierarchical token grouping. The goal is to show
that the proposed architecture of hierarchical token grouping provides an adequate
paradigm. Speciﬁcally, we will argue that (1) the token representation possesses ade-
quate descriptive power for all the representation schemes surveyed in this chapter, (2)
token grouping is an abstraction of most recognition methodologies, (3) integration
at different scales is coherently addressed in the generic grouping criteria proposed in
this thesis, and (4) hierarchical token grouping offers a concrete and systematic mech-
anism under which various theoretical computational paradigms for vision problem

solving can be realized in a globally homogeneous way. The literature survey related

55

 

th
.4“

x
..,4
..

 

56

to the area of tubular object recognition is treated in Chapter 4.

3.1 Representations

Representation is one of the most important aspects in vision. It deals with issues
of how a perceptual entity is abstracted, stored, and utilized during the process of
perception. To organize the presentation, we classify different representation schemes

into several categories and examine them from the perspective of token representation.

3.1.1 Taxonomy

In the literature, various representation schemes have been proposed for representing
heterogeneous types of perceptual entities. They essentially fall into ﬁve categories:
point representation (for 20 pixels and 3D voxels), region representation (for 2D areas
and 3D volumes), boundary representation (for 2D curves and 3D surfaces), relational
representation, and hierarchical representation. Table 3.1 lists these ﬁve categories
and Provides a few examples for each category. Some of the representations may
belong to more than one category. For example, the Generalized Cylinders (GC)
can be either a region (3D volumetric) or a boundary (3D surface) representational
scheme, depending on how it is used. The Constructive Geometric Structure (CSG)
can be classiﬁed as either a relational or a hierarchical representation because it
describes both the relationships and the hierarchical organizations of the components
Of an Object to be represented. We now examine individually each of the above ﬁve

categOri es.

3.1.2 Point Representations

conventionally, a digital image function f0 is described as a discrete representation

Of a SCene being imaged. It is deﬁned in a sample space, denoted by S, of either two

57

Table 3.1. Taxonomy of representation schemes and examples.

 

 

 

 

 

 

 

 

 

Category Examples
Point Intrinsic images
representation
Region-based Connected component, Quad/ Oct trees,
representation Axial representation, GC, Superquadrics
Boundary-based Freeman chain codes, B-splines, GC,
representation superquadrics, Fourier descriptor, EGI
Relational Aspect graphs, CSG, Cell decomposition
representation Quad tree, Oct tree
Hierarchical CSG, Cell decomposition, Quad tree,
representation Oct tree

 

or three dimensions. In this sample space, each quantized cell x, either a pixel at
x = (:c,y) E S or a voxel at x = (.r, y, z) E S, has some digitized sensor measurement
which can be intensity, depth, or speed[4]. Such an image f0 can be processed to
derive other useful features for each cell such as gradient magnitude, curvature, surface
normal, or velocity of optical flow. Thus, a set of equal sized images, called intrinsic
images[6], can be formed each of which represents one feature across all the discrete

cells where f0 is deﬁned. Denote this set of intrinsic images by

f= {f03f19f23‘ﬁfd}, (3.1)

where d is the number of features computed for every cell and each feature image
f;,1 S i S d, is computed based on the original image function f0. Using this deﬁni-
tion, the original image function f0 is a special case with d 2 0. Such a representation
is depicted in Figure 3.1(a).

Instead of considering f as a collection of d + 1 feature images each of which

58

£(x) - (£000, £1(x), ...., £d(x)}

 

Figure 3.1. Two viewpoints of an image representation. (a) Intrinsic images, (b)
Point. representation.

describes one aspect of the 3D scene being imaged, it can be viewed as a collection
0f N cell representations each of which characterizes a discrete cell using (1 + 1 fea-
tures. Here, N is the number of discrete cells in the sample space. This viewpoint
is Visualized in Figure 3.1(b). Each cell x in Figure 3.1(b) is represented by a point

representation

f(X) = {f0(x)if1(x)a "’fd(x)}a X E S (32)

Organizing information this way reﬂects the perspective of treating each cell as an
independently manipulatable entity with its own characteristics. This exercises the
CotlCept of object encapsulation of object-oriented design, starting from the smallest
data unit.

Although organized differently, information is preserved. The features captured
in intrinsic images are all obviously preserved in the point representation. What is

more implicit is the spatial relationship among individual cells. In an intrinsic image

59

representation (3.1), spatial relationships are embedded in the coordinates of cells.
For example, the spatial adjacency relationship n-connectivity is a binary relation

over {x} E S which can be expressed by
anx' iff IX—X'ISI,

Where D4” denotes an n-connectivity relation and l is a distance measure whose value
depends on the value of n. The expression states that a cell x is n-connected to cell
X’ if and only if the Euclidean distance between the two is less than or equal to value
1. For example, if f is deﬁned in a 2D space, when n = 4, l = 1; when n = 8, l = \/2.
In a 3D space, when n = 6, I = 1; when n = 26, l = x/3. Another example of a
Spatial relationship is “left”. Suppose the width of S is deﬁned along the X axis. In

this case, relation “left” can be inferred using
x W x' iff .T < (17’,

Where x = [:r,y,z]T, x' =2 [:r’,y’,z’]T, and ML! denotes the “left” relationship. Since
the coordinates of cells from representation (3.1) to representation (3.2) are not
Changed, the spatial relationships among individual cells can be inferred in the same
Way from (3.2). Hence, spatial relationships are also preserved in point representa-
tion_ Due to the preservation of information, the two representations ((3.1) and (3.2))
are equivalent.

The implicit representation for spatial relationships can be made explicit. F ig-
Ure 3.2 gives one example in which (a) is a set of 2D cells in a 4 x 4 array (the
ConVentional array representation of a 2D image); (b) is the explicit representation

for relationship D44 among the cells in (a); (c) is that for relationship N8; and (d) is

that for ML; based on only two rows of (a). The process of making a relation, say N",

60

explicit is to assign a set of indices to every cell x, representing relation N” between
x and other indexed cells. For example, to make N4 explicit, cell (2,2) is assigned

four indices pointing to cells (1,2), (2,3), (3,2), and (2,1).

 

(0.0) (0.1) (0.2) (0,3)

 

(1.0) (1.1) (1.2) (1.3)

 

(2.0) (2.1) (2.2) (2.3)

 

(3.0) (3.1) (3.2) (3.3)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0.1 (0.2 0,3)
W 1.1 a (1.3)
(2.0 2.1‘ (2;) (2’3)
( (3.1 3,; 3'3)

 

(C)

Figure 3.2. Implicit and explicit spatial relationships among cells. (a) Implicit spatial
relationship representation implemented as an array. (b) Explicit representation for
a 4-connectivity relation. (c) Explicit representation for an 8-connectivity relation.
(d) Explicit representation for spatial relationship “left”.

It can be easily shown that a point representation is contained in token represen—

tation. An image f0 can be represented as a set of N tokens, each of which is

11cc” : {Cicell’EcelDchell}, 1 S 2 S N,

61

where

Cicell = xi, Ficell : Picell U R56”,

and its intrinsic and relational features are

Pfe“ : {xi’f0(xi),..,fd(xi)}a

Ric” 2 {NR9 {vaajllv Mne AfRal S] g 9(Mnla

where N3 is a set of relation names that need to be made explicit for future use,
x,- points to another token to which token T?" is related under N", a, is a set of
attributes that describes the relation between Tfeu and T196” . The number of other
tokens, g(l><l"), that Tics“ is related to is a function of N". The organization of a

smallest cell at this level is itself which can be seen from the singleton deﬁnition of

component set Cfe”.

3.1.3 Region-Based Representations

Region-based representation refers to the scheme in which the geometric space occu-
pied by underlying perceptual entities is explicitly represented [4, 82, 10, 25, 51, 95].
In a two-dimensional space, the represented region is an area; while in a three—
dimensional space, the represented region is a volume.

Commonly used region-based representation schemes can be in three forms. The
ﬁrst takes the form of occupancy which describes a region by explicitly marking the
space the region occupies. Examples are connected component, spatial occupancy
array, and Quad tree [4, 84]. The second is a parametric form which, based on a small
set of parameters, describes the region to be represented. Examples of this form can
be superquadrics or spherical harmonics [72, 4]. The third form is to represent a region

by marking some subregion that carries the information about the entire region to be

62

represented [13, 81, 14]. For example, a region can be specified by some symmetric
axis, along which every point may carry the information about the space occupied
along the direction perpendicular to the axis at that point. The axial representation
belongs to this form. We will show that all three forms of region-based representation
can be contained in token representation.

The regions to be represented can be characterized by some features such as the
size (area or volume in 2D or 3D case), the centroid, the average intensity, the shape,
or the aspect ratio of the region. Various kinds of relationships between a region and
other perceptual entities (not necessarily regions) often need to be explicitly repre-
sented. For example, spatial relationships among a set of regions can be important
in the paradigm of recognition by components. It is also often desirable to make an
explicit connection between a region and its boundary (another type of perceptual
entity). Based on axial representations of a. set of regions, their spatial relationships
such as branch and cross-over relations may be analyzed. All the features mentioned
above can be easily embedded in a token representation.

We now demonstrate all the above region-based representation schemes are con-

tained in a token representation.

Occupancy Representation

This type of region representation explicitly marks all the cells that a region occupies.
This marking can be done by either enumerating all the cells once such as in connected
components or marking them hierarchically such as in Quad trees. In either case, a
region is composed of a set of components, cells or subregions, and is described by a
set of features. Denote a region by rg. Using the notion of token, we can represent a
region by

T?" = {059,163, Dig}-

63

For a connected component, its component set is a set of cell tokens that are all
related by an n-connectivity relationship where n is 4 or 8 in the 2D case and 6 or 26
in the 30 case. For a Quad tree, a hierarchical token representation is appropriate
with each C,” down the tree containing four indices pointing to four subregions until
the subregions consist of uniform labels[84].

To characterize a region, its feature set is

szPWURﬂ

I l

where the property set Pig may consists of features such as the size, centroid, or even
the texture measurement of the region, and the relational feature set Rf” may contain
relation “left” which relates T,” to all, say other regions, that are located on the left

of T,” .

Parametric Representation

A region can be speciﬁed by a parameterized equation. For example, a sphere is
deﬁned by four parameters, one for size and three for its position. Using such a
parameterized deﬁnition, a region consists of all the points that satisfy the speciﬁed
sphere equation. Denote a parameterized equation by ’R(p1,p2, ..,pm) _<_ 0 with m
parameters. The region described by R(p1, p2, .., pm) g 0 can be expressed as a token

grouped from a set of cell tokens that satisfy:
Cirg : {Tfeu I R( chell; P131729 '°3pm) S 0}°

Various features can be computed and stored in the feature set of a token to charac-

terize the region.

64

Axial Representation

An axis, denoted by A, is a general curve. In an axial representation for regions, every
point on this curve carries some information about the cross section which forms a
certain angle with the tangent of A at that point. The information carried is used
to specify the region occupied by the cross section and the entire region represented
by A is then the union of all such cross section regions. There are various criteria to
derive axis A. Examples are various kinds of symmetries. Depending on the type of
symmetry, the criterion can be distance based or angle based [39, 18, 37].
The region described by an axial representation can be deﬁned as a token with

the component set to be:

0:9 : {Tjd’ I Gf(CI9)},

where G: is the criterion for detecting the axis. For example, for Right Straight
Homogeneous Generalized Cylinders (RSHGC) with circular cross sections, their axes
can be deﬁned by the set of points that are connected and have equal distance to the
boundary of the cross section that is perpendicular to the tangents at axis points.
Similarly, various features that describe a region indicated by an axis can be easily

contained in a token representation.

3.1.4 Boundary-Based Representations

Boundary-based representation is a scheme in which the geometric space occupied by
the boundary of objects is explicitly represented [4, 10, 25, 95]. A Boundary is either
a SilhOUette in a 2D space or a surface in a 3D space. Table 3.1 gives some example
representation schemes of this category [78, 5, 64, 36, 19, 11, 102, 103].

A b011ndary entity is usually a set of connected points. Its appearance or shape can
be Characterized in three ways. The simplest way is to explicitly list all the discrete

points that comprise the boundary. The second way of representing a boundary entity

is through a parameterized form. Examples can be various types of parameterized
Curves and surfaces such as bicubic surfaces or superquadrics. The third way of
representing a boundary entity is through the use of some salient features. Examples
can be Fourier Descriptors for curves, fractal dimensions for curves or surfaces, or
Extended Gaussian Images deﬁned in a surface normal space called the Gaussian
sphere. All of the above three can also be combined to represent a boundary entity.

A boundary entity can be characterized by various properties such as length, area,
or curvatures. Relationships among boundary entities can be explicitly represented
that describe how different entities interact with each other, spatially or temporally.
Examples of spatial relationships are collinear, symmetric, parallel, or adjacency re-
lationships among curves or surface patches.

Similar to regions, each boundary entity can be represented by a token. All the
three ways of characterizing a boundary entity can be expressed in a token repre-
sentation. The ﬁrst form of a boundary representation can easily be shown being
contained in a token representation. The component set of such a boundary token
includes all the points that comprise the boundary.

Consider a boundary entity that is characterized by a function [3 with m pa-
rameters (p1,p2, ..,pm). A set C of edge points that comprise the boundary satisﬁes
8(0; P1 , ..,pm) 2 0. Then the token representation for such a boundary entity has
component tokens pointing to all the elements of C and has (p1, p2, ..,pm) included in
its Property feature set. Depending on the form of B, such a boundary can be either

a curve or a surface.
For the boundary entities that are speciﬁed by a set of features, token represen-
tation iS clearly appropriate. For example, assume a curve is described by a starting
point and a set of coefﬁcients from a Fourier series. Then all the points on this curve

can be included in the component set and the set of coefﬁcients can be included as

66

the property features of the token. One less obvious example is an Extended Gaus-
sian Image. A surface speciﬁed by an EGI can be expressed as a token in which all
the surface points are explicitly included as component tokens and all the points on
the unit Gaussian sphere, that are intersected by the translated unit normals (at the
center of the sphere) of the underlying surface are recorded as property features of

the token.

3.1.5 Relational Representations

Relational representation schemes are designed for expressing relationships among
perceptual entities [4, 10, 25, 95] . In vision problem solving, analyzing and under-
standing the relationships among different perceptual entities can be important. The
Graph is often used as a tool for this category of representations. For example, an
aspect graph describes the transition among distinct object appearances determined
by different view points [38, 92, 16, 24, 74]. Constructive Solid Geometry (CSG) rep-
resents explicitly the geometric (structural) relationships among different 3D parts of
an Object[104]. The Quad-tree and Oct-tree specify the occupancy in space by relat-
ing adj acent subregions in a tree structure[84, 50]. Some decomposition schemes such
as tetrahedral cell decomposition also describe spatial adjacency relationships among
tetrahedral elements [10]. Other implicit relational representations can also be found
in the literature[68, 91].

Dile to the design of the relational feature set R? in the token representation, it
pOSSeSSes the power to describe a set of relations among tokens. Suppose a relational
representation describes a spatial relationship that connects a set of entities into a
gr aph. Denote this graph by [G; R] where G is a set of nodes, representing the entities
inVoiVed, and R is a relation, represented by a set of arcs that connect the nodes in
0” Graph [G, R] can be represented as a token in the following way. Let the token

representation for an arbitrary element in G be T," whose relational feature set R?

67

includes (R,{(i,~,v',a,))}), Then, [G, R] is implicitly specified by {R,{(i,~,v’,a,-J)})
where

G = UT,"EG{Ti:,}?

with UTiveg{T,-‘J’I} the set of all tokens that are interrelated under R. Because the
spatial relation R here is arbitrary, the token representation is capable of expressing
any relationship.

A structural relationship can be adequately described in a token through the
component set. Tokens, instead of being isolated data structures, form a network or
a graph, via both relational features and component sets, in which nodes are tokens
and arcs are relationships. That is, relational representations can be contained in a

token representation.

3.1.6 Hierarchical Representations

A representation scheme in which no individual level of perceptual entities forms a
complete representation, but rather an entire hierarchy of abstractions is used to
form a full description is called pyramid representation or hierarchical representation
[22, 94, 33].

Some of the representation schemes discussed earlier can also be considered to
be variations of pyramid representation. For example, a 2D curve can be formed
hierarchically from Other curves which are formed by lines that again are formed by
pixels. A region can be viewed as constructed hierarchically from smaller regions down
to the most primitive regions — pixels or voxels. The Quad—tree, Oct-tree, and CSG
are also eXamples of hierarchical representations. The pyramid representation scheme
allows a hierarchical organization or aggregations of either the same or different types

0f representation schemes.

68

Token representation is naturally hierarchical. In a paradigm of constructive prob-
lem solving, a hierarchical representation describes the organization of the perceptual
entity being represented. Token representation differs from the conventional hierar-
chical representation in the following aspects. Firstly, the design of the component
set makes the structural relationship among different levels of abstraction explicit.
Secondly, tokens at different levels are uniformly structured so that while they are
capable of representing heterogeneous types of perceptual entities they offer a homo-
geneous representation across all levels of abstraction. Thirdly, besides describing the
represented perceptual entities, token representation is designed to also represent the
deformation information of the entities.

As we can see, all the representation schemes surveyed in this section can be
contained in the token representation. In the next section, we will examine various

recognition methodologies from the perspective of token grouping.

3.2 Recognition Methodologies

Various object recognition techniques have been proposed in the literature. In this
section, we first summarize these techniques from the conventional perspective of three
levels of processing. Then, following the classiﬁcation of ﬁve categories of recognition

methodologies in[42], these categories will be surveyed from the perspective of token

grouping.

3.2.1 Three Levels of Processing

Based On the conventional computational models for vision, there are three levels of
processing: low level, intermediate level, and high level [62, 58, 57].-They are deﬁned

and Summarized brieﬂy below.

69

Low Level Processing

The task of low level vision is generally to recover intrinsic information from given
input image(s) [4, 57, 42]. The processing techniques used at this level are usually
considered to be application independent. Operations are typically applied uniformly
to the entire image space. Examples of intrinsic properties to be recovered are depth,
information about surfaces, optical ﬂow, edges, or restored or enhanced sensor mea-
surements.

The techniques used to derive these intrinsic features are edge detection, stereo,
shape from X, optical flow, and various restoration techniques. These techniques
detect different features, often dense, from given input data for the smallest percep-
tual unit, a pixel or voxel. For every such smallest perceptual unit, the detected

features form a set of augmented descriptors which can be utilized later in organizing

percept ual data.

Intermediate Level Processing

Starting from intermediate level of processing, perceptual data is to be organized[4,
62]. The intrinsic images from low level processing are utilized to determine how
data should be organized. Various intermediate perceptual entities are derived at this
level of processing and associated features are extracted. Examples of the perceptual
entities derived at this level include regions, curves, ribbons, surfaces, or parts.

The techniques used to recognize intermediate perceptual entities at this level are
typically segmentation techniques, feature extraction, boundary detection, percep-
tual grouping, and various pattern recognition methodologies. The role of extracted
intermediate perceptual entities is to provide cues in a more abstract form for high

level processing.

70

High Level Processing

Conventionally, high level processing is where the correlation between given visual
data and previously known objects takes place [4, 62, 42]. All the perceptual entities
derived so far, together with their features, are integrated for the purpose of recogni—
tion via, usually, matching techniques to ﬁnd the best correlation between data and
the stored object models. Depending on how object models are represented, different
matching strategies can be used. We classify matching strategies into four types:
geometric matching, iconic matching, relational matching, and statistical matching.

When an object is modeled by a parametric description, geometric matching is
often used to measure the similarity between data and the object model [97, 9, 72].
Geometric matching techniques are applicable to both 2D and 30 problems. Match-
ing techniques of this type are often performed through a ﬁtting. For example, a
set of 3D points can be ﬁtted with a theoretical surface function with a number of
parameters. Fitting is an optimization problem. That is, the object parameters are
estimated such that some predeﬁned criterion function is optimized. For example,
the cost function can be deﬁned as the total squared Euclidean distance between a
set of data points and a parametric model surface. In this case, a fitting using this
criterion function is to ﬁnd an optimal parameter setting of the model so that the
distance between the model and the set of given data points is minimized. In geo-
metric matching, the correlation between a model and data needs to be established
based on some ﬁgure of merit designed according to speciﬁc application needs. In the
ﬁtting example mentioned above, the reciprocal of the total distance can be used as
a ﬁgure of merit, indicating the degree of similarity between data and a model.

An Object model can be built directly using the image of the object. Such a model
is called a template. Iconic matching strategies are often used in matching sensor

data against this type of object model. Template matching and matched ﬁltering

71

both fall into this category. Similarity between sensor data and a template model is
often determined by a correlation—based measure. One example of such a measure
is Signal-to-Noise Ratio in ﬁltering the sensor data using a matched ﬁlter. This
type of matching technique only measures how much the sensor data. looks like the
model. Therefore, it works only for limited conditions such as a small range of viewing
angles or similar imaging conditions, and it demands an essentially non-occlusion
environment.

Models for complex objects can be graph-based where nodes represent parts of
objects and arcs represent the spatial or topological relationships among different
parts. To determine a match between such an object model and data requires a graph
matching. The correlation is then speciﬁed by graph isomorphism or, more generally,
homomorphism found between two graphs, one built from the data and the other
from a. model.

Objects can also be modeled using their distinct features. Such an object model is
a feature vector or a point in a d—dimensional space where d is the number of features
used. The matching between data and this type of model is often determined through
a statistical matching. Pattern classiﬁcation and clustering are typical matching tech-
niques used in this category. The metrics designed to measure the similarity between
a model and data are usually deﬁned based on the distance (Euclidean or Mahalono-

bias) between two points, one representing the model and the other the data, in a

d—dimensional feature space.

3.2.2 Taxonomy of Recognition Methodologies

Haralick and Shapir0[42] have considered the essential issues that a recognition

methodology usually addresses:

72

Conditioning: A conditioning operation determines some informative

Labeling:

Grouping:

Feature

extract ing

Piatching:

pattern on the basis of the observed image.

Labeling determines the spatial events in which

each pixel participates.

Grouping identifies events by collecting together
or identifying maximal connected sets of pixels

participating in the same kind of event.

Extracting operation computes a list of properties

(features) for a group of pixels.

Matching operation determines the interpretation
of some related set of image events, associating
these events with some given three-dimensional

object or two-dimensional shape.

These steps constitute a canonical decomposition of a recognition problem at any

level of abstraction. They represent common issues that recognition methodologies

address at every level of abstraction[42]. This observation reveals the homogeneity or

commonality among recognition methodologies at any level of abstraction.

Table 3.2 demonstrates, through examples, that the above ﬁve issues are repeat-

edly encountered at all three levels (low, intermediate, and high levels) of processing.

Every recognition methodology addresses, explicitly or implicitly, these ﬁve issues.

73

 

 

 

 

waives: ~3me wane—:5 6.8.“
3963:. 252.5% cosetomow 3030 62:2: .033 .391 1.5088 335
$5638 _eosmsﬁm .momﬁoaoa 092m £839? .Eonamﬁoczzzo 320:0 :3:
$233.2: saeuw 665:8 .ﬁwcﬂ weasew 85qu 835m
9533.: quEOow doggone Suﬁsm 3398qu 62:3 .82: #9088 .264
@5538 _wusmsﬁm .1250: 83.5w .meuoE dcvcoaﬁoU .580 8:30 SEEEHSE
waive—c
winoaeﬁ wee—QES x 80¢ madam .cosgcoﬁmom $3223.58 36:
.moocowaoamotOQ 9:538 088 98% .cosuoaoc owvo ..Eoﬁoonwsco 304
@5332 :osueﬁxm— 838m $3326 mE—onmq mﬁcomﬁccoO £264

 

 

 

 

 

 

 

 

.momwobcoﬁoﬁ coﬁmcwoooe E 853 o>c 2: can Ewmceeem mammmoooa ~32 085 2: ocean—ob 3:28:28 23. .m.m @568

 

74

Conditioning

Examples of conditioning at low level include various image enhancement techniques,
ﬁltering, and morphological processing. At the intermediate level and high level,

conditioning can be in the form of outlier removal according to some criterion.

Labeling

Labeling is the process of deciding which perceptual entities participate in what event
according to some criteria. Assigned labels signal a classiﬁcation of the perceptual
entities in some given space. A typical example is thresholding in which the criterion
can be an intensity threshold value. A labeling can also occur at higher levels of
processing. For example, classifying detected perceptual entities into a set of known
Object parts. The criterion used for labeling speciﬁes some kind of homogeneous
Property among the perceptual entities that are assigned the same label.

From the perspective of token grouping, functional module O_Event(G”) in a
grouping agent plays a similar role. Given criterion G” = G: /\ G2, it identiﬁes a set

Of tokens from grouping domain G"; that comprise an observed event E0.

Grouping

The grouping deﬁned in[42] is simply an operation that mechanically collects all the
p erCeptual entities that have the same labels. Such a deﬁned grouping does not decide
which entities participate in which spatial event, that is done in labeling. A token
grouping agent, however, is an independent decision making mechanism. Therefore,
the Simple grouping deﬁned in [42] performs only part of what a token grouping agent
is Capable of, Speciﬁcally, it performs C,” <2 E, where E, is identiﬁed in the labeling

Step (see Section 2.2.5).

75

Extracting Features

This operation computes a set of features of perceptual entities. In a token grouping

agent, this task is performed by functional module H.

Mat ching

As discussed earlier, there are essentially four types of matching techniques: geometric
matching, statistical matching, graph matching, and iconic matching. Each of them
can be considered as a homomorphism ﬁnding problem, hence can be performed in
functional module H_Evaluate(Eo, E6) in a token grouping agent.

For a geometric matching, since correlation is usually made between two parallel
sets of object parameters, the set from data needs to be estimated from the grouping
candidate before the matching takes place. For example, if a tube-shaped object is
modeled by a 3D cylindrical surface, the set of perceptual entities hypothesized to
comprise a 3D cylinder surface should be ﬁrst characterized using six model param-
eters in order for it to be matched against the model. Having the parallel sets of
features from both data and the model, a geometric matching is to ﬁnd a homomor-
PhiSm between two single node graphs, one from the model and the other from the
data, Where both nodes are attributed by their corresponding sets of parameter val-
ues_ In this case, the homomorphism is deﬁned based on the property features, hence
a Similarity of the two nodes is to be identiﬁed. Statistical matching can be viewed
the Same way. A graph matching lends itself to a homomorphism problem. With

each individual point in an input image represented as a token, an iconic matching

can be formulated as a graph matching.

76

3.2.3 Abstraction of Recognition Methodologies: Token
Grouping

Note, in effect, all ﬁve steps together perform some kind of unit Imnsfo-rmation[4‘2].
Such a unit transformation process repeats across levels of abstraction until the or-
ganized data is identiﬁed as what the recognition system is looking for. That is, at
every level, the central task of all these ﬁve steps is common: to enable an appro-
priate unit transformation. The ﬁrst three steps choose appropriate candidates for a
unit transformation (grouping). The fourth step prepares some measurements that
are needed in order to carry out the unit transformation. The last step is where the
transform is completed provided that there is a signiﬁcant similarity between the set
of chosen candidates and what the unit transformation operator is looking for.
The process of unit transformations is the process of token groupings because
both organize data through means of aggregating a set of candidate perceptual en—
tities into a new perceptual entity. These two viewpoints are based on the common
observation of the homogeneity among different levels of processing. However, instead
of considering the ﬁve steps as loosely coupled, each grouping agent uniﬁes them and
encaPSUIates the functional behavior exhibited in these ﬁve steps and data represen—
tation. Another difference is that, in the framework of hierarchical token grouping,
all levels of grouping are put into the perspective of reaching an overall solution so
that the interactions, within GH and between GH and KH, are explicitly treated,

a1 ' . . . . . . .
I In a Coheswe and conSIStent manner, to fac111tate a more systematic. integration

scheme.

3’3 Interaction and Integration

111 te .
g1‘a'tlon is a very important aspect in solving vision problems. With different

indi V ‘
1 dual modules to be responsible for detecting different types of cues, integration

k

77

can be facilitated through interactions among modules. In the literature, interaction
has been attempted at several scales. The ﬁrst scale is neighborhood interaction which
is local by nature. The second scale is interframe interaction which across multiple
images. The third scale is intermodule interaction among individual modules across
levels of abstraction. The following sections survey the interactions at these three

scales and show that they can all be supported under the paradigm of Hierarchical

Token Grouping.

3.3 - 1 Neighborhood Interaction

The interaction at the scale of local neighborhood occurs among neighboring entities.
Contextual information is integrated to describe the compatibility among neighbors.
The goal of such an interaction is to maintain some predeﬁned consistency within the
neighborhood. A typical framework for achieving this type of interaction is relaxation
[109, 71, 2, 85, 31, 75].

Neighborhood interaction is supported under the proposed HTG architecture.
Since the smallest perceptual entities are represented as manipulatable tokens, in-
tegration of the information from neighboring tokens can be easily implemented in
a homomorphism evaluation function. Haralick has proved that the interaction and

integration at this scale can be realized through homomorphism satisfaction [41].

3°3-2 Interframe Interaction

The next scale of interaction is across multiple image frames. There are two situa-
tions where multiple frames exist, (1) frames from a hierarchy of multiple resolutions
and (2) frames from a time sequence. In the ﬁrst situation, information from dif-
ferent resolutions gets integrated in the fashion of coarseto-ﬁne, ﬁne-to-coarse, or a

corleination of both. The purpose of such interframe interaction is often to obtain

78

or reﬁne some visual data through integrating the equivalent type of information
derived from different image frames. During the interaction across temporal space,
correspondence information is often integrated. A typical framework used for in-
terframe interaction among different resolutions is the pyramid or processing cone
[62, 100, 40, 96, 80, 59, 60, 105, 3]. Interframe interaction is explicitly facilitated by

the groupings along conceptual dimension M (see Figure 2.1 in Chapter 2).

3.3. 3 Intermodule Interaction

The third scale of interaction is among individual modules. Zucker has termed the
interaction at this scale as a vertical process through which horizontal processes (mod-
ules) that are responsible for different perceptual entities interact[108].

While the literature has provided fairly consistent frameworks for the interaction
at the previous two scales, the interaction among individual modules have been so
far essentially ad hoc and heterogeneous. Individual modules interact by exchanging
a Wide range of heterogeneous types of information. For example, data from differ—
ent sensors can be integrated in a module to estimate or to characterize 2D or 3D
structures [56, 27, 1, 2, 63, 108]. Different types of knowledge can be utilized and
integrated in a module to infer invariant perceptual regularities from input images
[60, 86]. Cues from different processing modules can be combined in a higher level
module in order to identify salient features of objects [52, 26, 15]. Due to the het-
erogeneous nature of different visual tasks, there is so far no coherent mechanism
developed for supporting the interactions at this scale.

The framework of hierarchical token grouping is our attempt to provide an envi-
I‘Onment in which intermodule interactions can be supported in a more coherent way.
The token representation offers a consistent interface so that interaction models can

be Speciﬁed by using a list of levels (22’s) and the feature selection functions. From the

SPe ‘ . . .
efﬁcation for grouping agents, these types of information can be eaSily extracted

79

and compiled. Therefore, while allowing the freedom of deﬁning flexible intermodule

interactions, the proposed framework supports them in a structured and manageable

way.

3.4 Computational Paradigms For Vision

Researchers working in the ﬁeld of vision have been trying, over the years, to an-

swer important questions concerning the nature of vision. Aloimonos summarized as

follows:

There are those who ask the empirical question (what is), i.e., they are
trying to ﬁnd out how existing visual systems are designed; those who ask
the normative question (what should be), i.e., they are trying to ﬁnd out
what classes of animals or robots would be desirable (good, best, optimal)
for a set of tasks; and ﬁnally there are those who address the theoretical
question (what could be), i.e., what range of possible mechanisms could

exist in intelligent visual systems.

Various computational models for vision have been proposed in the literature. Several
have important impact in the literature among which the three level paradigm, origi-
nally proposed by Marr and later revised by Lowe, remains to be the most inﬂuential.
In subsequent sections, we review some of them and, more importantly, we examine

the relationships between them and the framework of hierarchical token grouping.

34:. 1 Marr’s Computational Model

Marr’s computational model for vision answers theoretical questions of vision (what).
18 Work was the ﬁrst attempt to make an analogy between human v1310n and machine

vi ‘
81011 [62]. Marr proposed the three level computational model for vision in 1982 (see

80

 

| Image Features I

 

Stereo
notion
Shape from x

 

 

 

 

2 100 Sketch
lntrlnelc Images

 

l Recoqni t ion

 

Object Models

 

Figure 3.3. Marr’s computational model for vision. The picture is taken from “Per-
ceptual Organization and Visual Recognition” by David Lowe.

Figure 3.3). The model dictates that perception is achieved within a hierarchy of
three levels of bottom—up computations and representations.

The computation at the lower level provides image features that are directly avail-
able or computable from input images. Marr terms these feature as “Raw Primal
Sketch” (RPS) and “Full Primal Sketch” (FPS). The concept of place token was pro-
posed to represent individual entities of RPS and grouping is suggested as a means
of “aSSernbling” FPS from RPS. But both the concept of place token and the concept
0f grouping were neither pursued further nor made concrete in Marr’s works.

Intrinsic properties are computed at the intermediate level and represented by a

set of intrinsic images[6]. Shape information of visible surfaces comprises the rep-
resentation of the 2%D sketch at this level which bridges what is available from the
low level and what is t0 be derived at high level (3D structures). From the cognition

po' , ,
int of view, however, the 2%D sketch at this level does not prov1de much more

81

abstract information than the original input image because the representation is still
an array and data is not yet organized.

Based on the 2%0 sketch, primitive surface patches and compact volumes can be
extracted and assembled (grouped) to index to the instances of the objects that are
similarly structured. How the derived 3D shapes are utilized toward cognition is not
made speciﬁc in Marr’s model. As Witkin and Tenenbaum pointed out, in contrast
to the previous two levels that are data-driven, iconic, and domain-independent, high
level processing is basically goal-driven, symbolic, and controlled by the expectations
of the perceiver[106].

Marr’s computational model describes a constructive approach to vision. From
an original input image to the recognition is a recovery process. The information
flow is essentially bottom-up. The model speciﬁes what is to be recovered when. It
emphasizes the role of 3D structure reconstruction and considers it as an indispensable
step toward cognition. The high level is described as a semantically involved process
and the explicit participation of knowledge in cognition occurs, in Marr’s model, only
at this stage of processing. Some important issues are not addressed in this model,
such as top-down influence, interprocess interaction, integration, and cooperation and
Competition.

While this model addresses many theoretical aspects of vision, to apply it effec-
tively in practice, the normative questions such as “what kind of mechanism could
it be in order to realize such a recovery process?” have to be seriously answered.
Considering the mechanism of neural networks in the human brain where massively
Connected networks consist of all homogeneous computational units which together
can Perform complex and heterogeneous vision tasks, a homogeneous and coherent

Inechzi—Ilism for machine vision should be possible. Marr has proposed grouping as a
concrete mechanism fOr low level processing, essentially from RPS to FPS. To date,

meg . . . . . .
hahlsms for realizmg the recovery theories for various reconstruction problems at

y

82

lmege Feeturee I
I Perceptual
orgeni set ion

 

 

 

 

 

 

 

l l l
: : -:
E
El El .9. I
8 3| 3| I
a l I g ' Perceptual
‘8’ l I no I Groupinge
:3 ii i L
I} - - .
:2 2 1’20 5km“ an inference
30 Groupinge
l
I Recognition
l
i

 

 

 

 

—+ |_ Recognition
. Object Modele I‘

Figure 3.4. Lowe’s computational model for vision. The picture is taken from “Per-
ceptual Organization and Visual Recognition” by David Lowe.

different levels have been local and heterogeneous.

3.4.2 Lowe’s Computational Model

Believing that perceptual organization is an essential level of inference and that non-
accidentalness plays an important role in human perception, Lowe presented a revised
computational model for vision[58]. This is shown in Figure 3.4.

In this revised model, Lowe emphasized the role of perceptual organization in
Vigion and de—emphasized, but did not eliminate, the use of depth information. From
the general vision point of view, Lowe deﬁned the functional roles of perceptual
organiZation to be for segmentation, three-space inference, and indexing to world
knowledge. He presented a computational model for vision similar to Marr’s model
but Contained the pathway that bypasses the intermediate 2%D representation in

Mal-108 model along both bottom-up recognition and top-down veriﬁcation directions,

reﬂecti , , , .
11g the observation that 30 recognition can often be achieved adequately and

83

directly from 2D information. Nevertheless, the recovery of depth information is
retained as an alternative pathway even though it is no longer treated as a necessary
condition for perception.

In Lowe’s theory, such a bypass pathway is possible due to the non-accidentalness
derived from the assumption of viewpoint invariance. The major contribution of
Lowe’s work was to identify a concrete set of non—accidental or causal relationships
among 2D and 3D structures based on perceptual regularities and to utilize them in
indexing object models (index 3D part or object models directly from 2D perceptual
structures). Such an indexing scheme generates a reduced search space for high
level processing where indexed objects are veriﬁed using the constraint of viewpoint
consistency. The role of grouping was made explicit in Lowe’s work: to uncover the
causal relationships in the visual ﬁeld. Perceptual grouping served as the mechanism
under which signiﬁcant 2D structures are identiﬁed via grouping at the intermediate
level-

The organizational (grouping) principles used in Lowe’s work were developed
based on the theories of the Gestalt psychology. The Gestalt psychologists demon-
strated that people tend to reach the simplest possible interpretation for given visual
data. The Gestalt psychology revealed that groupings seem to follow the principle
of Simplicity. But the concept of simplicity was never made concrete. Another qual-
itatiVe description of this general principle is the minimum principle which states
that grouping is performed among those perceptual entities “which requires the least
amount of information to specify” [44]. In their important paper on the role of struc-
tur e in Vision [106], Witkin and Tenenbaum proposed a uniﬁed principle of grouping
Called fuzzy identities or least-distortion across both space and time which is, concep-
tually, Similar to the minimum principle. Although all intuitively appropriate, these

prop Osed general grouping principles are essentially concepts which are difficult to ap-

Pl ' . . .
y In Computation. Lowe’s work has been Viewed as an effort to make these princ1ples

84

more concrete and was an important contribution to the literature in this respect.
The set of causal relationships or perceptual regularities he developed serve, under
the constraint of viewpoint invariance, as a generic basis of perceptual groupings that
are “likely to survive intact through later stages of interpretation” [58].

Lowe’s model also represents a constructive approach to vision, in which what
information is recovered when is explicitly deﬁned. In Lowe’s model, processing is still
mainly bottom-up except in veriﬁcation where indexed objects serve as hypotheses
or expectations. The issue of interprocess cooperation/ competition is not explicitly
addressed in his computational model. Besides dealing with the theoretical questions
about. vision, the issues related to the normative questions were also considered.
Speciﬁcally, Lowe used grouping as a mechanism for perceptual organization at the
intermediate level. But, by what overall mechanism the vision tasks can be achieved

IS not further examined.

3.4.3 Model-Based Hierarchical Paradigm

While both Marr’s model and Lowe’s model acknowledge the use of knowledge but
essentially only at high level of processing, another viewpoint emerged in the litera-
ture that makes the role of knowledge explicit at every level of information recovery.
Rather than being a conﬂict with the previously discussed models, this view separates
the information recovery process from the knowledge that is utilized by the recov-
ery Process. We call this kind of computational scheme a model-based hierarchical
paradigm.

BrOOks ﬁrst presented this vieWpoint in his work ACRONYM [20]. He designed a
hierarchical representation for the object model used (knowledge about objects) and
separated it from the information recovery process. The computational paradigm

0f ACRONYM consists of two parallel hierarchies, one for the bottom-up object

1‘ ec .
Over y process and the other for the top-down knowledge representation. These

85

two hierarchies interact horizontally at multiple levels during which the decomposed
object models can be used to assist in extracting the perceptual entities that are
signiﬁcantly similar to the model at those levels [20].

With the same view, Jain and Binford generalized the object model hierarchy and
extended it by allowing other types of knowledge to also be stored in it in a hierarchical
fashion. The observation that one can never experience or measure the physical world
directly suggests that a vision system must build a model of the world (not just
objects) using its past experience and, at any given time instance, must instantiate
the model for the environment of that moment by combining the learned experience
and the currently sensed information[53]. In this context, Jain and Binford conclude
that knowledge must be applied at every level of information recovery. They further
categorized the types of knowledge applied in visual information recovery[53]. The
knowledge related to perception is divided into three different categories: knowledge
about objects, knowledge about sensors, and knowledge about the domain in which
the vision system is to function[53]. Here, knowledge is classiﬁed according to the
nature of what it reveals about the world.

Besides concerning what is to be recovered when, such a model-based hierarchical
Paradigm also is concerned with how perception at different levels can be achieved,
including how the two hierarchies interact. In this paradigm, the role of knowledge
at eVery level is acknowledged and the integration of the top-down knowledge with
the bOttom-up cues is explicitly addressed. For example, edge detection methods
depend on the nature of the sensor data. With the knowledge about the type of
Sensor Currently being used instantiated in the model for the world environment, an
appropriate edge detection method can be invoked.

The separation of knowledge from the computations that apply knowledge yields

a. divide_and-C0nquer scheme at a even higher level. To solve vision problems, three

Ca .
tegol‘les of distinct problems have to be solved. The ﬁrst category deals with the

86

issues of knowledge engineering, including knowledge generation, updating, represen-
tation, and retrieval. The second category of problems concerns the theories and
techniques for recovering perceptual information using knowledge. The third cate-
gory is related to the problems of how to effectively establish necessary interactions
between the information recovery hierarchy and the knowledge hierarchy. This leads

to a more structured computational model for machine vision.

3.4.4 A Concrete Mechanism: Hierarchical Token Group-
ing

The framework of hierarchical token grouping is our attempt in answering the norma-
tive question of “what kind of mechanism should it be?”, a fundamentally different
question from the ones answered by the above surveyed computational models. The
Proposed architecture presents a concrete mechanism under which various computa—
tional theories for information recovery can be realized. The purpose of this architec-
ture is not to provide solutions for vision problems but rather to (1) offer a globally
COherent and consistent mechanism, through which integration of heterogeneous mod-
ules, different sources of information, and heterogeneous types of knowledge can be
achieved in a more systematic way, and (2) provide a formalism for organizing complex
Systems in order to produce structured and manageable vision systems.

The proposed architecture is capable of supporting all the computational models
Surveyed in this chapter. This is because the proposed architecture indiscriminately
realiZeS any information recovery technique by uniformly dealing with a grouping
p r Oblem. The proposed framework establishes the principles of constructing a vision
System but gives the vision system designers the freedom of deciding “what to build”.

That is, the decision about “what to be recovered when” is not a concern of the

at ‘ . . . .
chltecture, Such decisions reflect which computational model IS used and how it IS

87

applied. In order for the architecture to be able to support a wide range of ﬂexible
decisions, hence to realize different computational models, it is necessary for the
principles of constructing systems to be independent of the content of the systems
to be constructed. With a flexible number of levels, the architecture is capable of
supporting various vision systems with different purposes.

One issue that is seriously addressed in the proposed architecture is homogeneity
in vision problem solving and utilization of it in developing a homogeneous envi-
ronment with uniform interface (representation) among levels in order to facilitate
interactions and integrations. The homogeneous representation allows a consistent
constraint propagation interface among all levels so that interaction, backtracking,
or explanation can be systematically realized. The uniform token grouping operation

naturally yields a hierarchical structure of the token representation for objects. The
r016 of grouping as a general mechanism to organize or abstract data in a hierarchical

CorDIDIJtation environment was speculated by Brady in 1981[17]:

It is equally clear that grouping operations need to be deﬁned at each level
of resolution of each representation in the visual system, in order to impose
hierarchical structure upon the representation. The advantages that should
accrue from imposing such structure are likely to be precisely those which have

inspired the development of data structures generally in computer science.

Ver all, the proposed framework is a general engineering paradigm for machine Vision.

I t; -
l S Concrete, consistent, systematic, and structured.

3 ‘ 5 Summary

Th
6 literature survey is presented in Chapter 3. Four aspects in computer vision
th
6.1; are related to vision system integration are surveyed: representation, recognition

ﬁle
t 11Odologies, interaction and integration, and computational models and paradigms

g

88

for vision. We demonstrated that (1) token representation is a generalization of a
wide range of representation schemes proposed in the literature, (2) token grouping
generalizes the recognition methodologies at different levels of abstraction, (3) the
uniﬁed grouping principle of homomorphism is capable of describing the interaction
of all three scales, and (4) the framework of Hierarchical Token Grouping can support
ﬂexible computational models for vision.

As a concrete demonstration of the applicability of the proposed framework, it
will be shown in Chapter 4 and 5 that the framework of HTG provides a realistic

and useful problem solving architecture to the real-world problem of tubular object

recognition.

CHAPTER 4

Tubular Object Recognition

In this chapter, a model-based approach is described for recognizing tubular objects

from both 2D intensity and 3D volumetric data. The modeling includes (1) a general-

ized stochastic tube model characterizing the structural properties of tubular objects,
and (2) imaging modeling, predicting the cross sectional sensor measurements. An
automatic multi-level recognition strategy is proposed that exploits the power of the
nFIOCIEIS at different levels of abstraction. Region-based and boundary-based recogni-

tion techniques are integrated in order to achieve reliable performance under difficult

t1 ‘ ..
0136 conditions.

4 ‘ 1 Object Domain

hape is an intrinsic property of three-dimenSional objects. Even though often de-

f0
I‘Illél-ble, tubular objects possess distinct morphological properties. They are smooth,

C .
urvl linear, and often symmetric. The elongatedness can be characterized by a large

as
Dect ratio. Figure 4.1 shows a few examples of biological tubular object networks.

The

geometric structure of tubular objects can be mathematically modeled so that
1:11 -
61 1‘ distinct shape properties can be quantitatively characterized.

111 image understanding tasks, object shape needs to be inferred from observable

89

90

 

  

. ' 3.3%

i j;

7 :1".
\.

 

ss-zm-wamammmaﬂ

(C)

   

Figure 4.1. 2D digital images of (a) 2D blood vessels, (b) 2D plant roots, (c) 2D
bacteria, ((1) 3D blood vessels from MRA scan.

features. Since how objects are perceived in input images depends on both their 3D
shapes and the imaging condition, sensor plays an equally important role because
it determines what kind of image features can be observed. In the following sec-

tions, both the geometric shape of tubular objects and the sensing conditions are

m at hematically modeled.

4-2 Geometric Shape Modeling: Generalized

Stochastic Tube Model

There are two aspects in terms of the shape of tubular objects. One is about the local
sh ape of the objects that are distinct and salient. The other aspect is about their
Sh ape dynamics when we consider this class of objects from a more global point of
View ~ Modeling tubular objects geometrically needs to capture both of these aspects.
In the following sections, after introducing related background in geometric model-
111g techniques, we present an object model for the class of tubular objects, called
C;"(37le7‘(ilz'zed Stochastic Tube Model, that characterizes the geometric shape of tubular

0b ‘
J eCts using an explicit parameterized approach.

 

91

4.2.1 Survey

Geometric Modeling
In the literature of computer vision, Generalized Cylinders (GC) have been stud-
ied extensively [70, 89, 73, 101]. Since this representation scheme was proposed by
Binford, it has evolved from straight homogeneous cones to the recent much more
ﬂexible deformable models such as superquadrics or geons[72, 12]. Sliafer[89] and
other researchers [77, 73, 101] have analytically studied various invariant geometric
features of this powerful model to provide constraints that, ultimately, can lead to
the localization of objects of interest.
A Generalized Cylinder is defined as follows[l3]. Let A be a smooth axis in a
three-dimensional space (either straight or curved) and X be a cross section. Deﬁne
a Sweeping rule 5 of X along A (not necessarily containing A) where both X and 5
ar e functions of the length of A. A Generalized Cylinder, denoted as GC, can then
be deﬁned uniquely, up to a translation, by CC = 5(X,A). Figure 4.2 visualizes
this deﬁnition. Note, there is essentially no constraint to any component involved
in deﬁning a GC. The axis is arbitrary. The cross section can be any form such as
Sclllares, circles, symmetric, or asymmetric and it can deform along the sweeping.
The sweeping rule speciﬁes the relationship between cross section X and axis A. For
example, the cross section can form an arbitrary angle with axis A at any point
of SVveeping and it can even rotate while being swept so that a twisted object can
be generated. Therefore, the GC model has tremendous descriptive power and it
provides a general representation. Although the GC model was originally proposed
as a volumetric representation scheme, its 2D projected form is also widely used. A
Commonly used class of 2D GC is called ribbon. While CC offers a powerful modeling
r11ethod, using it effectively for recognition purpose requires a mathematically more

Dr . .
gclse formulation of the representation.

92

 

 

 

Figure 4.2. A tubular object.

The theory of Recognition-By-Components (RBC)[12] utilized the concept of Gen—
era'lized Cylinders and deﬁned a set of 36 components, called geons, that can be
differentiated on the basis of perceptual properties in 2D images. Biederman demon-
Str E1'ted that a wide range of objects can be modeled as assembly of this generic set of
C()l—Ilponents[12]. While RBC offers a theory for constructive object recognition, since
geons are qualitatively deﬁned, it is difﬁcult to apply the theory in practice for recog-
I1it‘i‘3n purpose. To explore the power of this general representation for recognition
purpose, it is necessary to directly parameterize the geons. That is, mathematically
(2

11 Crete or quantitative deﬁnitions for geons need to be developed so that recognition

be achieved by explic1tly assoc1ating a set of model parameters With observable

 

93

image features for different geons.

Superquadrics is a parameterized model. It represents a subset of 3D geons that
have symmetric cross sections and an eccentricity of 90 degree. With its simple
parameterized form, superquadrics can capture various global shape deformations
with a small number of ﬁve parameters. However, use of this elegant model for
recognition demands the segmentation of data points before the model can be applied
and the model itself represents a necessarily closed object instead of being able to
grow in space (like what is modeled using sweeping in Generalized Cylinders).

The Generalized Cylinders, superquadrics, and geons have close relationships.
While the cross section function, the axis, and the sweeping rule in the GC model

Can all be completely arbitrary, geons and superquadrics are both subsets of GC’s
With speciﬁc constraints on cross sections, axis, and the sweeping rules. Both GC’s
and geons are deﬁned qualitatively. In recognizing the objects represented using
these qualitative shape models, the relationship between the geometric shape and
the observable image features of the objects have to be made concrete. One way of
doi 11g so is to explicitly parameterize the models and to relate the model parameters to
obServablesBS, 47, 48]. Some research has also been done in the literature that utilized
the parameters of superquadrics to differentiate geons using statistical classiﬁcation

teC1111iques[76].

Ql'QeBased Object Recognition

Q Q ‘ based reconstruction of 3D objects aims at capturing the relationship between the
Obj eCt shape and the invariant properties in a given space. If the space is a gradient
Space, geometric surface features of a GC reveal the orientation of the underlying
Obj eCt; while in a shadow plane, the visible shading information is often analyzed to
CQIIStrain or determine the surface orientations of the objects in the corresponding

gr - . . . .
a”'(ilent space. A commonly used recognition strategy is to utilize the detected

m

94

features, either surface features or shadow features, to constrain or hypothesize the
pose of the objects and then verify [45, 35, 19, 12, 102, 55, 23]. A robust 3D object
recovery requires that (1) features used adequately characterize a 3D object and (2)
they can be detected reliably.

In a gradient space, most GC-based recognition methods are essentially boundary-
based approaches. Important issues in these methodologies are to segment surface
points and to extract surface features reliably. The explicit relationship between sur-
face features and the parameters that characterize, qualitatively or quantitatively, a
GC object can lead to a correct recovery provided that the surface features are iden-
tiﬁed accurately. Practice has shown that both these issues often represent difﬁcult

pI‘Oblems [107, 85, 67, 66]. Obtaining precise surface features relies on the quality of

Segmentation which often can not be guaranteed. Noisy data can easily corrupt not
only the segmentation of surface points but also, consequently, the computed surface
featUIes.

In a shadow plane, GC—based object reconstruction with 2D input relies on the

COIIStraints in gradient space estimated from shadows using shadow geometry. Two
kinds of shading information that are most often exploited are the shading conﬁgura-
tion and the contours of objects. The shading property along the cross sections of a
G C has been used to search for blood vessels and coronary arteries in medical imag-
ing [55, 23], or for plant roots in processing agricultural imageries[32]. The shading
111°C1e] is, however, either collapsed onto proﬁle patterns to reduce the dimensionality
of the computation, causing an inﬂexibility in recognition and sensitivity to noise,
or Ihatched with data exhaustively in the image to identify object shadow regions,
leading to an expensive computation. Due to the low level processing scheme, these
approaches lack an effective method to localize objects, hence, they often demand
ei t her a manual seeding from users or a brute force search.

Using contour information, a set of invariant 2D perceptual regularities such as

m

95

collinearity, symmetry, or parallelism have been utilized to reveal the relationship
between a 3D GC and its 2D image [58, 12, 69, 19]. Another invariant contour
primitive, the ribbon deﬁned based on parallel symmetry, has also been used [19,
101, 94]. The theories of perceptual grouping and part—based recognition propose
the reconstruction of objects directly from 2D contours. Although elegant in theory,
recovery can be difﬁcult in practice. Additionally, ambiguities exist when only 2D
contours are used to recover 3D objects. Veriﬁcation is thus needed. Most 2D GC-

based recognition methods in the literature have applied heuristics for veriﬁcation.

4.2.2 Modeling Local Shape

Use of a geometric model for objects provides a priori knowledge to organize sensor
data. Based on the GC representation scheme, we develop a geometric model for the

Class of tubular objects. A special class of GC, called Right Straight Homogeneous

Generalized Cylinder, is parameterized to establish a generalized stochastic tube model

that, directly facilitate the recognition.

G e n eralized Tube Model

Deﬁne Straight Homogeneous Generalized Cylinders (RSHGC) to be a class of GC’s
that have straight axis, ﬁxed cross section functions, and that its cross sections and

aXiS are perpendicular to each other during the process of sweeping[89]. Using a

RS HGC of some length, called a local tube, as a primitive, a tubular object can be,

Wi t hout loss of generality, approximated as a set of smoothly connected RSHGC’S. A

t.
111)lllar object consisting of k local tubes is deﬁned as:

3,. = {T1,T2,...Tk}.

 

96

Each local tube Ti, 1 g i S k, is described by a 3—tuple:

Ti : {PiaXiilili

(4.2)

where ’P; denotes its 3D pose, x,- denotes its cross section function, and l,- is the length

of the tube.

 

c . (x0.y0.20) MK

 

 

Figure 4.3. A rotated and translated tube model.

Fir St, we model a local tube T,- in an object-centered coordinate system. Assume

an elliptical function is used for cross sections. Let UVW be an object-centered

coordinate system such that T.- is located at the origin of UVW and lying along the

V “is (See Figures 4.2 and 4.3). In UVW, T,- is a set of 3D points that satisfy the

following:

(—)2+(%)2 < 1. vi) $1..

(4.3)

97

where a and b are the major and minor axes of an ellipse.

We now express T.- in world coordinate system XYZ. There are two ways to rep-
resent T, in XYZ. One is based on the homogeneous transformation between UVW
and XYZ and the other is to express T,- directly in XYZ.

Assume an arbitrary homogeneous transformation from UVW to XYZ with trans-
lation c = [2:0 yo 201T and three rotations: the rotation with respect to the U axis by
a, Rum), the rotation with respect to the new V axis by 0, RUM), and the rotation

with respect to the new W axis by )3, RAH). Figure 4.4 illustrate these rotations.

The translation c = [1:0, yo, z0]T represents the new location of T, in XYZ. The homo—

 

 

 

 

y Y
V
l W V
\ ’4
:U >X ......... N;U >X
W
Z Z

RWY)

 

 

 

Figure 4.4. Rotational relationship between UVW and XYZ.

98

geneous transformation is:

f (not case —Sﬁ 0
SaSﬁCg — C059 505559 + COCQ SOC/3 0

cmwﬂsﬁgdma—aocmno

 

 

~ 1‘0 yo 30 1 ]

where C, stands for cos(.r) and Sat stands for sin(.r). The relationship between a 3D

point p in UVW and its transformed image p’ in XYZ is:
p'=[.ryzl]=px’H;,p=[uvw1]=p’><’H,-_I, (4.4)

where H,- is a function of six parameters. Figure 4.2 and 4.3 illustrate the relationship
between XYZ and UVW. An elliptical cross section in XYZ has eight degrees of
freedom: {2:0, yo, 20, a, b, a, B, 0}.

In our current research, a special case of ellipse, the circle, is used for the cross
sections. In this case, a = b = r and the rotation 12,,(9) vanishes. Combining (4.3)

and (4.4), a tube T.- with circular cross sections is the set of points (r, y, 2) such that:

[(y - yi) C08(0) + (z - 2:) si10(a)l2 + [(36 — an) C08(5) + (4-5)

(y — yi) sin(a) sin(;8) — (z — 2,) cos(a) sin(ﬂ)]2 3 r2, (4.6)

Where (58,-, y;, zg)’s are the points along the transformed new V axis. In this formula-
tlon’ the pose and size of any local tube is uniquely described by six model parameters.
Therefore,

Ti = {piaXiali} : {Harali}, (4-7)

99

where H; specifies the pose of T,- and r,- describes the circular cross section.
To represent T,- directly in XYZ, note that the pose of T,- can be determined by its
location, with respect to the origin of XYZ, and its orientation vector which coincides

with T,’s straight axis. Therefore, T,- can be represented by a 4-tuple in XYZ:

T, = {c,,d,,r,,l,}, (4.8)
where c,- is a chosen reference point on the axis of tube T, that describes a translation,
d.- is the orientation vector, 1‘,- is the radius, and l,- is the length of the tube. Figure 4.3
visualizes this deﬁnition of a tube. In this representation, c,- and (1, specify the pose
of T; and, similarly, r,- adequately describes a circular cross section. The orientation
vector d,- can be equivalently speciﬁed by two angles, 7 and C, denoting latitude and
longitude, respectively.

Relationships between (4.7) and (4.8) exist. Recall, in coordinate system UVW,

with circular cross sections, T,- is a set of all points (u, v, w) that satisfy:

u2 + w2 g r?, Vv 6 [0,1,] 6 V. (4.9)

Here, the transformed V axis implied by "H, in (4.7) aligns with the orientation vector
d,- in (4.8) and the origin of UVW in (4.7) coincides with the reference point c,- in
(4'8). Figure 4.3 visualizes these relationships.

EVen though both parameterizations in (4.7) and (4.8) are adequate to characterize
a local tube with an arbitrary orientation and size, (4.8) leads to a more direct
linkage between its object parameters and the observable image features through
an explicit set of relationships between invariant surface features (observables) and
Object parameters (to be discussed in the next two sections). Since a more direct

1 ° . . . .
Inkage leads to a clearer recognition strategy, the parameterization deﬁned in (4.8) 18

100

used from now on. With this parameterization, the geometric structure of a tubular

object consisting of 1: local tubes is modeled as:

Ek = {T1,T2,...Tk} = {{C1,d1,r1,11},.., {Ck,dk,rk,lk}}. (4.10)

Invariant Surface Features of Tubes

Shafer[89] and other researchers [73, 101] have analytically studied, in both gradient
space and shadow plane, various invariant geometric features of the GC model. These
fundamental properties provide constraints on object surface orientations that, ulti—
mately, can lead to the localization of the objects of interest. In this study, some
of the fundamental surface features of the tube model are fully and quantitatively
exploited.

Several intrinsic surface features of this object model are invariant to rotations
and translations. Examples are surface curvatures and radius. Curvature at a surface
point p is deﬁned as the second derivative (at this point) on a C2-continuous surface.
For each surface point p, there is a tangent plane TD and an inﬁnite number of
orthonormal basis vector pairs for Tp. Among them, there exists a unique pair of
directions along which the surface curvatures reach a maximum value in one direction
and a. minimum value in the other. Denote the two extreme curvature values by mm”
and "3 m in, , called principle curvatures, and their corresponding directions by vmax and
Vmina Called principle directions. Figure 4.5 gives an example of principle directions of

a cyliIICirical surface. The surface normal at point p, denoted by np, can be obtained

f
rom the cross product 0f Vmgn and vmax:

up = vmin X vmax

ere We choose the surface normal pOinting inward to the object.

101

 

Figure 4.5. The images of a cylindrical surface (left), its vmin’s (middle), and vmar’s

(right).

A cylindrical surface can be characterized by some distinct surface properties. For
example, all the points on this surface have continuous surface normals, zero minimum
curvature value, identical maximum curvature value, and parallel minimum curvature
directions. The orientation of a straight tube is parallel to the direction of minimum
curvatures; while the cross sections of a tube lie on the tangent plane formed by the
two principle directions. Therefore, a 3D cross section is bounded by a set of surface

points that have similar lem-n, Km”, and vmin and that are spatially conﬁgured in a

shape similar to a circle in 3D space.

Relationship Between Surface Features and Model Parameters

There exists explicit relationships between model parameters {c, d, r} of a tube and
the abOVe deﬁned surface features {nmim 1cm”, vmin, vmn, up} at a surface point p on
the tube surface. The radius of a tube is the reciprocal of its maximum curvature value

and the orientation vector of a tube is parallel to the minimum curvature direction.

 

r: 1 , (4.11)
Ema:
= vm.,.+c_ (4.12)

102

The orientation vector d of a tube can be derived by simply translating a minimum
curvature direction vector to a reference point c and normalizing it, where the refer-

ence point c can be a point on the axis of the tube which can be obtained from the

features of a surface point p from the same cross section, that is:

1

”max

(4.13)

 

c=p+ronp=p+ .(Vminxvmar),

where up is the inward surface normal at point p computed from the cross product of
minimum and maximum curvature directions at p. These relationships are invariant.
Since a local tube can be completely speciﬁed by d, r, and 1, all with respect

to a reference point c in the world coordinate system, the above relationships can

be quantitatively exploited to constrain the poses of tubes, hence to localize object

segments.

4.2.3 Modeling Global Shape Dynamics

In using parameterized RSHGC as a primitive to model objects, the shape dynamics
can be expressed in shape of axis A and the shape of cross sections which is determined
by both the type and the size of cross section function x. In this context, the shape
deformation can be captured by considering the dynamics along two dimensions: the
dynamics in axis and the dynamics in cross sections.

Having mathematically modeled each local tube, in order to model an entire tubu-
lar Object, we now formally model the transitions between adjacent parameterized
RSHGC’s. Deﬁne a transition to be the changes or dynamics of object model pa-
rameters between two adjacent tubes. Consider a transition from tube T,-_1 to tube

T‘: the tube after the transition T .- can be described based on the tube before the

103

transition T.-_1 with some deformation in model parameters,
T,- = T,_1 x 65-13, (4.14)
where 6,-“- represents the transition. Speciﬁcally,
6i-1.i 2 mic—lgi‘sii—im‘si—im(Si—m}- (4-15)

Individual terms in (4.15) correspond to the changes in different tube parameters

which are constrained in speciﬁc domains:
(5,31,,- = c.- — c;_1 = [6:13,6y,6z]T, 6r,6y,6z E [—C, +C], C = ma:r(M, N, O),

631,,- = cos‘1(d.~_1 o dg) E {—9, +0],

(5r = I‘.‘ - 13-1 6 [—R,R],

i—l,i
6L“. = 1.- — 1,.1 e [—L, L],
Where M x N x O is the size of a constrained 3D subspace in X YZ ; 9, R, and L

are some constants. Based on (4.14), a recursive deﬁnition for a tubular object can

be deri ved:

E]: = (C0,d0,r0, 10) X {60,1a"a6k-l,k}

= To X {5:414}; (4-16)

Ek_1 X 6k_1,k, (4.17)

Where To = {co, do, r0, lo} denotes a starting tube and {6,-_1,g}f=1 represents a process
of transitions that captures the dynamics of the geometric shape of object Ek. The

a. O o I I
bOVe formulation conveys the VleW of conSIdering an object to be the result of a

104

sweeping: the overall structure of a tubular object is formed based on a reference tube
T 0 that travels in 3D space and undergoes a series of deformations. Each deformation
is described in terms of To’s trajectory and size, or in terms of the changes in tube
model parameters as deﬁned in (4.15).

For recognition purposes, this continuous process of deformation is later modeled
using several independent stochastic processes, that are functions of the length of the
underlying object, or simply time. Each process corresponds to one term in (4.15).
Together with the stochastic process modeling (discussed in Section 4.4.5), we call
(4.16) a generalized stochastic tube model. It captures the geometric shape of not only

local tube-like object segments but also the global curvilinearity and elongatedness of

the objects to be modeled.

4.3 Modeling Imaging Processes

Since recognition is an inverse mapping from observable images features to meaningful
instances of objects, it has to apply the knowledge about not only the objects to be
recognized such as shape but also the condition under which the objects are to be
imaged. Therefore, the role of sensor information is essential in object recognition.
In the context of model-based object recognition, while the Generalized Stochastic
TUbe Model describes the geometric structure or shape properties of tubular objects,
to ca’Pture the shape information from observables, sensing process also needs to be
expliCitly modeled. In this study, we deal with two types of sensing conditions, one
is photometric sensing condition and the other is MRA imaging process. Together
With the geometric shape model developed previously, such developed sensing models

Will be integrated in the process of mapping observable image features to instances

of t"lbular objects.

105

4.3.1 Photometric Sensor Modeling

Shading of an object in a 2D image is determined by its shape, its position, as well
as the lighting condition, specifically the angles between the normals of the visible
surface of the object and the light source direction [46]. The surface normal at a
surface point p = [:13,g,2]T on local tube Tj, denoted by up, is the cross product of

two vectors v, and vy on the tangent plane at p. They are deﬁned as:

v. = (1,0,1)? = (1.0, 33?,
v,, = (0,1,q)T = (0,1, 2;)?
Then
np = V; x vy = (g, g;,-1)T.
The unit surface normal is obtained by:
. np __ (1M1, -1)T

 

 

n = _ . 4.18)
p Inpl x/1+p’+qz (

Suppose the light source is located at [x,, y,, 2,]T. Then the light source direction at

surface point p = [:r, y, z]T is:

r T

 

8 —’ a (4.19)
y/Pi + (13+ 7‘3
Where,
Ps=$-£B,, (lazy—ya, 1322—23. (4.20)

The iIltensity value at the corresponding point [.r’, y’, 2’]T in the shadow plane (where

I
[1. ’ 3": z’]T is the projected point of 3D point p on the image plane) is pr0portional

106

to the dot product of rip and ri,, i.e.,

1(x', y’, 2') oc rip o Ii, = cos 7, (4.21)

where 7 is the angle between surface normal rip and the light source direction Ii, at
point [:r, y, z]T. Considering that only part of the surface can be seen, I (:r' , y’ , z’) is
deﬁned whenever cosy Z 0. Figure 4.6 shows the intensity proﬁles extracted from
real data of plant roots. It can be seen that the proﬁles have shapes close to a cosine

function.

 

 

 

 

 

 

 

 

Figul‘e 4.6. Intensity proﬁles along the cross sections of plant roots. (a) unnormalized
Proﬁles over 15 cross sections, (b) normalized proﬁles over 15 cross sections.

4332 MRA Sensor Modeling For Blood Vessels

Blood ﬂow usually exists only within vessels. Consider a cross section of a blood

vessel. Ideally, maximum velocity of blood ﬂow occurs near the center of the cross

107

section and zero velocity occurs near the boundary of the cross section. This ideal
ﬂow pattern is depicted in Figure 4.7. An example of blood ﬂow pattern from a
chosen cross section of a real blood vessel is shown in Figure 4.8(a). To utilize this
information in identifying blood vessels, a bivariate Gaussian density function, erected
from a cross section with an arbitrary size and orientation, is employed to model the
ideal blood ﬂow pattern. Formally, in UVW, on every cross section of a tube that is
parallel to the UW plane, the following Gaussian density function models the speed
distribution of the blood ﬂow on the cross section:

b(u) = I x [—e:rp(—%uTZ'lu)], O S v 31;, (4.22)

where I is a scaling factor corresponding to the height between the maximum and
minimum blood ﬂow speed, u = [u, v, w] is a point on the cross section in the object-
centered coordinate system, and 2 is a 2 X 2 symmetric covariance matrix with
diagonal elements equal to the square of r/ 3.0 and others equal to zero, where r
is the radius of the cross section. In this model, on the plane of a cross section, the
speed measurements form a flow ﬁeld, consisting of a set of concentric equal energy
rings with respect to the center of the cross section, where the sensor measurement
at each ring is determined by the Gaussian density function value, modeling the ﬂow
speed. If a 95% percentile is used to truncate the tails of the Gaussian, Figure 4.8(b)
displays a theoretical model for the blood ﬂow. With an appropriately estimated I,
(422) speciﬁes an expected conﬁguration of the sensor measurements within a cross
SeCtion. Since the model in (4.22) is expressed in UVW which is determined by the
or ientation of the object and the radius is used to constrain the distribution, (4.22)

integrates geometric shape model with the ﬂow model.

108

C

W
0

Figure 4.7. Ideal blood ﬂow.

 

 

4.4 Model-Based Tubular Object Extraction

The models developed in the previous two sections characterize different aspects of the
objects to be recognized. In this section, we designed a model-based system in which

these models are utilized in an integrated fashion during the incremental process of

recognition.

4.4.1 4-Stage Recognition Strategy

The vision system designed for recognizing tubular objects consists of four stages:
initial segmentation, automatic seeding, local recognition, and global object recovery.
Each stage aims at identifying certain types of perceptual events that are distinct
in recognizing tubular objects. An overall recognition is achieved in an incremental
f«‘IShion. The purpose of the initial segmentation is to identify the points in an input
whiCh are (1) likely to represent object points as opposed to background and (2) likely
to represent object boundary. Seeding is to extract meaningful perceptual events
that are to be used to hypothesize the poses of segments of objects. Such perceptual
eveIlts are identiﬁed by ﬁtting parameterized shape model to boundaries of the regions

where the initial segmentation produces clusters of points with high intensity. Such

109

 

(a) (b)

Figure 4.8. The sensor measurements conﬁguration on a cross section within a blood
vessel in (a) real situation, (b) theoretical model.

generated hypotheses are veriﬁed at the stage of local recognition in which geometric
shape model is integrated with the sensing model under the framework of optimal
ﬁlters. The goal of model-based seeding and local recognition is to extract a set of
reliable object segments, serving as good initial guesses of the entire objects, from
which the tubular objects can be recovered via a global sweeping process.

The system diagram of this 4—stage recognition strategy is shown in Figure 4.9. It
is hierarchical, model-based, and the sensor information is explicitly used during the

Process of recognition. In the following sections, each stage of processing is introduced

in detail.

4 - 4 . 2 Initial Segmentation

Due to the nature of the two types of sensing conditions dealt with in this study,

t . . . . . .
he Sensor measurements w1th1n object reglons are, in general, higher than the sensor

110

 

Global
Recognition

1.

 

 

Local
Recognition

 

Object
Geometric

Model Sensing
) Model

 

 

 

Seeding

 

 

 

 

 

 

 

 

Initial
Classiﬁcation

 

ll ill/ll

 

 

 

Figure 4.9. The model-based 4-stage tubular object recognition system diagram.

measurements from non-object regions. Empirical study shows that both the sensor
measurements from object and background regions often have statistical properties

close to Gaussian distribution. For 2D case, Figure 4.10 (a) and (b) illustrate a 2D

         

(a) («1)

Figure 4.10. An example of initial segmentation on a 2D intensity image of plant
l'OOts. (a) An intensity image of plant roots, (b) intensity histogram of (a), (c) initial
segmentation result on (a), ((1) initial edge detection on (a).

root image and its intensity histogram. Both background and root regions have his-
tOgra-Ins shape like Gaussian distribution and together they form a histogram similar

to a. bimodal Gaussian distribution. For 3D case, Figure 4.11(a) is a subvolume of 3D

111

MRA input consisting of only non-vessel voxels and (b) is the corresponding inten-
sity histogram. Due to the fact that blood vessels occupy much smaller volume than
the background tissues, the second mode of histogram corresponding to the intensity
distribution of vessel regions is much smaller. Therefore, the histograms established

based on 3D inputs present the shape similar to a unimodal Gaussian distribution.

   

(b)

Figure 4.11. An example of 3D subvolume of blood vessels from MRA scan. (a) A
3D subvolume projected using MRI, (b) intensity histogram of (a).

At the initial stage of detection, the statistical properties of the sensor data can
be estimated by ﬁtting an appropriate Gaussian distribution function to the intensity
hiStogl‘arn established from input. In 2D case, we use a bimodal Gaussian distribution
flinction which simultaneously captures two distributions: G(I ; 110,02) representing
the distribution of the sensor measurements of objects and G(I; wharf) representing
the distribution of the sensor measurements from background. Here, I is a random

Variable Whose values represent sensed intensities. Denote the intensity histogram by

h(i) where i is a speciﬁc intensity value and h(i) is the number of points in the input

112

that have intensity gray level i. A bimodal Gaussian distribution function

 

 

k k = 1,2 (4.23)
k ”k
is ﬁtted with h(i) where
1 ‘*
u —— — h I X I
k N. I; ( > >
1 ‘*
: _ __ 2
0:. N1. gill“) X (1 “kl ]
Nk X O'k tk .
Pk = "“!22’ N :: h 2 '
tk —(I2xcrk k i=§l ()
[:1

If the bimodal test is passed, the optimization, with respect to minimizing classiﬁ-
cation error, is performed by determining an intensity threshold I, using the Bayes

decision rule to minimize the classiﬁcation error. That is, It is determined such that:
C(It; umog) = G(It; what?)

This threshold value is used to initially classify all the pixels into classes of interior
alid exterior regions of tubular objects. Figure 4.10(c) and ((1) show the initial seg-
ITlentation result and initially detected edges using a Canny edge detector[21] on the
input image displayed in (a) of the same ﬁgure.

In 3D case, the statistical properties of the sensor data from non-object regions
Can be estimated by ﬁtting a unimodal Gaussian distribution function, G(I; ub,ab),
with the left mode of the intensity histogram of the input. It is a ﬁtting process
SiInilar to (4.23) except here k = 1. The ﬁtted parameters are used to automatically
determine a threshold:

It:ub+2'aba

that is to be used to initially classify all the voxels into classes of interior and exterior

113

regions of tubular objects.

Due to the known pattern of blood ﬂows, the gradient magnitudes of interior vox-
els are expected to be high compared to the gradient magnitudes of exterior voxels.
The statistical properties of background gradient magnitudes can be estimated by
ﬁtting a unimodal Gaussian with the left mode of the histogram of gradient mag-
nitudes established using the 3D edge (surface) operator proposed by Zucker and
Hummel[107]. A threshold of magnitude, m,, can then be determined in a similar

manner:

mt=um+2-om,

where um and am are the parameter values of the ﬁtted Gaussian distribution function,
G (m; um, am). By combining the two thresholds, It and mt, blood vessel surface points
can be initially identiﬁed. A 3D point p at (:r, y, z) is initially classiﬁed as a surface

point if and only if:
I(p) 2 It, and m(p) 2 rm,

and

3p', p' :,£ p, such that I(p') < It and p’ N6 p,

Where I () and m(.) are the intensity and magnitude measurements, respectively, and
[x36 denotes a 6-connectivity spatial relationship in 3D space.

Figure 4.12 and Figure 4.13 show examples of initial processing results on a 3D
Volumetric data. Figure 4.12 gives the initial separation of interior and exterior
r egions for the input shown in Figure 4.12(a). Figure 4.13 illustrates initially detected
vessel surfaces, rendered as range images from four different views (90 degrees apart

horizontally). In Figure 4.13 the brighter the points are, the closer the surface points
a"? e to the viewer. Due to noise, the initial classiﬁcation of interior, exterior, and

S 1lrface point is not reliable and it can be used only as partial evidence.

 

Figure 4.13. The images of initially detected surfaces from an MRI subvolume, viewed
from front, left, back, and right, respectively.

4.4.3 Automatic Seeding

A robust recovery of the invariant features of tube surface is needed to enable a
reliable estimation of local tubes. We call this process seeding or object localization.
Although the initial segmentation discussed previously may not provide a good global
Separation of object interior, exterior, and boundary, it is useful in locating a small
set of reliable seeds. Below, we describe how useful surface features in both 2D and

3D cases are extracted and how they are used to localize segments of objects.

115

Seeding in 2D Images

A tubular object perceived in a 2D image plane can be necessarily (but not sufﬁ-
ciently!) located by identifying the corresponding parallel ribbon. We ﬁrst deﬁne
parallel ribbons and then show how they can provide well constrained situations for
localizing 2D tubes.

Parallel ribbons are deﬁned as pairs of parallel symmetric contours. Parallel sym-
metry was proposed in [102] and deﬁned mathematically in [79]. Following their
notations, in a continuous domain, assume two curves CJ- and Ck parameterized by

Curve length 3:

01(3) = (15(3), 30(3)); Ck(3) : ($k(5)9yk(3))s

Where 01(3) and 0;,(3) are the tangent orientations along the two curves. CJ- and 0;,
are parallel symmetric if and only if there exists a continuous monotonic function f (s)

Such that

its) = we».

Being identiﬁed as parallel symmetry, C J- and Ck form a parallel ribbon, denoted by
R8,“. The symmetry axis A.- of RB.- is the collection of the middle points between
CJ-(s) and Ck(s) for all s where f(s) exists [79].

Two spatial relations, namely overlap, denoted by $20, and enclose, denoted by
33., between a pair of parallel symmetric curves are proven to be sufﬁcient conditions
for ﬁnding a parallel ribbon [98]. These two relationships will yield a region, called
overlapping region, that corresponds to a ribbon RBg. Figure 4.14 illustrates these
two relationships and the shaded areas in this ﬁgure are the overlapping regions. A
special case of a parallel ribbon is straight parallel ribbons. In this case, both Cj and

C1. are straight lines characterized by 03(3) = 0;,(3) = 0, where 0 is a constant.

116

   

(b)

Figure 4.14. Examples of overlapping (a) and enclosed (b) relationships.

The features of RB.- provide constraints in localizing a tube. A 2D straight ribbon
can be characterized by its center (mo,yo).-, its orientation 0;, its radius r;, and its
length l,-. As we can see, these features are directly related to the parameters of a 2D
tube. Therefore, each ribbon yields a hypothesis, denoted by Hg, for a tube TB;

H, = {c?’,0?’,r?”,l]"} (4.24)

{C?’,d?”.r$".l?”}, (4.25)

where d?” is the hypothesized orientation vector and the hypothesized tube model

parameters can be computed as

c, _ area(RB,-) rag-FE.- p ’

,_ = 01(6) + (h(s)
t 2 1
1
r,- = §d(CJ‘, Ck),

1.: min{l(Cj)J(Ck)},

117 .

where p’ is a point within ribbon R8,, d(CJ-, Ck) evaluates the distance between the
two lines, and 1(0) measures the length of line C. Whether H.- corresponds to a 2D

tube depends on the veriﬁcation of the intensity conﬁguration of the pixels within

RB, .

Seeding in 3D Volumes

For an initially classiﬁed surface voxel p0 = {:ro,y0,zo}, deﬁne a small spherical

neighborhood on around p0:
on : {p = ((13,352) i ll P0 — p ll-<- C},

where e is the size of the spherical neighborhood. All the initial surface points within
on are used to compute the surface features at p0. A local coordinate system X’Y’Z’
is established using the principle components method[35]. The origin of X’Y’Z’ is
located at p0 and the Z' axis is perpendicular to the tangent plane of po. The surface
points collected from on are projected into this local coordinate system and a bicubic

surface is ﬁtted to this small surface patch using a least—square solution[35]:

Z, : f(xl,yl) : “133,3 + agxay’ + asxlyﬂ + (My/3

0530 + Gs$’y’ + (173/2 + 0833’ + 093/, + 010-

The Hessian matrix H is directly related to the measure of the Gaussian curvature,

”G, and the mean curvature, MM:

1
[CG =| H I, KM = 5 - trace(H).

118

The principle curvatures can then be obtained from:
5min = min(|n1I,In2|),

I"imam: : max(| ”'1 ii] K2 i)?

where

531: KM + ﬂag; — NM, and
n2 2 KM — V520 —h:M.

At the origin of X’Y’Z’ (p0), the two principle curvature directions are derived from

the linear combination of the basis vectors, v; and v;,:

 

(a — c+ \/(a —c)2 +4b2)v;. +2bvg, a 2 c

vmin =

 

2bv§c+(a—c—\/(a—c)2+4b2)-v;, a<c

 

—2bv;c + (a —c— \/(a—c)2 +4b2)v;, a Z c
(a—c—\/(a—c)2+4b2)v;+2bv;, a<c

Vmar =

 

with
.r’rc’ 0,0
a = r._;__> =
b : fr’y’va) = fy'x’mvol = a_6
2 2 2 i
c = fv_u;9_‘1). __. a,

Figln‘e 4.15 and Figure 4.16 display a few examples of such detected principle curva—

ture pairs for the front and the back views of the blood vessel shown in Figure 4.13.

with computed surface features, a 3D cross section can be inferred based on

the relationships speciﬁed in equations (4.11), (4.12), and (4.13) in Section 4.2.2.

119

 

Figure 4.15. Examples of the detected principle curvatures from the front view of the
vessel shown previously. Leftzdepth image of the surface; Middle:minimum curvature
direction vectors; Rightzmaximum curvature direction vectors.

 

Figure 4.16. Examples of the detected principle curvatures from the the back
View. Leftzdepth image of the surface; Middle:minimum curvature direction vectors;
Rightzmaximum curvature direction vectors.

To improve the reliability of the automatically generated hypotheses, our seeding
method demands that a. local tube only be hypothesized collectively by a set of
surface points that result in consistent estimates of model parameter for the same
“038 section. Each such set of consistent surface points gathered from a 3D thin
disk representing a hypothesized cross section is ﬁtted with a 3D parametric circle to

reﬁne the hypothesis. Denote the hypothesis for a tube T.- by Hg,
H:- = {C?’,d?’,r?’,l?’}, (4.26)

that provides the estimated position, size, and orientation of a local tube along a

120

blood vessel.

For a tube hypothesis Hg, initiated from either a 2D ribbon or a 3D cross section,

the corresponding axis of the hypothesized tube,
hp hp
At = {“4131 a "'v Aisbﬂs}l
can be conveniently derived from the hypothesized model parameters:
)2 h . h .
Ag? = Cip +1 .dipa 0 S] S 1i,

Where A13 is a point on the hypothesized axis A?” . Once hypothesis H.- is veriﬁed, it
represents a local tube, serving as a seed to grow into a global recognition. Compared
to the manual or semi-automatic heuristic—based seed ﬁnding methods frequently

employed in the literature, our seeding method is model-based and automatic.

4.4-4 Local Recognition

Hypotheses of local tubes established based on invariant surface features need to
be Veriﬁed. Since veriﬁed seeds represent only segments of objects, this stage of
Processing is called local recognition. Based on the hypothesized geometric shape of
an Object segment, this stage of processing not only veriﬁes the hypothesis but also
Optimizes the recognition of the corresponding local tube by integrating the region and
boundary information using the sensing model developed in Section 4.3. The goal of
this Combined model-based seeding and integrated local recognition is to locate a set
of Teh'able object segments which serve as the initial guesses or the starting tubes for
pro(lesses that recover the rest of the objects. The incremental model-based strategy
is designed to be robust to combat noise in data with the hope of achieving an efﬁcient

and reliable recovery of objects.

121

Hypothesis Veriﬁcation and Optimization

Each seed is represented by a set of tube model parameters that characterize a rec-
ognized tube. To recognize a local tube, an optimization is performed to obtain best
estimated model parameters. The hypothesis generated based on surface features
serves as an initial guess and the recognition of the underlying tube Tg is optimized

by maximizing the following posterior probability or likelihood:

P(Hif)
P(f)

P(f | Hg)P(H.-)
P(f)

 

max(P(Tg)) = max{ } = mas:{ }

Where f is an image function representing the input and Hg is a hypothesis. Because
the denominator is not a function of Hg, this optimization is equivalent to maximizing

the numerator. That is,

P(Hiaf)

P(f) } o< max{P(f I Hg)P(H.-)}. (4.27)

max{

Term P(Hg) in (4.27) reﬂects the prior knowledge about the shape of the objects of
interest, while P( f | Hg) measures the likelihood of f to have an expected distribution
0f the sensor data, conditioned on the given hypothesis Hg. In effect, Hg speciﬁes a

region on which P( f | Hg) is evaluated. Taking the logarithm of (4.27), we have:
max{ln[P(f I Hg)P(Hg)]} = max{ln[P(f | Hg)] + ln[P(Hg)]}. (4.28)

In general, P (H g) is conditioned on its neighboring tubes due to the causal relationship
b('3‘13‘7W23en adjacent tubes. In this context, hypothesis Hg is viewed as being constructed
from two parts, one is the model parameter setting inherited from an neighboring

t“be, say Tg-1, and the other part is the deformation 55-1,; occurring during the

122

transition from Tg-] to Tg. Because these two parts are independent, we have:
P(H.) = ants-..) = P(T.-.)P(6._.,.-).
As P(Tg_1) is not a function of Hg, we further have
P(Hi) 0‘ P(5i—1,i)- ' (4-29)
Substitute (4.29) into (4.28), we deﬁne
Pi“) = 771015“an” I Hill +1n[P(5:‘—1.4)l}, (4-30)

as the criterion in optimizing the recognition of a local tube. At the stage of local
recognition (i = 0), since there is no previously recognized tube, the second term in
(4.30) vanishes. In this case, local recognition optimizes only ln[P( f I Ho)].

The optimization in (4.30) is often achieved through a geometrical surface ﬁtting
using a ﬁgure of merit designed based on the Euclidean distance. In this case, the
absolute overall distance between a theoretical surface and the data surface directly
affects the similarity measurement even when they actually have an identical shape
but situate at different positions. The theoretical surface has to be gradually moved
towards the data surface until an exact match. Such a process of ﬁtting can be
computationally costly. In this study, we employed the theory of optimal ﬁlters for
matching purpose. With the hypothesis initiated from observable image features,
Optimal ﬁlters can be dynamically generated that avoid the expensive exhaustive
Search scheme often associated with the use of optimal ﬁlters in the literature [23]. In
next section, we brieﬂy introduce the theory of Optimal ﬁlters and then discuss how

it is applied for our recognition purpose.

123

Theory of Optimal Filters and Its Use in Object Recognition

According to the theory of optimum receivers, to recognize a signal r = s + n at the
receiver end, where s is the original signal transmitted from the sender and n is the
noise, Signal-to-Noise-Ratio (SNR) will be maximized when the ﬁlter used to match
with 1‘ equals the input signal 5 [43]. Such a ﬁlter is called a matched ﬁlter or optimal
ﬁlter.

Let us consider one dimensional case. Denote a ﬁlter by f and its input signal by

r(t) = s(t) + n(t). The the ﬁlter output, r'(t), is given by

r’(t) = r(t) * f(t), (4.31)

where at represents a convolution. In discrete case, for a causal input signal and a

Causal ﬁlter, (4.31) becomes

N

T'lNl = 2 7‘Lille - i], (432)

i=0

where r'[N] is the N-th output, r[i] is the i-th sample of r(t), and f[i] is the i-th

SEuriple of f(t). Let

kj = f[N _ Iii (4'33)
We then have
N
r’lN] = 2: arm. (4.34)

which is a weighted sum of all the N input samples. Here, (4.34) gives the signal
associated with a single pixel at N. The corresponding noise n associated with a

discrete point in the ﬁltering process is

I N
n = Elsi-0‘2, (4.35)
i=0

124

where a is the standard deviation of the noise which is uncorrelated from sample to

sample. In order to ensure that the DC component is zero, the following condition

is imposed:
2 kg = 0. (4.36)

With condition (4.36), the signal (S) at a single point in ﬁltering is

N N
s = r’lN] = z W] = 2; arm — c). (4.37)

Where c is an arbitrary constant. Therefore, the expression of signal to noise ratio is:

N
S/n = (2k( (r[j] — c) )(l: £7202). (4.38)

It can be shown that, to maximize signal to noise ratio S/ n, the choice of kg,i =

O, 1 , .., N can be derived directly from the input signal r[i],i = O, 1, .., N. To optimize

S/n, set
(9(5/71)
8kg

 

= 0, 4': 0,1,...,N. (4.39)

Sllbstitute equations (4.35) and (4.37) into (4.39) and solve for kg, we obtain

 

”:0 k} , .
kg: Eli—.02: (Tm _ c) x (r[z] — c),i = 0, ..,N. (4.40)
Let
N 2
2 =0 k] =g({kj}), (4-41)

 

szzo k401i] - C)

VVhich is a function of {kg}. Equation (4.40) becomes

kg = g({l€j}) X (1‘[2]— C), 2: 0,1,...,N. (4.42)

125

Notice, from equation (4.41), we have

N . Ely—0k? N N
2mm — c): ——-{———g; DJ 21.; = g({kJ-}) x 231.jpg] _ c). (4.43)

i=0

Substituting them into equations (4.37) and (4.35), then the signal to noise ratio S/n

can be shown to be independent of g({kj}), i.e.,

 

 

 

 

s 235-iota ({k) }) I(rlz‘] - c) >2
_ = (4.44)
n (ha--019 (.{k })1( r121 — era?

£0014] — .)2 (4.45)

\/Z§v _0(r[i] —- c) )202

Therefore, in choosing kg that maximize S/n, g({kJ-}) can be an arbitrary value with-

out affecting the evaluation of S/n. If we set g({kg}) = 1, from (4.42), we obtain
k,- = r[i] — c, i = 0, 1, N. (4.46)

Based on condition (4.36), we can derive the value of c,

1 N

: N_:T i=0 r[i], (4.47)

Which is the mean of signal samples {r[i]}. Therefore, the choice of optimal ﬁlter f
e(luals the signal itself, up to a translation in signal’s magnitude. Such a translation
will not affect the evaluation of S/n because of condition (4.36). This leads to a
desirable property mentioned earlier: when the theoretical model has an identical
shape as the data, the matching score or the ﬁgure of merit should remain the same
e"en though the two may position differently.

A tube model, either 2D or 3D, rendered according to a given hypothesis Hg

1Iﬁipresented by a set of model parameters to be optimized and the sensor model, can

126

be treated as a matched ﬁlter. The correlation between this ﬁlter and the sensor
data indicates the degree of match in their intensity conﬁgurations. Therefore, max—
imization of P( f | Hg) can be achieved through maximizing SN R. In matching the
sensor data with a series of 3D rendered tube models, the derived SNR’s can be used
to establish an empirical distribution simulating P( f | Hg) and the Hg that corre-
sponds to the best SNR yields the optimized local recognition that we are seeking.
Figure 4.17 shows an example of the range of optimization which looks like a cone.
Figure 4.18(a) plots the values of SNR with respect to different radius of tubes. The
peak value corresponds to the correct radius. Figure 4.18(b) shows the sensitivity of
SNR to the direction of tubes. Similarly, the peak SN R value corresponds to correct
Orientation. Combining both parameters (orientation and radius), Figure 4.19 shows
an example of such an empirical distribution estimated based on some synthetic data.
In Figure 4.19, radius and orientation vary along X and Y axes and the computed

SNR is plotted as the dependent variable along Z axis (height).

 

 

Figure 4.17. Range of optimization.

127

 

 

 

 

 

 

 

 

 

Semitivitynhdius SauitivityioOtiemnion
'3 I I I I I I I I I I I I I U T ﬁ I T l T T
radius=3.0 — radim=3.0 —-
1.2 b name-4.0 ~ 1.2 - mums ~
”$5.0 ----- ruins-15.0 .....
g H g 1.1
E i
‘5 I .......... 3' l
a --------- a
E 0-9 a: 0.9
0.8 0.8
0.7 l J J l l l l l l l l J 1 0.7
l152253354455556657758 8
Radius

 

(a) (b)

Figure 4.18. Sensitivity of SNR to (a) radius, (b) orientation.

Matched Filter Generation

We now discuss how a 2D and a 3D matched ﬁlter can be generated based on a tube
hypothesis and sensor model. Let Mg be a theoretical 3D tube model and M; be the
Corresponding matched ﬁlter to be generated. The process of dynamically generating
a matched ﬁlter can be considered as a mapping, denoted by G), which transforms Mg
into Mg, that is:

Q: MtHMf.

Speciﬁcally, C9 is deﬁned as:

. N o 52 o R(Mg) if 2D input
M! = @(Mt) =

N o S3 o R(Mg) if 3D input
Vvhere R(.) is a 3D volumetric model generator, N is a normalization operator, S 3()

is a 3D rendering (sensing) operator, and Sz(.) is a 2D rendering (shading) operator.

lDenote a 3D volumetric model by M”, a rendered model by Mr, and a normalized,

128

 

Figure 4.19. An example of the empirical distribution of p.

hence ﬁnal, model by NI]. We deﬁne:
RI Mt = Hi H Mu = {(11), 3172)},

Where Hg is a set of model parameter values deﬁned in (4.26) and Mg, is a set of
30 points that satisfy inequality (4.9). To render the 3D volumetric model, the 3D

rendering operator:
335 Mt; H Mr ={($,y,z,1($$y1z))}={($,y,z,b(u1v’w))}’

a-Ssigns each point (2:, y, z) in Mg, 3. value I (:c, y, 2) determined by the expected blood
ﬂow speed determined using (4.22). The relationship between (u, v, w) in the object—
Qentered coordinate system and (2:, y, z) in the world coordinate system can be deter-
11'lined through hypothesized object parameters Cg and dg.

A 2D matched ﬁlter Mr is rendered based on the theory of shading discussed in

Section 4.3.1. The value of M f at a point p = [x,y, z]T in the shadow plane (where

129

z is a constant) is proportional to the dot product of the unit surface normal nip and
the light source direction Ii, at p (see (4.21)). Assuming the number of gray levels is

256, we have
$23 MvHMr = {($3y7z31($ay’z))}?

where
1 + P3P + 93‘]

\/1+P2+‘12\/1+P3+(1§.

 

 

I(x,y,z) = 256 x

The computations of p, q, p,, and q, are described in Section 4.3.1. [(27, y, z) is deﬁned

Whenever I + p,p + q,q Z 0.

Finally, the normalization operator is deﬁned as:

N: MrHMf:{(xayazaj(xayvz))}’

with
A _I $13,}. I 1 ital/,2
I(xvyvz)‘ (1‘,y,z)-u, u— ( y )lT ill-I(l )a

Where a is the mean of the predicted sensor measurements within tube T5. Therefore,

ﬁlter M; is M, normalized such that the expected value of MI is zero.

Figure of Merit

I“ilter M; is matched against input data to determine the similarity between this
d3'namically generated model and the sensor data. An iconic matching is performed
for this purpose. For any particular ﬁlter, the maximum SNR we can possibly obtain
is the autocorrelation (a perfect match). On the other hand, with the estimated
properties of the background intensities distribution, the matching between M, and
the background provides the lower bound of SNR. The Gaussian function G(ub,a§)
es-tima.ted in Section 3 can now be utilized to determine such a lower bound with

beepect to each hypothesis. Assume a background patch 39(2), 3]) is created based on

 

130

G('ub,a§). Set a range for SNR be [SNRf-,SNR}‘] where
SNR: = Z M, x 3,, SNR? = Z MK.

Note, this range is adaptive to both input data and individual hypotheses. With this

adaptive range, we deﬁne a ﬁgure of merit to be:

(2.) _ SNR — SNR:
‘0 ' ‘ SNR: —SNRf-’

Where 0 3 p(2) S 1. We use 39(2) as an estimate of P(f | H.) Therefore, at i = 0:

P'(0) = ma${1nlp(0)l}- (4-48)
A system parameter, 1;, is chosen as the validity threshold for veriﬁcation: H,- is

accepted if and only if p*(0) 2 17. Such a matching is local, hence its computational

Cost is reasonable.

Since a matched ﬁlter is determined by the models of both the object shape and the
Sensor used, this local recognition method veriﬁes, simultaneously, both the expected

geometric shape of a tube and the expected conﬁguration of the sensor measurements.

4.4.5 Global Recognition

Identiﬁed local tubes are isolated segments of objects each of which is used as a
St'arting point of a sweeping process at the stage of global recognition whose goal is to
a-(111ieve a recovery of the entire underlying objects. Initiated from each starting tube,

global recognition sweeps the seed in space along the underlying object according to

t he criteria discussed below.

131
Recursive Optimization

The overall structure of a tubular object is considered to be formed by a starting tube
traveling in 3D space and undergoing a series of deformations. In order to recover
the trajectory and the deformation along the trajectory, we designed an optimization
scheme with respect to two criteria, one is the goodness-of-ﬁt between the sensor
data and the model for sensed information, generated based on the deformed object
parameters, and the other is the smoothness of the trajectory.

Consider a tubular object E, = {Tb ...,T1,To}, its recovery is through an opti-

mi zation:

max(P(Tk, ..,To)), (4.49)

Where To is the seed tube. The task of global recognition is to ﬁnd a series of tran-
Sitions, starting from a seed, so that P(Tk, .., To) is maximized. The model adopted
for tubular objects deﬁnes a causal system in which the alphabet of T.- is the set
of all combinations of object parameters in a discrete optimization range of a cone
determined by a tube hypothesis. In general, the probability for a particular tube
i to occur is related to the presence of the previously detected tubes at j <2 i or

P (T,- | T,._1,.., To). According to the Markov assumption,

P(Ti I Ti—h-uTO) = P(Ti I Ti—l)- (4-50)
Applying the chain rule, we obtain:

P(Tk,..,T1) = Hf=1P(T,- | TH).

I‘Iere, P(T; I T;_1) is the probability of the transition from T,_1 to T,'. Using the

132

logarithmic form, we have

I:
max{P(Tk,..,To)} 0< max{Zln[P(Tg I Ti—1)l}

i=0

I:
= Emax{ln[P(Ti I T;_1)]}. (4-51)

In Section 4.4.4, we have deﬁned p"(i) to be the criterion to achieve an optimal local

recognition. Applying it here, we deﬁne the criterion of an optimal global recovery

to be:
I: k
p*(k...,o> = gmawnwm l 7*.-.)1} = 23pm)
k
= :maxﬂnlmf I HO] +1n[P(6.-_1,.-)]}
+ma:c{ln[P(f I Ho)]}. (4.52)

Note, max{ln[P( f | H0)]} is accomplished in the local recognition stage. Hence, the

task of global recognition is to maximize:

k
p‘(k, .., 1) = Zma${ln[P(f | Hg” + ln[P(6,-_1,,')]}. (4.53)

i=1

This can also be written in a recursive form:

p*(k, .., 1) = p"(k — 1, .., 1) + ma:1:{ln[P(T;c | Tk_1)]}

p'(k - 1, 1) + ma${1n[P(f I Hk)l}

+max{ln[P(6k_1,k)]}. (4.54)

Therefore, sweeping is a recursive operation and the global recognition problem is a
recursive optimization problem which leads to a constructive recovery scheme.

The two optimization terms in (4.53) correspond to the two criteria discussed

133

earlier. The first reﬂects the goodness-of-ﬁt and the second is related to choosing a
transition that agrees with what we know a priori (or learned) about how object shape
typically deforms in space. The overall optimization expressed by (4.53) reaches a best
balance between the two criteria. The optimization of the ﬁrst criterion is equivalent
to the maximization of the ﬁgure of merit deﬁned in Section 4.4.4 and the optimization

scheme introduced there can be directly applied. That is,
p’(k, .., 1) = p'(k — 1, .., 1) + max{ln[p(i)]} + ma${ln[P(6,-_1,,-)]}. (4.55)

Next, we discuss the issues related to the optimization of the second term, ln[P(6,'_1,,)].

Sweeping: A Stochastic Process

Recall, the deformation of a tube is described in terms of several random variables
(deﬁned in (4.15)) each of which is a function of time and corresponds to one object
parameter of the geometric tube model. Since the translation shift 6,21,,- from T,_1 to

T,- can be determined from the parameters of T,_1 by:
619-13 : li—l 0 di-la

it does not affect the optimization, we discard it in our further discussion. Lacking
otherwise knowledge, we assume the independent deformation behavior of different

model parameters, that is,

P(6.-_1..-)=P(6?..,.,6*' 63-1,.)=P(6?_.,.-)P(6*' )P(6L.,.-). (4.56)

3—l,iv t—1,i 1
So, we have,

1n[P(6e—1..-)] =1an(69..,.-)1 +1n[P(6*‘ )1 +1n[P(6!-.,.->1.

:—l,i

134

Here, 631g, ,-_1, ,, and 6,-l_ ,are random variables in independent stochastic processes.
Their statistical properties are determined by the geometric characteristics of the
objects and can be either set a priori or learned during the sweeping.
Our previously reported work [47, 48] examined how the three distributions,
P6(d ,-_ _1 ,-), P(,-"_1',-), and P(6,1_1,,-), can be emipirically estimated based on an on-line
maintained sweeping history. Deﬁne a weighted sweeping history, denoted by H w, u
to time i:

Hw = {(Wi—la Iii—1): ..., (W13 ’11)}?

where h,- = {61.141} is the jth piece of history and W,- is the weight that speciﬁes the

impact of h, on the current sweeping front. Weight W,- for 1 S j < i is deﬁned as:

G

W- = ,
J theH‘” 1" X Zk>=j 1"

 

lngi—L

Wj is small whenever tube j is relatively insigniﬁcant (short in length) in history
or tube j is relatively far away from the current sweeping front at i.

Based on this history, the three distributions can be empirically estimated. For
each independent term P(A) in (4.56), A E {6,-_,6 ,-, f_ -1 ,-, 6,-_ _1 ,-}, deﬁne E? to be the
expected value of A and 0,4 to be the standard deviation of A estimated at moment

i. Their unbiased estimates from the weighted history are:

E-A = Z Wj X Aj,j_1, (4.57)

t
1$j<i

 

z —

 

E,- and a? characterize the distribution of A E {6,-_1 ,-, 6L1, ,-, 6,_l ,-} or the statistical
properties of the stochastic process associated with A. All three distributions together

describe the shape dynamics of the object to be recovered. The estimation of these

135

distributions can also be performed using the mechanism of a Kalman ﬁlter.

Adaptive Control of Sweeping

The empirically estimated statistical properties of shape deformation of objects can
be utilized to develop an adaptive control strategy for the sweeping. If certain degree
of smoothness is desired for recognized objects, it can be achieved by limiting each
A E {6,-_1 ,-, 6,-__ 1. ,-,6 _1, } within some constrained range.

Consider the transition between Tg-] and Tg. We can establish certain bounds for

A’s using the estimated Ef’s and of’s. Deﬁne
8?. = 1A1”. Are]

to be the dynamic bound for A at moment i. We can set:
A?” = E? —1'°;” 0?,

A??? ___. EA + yuppo. A

Here, A’s are the parameters that specify the degree of smoothness in sweep. A
sweeping at moment i enforces that every A E {6,d_1,,-, {_1 ,, 6,-_1 ,-} falls within its

dynamic bound, i.e.,

A68;

Therefore, tube i can be extended from tube i - 1 along the direction of a best-ﬁt if
and only if the deformation the transition causes is within the bound speciﬁed.
Parameters AIM." and A”? can be determined adaptively if prior knowledge about
the object shape dynamics is available. We propose an off-line training procedure in
which the knowledge can be learned. All the A’s can be initially set to large values so

that the system is not biased. A set of representative input data is provided to the

136

system and tubular objects are recognized by the system through sweeping. Statistics
EA’s and oA’s are collected. These statistics, learned from the training, reﬂect the
typical shape dynamics of the objects contained in the training data. Such learned
knowledge can be used to determine the A’s. If a > 95% conﬁdence interval is used,

we can set:

E? _ 11:30? = EA _ 20A,
E? + 123%? = EA + 20“.
At each step of sweeping, we can adaptively compute A’s as:

E? — EA +20A

A 3
0.“

EA _ E? + 20A

0',A ’

 

 

low 111
AA; = AAip :
where EA and 0A are the statistics learned from off-line training and E? and or:3 are
the statistics from on-line sweeping history. Therefore, the derivation of bound 3,3,,
is automatic, dynamic, and adaptive throughout the entire process of global object

recovery.

4.5 Summary

A model-based recognition scheme for tubular objects is designed in this chapter. The
explicit integration of the models that characterize different aspects of the objects is
intended to improve the reliability of the recognition. Shape information is combined
with the sensor measurement conﬁguration, yielding the potential for a recovery of
objects that is both more reliable and more precise than what is achievable using
a single feature. The incremental reconstruction approach allows a continuous re-
ﬁnement of recognition. By using a parametric form of the geometric model, when

objects are recognized they are simultaneously described and quantiﬁed.

137

In the designed recognition scheme, various classical vision problems are encoun-
tered. Examples are initial simple segmentation, 2D line and ribbon structure de-
tection, 3D surface ﬁtting, 2D and 3D matching, and various feature extraction
problems. In the next section, we will show how these typical vision problems can
be posed as grouping problems, and how the recognition strategy discussed in this

chapter can be realized under the framework of Hierarchical Token Grouping.

CHAPTER 5

Tubular Object Recognition Via
HTG

In Chapter 4, we developed a 4-stage recognition strategy for the problem of tubular
object recognition. In this chapter, this recognition strategy is posed as a hierarchi-
cal token grouping problem and the detailed system design under this paradigm is
discussed. In model-based recognition of tubular objects, various conventional vision
problems are encountered such as thresholding, matching, and segmenting instances
of objects. This chapter shows how the methodologies to solve these problems can be
realized under the paradigm of hierarchical token grouping.

In posing a speciﬁc problem as a token grouping problem, the following issues
are considered: representation problems, recognition methodologies, integration of
information, and interaction among different modules. Speciﬁcally, token representa-
tion for each type of perceptual entity needs to be designed. Different methodologies
that extract various perceptual entities are designed as grouping agents with their
input, output, and grouping criterion deﬁned. Such deﬁned grouping agents provide
information about the interactions so that the system communication network can be
extracted from their speciﬁcations. These will be discussed in detail in the subsequent

sections.

138

139
5.1 Representation Hierarchy

In a vision system built under the architecture of hierarchical token grouping, there are
two hierarchies, one is top-down, representing knowledge, and the other is bottom-up,
representing the organization of the objects to be recognized. Although their content
differs, the two are similarly structured. In the top-down hierarchy, each level holds
the knowledge related to the perceptual entities at the corresponding level of the
bottom—up hierarchy. In the bottom-up hierarchy, each level holds the instances of a
particular type of perceptual entity, each characterized by a set of features. Below,

we design these two hierarchies for the problem of tubular object recognition.

5.1.1 Object Model Decomposition

In a hierarchical paradigm, a model-based recognition strategy is to use the model
information decomposed at different levels to assist the process of organizing visual
data. To recognize tubular objects via hierarchical token grouping, the object model
for this class of objects needs to be decomposed. The generalized stochastic tube
model developed in Chapter 4 captures the geometric shape of a tubular object, ap-
proximating an object as a set of connected tubes, each of which is modeled by a
RSHGC. Each RSHGC can be further decomposed into its boundary and interior
region. Both object boundaries and interior regions consist of a set of image points.
Such a hierarchical decomposition of the generalized tube model is shown in Fig-
ure 5.1.

Since both 2D and 3D data will be processed, the decomposition can be made
more speciﬁc for both cases as shown in Figure 5.2. In the 3D case, a straight tube is
a volume enclosed in a cylindrical surface which consists of a set of 3D cross sections
each of which is composed of a collection of connected surface voxels. In the 2D

case, the decomposition is based on the perceptual regularities when projecting a 3D

140

 

Elongated Object s

\I’

Straight Tubes

 

 

 

 

 

 

 

 

 

 

Tube Boundary Tube Region

 

 

 

 

 

 

 

 

 

 

Image Points

 

 

 

Figure 5.1. Decomposed object model.

tube onto a 2D image plane. The visible surface of a 3D cylinder is projected onto a
rectangular region in a 2D plane and the shadow produces two straight lines that are
parallel. Therefore, the 2D image of a 3D straight tube consists of a straight ribbon
and its interior region, where the ribbon is composed of two parallel straight lines,
individually formed by a set of edge points. During the process of recognition, which

branch of the hierarchy in Figure 5.2 is used depends on the type of input (or sensor

used).

5.1.2 Tokens and Token Hierarchy

Having decomposed the object model into different levels of abstraction, a hierarchy
of tokens can be designed accordingly because the correspondence between the decom-
posed model and tokens facilitates the recognition task at different levels. We classify

tokens into three categories based on their geometrical dimensionality. Within each

141

 

Elongated Object :

1

Straight Tubes

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

3D 20
1 \V 1 \1
JD Region: Cylinder Surface 20 regions Ribbon:

1 1 1 1

3D Object Points Cross Section: 20 Object Points 501138,” lines
1 1

Surface Voxel Boundary Pixel:

Voxels E Pixel:

 

 

 

 

 

 

Figure 5.2. Hierarchy of decomposed object model.

category, there is more than one kind of token.

Category I contains tokens of zero-dimension, called point tokens, including orig-
inal input point tokens, either for pixels or for voxels, boundary point tokens, either
2D edge or 3D surface points, background point tokens, as well as object point to-
kens. Category 11 contains four types of two-dimensional tokens, corresponding to
perceptual entities of connected component, straight line, curve, and ribbon. All
three-dimensional tokens belong to category III. They are cross sections, tubes, and

tubular objects. These three categories of tokens are listed below.
0 Category I: Point Tokens

TP‘ : the set of input point tokens.

de : the set of boundary point tokens.

142

Ti" : the set of interior point tokens.

Tbkg : the set of background point tokens.
0 Category II: 2D Tokens

TCc : the set of 2D connected component tokens.
TI : the set of straight line tokens.
Tc : the set of curvilinear structure tokens.

Trb : the set of ribbon tokens.
0 Category III: 3D Tokens

Tac : the set of cross section tokens.
Ttb : the set of tube tokens.

Tel : the set of tubular object tokens.

These tokens form a token space V at different levels of abstraction which can be

expressed using a set of names for the above deﬁned discrete levels:

V: {pt, bkg, itr, cc, bd, I, c, rb, 3:,tb, el}.

In Chapter 2, each token is characterized by its organization, its feature set, and its
deformation characterization. We now introduce the design of each type of token

listed above.

Point Tokens

In this category, a token represents a single cell, pixel or voxel. Each cell in an

input image is represented as an input point token described by a set of property

143

features and some spatial relationships such as n-connectivity. Due to the zero-
dimensionality of point tokens, the organization of any token in this category has a

singleton component set.

Input point token: An input point token is designed as:

 

Tipt = {CFtaFiptaDiDtla
with component set C3” = {TF} and the feature set F,“ = P,” U Rf” where
Pipt : {X,I,m,gd}, and th:(Mna{(j1ptaai.j)})a aid = d,

where x represents the coordinates of the cell to be represented, I is the sensor
measurement at x, m and g; are the gradient magnitude and the gradient direction
at x, and N" is a spatial n-connectivity relationship. In the 2D case, n is 4 or 8. In
the 3D case, n is 6 or 26. For each n, token T," is related to 17. other point tokens;
that is, for each N", set {( j, pt, a,-,j)} contains n elements, each of which is a 3-tuple
(j, pt, am) that relates token T," to another input point token T3?” (its neighbor) with a
distance measure d between the two tokens as the attribute of this particular relation.
For example, when n = 4, token T,“ is related to four other input point tokens and
all four attributes (d’s) have distance of one. This example is shown in Figure 5.3(a).
Another case is for n = 8 where four of the eight distance measures equal one and
the other four equal V2. These features are useful for higher level token grouping.

The deformation characterization for the tokens at this level is null.

Boundary point token: A boundary point token, denoted by T,bd, represents a single

 

cell that is initially identiﬁed as a point on the object boundary. Such a token has a

component set with a single input point token Cf"i = {Tft}. Its property feature set

 

Figure 5.3. Examples of spatial relationships among point tokens. (a) 4-connectivity
relation among input point tokens, (b) 8—connectivity relation among 3 boundary
point tokens.

is deﬁned as:

P“ _ {x,1,m,gd, k} if 2D case

i {x, I, m, gd, kM, mg, ngn, km”, vmin, vmax, n} if 3D case
For both 2D and 3D input, the property feature set contains coordinates x of the
point, sensor measurement I , gradient magnitude m, gradient direction gd, and the
curvature is at that point in 2D case. In the 3D case, the property feature set also
includes a set of invariant surface features. They are mean and Gaussian curvatures
(kM, kg), minimum and maximum curvatures (ngn, km”), minimum and maximum
curvature directions (vmimvmax), and inward surface normal (n).

The relational feature set for a boundary point token is designed as:

Rid = {94”, {(j, bd,a.,,~)}}.

The n-connectivity relationship is with respect to two other boundary point tokens
(rather than to other point tokens). Figure 5.3(b) shows an example in which the
shaded cells are boundary point tokens. The token at the center is T,“l and it is related

to two other boundary point tokens, T}? and T3" under an 8-connectivity relation

32’

(M8). The value of n is 8 for the 2D case and 26 for the 3D case. The attribute set

145

for an n-connectivity relation is deﬁned as:

0M = {t1 ‘1}:

where t is a tangent vector, indicating the linking direction, and d is the distance
between the two points. There are no deformation features deﬁned for this class of

tokens.

Interior point token: An interior point token, denoted by TE”, represents a single

 

cell that is initially identiﬁed as an interior point of an object. Similar to a boundary
point token, its component set contains only one input point token, Cf" = {TJPt}. Its

property feature set is deﬁned as: Pi" = {x, I}. The relational feature set is:

12:“ = {04", {( j, bd, am), (k, m, a,,,,), (1, bkg, a”) }}.

That is, the relationship is established with all its n-connected neighboring point
tokens that can be boundary, interior, or background point tokens. The attribute is
the distance between two points. Again, the value of n depends on the type of input

image.

Background point token A background point token, denoted by 71,-bkg, represents a

 

single cell that is initially identiﬁed as a background point. Its component set, feature
set, and its deformation characterization are all deﬁned similar to those designed for

an interior point token.

2D Tokens

In this category, each type of token represents a 2D perceptual entity, including a

connected component token, a line token, a curve token, and a ribbon token.

146

Connected component token: A connected component token T,-CC represents a con-

 

nected 2D region. Its component set contains a group of all 8—connected interior point

tokens. Therefore,

Cfc : {TJitr I GitT(C,-itr)},

where grouping criterion G‘" will be discussed in Section 5.3. Each such region is

characterized by its feature set which is designed as:
Fix : Picc U Ric = Rec : {C, Iavga Area},

with c the centroid of the region, [avg the average sensor measurement, and Area the
area of the region. There is no relational and deformation feature deﬁned for this

type of token.

Line token: A straight line is represented as a line token, denoted by T}, with a
component set containing a group of boundary point tokens C,’ = {Tj’d I GI(C,')}. A

set of property features is used to describe a line token:

1):: {(Xh,Xt),l, d9 mavgagd},

where (xh, xt) gives the two ending points of the line, 1 is the length, d is the
orientation, mm,g is the average magnitude, and gd is the gradient direction of the

line. The relational feature set of a line is:

Ri’ = {(M8) {(3.111 ai.j)})1 (Maul? {(191 1’ aivk)})},

where N8 denotes an 8-connectivity relation, and Wm” an overlap relation. For each
relation, say T,’ N8 T}, a set of attributes is used to describe it. For example, if two

lines are 8-connected when one end of one line touches one end of another line, the

147

\‘1

   

Figure 5.4. Examples of overlap regions and the four corners of the region. (a) Region
generated by an overlap relation. (b) Region generated by an enclose relation.

attribute set for this relation is

at)“ = {X17},

where 7 is the the angle formed by the two lines at the coordinate position x where
the two lines touch. For overlap relation NW1 , the attribute set contains a set of four
coordinate points

01,3' = {X11,X12,X21,X22},

which represents the four corners of the region generated by this relation. Figure 5.1.2
shows an example of such a generated region (shaded) and its four corner points. As
discussed in Section 4.4.3, the sufﬁcient condition for ﬁnding a ribbon is the existence
of either an overlap or an enclose relation between two straight lines. The common
property of these two relations is the overlap region yielded in between. The regions
shown in Figure 5.1.2 that are speciﬁed by the four corner points (x11,x12,x21,x22)

illustrate this.

Curve token: A curve token, denoted by Tf, represents a 2D curve structure. Such
a structure can be approximated using a polyline which is formed by tracing all 8-

connected straight lines. So, the component set of a curve token contains a set of line

tokens

when
other
rated
such

6‘): am

Where E
e“Close .

f0rarib

Where C I

1’19}. are

18 the me

148

tokens:

Cf = {T} | 0705)}.

where the grouping criterion GC demands that all the components relate to one an-
other by an 8-connectivity spatial relationship. Other conditions can also be incorpo-
rated in grouping criterion 0: to make the formed curves possess certain properties
such as curvilinearity. The design of the grouping criterion for curve tokens will be

examined in Section 5.3. The feature set of a curve token is:
Fic = Pic U R? = {(xhaxtlalv navgvmavg} U {N81 {(Jalvai.j)}}a aid = {x17}:

where navy denotes an average curvature and others have the same meaning as deﬁned

for a line token.

Ribbon token: As deﬁned in Chapter 4, a straight ribbon is a region formed by a

 

pair of straight lines that either overlap or enclose one another. A ribbon token T,”

represents such a region. Its component set consists of two straight lines:
05" = {(T}»Tl) | T157): 6 T',G'b(C.-’b)},

where grouping criterion 02" speciﬁes the condition that either an overlap or an
enclose relationship must exist between line tokens T} and Ti. The property features

for a ribbon are designed to be:
Rrb = {(3, d1 17 W9 Ilow, Ihigha Iavga Bb}:

where c is the centroid, d is the orientation, 1 is the length, w is the width, 110w and
[high are the minimum and maximum sensor measurements within the ribbon, 1m,g

is the average sensor measurement, and 35 is the bounding box for the ribbon. A

  
  

bounding h
be represen
and x2 are

ﬁgure.

FlgUft‘

The rela1

“hi 1
Ch Tj ant

 

 

C00, -
dlnates I

 

149

bounding box is the smallest rectangle that contains the ribbon. This rectangle can
be represented by the two extreme points of the box. See Figure 5.5(a) in which x1
and x2 are the two extreme points of the bounding box for the ribbon shown in the

ﬁgure.

 

X2

 

(b)
Figure 5.5. Bounding boxes for (a) a 2D ribbon, (b) a 3D cross section.
The relational feature set of a ribbon connects each ribbon to the two curves on
which TJ-I and T]: reside:
33" = imp, {(jlacs ai.j1)v(j2acv wall},
where MP is a “part of’ relationship, (jq,c),q = 1, 2 points to curve token T J: , and
‘1qu = (quaxq’l), q = 1,2,

characterizes the relation between T,” and chq where (xq,,xq2) represents the two

coordinates between which a straight line is used to form a ribbon.

3D Tokens

9mm
posed of a 51

properties:

A cross sect

Where c is t
the normal
maximum 51
that enclose
by tw0 extr

relational f,

M

0f input POirl

Where A l ,

IdlSCUssed in

 

150

3D Tokens

Cross section token: A 3D cross section is represented by a token Tf. It is com-
posed of a set of 3D boundary point tokens which possess consistent cylindrical surface

properties:

C.” = {de l G’(Cf)}-

A cross section token is characterized by the following property features:
Pix 2 {C9 1', d7 Ilowa Ihighs Bb}:

where c is the center of the cross section, 1‘ is the radius of the cross section, d is
the normal vector of the cross section plane, 110w and [high are the minimum and
maximum sensor measurements within the cross section, and 3;, is the bounding box
that encloses the cross section. A bounding box for a 3D cross section is represented
by two extreme points of the box. This is illustrated in Figure 5.5(b). There is no

relational feature designed for cross sections.

'Ihbe token: A token T,“ represents a 3D straight tube which is composed of a set

of input point tokens:

05" = {Tf‘ | G‘b(C.-‘b)},

and is characterized by a set of property features:
P.“ = {c.r,d,1,A,p*, 3.},

where A is the straight axis of the tube and p“ is the optimized ﬁgure of merit
(discussed in Section 4.3.4) serving as a conﬁdence measure of the tube. The relational

feature set of a tube token describes an adjacency relation between tube token T,“

and other

where 2111]

Object 1:1

The comp.

Each tubul

Where A is
radius lnf0r1
describes th
Spatial felat

WhEre the lV

 

SlDCe a l
Space With a

explicitly Cl]

 

151

and other tube tokens:

Rib = {Nadj1{(]1tbvaid)}}v aid = X,

where attribute x is the coordinate where the two tube axes meet.

Object token: Finally, a tubular object is represented by a token, denoted by T5”.

 

The component of T," is speciﬁed by a set of connected local tubes:
0:” = {231 I G:'(C:”)}.
Each tubular object is described by its feature set:
F18, = Pie, U Rf],

with
Pie, : {“4}, RI, = twin", {(3, 61), “Lilli

where A is the attributed axis of the object with each point on A carrying the
radius information about the corresponding cross section; relational feature set Rf’
describes the intersection relationship among tubular objects. For each particular
spatial relationship, between two object tokens, it can be described by the position
where the two intersect:

a,,j = X.

Since a tubular object is modeled as the outcome of sweeping a local seed tube in
space with a series of deformations along the sweeping trajectory, the deformation is

explicitly characterized in the token representation for a tubular object. Deﬁne a set

D1 of ﬁfSl-t

and an over

Where D,“ c
order defori
Parameter .
measured a«
the deforme
a tube mod
example. ll‘lr
indication a
ObIECts from
intlicate 50,:
can he imp
Elm-e much
of corn, Tl,

tubular obj,

Token Hier

lllllll all Ca:
IOkenS' Con

two are iden

Along th

 

152

D1 of ﬁrst-order deformation parameters:

0‘ = {£1.13 1', p‘},

and an overall deformation characterization:

Di] = UAeDI{(EA,0A)},

where D?1 characterizes the deformation using the statistical properties of these ﬁrst-
order deformation features deﬁned in D1. The statistical properties of deformation
parameter A 6 D1 are captured by the mean (EA) and the variance (0A) of A
measured across the entire tubular object. The four deformation features in D1 reﬂect
the deformation in orientation, radius, length, and the degree of homomorphism to
a tube model so that they capture the shape variation of the underlying object. For
example, the statistical characterization of the deformation feature in orientation is an
indication about how sharply the underlying object curves in space. For the tubular
objects from different domains, the corresponding deformation characterization may
indicate some statistically signiﬁcant difference in object shape. Such information
can be important in applications. For instance, the roots from one type of corn may
curve much more sharply, in a statistical sense, than the roots from another type
of corn. Through the deformation characterization, knowledge about the shape of

tubular objects from different classes or different domains can be learned.

Token Hierarchy

With all categories of tokens deﬁned, Figure 5.1.2 shows the organization of these
tokens. Comparing it with the decomposed object model shown in Figure 5.2, the
two are identically structured except the arrows are reversed.

Along the hierarchy, objects are incrementally identiﬁed. Tokens across different

relationship

5.2 K

KHOWledge I

that is useli

gm”Ping hi.

as an expert

aISO stored

 

 

l3 (lWOmPQg

 

 

153

 

el

 

 

 

a”

 

B-

 

 

 

 

 

30 ZD

 

 

F-

 

 

 

3-

CC 8

"I
"l
"I
g
*1

 

 

 

 

 

5H
, a
)9

5
a
3

———9~1.9
~12;

——:u
”1&9

H
H
"I

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

_I

Figure 5.6. Bottom-up hierarchy of tokens.

 

 

 

 

 

 

 

levels of abstraction are related by component relationship. All levels together de-
scribe a recognized object at different levels of detail. By tracing down the component

relationship from a top level token, the organization of an object can be recovered.

5.2 Knowledge Hierarchy

Knowledge hierarchy K H is a hierarchy in which each level provides the knowledge
that is useful in recognizing the perceptual entities at a corresponding level of the
grouping hierarchy. At each level of K H, a decomposed object model is instantiated
as an expected event (E:). Its associated homomorphism validity conditions (HE) are
also stored at that level. In Section 5.1, we have shown how the object tube model

is decomposed at various levels of abstraction. In this section, we discussed how

thes
cond

CUI‘I’E

5.2.2

There
necess
ofinpr
\ohnne
the (ha
The
the type
abstractj
against c
render tt
generate j
[llcan b
ACCluir
hmsmne]
tHOWledge
Shrenjenxs
eﬂahhghing
I3) Charact,
depend,ng (
onsenSOrml

154

these expected events are represented in K H and how the homomorphism validity
conditions can be established by dynamically acquiring the information about the

current recognition environment.

5.2.1 Acquiring Dynamic Environment Information

There are two types of information that are acquired in our system to establish a
necessary description about the current recognition environment. One is the type
of input data which, in our case, is either a 2D photometric image or a 3D MRI
volume. The second type of knowledge about a dynamic recognition environment is
the characteristics of current input data.

The ﬁrst type is associated with an environment variable whose values encode
the types of sensor used. This variable can be stored in I M . At certain levels of
abstraction, such information is needed. For example, to match a 3D tube model
against data, depending on the type of sensor used, a different approach is used to
render the model before matching. Chapter 4 discusses the different methods to
generate 2D and 3D matched ﬁlters. In this case, the environment variable stored in
I M can be used to decide the appropriate method to render a model.

Acquiring the second type of knowledge is necessary in order to dynamically estab-
lish some homomorphism validity conditions. Currently, this type of environmental
knowledge is acquired by (1) obtaining the maximum and minimum sensor mea-
surements and the gradient magnitudes from an input (Imam, Immmmammmm), (2)
establishing the histograms for both sensor measurements and gradient magnitudes,
(3) characterizing the histograms using a unimodal or bimodal Gaussian function,
depending on the type of sensor used, and (4) determining the dynamic thresholds
on sensor measurement and gradient magnitude, {I¢, mt}, from the parameters of the

estimated Gaussian functions. The process of acquiring these thresholds is described

in
rep

5.2.

Know
any 16
phism
con di t i
eXpectr
GXpecte
We no“
represen
How Sue,

the next

Interior

ﬁll this let
Sensor mea

“how high i

W t
here Eetr (
pblsm Validi

Slimmer], lo

 

155

in Section 4.4.2. The obtained characterization is represented by a set of environ-
mental measurements {Imam 1min, mmammmgn, Inmt}. It instantiates the knowledge

representation at the lowest level of K H (see Figure 5.7).

5.2.2 Knowledge Representation

Knowledge hierarchy KH is structured identical to the token hierarchy. A model at
any level of K H is described by a set of features, each has a corresponding homomor-
phism validity condition or tolerance associated with it. The homomorphism validity
conditions specify how much deviation is allowed between an observed event and the
expected event. Object models at different levels often need to be instantiated as
expected events before they can be utilized by the grouping agents at parallel levels.
We now examine, starting from the lowest level, how the decomposed models are
represented in K H and what are their associated homomorphism validity conditions.
How such stored knowledge is to be utilized by grouping agents will be discussed in

the next section.

Interior Point Level

At this level, a decomposed model is a cell. Due to the nature of the sensor types used
in this study, a cell belonging to an object should, ideally, have a substantially higher
sensor measurement than an non-object cell. The acquired knowledge It speciﬁes

“how high is high”. That is:
E:;": 1.; Hi." : [o,1,,,.,,],

where Ei” describes the expected event for an interior point, H}? is the homomor-
phism validity condition associated with E:;", It is the expected lowest sensor mea-

surement for object cells, 1mm, is the maximum sensor measurement, and [0, 1mm]

gives a

lower t

then thi
Eétr. T
homomc

feature ,

Exterio

AD exter

token. T.

156

gives a range of allowed deviation of a real sensor measurement from the expected

lower bound 1,. Together, this means that if
I(x)—It€ 3’, or I-ItZO

then the cell at x (an observed event) is considered to be similar to expected event
E:;". That is, x is identiﬁed as an interior point token. Here, I — It is used as a
homomorphism evaluation function based on feature I from an observed event and

feature I, from an expected event.

Exterior Point Level

An exterior point is a cell that does not satisfy the condition deﬁned for an interior
token. That is,
Efkg : 113 Hi)?” : I—Imaxa—lla

where Hg” denotes an opposite range compared with that for interior points. That
is, if a point has a sensor measurement that deviates from It by some value within
[—Imax, —-1] or I — I, < 0, then this point is identiﬁed as a exterior or background

point.

Boundary Point Level

A boundary cell is modeled as a point token whose gradient magnitude is high with

respect to mt. Such a model is represented as:
E2", : m,, H?! : [0,mmax].

That is, if a point has m - m, 6 Hi? or m — mt Z 0, then it is considered to be a

boundary point.

Conner

For a 2.

8-conne<

This the
connectr
token is

Note. Sin
P0nent 0}

Condition

Line LEV

A 20 stra
a Straight

:1 Straight.

Where the \

That js‘ a li
CtlrtvdtUre a

r a
alige SPEC”

157

Connected Component Level

For a 2D connected component token, its model can be represented by a set of all

8-connected interior points, that is,

EEC: N8 ({E:"}), Hf)C : True.

This model denotes a graph with nodes being interior points and arcs being 8-
connectivity relationships. Therefore, the identiﬁcation of a connected component
token is to ﬁnd an isomorphism between this model graph and a graph from data.
Note, since the number of nodes (size of {E;"}) is not speciﬁed, a connected com-
ponent of any non-zero size can be identiﬁed as far as the homomorphism validity

condition is satisﬁed by all the components that are related under N8.

Line Level

A 2D straight line is a set of 8-connected boundary points. One distinct property of
a straight line is its zero curvature everywhere along the line. Therefore, a model for

a straight line can be represented by two features:

Ei= {r({Efidlh'xl8 ({Efd})}.

where the value of n({Ede is the curvature measured on its components which should

be zero. Accordingly, the homomorphism validity condition is designed as:

H); : {I—Cn, +eK], True}.

That is, a line is ideally composed of a set of 8-connected boundary points with zero
curvature along the line. The requirement of zero curvature is relaxed within the

range speciﬁed in H j; when a line is grouped from a set of quantized points.

Curve

A mod

8-conne

Where 1

formed '

Where 3.,
Can be C.
to be grr

56‘ to be

Ribbon

A 2D ribl
gradient (

either all '

The 130,110]

ture values

158

Curve Level

A model for a curve is approximated by a polyline which consists of a group of all

8-connected lines. It is represented by two features:
E§= {7(Eé‘,E§‘“),N8 ({E§})},

where Ei‘ and E:;“ are two adjacent lines along the polyline, and 7 is the angle

formed by these two lines. The homomorphism validity condition is given as:
H; = {Mme},

where 7, is a threshold on the angle 7 through which the shape of the desired curves
can be controlled. When ’7: is a smaller value, the curve structures that are allowed
to be grouped are curvilinear. Due to the shape property of a tubular object, 7, is

set to be 25 degree in this study.

Ribbon Level

A 2D ribbon is modeled using two features, one measuring the absolute difference of
gradient directions of two component lines and the other describing the existence of

either an overlap or an enclose relationship between the two lines. That is:
E§b= {I ME?) - 9.103.?) I, (”“0”1 (Ei‘aEi’)V NW (199,139)}-

The homomorphism validity conditions describe the constrained ranges on both fea-

ture values:

11,3”: {[180 — a“, 180 + cw], True}.

Ideally, th
the value
in H I? by

directions

Cross Seq

A 3D cross
orientation
C: the radii

denoted b\

Therefore, E
evaluated (11

Q I .

in a real situ

range can be

 

data points {

159

Ideally, the two lines that form a ribbon have opposite gradient directions. Therefore,
the value of | gd(E,f,‘) — gd(Ei3’) I in K H should be 180. This requirement is relaxed
in HIE” by 69,, which means that the two lines may not have exactly opposite gradient

directions but the allowed deviation is within 69,.

Cross Section Level

A 3D cross section is a thin disk in three dimensional space with an arbitrary location,
orientation, and size. Such a disk can be uniquely characterized by its center point
c, the radius r, and the normal vector n. Using a parameterized model for this disk,

denoted by ’P(c, r, n), a set of points {xi} from an ideal cross section satisﬁes:
7”({Xj}; 6.1311) = 0-

Therefore, a model cross section is represented using two features, one is the above
evaluated disk function value and the other is a connectivity relation among all cross
section points:

Ex. {P({Er}; c,r,n).t><126 ({Ern).

C

In a real situation, the evaluated function value may not be precisely zero. A tolerance

range can be speciﬁed. Hence, the homomorphism validity condition is:
H}? : {[—ex, +ex],True}.

That is, whenever model ’P(c, r, n) is used to match against a set of 26-connected real

data points {Xj}, a cross section is identiﬁed if

79({><j};c,r,d) E l-éx.+€xl-

'liibe Le‘

As we des.

in a parar

 

7(c. d. r. l

The Object

Here. inst:
because a
with 0111' (l
hot 9 tllbi

ConﬁdenCe

where 7} git
4.44 that 1
Object Le

Slice a tub

as folio“:

160

'Dube Level

As we described in geometric modeling of tubular objects, a 3D tube model is also
in a parameterized form with parameters c, d, r, and l. Denote such a model by

T(c,d, r, 1). All the 3D points {xi} that form the boundary of a tube satisfy:

T({x,-}; c,r,d,l) = 0.

The object model at the tube level can then be represented by:

Ezbz 1— 7({Ef‘}; c,d,r,l).

Here, instead of directly using the evaluated function value 70, we employ l — T()
because a good match makes [1 — T()] m 1 , meaning a high conﬁdence. This agrees
with our design of the ﬁgure of merit in matching a tube model against sensor data.
For a tube, the homomorphism validity condition is deﬁned by the lowest allowed

conﬁdence in matching a tube model with data:

Hiab= n,

where 1] gives the lower bound on p“, the optimal ﬁgure of merit developed in Section

4.4.4 that quantitatively reﬂects the degree of match between a model and data.

Object Level

Since a tubular object is modeled as a set of connected tubes, it is represented in K H

as follows:

E? I (C0,d0,r0,10) X {5i—1,i}f=1,

Will] (C0

process c

Since a ll
deformati
should mg
fore, only
set of bou
hounds are
lected alor
A E {6d, 6
Parameters

of the Objfﬂ

Combining 1
token repres

deformation

5.23, K

with all the
deﬁned, the it

(0118thde

 

161
with (co,do,ro,lo) a reference tube (or a seed) along the object and {6,43}le is a
process of deformations, each of which is described by 4 deformation parameters:

(Si-1,1" 2 {6c 6d 5313361143}-

i—1,z'7 i—1,i9

Since a tubular object is viewed to be formed by the reference tube and a series of
deformations when it is swept along the trajectory of the object, these deformations
should maintain the distinct morphological properties of the tubular object. There-
fore, only certain deformations are tolerable. Such tolerance can be expressed by a
set of bounds on the deformation parameters. As discussed in Section 4.4.5, these
bounds are currently established dynamically from the deformation information, col-
lected along the recognized object, and some system parameters (A2”, A2”) for all
A 6 {6d,6’, 61}. Therefore, the validity condition at this level provides the system
parameters (A2”, A2”) that are used to establish the bounds or to control the shape

of the object that is allowed:
Hf}: {(AfngZPPﬂ, for A E {6d,6r,61}.

Combining the given A’s and the deformation characterization (Df’) provided in the
token representation for each tubular object, the dynamic bounds that constrain the
deformation can be adaptively established. For detailed computations, see Section

4.4.5.

5.2.3 Knowledge Hierarchy for Tubular Object Recognition

With all the decomposed models and associated homomorphism validity conditions
deﬁned, the knowledge hierarchy for the problem of tubular object recognition can be

constructed. Corresponding to the levels of token hierarchy, K H consists of following

levels:

Figure .5.

-l

these level:
served 9\'81
to GH. 1.-
instantiate
50. FOr eXa
to instantia
Sectior1 par:

the 101\‘en l

0hj€ct T€c0

162

 

 

 

 

 

 

 

el: e ”1’
tb E? H,”
x E: H;
rb E? H:
c E: H;
I E,’ H;
M E? 11,.“

 

cc: E “' [1,?
G

 

.. a: at

b
bk: E f“ 1.1,."Kl

 

 

I I
. Mill, max,
‘ env. M

 

 

Figure 5.7. Knowledge hierarchy for tubular object recognition.

levels:

KH: {(E:,H,'§)}, v E {env,itr, bkg,bd, cc, l,c,rb,:r,tb,el}.

Figure 5.7 describes the designed knowledge hierarchy. The information stored at
these levels can be accessed by grouping agents and integrated with bottom-up ob-
served events in making grouping decisions. This is the information ﬂow from K H
to GH. Information can also ﬂow in an opposite direction. Grouping agents can
instantiate the models stored in K H whenever evidence detected from input suggests
so. For example, invariant surface features of boundary tokens (in CH) can be used
to instantiate an expected event of 3D cross section at level at of K H using the cross
section parameters estimated from surface features. Having developed both K H and
the token hierarchy, we design the grouping hierarchy for the problem of tubular

object recognition in the next section.

5.3 t

In this sec
4) can be
designed .
next sect:

extracted

5.3.1

for each 1
is to iden‘
4-stage re
ﬁgure 5.8.
that for 3,

In the
various t a:
formal m,

“6 ﬁrst re

A gtOu

Here, 1; SF

163

5.3 Grouping Hierarchy

In this section, we examine how the 4-stage recognition strategy (discussed in Chapter
4) can be realized via a set of token grouping agents. A set of grouping agents are
designed and formally deﬁned. Based on their formal speciﬁcations, we show in the
next section that different types of description about the system can be directly

extracted and visualized.

5.3.1 4-Stage System Diagram in HTG Design

For each type of token, there is a corresponding grouping agent whose responsibility
is to identify all the instances of that type of token. The relationship between the
4-stage recognition strategy and the grouping hierarchy is shown in Figure 5.8. In
ﬁgure 5.8, the left part is the 4—stage processing for 2D data and the right part is
that for 3D data. The middle hierarchy is the grouping hierarchy.

In the subsequent sections, we deﬁne all the grouping agents designed to perform
various tasks at different levels of abstraction to achieve tubular object recognition.
Formal methods of deﬁning these grouping agents developed in Chapter 2 is used.
We ﬁrst review the syntax of the formal speciﬁcation for a grouping agent.

A grouping agent at level U can be deﬁned as:
GA(v) = {13,633,031}-
Here, I}: speciﬁes the input to GA(v) from outside:
1:: {GiafEé’iHi’nlh

with G: to be the source of input from the bottom-up ﬂow within GH and {E:, H;}

to be the source of input from the top-down ﬂow from K H . Term 03, speciﬁes the

Figure 5.
Chicai toi

Olliput Of

Where Te

E; is a ll!

i0 lhe tOp

prlHCip]

98

thegenGri

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2D HTG 3D
cl.
Global ‘ﬁ Global
. lb: . .
Recogmtion Recognition
x
rb:

Local —— Local
Recognition 0: Recognition
l:

. cc:
Seeding __ , Seeding
.1— bd:
__ bkg:
Initial -_ itr: I Initial
Classiﬁcation Classiﬁcation
‘— pt:

 

 

 

 

 

 

 

 

 

 

Figure 5.8. Relationship between 2D and 3D tubular object recognition and hierar-
chical token grouping.

output of GA(v):
011. = {Tva {Bill},

where T” is a set of tokens formed at v that contributes to the bottom-up ﬂow, and
E:’ is a new expectation instantiated from some token formed at v which contributes
to the top-down ﬂow. The behavior of each grouping agent is ultimately determined
by the grouping criterion G: because it answers the question “by what organizational
principles should the tokens at this level be grouped?”. This criterion is deﬁned in
the generic form of homomorphism satisfaction:

G”: H(E;;,E:) s Hg,

C

165

with the homomorphism evaluation function:
H(E0,E.) = H(l(5i3 X PL") U (SE >< RIO], [(5% X E?) U (57; X EDI),

Where the feature selection functions for property and relational features are:

Si; 2 (anllsijnzl'"811ilNK)’ Sin, = [031],
and

SI? : (3:113:21°'33:NR)1 8:, = i011l°

In this deﬁnition, both bottom-up input (tokens from grouping domain G3) and top-
down input (E:;,Hj’3) participate in grouping decisions. The former is speciﬁed by
{(5}; X PL”) U (57;, x RL”)] while the latter by [(S}; x E:;) U (5;, x E:)].

Each grouping agent can be viewed as an encapsulated object (in the object-
oriented design sense), combining data structure and behavior together. The data
structure of each grouping agent is T” and the behavior is determined by group-

ing criteria G13. Next, we deﬁne all the grouping agents at v E V where V =

{bkg, itr, cc, bd, l, c, rb, 1:,tb, el}.

5.3.2 Initial Classiﬁcation

As discussed in Chapter 4, in the initial stage of tubular object recognition, each cell,
pixel or voxel, is classiﬁed into one of the three classes: interior, exterior, and bound-
ary. Three grouping agents are designed to perform the task. They are (1) GA(itr)
which is responsible for interior point tokens, (2) GA(bkg) which is responsible for
background point tokens, and (3) GA(bd) which is responsible for boundary point

tokens. We now deﬁne these grouping agents separately.

Initial Seg

Initial 36911
and exterio
each of whi

point token

Interior P.
The groupir

with

The inPtlt cc

The criterior
feeltures of st

selection flint

flincthn to be

166

Initial Segmentation

Initial segmentation means to classify all the cells into classes of interior of objects
and exterior of objects. Grouping agents GA(itr) and GA(bkg) perform the task,
each of which extracts one type of token, either interior point tokens or background

point tokens.

Interior Point Tokens
The grouping agent for interior point tokens is deﬁned as GA(itr) = {1:37, 0:", 03,7}
with

1;," : {Gitr, (Eétr, ng)}, 0:31“ : Titr.

8

The input consists of a grouping domain
G1" : web-"T“, L"tr 2 {pt},
of all the input point tokens and the knowledge from K H :
E;tr : I, Hg," : [0, [max].

The criterion for forming an interior point token merely depend on the property
features of sensor measurements of an input point token. Therefore, the feature

selection functions are designed as:
53;:r _—. {1,1,o,0}P‘, 53’" = {0}“,

where S}? is to choose property features of an input point token. With respect to
the property feature set P," = {x,I,m,gd} for input point token, setting selection

function to be {1, 1, 0, 0} means that only the ﬁrst two property features are selected;

while 3:" 1
meaning tl.

selected fee

and the co

whiCh says
eXceeds (h

Ohio) is

EXterior ]

The STOU}
ifs)”. Gfkg.

Th6input

Of All the if

The Conditi.

167

while Si" is with respect to the relational feature set of input point tokens with {0}
meaning that no relational feature is used by grouping agent GA(itr). Using these

selected features, the homomorphism evaluation function is designed as
H : I (x) — It,
and the corresponding homomorphism criterion is:
G:," : H(E., E0) 6 Hf." => I(x) — I. e [o,1,,,,,],

which says: an input point token is an interior point token if its sensor measurement
exceeds (homomorphic to) the expected value It. The output of grouping agent

GA(itr) is a set of interior point tokens.

Exterior Point Tokens

The grouping agent for exterior point tokens is deﬁned as GA(bkg) =
{13kg , G3” . 031°” } With

1:19 = {0:19. (Era, so}. 02:9 = The.
The input consists of a grouping domain
G?" = u,,€L...T”', Lbky = {Pt},
of all the input point tokens and the knowledge from K H :
E2,” : It; Hgkg : [—ImaI,—1].

The condition of forming a background point token also depends only on the sensor

168

measurement of an input point token. As in the design for GA(itr), feature selection

functions are:

52kg = {1.1.0.0}"‘, St.” = {0}“-

The homomorphism evaluation function for GA(bkg) is also designed similar to
GA(itr):
H = I(X) — It.

Its homomorphism criterion is:

off-1 : I(x) — I. e Hf.” = Hm, —1],

which says: an input point token is used to form a background point token if its
sensor measurement is below (homomorphic to) the expected value It. The output of
grouping agent GA(itr) is a set of background point tokens.

Because relational features are not involved in these grouping agents, a non-
structural homomorphism is sought in both GA(itr) and GA(bkg) with respect to the
conditions speciﬁed by H}? and 171%”. Since such a non-structural homomorphism

is allowed, a conventional thresholding operation is now formulated as a grouping

operation.

Initial Boundary Detection

Initial boundary detection means to identify all the cells that may he points on the
boundary of an object. We now deﬁne a grouping agent, GA(bd), whose responsibility

is to identify all boundary point tokens.

Exterior Point Tokens

The grouping agent for boundary point tokens is deﬁned as GA(bd) = {[31, 02", Off}

169

with
12d = laid, (Essen, 0:: = T“.

Its grouping domain is the set of all input point tokens:
ng 2 UvreLdeU’, Lbd 2 {pt}.

The knowledge Efd and Hgd are retrieved from K H. Grouping agent GA(bd) chooses
all the properties features except sensor measurement of an input point token. Some

of the chosen features are used in making grouping decisions and some are used to

compute features for boundary point tokens. Therefore,

33;“ .-= {1,0,1,1}P‘, 5,2" = {0}“.

Boundary point tokens can be formed in a way similar to interior point tokens except
the property feature that participates in grouping decisions is gradient magnitude.

The chosen feature 9,; is used for computing the curvature feature for a boundary

point token. The homomorphism evaluation function is:

H: m(x) — m,.

The homomorphism criterion for GA(bd) is:

Glc’d : m(x) — rm 6 [0,mmax].

The output of grouping agent GA(itr) is a set of boundary point tokens.

170

5.3.3 2D Seeding
The tasks involved in this stage are to identify various types of intermediate perceptual
entities that may evidence the objects of interest. From 2D boundary point tokens,

bigger perceptual entities such as lines and ribbons can be formed. A set of grouping

agents, GA(I), GA(c), and GA(rb), are designed to extract these different types of

tokens. They are deﬁned separately below.

Straight Lines

Grouping agent GA(I) is responsible for line tokens and is deﬁned as GA(I)

{171116190th With
IS. = {G:,(Eéﬁl’an. 01.. = T’.

The grouping domain of GA(l) consists of all the boundary point tokens:
G’, : was“, I.1 = {bd}.

The top-down knowledge E; and H}, from K H are used in forming line tokens. The
property feature of curvature and the relational feature of 8-connectivity of boundary

point tokens are used in deciding which sets of boundary point tokens should be

grouped. Hence, the feature selection functions are deﬁned as:

s}. = {1,o,o,o,1}bd, S}, = {1}“.

Because an ideal straight line should have zero curvature everywhere along the line,

the homomorphism criterion for grouping agent GA(l) is deﬁned as:

G: = {(405) — m(EiDMs (05)} 6 Hi» = {l-ém +€~l,True}.

171

where n(Ei) is set to zero in K H and C,I is the component set of line token T}. The

output of grouping agent GA(itr) is a set of line tokens.

Curves

Grouping agent GA(c) is responsible for curve tokens and is deﬁned as GA(c)

{I§,G§,0,ﬁ,} with
1::{T13(E::HP)}3 Olen : TC:

Since a curve is approximated by a polyline which can be formed from a set of adjacent

lines, the grouping domain of GA(C) is the set of all line tokens:
G; : leeLch', LC = {I}.

To group a set of lines, the adjacency relationship among lines is used to identify the

candidates. That is:

5;, = {0, ..,0}’, 5;, = {1,0,0}’,

where the chosen spatial relation is N8 which is characterized by an attribute set
a, j = {7,x} with 7 to be the angle formed by two adjacent lines at coordinate

position x (see Section 5.2.2). With the attribute values of this relational feature, the

homomorphism criterion for grouping agent GA(c) is:

G§= (7(T},,T}.,,).N8 (0.9)} 6 H?» = {VuTrue},

Where Cf is the component set of curve token Tf. The output of GA(c) is a set of

curve tokens.

172
Ribbons

Grouping agent GA(rb) extracts ribbons and constructs their token representations

It is defined as GA(rb) = {I,’,5,G’C'b,0,’,l,’} with

15." = {T1, (Bib,Hf)b)}, 0;? = T“ U {525}-
Since a straight ribbon is a pair of lines, the grouping domain of GA(rb) is the set of

all line tokens:

Gg”: u,.6,,..:rv’, L” = {1}.

Two lines form a ribbon as long as the lines have opposite gradient directions and
are related by either an overlap or an enclose spatial relationship. To test these two

conditions, property feature of gradient direction and relational feature N0“ of line

tokens need to be propagated from level I to level rb. Hence,
5;: = {0,0,0,0,1}’, 5;? = {0,1}’.

Identifying ribbons is a process of ﬁnding all pairs of lines that satisfy the above

stated conditions. Therefore, the homomorphism criterion is:

Gib: {I gd(Tj) — 94(Ti) I,N°”1(TJI,T,:)} E H” = {[180 -— 69d, 180 + 69d],True}.
This criterion describes both structural and property homomorphisms simultaneously.

The output of GA(rb) is more than just a set of ribbon tokens. It also includes a
set of expected tubes instantiated in knowledge hierarchy K H :

or}; = T” u {E31}.

173

Since a ribbon provides evidence of a tube, the grouping agent at this level will
instantiate an expected event Eﬁb at level tb of K H whenever a ribbon token is formed.
Such an expected event will be used by grouping agent GA(tb). The instantiation

is performed by the functional module KJnstantiate(T,-’b) of grouping agent GA(rb)

which maps ribbon features to a set of tube parameters.

5.3.4 3D Seeding
The purpose of 3D seeding is to identify 3D cross sections that reveal distinct prop-
erties of cylindrical surface. Such identiﬁed cross sections are used to hypothesize

3D tubes or to instantiate an expected event of 3D tubes. Grouping agent GA(r) is

designed for this task.

Cross Sections

The grouping agent at this level is GA(r) = {Iff, G:, 05,} with

1§= {de9(E:aHP)}v 0:. = T” U {193’}-

A cross section is formed by grouping a set of 3D boundary point tokens as well as a

set of input point tokens. So, the grouping domain of GA(r) is deﬁned as:
G: : leeLxTv’, LI 2 {pt, bd}.

A boundary token is described by a set of invariant surface features as deﬁned
in Section 5.2.2. Every input point token carries its 3D coordinates. The boundary
point tokens on a cross section have surface features that consistently predict the
same cylinder. All points on a 3D cross section should form a disk in 3D space.

Therefore, to identify 3D cross sections, the set of invariant surface features from

 

 

 

 

174

boundary point tokens and the coordinates of all point tokens involved need to be
propagated into this level. That is, the following set of property features are chosen

from levels pi and bd:

S; = {1,0,..,0}P‘ u {1,1,0,0,1,1,1,1,1,1,1}bd.

Since all the point tokens on a 3D cross section should be connected, the relational

features of 26-connectivity from both domain levels are also chosen:

8: = {1}“ u {1}“.

The homomorphism criterion for grouping agent GA(x) describes the condition
when a set of connected domain tokens can be grouped to form a 3D cross section
token:

Gf: {7,(C.; é,f,ri),l><1“ (05)} 6 Hi5,

where

H}? : {[—63, +63], True}.

In the homomorphism criterion, Cf is a set of component tokens consisting of both
boundary point tokens and input point tokens c, i', and n are the cross section
parameters estimated from the surface features of boundary point tokens.

The output of GA(x) includes a set of cross section tokens as well as a set of
expected tubes to be instantiated in knowledge hierarchy K H. Such an expected
tube is instantiated at level tb of K H using a set of tube model parameters estimated
from a cross section token formed in grouping agent GA(x) and will be used by

grouping agent GA(tb).

175

5.3.5 3D Tubes
‘5}. Its input and output

at

The grouping agent for tube tokens is GA(tb) = {135, 6'2”, 0
are defined as:

[rib : {TptvEnva(E:baHlt’b)}a Ottlbt : Ttb U {Eébi'

Note, GA(tb) uses the knowledge about the type of sensor used (EM) as part of
its input. This information is needed in deciding how to render a tube model (an

expected tube E?) in order to match it against an observed event. The grouping

domain consists of a set of input point tokens:

Gib I Uv’EL‘bTv’a Lib = {pt}.

A tube token is formed by a set of connected point tokens whose position conﬁg-
uration exhibits the shape of a cylinder and their sensor measurement conﬁguration
exhibits a certain distribution. To see whether these conditions are satisﬁed, two

property features, coordinate and sensor measurement, and one relational feature of

input point tokens are selected:
5;? = {1, 1,0, ..,0}P‘, 5;: = {1}“.
An expected tube E? is a parameterized 3D cylinder. To test whether a set of
input point tokens is homomorphic to an expected tube event, the following homo-

morphism criterion is designed:

02'” {P'(C.-‘b; c,d,r,l)aw" (035)} 6 H“ = {mTrue},

where p“ is the measure developed in Chapter 4 that quantitatively evaluates the

 

 

176

degree of match between a set of points (0,“) and a parameterized tube model

({c,d,r,l}). Therefore, p“ serves as a conﬁdence measure which has to exceed a
certain expectation level speciﬁed by H}? in order for C,“ to be considered as being
homomorphic to the expected tube. The homomorphism validity condition speci-
ﬁes both a structural homomorphism (“l><l"= True” makes sure that all points are
connected) and a property homomorphism (p‘ Z 17).

Besides a set of tube tokens (T‘b), the output of GA(tb) also includes instantiated
expectations {E3} at level tb of KH. For every grouping operation performed at
this level, grouping agent GA(tb) generates a new expectation (E?) by extending the

current tube along a smooth trajectory due to the morphological properties of tubular

objects. Such generated expectation is represented in a parametric form in K H.

5.3.6 3D Tubular Objects

The grouping agent that is responsible for tubular object tokens is deﬁned as GA(el) =
{1,55 65’, 06$} with

12’ = {T‘b,Te‘, (E:’,Hio‘)}, 0:: = T“ u {193}.

Since a tubular object is approximated by a set of connected tubes and it grows, the
grouping domain of GA(eI) consists of all the available tubular object tokens and

tube tokens:

05‘ : u,,eL..Tv’, L61 = {tb,el}.

lt this level, a grouping is performed between two component tokens because (1)
Le deformation caused during every growing step needs to evaluated in order to
aintain the morphological properties of tubular objects and (2) the deformation

[aracterization needs to be dynamically updated after every growing step. The

177

feature selection functions at this level are designed as:

5;; = {1}cl u {1,1,1,1,1,1}"’,

5;,l = {1,0,..,0}cl u {1}“,

which select all the property features of domain tokens and the adjacency relation

 

among tubes and tubular objects. The homomorphism criterion at this level is de-

signed as:

 

05‘: {{A},NC(Cf’)}€{{[A’°w,A“pP]},True}, VAe{5,6r,6P’},

where the ﬁrst condition:

A e [Manner], VA 6 {6d,6’,6”'},

constrains the deformation caused by the current grouping within some bound, the
second condition NC (C?) = True demands that the two tokens to be grouped are
spatially connected. The dynamic bounds [A’°‘”, A"””] are established using the defor-
mation characterization from the tubular object token representation and the A’s used

to compute these bounds are retrieved from the knowledge hierarchy K H. Whenever

a grouping is performed, the deformation information is collected and used to update
the deformation characterization. The detailed computation is described in Section

4.4.5.

5.3. 7 Grouping Hierarchy for Recognizing Tubular Objects

faving deﬁned all levels of token representations and the grouping agents that are

esponsible for extracting these tokens, we can present the grouping hierarchy for the

 

178

 

 

 

 

 

 

 

 

 

 

 

el: GA(el)
tb.‘ GA(tb)
XI GA(x)
rb: GA(rb)
c: GA(c)
1: GAO)
bd: GA(bd)
CC: GA(cc)
bkg: G A(bkg)
itr: GA(itr)
pt: Input Point Tokens

 

 

 

Figure 5.9. The Grouping hierarchy for tubular object recognition.

problem of tubular object recognition. Figure 5.9 shows the hierarchy. In the next

section, we will extract the interaction models from the formal speciﬁcations for all

the grouping agents.

5.4 Interactions

The I: and 0:, speciﬁcations for all u E V deﬁne how the communication channels
among grouping agents should be established. Based on the formal speciﬁcations for
all the grouping agents given in the previous section, interaction models of the entire
system can be easily extracted. We classify interactions into three types: interactions
within GH, interactions within K H, and interactions between GH and K H . In the
following sections, we show how the interaction models of these three types can be

extracted from speciﬁcations of grouping agents and the extracted interaction models

are visualized.

179

5.4.1 Interaction Models

From the grouping domain speciﬁcations for all the grouping agents, the interaction
model or information ﬂow channels within G'H can be completely determined. Fig-
ure 5.10 visualizes all the interaction channels within GH among grouping agents.
Each channel shown in Figure 5.10 is represented by a directed vertical line which
is labeled in the form of source — —- > destination, meaning that the information
being delivered ﬂows from the indicated source level to the indicated destination level.

For example, for the communication channel from source level pt to destination level

tb, it is labeled as pt — — > tb.

 

—————1

 

—_————‘

 

 

 

 

 

 

 

Figure 5.10. Communication channels within GH.

180

Combining the I: and 0:, speciﬁcations at all u E V, an overall interaction model
can be formed. Figure 5.11 visualizes the overall information ﬂow in the system. It
can be seen that all the arrow of the interaction channels point at the same direc-
tion, signaling a bottom-up information ﬂow or recovery. Most channels in K H point
downward, signaling a top-down information ﬂow. Some in K H are also bottom-up
which indicate how the acquired knowledge about the current recognition environ-
ment is used to instantiate the knowledge hierarchy. The parallel communication
channels between GH and K H facilitate the integration of top-down and bottom-up

information at almost every level of grouping within GH. At some levels of GH,

expectations are generated dynamically and used to update K H.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

K}! (3FI
01n-—01n
ebu l
————— i 01--01
ek
tb—-tb
------ *‘ ﬂ>-—tb
1b: V i
r x: U """ : x—tb
_-;;:- -— Til tb--rb 44.. n>_4b
— — — — l- d
c--c
C.’ 3
I: 1"1
_ _ — — i l" 4
bd: V “'4” ,
_ _ _ _lL i
cc--cc
cc: __
_ - — l 1tr--1tr
ttrr ll
— — — — bk --bk
bkg: g g
env.
bd-x

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5.11. All the communication channels in the system.

 

 

181

Table 5.1. Bottom-up information ﬂow on the interaction channels within GH.

 

 

 

 

 

 

 

 

 

 

 

Channel Information Flowing on the Channel
pt — itr {x, 1}"t
pt — bkg {x, 1}”t
pt — bd {x, m}?t
itr — cc {x, N8?"
bd -— l {N"}bd
bd — a: {1}”t U {x, 52M, Kg, mm”, mm”, vmimvmax, n?
l _ 6 {N8}!
1— Tb {91,040va
pt — tb {x, 1, N8ipt
tb — 61 {c, r, d, l, A, p, 81,, WWW

 

 

 

 

 

5.4.2 Data Flow Models

The information about exactly what kind of data, including both bottom-up cues and
top-down knowledge, ﬂows in which channel can be further extracted from the feature
selection functions (Sf; and 5%) deﬁned for all v E V. Table 5.1 lists the bottom-up
information ﬂow on all the communication channels among grouping agents. The left
column indicates the speciﬁc channel and the right column gives the set of features
that are propagated from the source to the destination through a speciﬁc channel.
Table 5.2 describes the speciﬁc information that ﬂows between the grouping hierarchy
GH and the knowledge hierarchy K H along either direction. The top part of the table
lists the data that ﬂows from K H to GH on each parallel level of the hierarchies.

The lower part of the table describes the data ﬂows from GH to K H which are not

necessarily along parallel levels of the two hierarchies.

 

182

Table 5.2. Information ﬂowing on interaction channels between GH and K H .

Information Flowing on the Channel

Direction Channel

KH—GH el—el
t—t
rb—r
:r—cr
c—c

bkg—bkg

itr—itr

GH—KH tb—tb
rb—t
cc—t

—:1:

 

5.5 HTG Design For Recognizing Tubular Ob-

jects

5.5.1 HTG System Architecture

Using all the designs presented so far, including the overall designs for the hierarchies
and the detailed designs for involved tokens and the grouping agents, the architecture
of the vision system for recognizing tubular objects is shown in Figure 5.12. In this
ﬁgure, at all levels of abstraction, the grouping agents (levels within GH on the right
of the ﬁgure) are all homogeneously designed. The entire architecture is structured.
The communication channels shown in Figure 5.11 can be superimposed on Fig-
ure 5.12 to form a complete graphical description of the system. Different types of

information can be extracted to describe various aspects of the system. For example,

 

183

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

& 9L
v: E. Hp v: In Go out
eln: E:;, Hg" eln: cg”; 5%”: 5:" H(E6”,' E31" ) Teln
E:;, 6’: 133,113 el: G:,sfiﬁﬁ H(Eg, ES’) 'r"
tbs E2, H? m :b: of. 811.8% umes.") T’fEii’
l? x: EévH; : x: G:,s;,s; Hues. Eé) Tie?
rb.-Efﬁéb rb: Gisﬁsg” n(Eé". E?) 'r"’, El.”
c:E£,H; = c: G:,S;,S; n(ES.E2) T‘
1: Ban; 1: G’..sj,,S£I meant) T’
bd: Efﬁid bd: Gﬁs’i‘lﬁ'ﬁ’ n(eg‘iEi") TbilE: *—
cc: Bic,pr cc: G:, Seas: H(ESC.E§C ) T“
itr: Edi", Hf." itr: G:,,Sjiiﬁiii H(E<i>”,E2r) Ti"
bkg: $3,113“ bkg: cﬁgs’iifsﬁ‘ a (Bob? E2“) Tbkg
env: "‘ E: pt: Tpt

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5.12. Tubular object recognition system diagram under HTG.

it is conceivable that, with appropriately developed tools, data ﬂow activities on any
communication channel can be monitored or even visualized. Such dynamic data flow

information can be used to describe or visualize the dynamic behavior of the entire

system.

5.6 Summary

In this Chapter, the problem of tubular object recognition presented in Chapter 4
was posed as grouping problems at different levels of abstraction. Speciﬁcally, the

generalized stochastic tube model developed for tubular objects was decomposed into

 

 

 

 

184

different levels which determines (1) the number of levels of both the knowledge hier-
archy K H and the grouping hierarchy GH, and (2) the tokens that are to be formed
during the recognition process. The knowledge hierarchy was constructed based on
the representations of the knowledge decomposed at different levels. Token represen-
tation for various types of perceptual entities and associated features were designed.
Individual grouping agents, each of which is responsible for one type of perceptual
entity, were formally deﬁned. Their input, output, and the grouping criteria were
given, all in the format of formal speciﬁcations. Based on these speciﬁcations, inter-
action models, within GH or within K H or between GH and K H , are extracted. The
data ﬂow model that describes what data, including bottom-up cues and top-down
knowledge, ﬂows in which channel was also extracted from the formal speciﬁcations
of the grouping agents. The interaction models of the entire system was visualized.
Finally, the architecture of the HTG system design for recognizing tubular objects
was constructed.
Since the problem of tubular object recognition involves many classical computer
vision tasks , this chapter serves as a demonstration in terms of how a speciﬁc com-

puter vision system can be developed under the paradigm of hierarchical token group-

ing. Results of using this system are given in the next Chapter.

 

 

 

CHAPTER 6

Experiments and Results

In this chapter, experiments and results are reported. First, we examine the is-
sues related to performance accuracy and robustness evaluation. Three performance
accuracy evaluation methods are proposed. Robustness is assessed in terms of the ac-
curacy degradation with respect to varying degrees of noise added to the input data.
In order to control the quality of the testing data, synthetic data is generated using
parametric methods. A visualization scheme for 3D tubular objects is presented that
allows one to view a “cross section movie”.

The system has been tested on both 2D images and 3D volumes, synthetic and

real data for both cases. Performance and robustness are evaluated based on the

testing data for which ground truth is available. Experimental results from many real
data are also visually evaluated at every stage of the processing. Some recognition
results are shown using the proposed 3D visualization method. Experimental results
from noiseless synthetic data show that accurate object recovery is achieved. This
is concluded from the observations that (1) the object parameters such as radius
and orientation estimated by the system from the test data are very close to the
ground truth parameters, and (2) the system nicely recovers the shape dynamics of
the underlying objects because the estimated parameters change with the change of

true object parameters. The integrated model—based approach and the incremental

185

 

 

 

186

recovery of tubular objects leads to a reliable and robust system behavior. When
different degrees of Gaussian noise are added into the synthetic test data, the system
performance degrades gracefully with the increase of noise because the deviation
between the estimated object parameters and the ground truth parameters gradually
increase at small paces. These conclusions also apply to real data. The system makes

mistakes whenever the data does not agree with the object model used.

In the following sections, experimental issues and results are presented in detail.

6.1 Performance Evaluation

To assess the performance of a vision system, quantitative evaluation methods are
needed. In this section, we propose three types of evaluation schemes, namely
parameter-based method, region-based method, and boundary-based method. The ﬁrst
scheme is suitable only when the objects of interests can be modeled in a parametric
form. The evaluation is conducted in terms of the precision of the model parameters
estimated by a vision system with respect to the ground truth model parameters.
The deviations between the two sets of model parameters are indications of the sys-
tem performance. The region-based approach evaluates performance in terms of the
accuracy in classifying object regions. The boundary-based approach measures the
precision in localizing objects and the correctness of the extracted object shape to
quantitatively assess the system performance. All three evaluation schemes can be
used only when ground truth is available.

The measures developed in both region-based and boundary-based evaluation
schemes may change drastically in some extreme cases with the number of points
involved in ground truth. For example, if there is only one labeled pixel point in

some ground truth, the evaluation measure will take value of either 100% or 0%.

187

Nevertheless, the proposed evaluation methods will serve the purpose in most situa-

tions.

6.1.1 Performance Accuracy and Robustness

In evaluating a machine vision system, two aspects of performance need to be con-
sidered: one is its accuracy and the other is its robustness. The accuracy ought to be
evaluated in terms of the precision of recognition with respect to ground truth. The
robustness should be assessed in terms of corresponding accuracy degradation with
respect to the degradation of the quality of the test data. In either aspect, the key
issue in evaluation is ground truth.
Currently, synthetic images generated with predetermined model parameters are
used not only to test the system but also to serve as the ground truth in evaluating the
accuracy and the robustness of the system. Ground truth for some plant root images

was established for evaluation purposes based on discussion with several panels of the

domain experts.

6.1.2 Two Types of Errors

There are two types of errors which must be considered in evaluating performance
accuracy. Conceptually, they are associated with hypothesis testing. In a hypothesis-
testing problem, one is presented with two claims about the true value of a variable,

of which exactly one must be true. Based on experimental evidence, one wishes to
decide which of the two contrasting claims is correct. One of the two claims is called
the null hypothesis, denoted by H0, and the other is called the alternative hypothesis,
denoted by Ha. For example, within the context of image understanding, these two
claims can be established with respect to either object or non-object classiﬁcation of

a pixel (voxel). Assume the variable that labels the classiﬁcation of a pixel (voxel)

 

188

is L. Let the value of this labeling variable for object class be one and the value for

non-object class be zero. In this case, the two hypotheses are:

A type I error consists of rejecting the null hypothesis when it is true. A type [I
error arises if H0 is not rejected when it is false. With the two hypotheses in the
previous example, a type I error is committed if a pixel (voxel) has been classiﬁed as
an non-object point when it is indeed an object point. On the other hand, a type II
error occurs if a pixel has been mistakenly classiﬁed as an object point. We will use

these concepts to establish some performance measurements in the next few sections.

6.1.3 Parameter-Based Evaluation

Assume an object to be recognized can be precisely and uniquely described by a set of
known parameters. This set of parameters forms the ground truth about the object.
Let the corresponding object recognized by a vision system be represented by a set of

parameters that are estimated from the input image. The discrepancy between these
two sets of parameters speciﬁes the performance of the vision system to be evaluated.
We now formally deﬁne this evaluation scheme. Let GP be the set of ground truth

parameters:

GP = {9“,9”, ..., 9”},

and P be the set of estimated parameters:

P : {plapib "'9 pk}

Then the difference between 9” and P, denoted by 6(QP, P), is deﬁned as:

5(QPJ’) = {5(QP‘,p.-)| 1 S i S kl,

189

where

5(gp‘mi) =l g“ —Pi I-

That is, 6(QP,P) is a set of deviations measured as the absolute differences between
pairs of corresponding parameters, one from the ground truth set and the other from
the estimated set. A smaller deviation indicates a better performance. The amount
of increase in these deviations with respect to the degree of noise added into the input
image reﬂects the sensitivity of the system to noise. That is, the performance degra-
dation with respect to the quality degradation of input data measures the robustness
of a vision system. The parameters used for evaluating tubular object recognition

system developed in Chapters 4 and 5 are the object model parameters.

6.1.4 Boundary-Based Evaluation

This evaluation scheme is designed to assess the quality of an object recognition
system in extracting the boundaries of the objects of interest. Let f (x) denote an
input image and B be the boundary point set derived from f by the machine vision
system to be evaluated. Assume the boundary ground truth for f, 93, is available.
The goal of this evaluation method is to describe the discrepancy between 93 and B

using certain criteria. We propose three measures to form this description, namely

distance distribution index, ’DB, type I error, eg, and type II error, eg.

Overall Performance Index

We define the distance distribution index, denoted by D3, as a discrete function
whose distribution characterizes the discrepancy, measured in distance, between the
boundary ground truth and the boundary derived by the underlying recognition sys-

tem. The distance from an arbitrary point x in set B to ground truth 93 is deﬁned

190

as the minimum absolute distance from x to 93 :
d(x,gB) = min{dz(x,y)}» W 6 G”,

where dE(x,y) denotes the Euclidean distance between point x and point y. The
distribution of 193 can be constructed from the histogram of distances between in-
dividual points in B to the ground truth boundary QB. Distribution D3, its mean
E, and its standard deviation M together characterize the degree of match (or
mismatch) between B to 93. Figure 6.1 shows several examples of this distribution.
Since means are known to be sensitive to outliers, the median, denoted by D3, of

distribution 733 can also be used to describe the degree of match.

 

(a) (b) (C)

Figure 6.1. Examples of the distance distribution D3 for image R2 at (a) initial stage,

(b) intermediate stage, and (c) ﬁnal stage.

In this research, to implement this evaluation scheme, the technique of Chamfer-
ing distance transformation [7] is adopted and extended to 3D volumetric space to
compute a. distance map with respect to Q”, denoted by d(x,gB) based on which
D3 can be constructed. Assume both 93 and B are deﬁned on a grid, 2D or 3D,

{x,}, 1 S i S MXNXD(MXNxDisthesizeofthegridandD=1in2D

case). Derive d(x,gB) by performing a distance transformation based on QB using

191

the Chamfering algorithm. Both the original and extended Chamfering algorithms
are given in Appendix A.
From the resultant distance map d(x, 93), whose value at x represents the mini-
mum distance from this point to ground truth 93, D3 is derived from the histogram
of d(x,d(x,gB)). The mean and standard deviation of D3 can be computed as:

— _ 2x68 d(X, 98))
DB " l B l

 

 

as, : \szesd(x,98)2— I B l >613;
| B l -1

A perfect match between g3 and B will yield D3. = 0 and 038 = 0. Generally, a

D3 with a zero mean and a small standard deviation indicates a good performance

of the machine vision system to be evaluated. A large standard deviation may signal

the existence of outliers. In that case, median D3 of distribution D3 may provide a

better indication of the performance accuracy of the system.

Type I Error

Type I error in boundary detection speciﬁes the percentage of the points on 93 that

have been mistakenly classiﬁed as non-boundary points. It is deﬁned by:

E

 

where

B1={x|(x€QB/\(X¢B)}.

192

Type II Error

Type II error indicates the percentage of the boundary points on B (x) that have been

mistakenly classiﬁed as boundary points. It is calculated from:

where

B2={x|(xeB)/\(x¢93}.

6.1.5 Region-Based Evaluation

This approach evaluates the performance of a vision system in terms of the accuracy
in extracting object regions. Let the recognition result labeled by the vision system be
R(x) and the corresponding ground truth labeling be gR(x). Both R(x) and QR(X)
are binary functions of x with function value of H as the label for the object region
and L as the label for the background region. The goal is to quantitatively describe
the degree of mismatch between R(x) and gR(x). The smaller the degree of mismatch,
the better the system performance. Similar to the boundary-based approach, three
indices are designed to serve the purpose. They are performance index, p, type I error,
e19, and type II error, cg.
Overall Performance Index

We deﬁne a performance index, denoted by p, to measure the degree of the agreement

between gR(x) and R(x):
_ 2.6,[1— W1
— If I

i

193

where f is the image function. The measure p = 1.0 when a perfect match occurs
and p = 0.0 when R(x) and 93(x) completely mismatch. This index actually gives

the percentage of the mismatched points.

Type I Error

The type I error in a region-based evaluation method indicates the percentage of the

object points in QR(x) that have been mistakenly classiﬁed as non-object points in

B(x). That is:

where

HI = {x | (X E GEM) A (X ¢ R(X))}-

Type II Error

This index describes the percentage of the object points in R(x) that should be

background points, according to ground truth QR(X). Therefore,

2 152;,
R | R(X) I

where

R2 = {x | (x E R(X)) /\ (x ¢ 9R(X))}-

6.2 Synthetic Data Generation

Synthetic data is needed to (1) test the tubular object recognition system and (2)
provide ground truth in evaluating system performance. Motivated by the axial
representation for Generalized Cylinders, all the synthetic tubular objects in this

study are generated using a parametric approach described below.

194

Let E denote a tubular object to be generated and it is represented by its center

Every point on A is attributed, carring a set of

line or axis, denoted by A(E).

parameter values that describe the corresponding cross section centered at that point.
In our study, we only deal with a special class of tubular objects with circular cross
sections that are always perpendicular to the axis. Therefore, only radius information

is carried on each axis point. The trajectory of A describes the shape of E and the

radius attributes on all the points of A specify the size of E. Therefore, such an axial

representation completely characterizes object E.

Due to the descriptive power of an axial representation for tubular objects, the

following algorithm is used in generating a synthetic tubular object, either in 2D or

3D.

1. Generate axis A using a given set of control parameters.

2. Derive a minimum distance map d(x) with respect to curve A.

3. Segment d(x) using a given radius r to derive 3D region R(E) occupied by E.

4. Generate E by rendering R(E) using some appropriate sensing model.

The region R( E) consists of all the points x whose minimum distance to curve A is

smaller than or equal to the given radius. Based on the sensing models used in this

study, the rendering operation here uses the techniques described in Section 4.3.

To generate synthetic objects with different shapes, we adopt a parametric ap-

proach through which object shape and pose can be controlled via a set of parameters.

Below, we examine the parametric approaches proposed to generate various axes with

different shapes and poses in both 2D and 3D cases.

195

6.2.1 2D Synthetic Data Generation

In 2D image plane, we use a cubic curve to represent a general axis. A cubic curve

is deﬁned as:

yzAx(:r—xo)3+BX(x—xo)2+CX($—$o)+Da

where A, B, C, and D are the coefﬁcients and (.730, yo) where yo = D is the translation
of the curve in the image plane. All the points (z, y) that satisfy the above equation
form curve A. Through different combinations of the coefﬁcients, curves with various
shapes can be derived. For example, the bigger the coefﬁcient A is, the more curved
the axis becomes. If A = 0.0 and B 75 0.0, the axis generated is symmetric with
respect to 3:0. Another special case is a straight line when A = B = 0.0. In this case,
the slope of the curve is a constant which is the orientation of the line. The tangent
at every point on the curve can be computed from the ﬁrst derivative of the curve at
that point:
y' = 3A(:r — mg)2 + 2B(:1: — $0) + C,

which provides the ground truth of the orientation of the object at that point can
then be computed and will be used as ground truth.

The choice of using such a parametric cubic form for generating an axis is because

(1) the shape of the curve it represents can be easily controlled by changing the

coefﬁcients and (2) the cubic form is the lowest order that guarantees a second order

differentiable curve CZ, hence smooth everywhere.

196

6.2.2 3D Synthetic Data Generation

To generate a 3D axis, two approaches are used. One is to apply a 3D homogeneous

transformation to a 2D curve generated in a cubic form discussed above. That is,
A3 = A2 x ’H,

where It represents a homogeneous transformation with translation T = [X0, Y0, Z0],
rotations 123(0) (rotate (1 degrees with respect to the the X axis), Ry(0) (rotate 0
degrees with respect to the Y axis), and RAE) (rotate ,3 degrees with respect to the

Z axis. That is,
H.- = Rx(a) x Ry(0) x RAB) + T

an. as, —m oi
smwrcmgmmm+dm.mno
casgcg+sasg 005359—5009 0.0,, 0
X0 n 25 1

 

 

.l

The other approach is to use directly some 3D parameterized curve. One such
example is the mathematical helix. It is formulated as follows. A helix, denoted by

h, is parameterized by 0, for 0 E R:

where

r(0) = gxcosw),
y(9) = bx0,

2(0) 2 —Xsin(0).

 

 

 

197

There are several parameters that can be used to control the shape of a helix. Param-
eter 11) controls the radius of the helix and parameter b controls how rapidly a helix
rises. In this particular application, since a synthetic object is generated within a
normalized 3D cube, parameter b can be computed from a speciﬁed number of cycles

of the helix, denoted by c, that are to be seen within the cube. So, b can be derived
by:
1
b = —.
360 X c
Figure 6.2 shows two discrete helixes generated in 3D volumes using different sets of

parameters.

 

Figure 6.2. Examples of helixes generated using different parameters, (a) w

= 36,
c 21.0, b = 0.1778, (c) w = 20, c = 3.5, b = 0.0508.

6.3 3D Tubular Object Visualization

One of the advantages of a concise axial representation for tubular objects is that it

supports an effective visualization scheme for 3D blood vessels. This visualization

 

198

scheme allows a viewer to “travel” along the axes of chosen vessels. The sensed
blood ﬂow pattern for every cross section can be directly visualized by projecting
the sensor measurements in the cross section onto a plane that is perpendicular to
the viewing direction. This can be achieved by aligning the V axis of the object-
centered coordinate system with the viewing direction and then transforming all the
points (3:, y, z)’s on a cross section into (u,v,w)’s in UVW. Since each cross section

has a known intrinsic dimensionality of two, the proposed visualization scheme can

be realized through an eigenvector transformation. In this way, there is no need to

explicitly compute the rotation matrix needed to transform (2:, y, z)’s to (u, v, w)’s.

Let A be the axial representation for a blood vessel:
A = {(Aia 73)}, 1S i S 19 Ai : [xiiyiazilTa

where I is the total length of the axis and Ag is a 3D point on axis A, rg is the
radius information that Ag carries, and l is the length of A. To achieve the proposed
visualization scheme, every cross section xg, represented as (Ag, rg), 1 S i _<_ I, needs

to be visualized individually. A cross section Xg = (Ag, rg) resides on a 3D plane Pg

which can be described by

Pg: agxx+bg><y+ngz+dg=0, (6.1)

where (ag, bg, Cg) is the normal of Pg and xg is translated at Ag =(xg, yg, zg)€ Pg. Note,
vector (ag, bg,c.-) is parallel to the tangent vector of A at Ag, denoted by Ag. All the

points on Xg should simultaneously satisfy (6.1) as well as the following condition:
|| XI: - «45 HS re,

Where xi = [$k,yk, ijT, X]; E )5.

 

199

In order to align the tangent vector Ag with the viewing direction, an eigenvector
transformation is performed on the set of cross section points X = {xk}, xk E Xg.

The transformed set of points, denoted by Y = {yk}, is computed using

Y = [y1, ..,y,,]T = [x1, ..,x,,]T x HT = x x HT, (6.2)
where yk = [:cjc, yjc, zle. The matrix H is obtained by solving
HRHTzAm

where R is the covariance matrix of X and AR = diag(A1, A2, A3) is a diagonal matrix
with Ag,i = 1,2,3, the eigenvalues of R. Ideally, all the transformed points should
have 2;, = 0. By translating all the points in Y to (:rg,yg,i), where i represents the
2D viewing plane, cross section xg is centered at (:rg,yg) on the viewing plane and
all the sensor measurements within Xg are projected onto the viewing plane that is

perpendicular to the viewing direction. This visualization scheme should be especially

useful for clinical diagnostic purposes.

6.4 Experimental Results From 2D Images

Experiments on 2D test images are conducted and the results will be presented and
evaluated in this section. Three types of test data are used: images of synthetic
tubular objects, images of man-made tubular objects, and images of organic tubular
objects. Four object parameters are used in these experiments. They are translation
(mmyo), orientation 3, and radius r. The system performance is evaluated based on
the recognition results from the synthetic images and some of the images of organic
tubular objects. The robustness is analyzed based on the degradation of the recog-

nition results from synthetic images under different degrees of Gaussian noise added

200

into images.

6.4.1 Results and Evaluation on Synthetic Data

Test Data

The 2D synthetic data are 256x256 images. Table 6.1 lists all the parameters used to
generate ten tubular objects in seven 2D synthetic images, of which four contain only
straight tubular objects (i.e., A=B=0.0). A table entry value “na” means “not ap-
plicable”. Table 6.1 will also serve as ground truth in evaluating system performance
using a parameter-based scheme. To examine the sensitivity of the system to noise,
different degrees of white noise were added to all 20 synthetic images. Four levels of
noise, corresponding to a=0,10,15,20 are used. The total number of 2D synthetic test

images is 7x4+5=33. Some of these synthetic test images are shown in Figure 6.3.

Figure 6.3. Examples of 2D synthetic tubular objects, (a) two straight objects in
S23, (b) one curved object in S26, (c) two curved objects in S27.

       

(b)

Performance and Robustness

To evaluate the performance and robustness of the system on synthetic test data, the

parameter—based evaluation scheme is applied because of the nature of the parametric

201

 

m

 

 

 

 

 

 

 

 

 

 

 

 

 

as 2 8.8- 88.8- 88.8- 88.8

3: 8 88.2 883 88.8- 88.8 8 Km
3: S 88.8 88.8- 8.8- 88.8 H 88
3: 2 88.8 88.8 88.8 88.8 a am
of 8.2: 888 8a.? 88.8 88.8

8.2 8.8 88.8 88.8 88.8 88.8 a 8m
3: 8.8 88.8 888 88.8 88.8

3: 8.8 88.8 885 88.8 88.8 a 8w
8.2 8.88 88.8 828. 888 88.8 H an
8.2 8.8 88.8 88.8 88.8 88.8 a 8m
.. n Q o m 3 use; @882

 

 

 

 

 

 

 

 

domes: awe... US$35? Gm .8“ 5:: 9580 go via.

 

202

synthetic data generation and the speciﬁcity of their parameters. When a 2D syn-
thetic object is straight (A = B = O), the differences between (1) the ground truth
radius r and the estimated radius 1“ and (2) the ground truth slope t = tan-I(C) and
the estimated orientation ,8 are used as indices for the performance. If a synthetic
object is curved, t and B can be compared in a piecewise fashion. We denote the
absolute difference in orientation by 66 and denote the absolute difference in radius
by 6'. Since the input data is quantized, the estimated 1" often differ from r by one
or two pixels. The smaller these index values are, the better the performance is.

These indices are also used to assess the robustness of the system. Another index
used for this purpose is the optimized ﬁgure of merit p“. With the increase of noise,
the degradations in these indices reflect the robustness of the system. Both 6‘6 and
6' may increase and p" may decrease when noise is added to input images. The
quantities of these changes measure the sensitivity of the system. The smaller the
changes in these index values, the more robust the system is.

Table 6.2 lists the experimental results for the seven 2D straight synthetic objects
in four test images. The table includes both the estimated parameters, the evaluation
indices, 6’s, measured by the absolute difference between the ground truth and the
estimates, as well as the optimized ﬁgure of merit (2* deﬁned in Section 4.4.4. In
Table 6.2, column labeled by n indicates the degree of noise added. Column “obj”
lists the objects to be processed. For example, S23-2 represents the second object
from input image S23.

Examining Table 6.2, we see that the degradation in p" is not obvious and does
not have a consistent and signiﬁcant trend. Since all the values of p“ are close to 1.0,
the system seems to be fairly robust against noise in this set of experiments. The
errors in estimating r increases with the increase of the degree of noise. So does the
errors in estimating the orientation. Table 6.3 gives the averaged 6’0, 6', and 9" under

different noise situations. The effect of noise on averaged 5" and on averaged 63 are

 

aim—

)_.-_-

-.'..z_'~ (.’.-1L.
1'

 

203

Table 6.2. The results for 2D straight synthetic objects.

 

obj. n E 65 r 6" p“
321-1 0 28.1 2.5 15.0 0.0 .9625
10 27.9 1.3 15.4 0.4 .9628
15 31.0 4.4 15.2 0.2 .9405
20 29.1 2.6 15.3 0.3 .9617
822-1 0 335.0 15.0 15.1 0.1 .9545
10 335.0 15.0 15.2 0.2 .9569
15 336.2 14.8 15.3 0.3 .9744
20 330.1 20.0 15.4 0.4 .9608
323-1 0 60.8 0.8 9.8 0.2 .9946
10 59.8 0.2 9.6 0.4 .9890
15 59.8 0.2 9.6 0.4 .9755
20 59.8 0.2 9.7 0.3 .9515
823-2 0 67.8 7.2 13.2 0.2 .9852
10 68.2 6.8 13.1 0.1 .9777
15 67.1 7.9 12.8 0.2 .9832
20 60.0 15.0 12.7 0.3 .9885
824-1 0 39.1 0.9 12.4 0.4 .9323
10 40.8 0.8 12.6 0.6 .9065
15 38.9 1.1 12.7 0.7 .8956
20 40.2 0.2 12.6 0.6 .8785
S24-2 0 131.3 1.3 11.9 0.1 .9640
10 130.0 0.0 11.9 0.1 .9447
15 131.4 1.4 11.8 0.2 .9817
20 131.3 1.3 11.6 0.4 .9711

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

visualized in Figure 6.4.

The experimental results for the curved synthetic objects are shown in Table 6.4.
Since the curved objects are approximated as a series of local straight tubes, the
estimated local centers are immaterial to the evaluation. The estimated orientations
of local tubes are currently not used in evaluation. The degradations in both p“ and
6' for the curved objects show a fairly consistent trend: 9" decreases and 5' increases
with the increase of noise. Figure 6.5 visualizes this observation based on the averaged
50* and 6'. Figure 6.6 visualizes the precision of the local tube centers estimated from

synthetic images shown in Figure 6.3.

 

-22...-ji

_-
"s...-

-7 ‘f.’
I

 

 

204

Table 6.3. The averaged results from all 2D straight synthetic objects.

 

n M 6" so"

0 4.6 0.17 .9555
10 4.0 0.30 .9562
15 5.0 0.33 .9585
20 6.5 0.45 .9520

 

 

 

 

 

 

 

 

 

 

6.4.2 Results and Evaluation on Real Data
Test Data

Two kinds of real tubular objects are used as test data. One kind is man-made
tubular objects such as wires and pipes. Eight images of man-made tubular objects
were taken in the Pattern Recognition and Image Processing laboratory at Michigan
State University. Figure 6.7 shows the test images of man-made pipes and Figure 6.8

gives the images of wires.

Another kind of real tubular objects are organic. They are eight plant root images
provided by the Plant and Soil Department at Michigan State University and four
bacteria images. Figure 6.9 and 6.10 illustrate the images of both bacteria and plant

roots.

Performance and Robustness

A total of eleven of this category of input images have been processed. The system
parameter 1] was set to 0.3 due to the low contrast and noisy nature of this type of
data. Such a low threshold of p will increase type II error.

The derived tubes for plant root images and bacteria images are given in F ig—

ure 6.11 and Figure 6.12. With the ﬁve ground truth images for plant roots (given in

205

 

 

 

 

 

 

 

 

Sensitivity of Radius Estimation Sensitivity of Orientation Estimation
0.5 , , g 7 , , t n
0.45' I! *
6.5+ at
0.4’
035* i S
.5 n g 6’
g 0.3- x g
A s
30%» E15
3 5
°‘ '5
G
E 0.2’ .5
’l‘ h 5 x
0.15. E
0.1 . . alt
4.5r
005*
G 1 l 1 1 4 r 4 l 1
0 5 10 IS 20 0 5 10 15 20
Degree of Noise Degree of Noise

Figure 6.4. Effects of noise on 6" and 6B for 2D synthetic straight objects.

Appendix B), we applied both region-based and boundary-based evaluation schemes
to assess the performance of the system. The ground truth is built for R2, R3, R4,
R5, R7. Table 6.5 gives the evaluation results from the region-based scheme. The
performance index p is almost 95% on average. Both type I and type II averaged

errors are near 15%.

For 2D data, a hierarchy of processing is designated to extract more reliable
boundary information, hence to produce quality hypotheses. The initial stage is the
grouping of line tokens. When curvilinear boundary tokens are formed, background
noise were discarded based on the decomposed object model at these levels. Grouping
boundary tokens further resolves ambiguities of gaps among these tokens. Hypotheses

are initiated from reliable boundary tokens and are further grouped into tubes. In

 

in“ a
_ 1

'ﬁ.‘
mu.

‘F
1’.

206

Table 6.4. The results for 2D curved synthetic objects.

 

t

obj. n r 6" p
325-1 0 13.1 0.1 .9886
10 12.9 0.1 .9853
15 13.1 0.1 .9788
20 12.9 0.1 .9775
826-1 0 13.1 0.1 .9759
10 13.1 0.1 .9717
15 13.0 0.0 .9702
20 13.0 0.0 .9595
827-1 0 10.1 0.1 .9784
10 10.2 0.2 .9603
15 10.5 0.5 .9585
20 10.4 0.4 .9526
327-2 0 9.5 0.5 .9653
10 9.3 0.7 .9370
15 9.4 0.6 .9273
20 9.5 0.5 .9114
Avg. 0 0.20 .9771

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10 0.28 .9636
15 0.30 .9587
20 0.25 .9503

 

 

 

 

 

 

 

order to examine how these processing steps gradually improve the quality of recogni—
tion, we evaluate the results from the following three stages: initial stage (obtaining
line tokens), the middle stage (deriving reliable boundary tokens), and the current
stage (deriving tubes). The boundary-based evaluation results for these stages on ﬁve
plant root images are shown in Table 6.6.

It is obvious that the quality of recognition gets improved when the processing
proceeds. The distance distribution D3 describes the discrepancy between detected
boundary and the ground truth so that the smaller its mean and standard deviation
the better. D3 drops with the processings stages. This measure can be dramatically

affected by outliers. A large 01) B may suggest the existence of outliers. To avoid this

207

 

 

 

 

 

 

 

 

Sensitivity of Radius Estimation Sensitivity of Proximity
0.35 v v 1 T l , f .
0.975r
0.3' I! 4
0.9%
at
5- ..
g 0.25 a“ E 0.965» x
s i
.3 z 0.95
‘8 z. §
n: g
G
.... 0. -
5 ”E g 0.955»
0.95' "‘ .
0. 15’
0.945’ 4
0. l J l 1 1 0.94 1 m l l
0 5 10 15 20 0 5 10 15 20
Degree of Noise Degree of Noise

Figure 6.5. Effects of noise on p“ and 6" for 2D synthetic curved objects.

sensitivity, we also measured the median of D3 that provides a better indication in
terms of performance. This median is D3 in Table 6.6. It drops rapidly from 18.60 to
5.2 and then to 1.0 on average when the processing proceeds. The value of 1.0 may
be caused by the bias the Canny edge detector has (it can be proven by applying it to

a checkerboard image). Type 11 error also drops sharply which means that spurious

Table 6.5. The region-based evaluation of performance on plant root images.

 

Measure R2 R3 R4 R5 R7 Average
p .9820 .9325 .9056 .9497 .9763 .9492

cf; .0513 .1548 .2028 .1616 .1177 .1376

eg .2003 .1362 .0889 .1584 .1683 .1504

 

 

 

 

 

 

 

 

 

 

 

 

208

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

88. 88. 88. 8.8. 88. 82. ms
82. 88. 88. 82. 82. 88. as
3 3 3 3 3 3 an
8.2 8.2 8.8 8.... 2.... 8.8 mes
88 8.8 8.8 88 8.8 8.8 mm 8250
82.. 2.8. 82.. 88. :8. 82.. ms
88. 88. 8:. 82. 88. 88. ms
8.. 8.8 3 3 3 3 an
8.8 8.8 8.8 8.... 2.8. 8.2. 8:.
8.2 2.8 8.2 8.8 8.2 8.8 mm 22.8
88. 88. 88. 82.. :8. 88. m...
88. 88. 88. 88. 88. 28. «a
8.2 8.8 8.2 8.2 8.8 8.3. 3.9
8.8 2.8 8.8 8.8 8.8 8.8 8:.
8.8 2.8 2.8 88 8.8 8.2. mla 822
oweuo>< Sm mm 3m 2 mm c.5302 oweam

 

 

 

 

 

 

 

 

8.225 803 883 no eoneEuotoa 8828 Mo 220383.86 nonmebevcwom 66 @388

 

209

 

(a)

Figure 6.6. Estimated local tube centers superimposed on input images (a) S23, (b)
S26, (c) 527.

(a)

Figure 6.7. Examples of man made tubular objects of pipes.

 

(d)

 

boundary information continuously gets discarded. Type I error actually increased
slightly when the processing proceeds. This may be partially due to the side effect of
discarding spurious information.

In general, the absolute values of measured Type I and Type II errors are not as
signiﬁcant as the trend of their changes due to the following reason. Recall, both types
of errors are measured based on the number of pixels that are mistakenly classiﬁed
either as boundary points or as non-boundary points. Because of the quantized points
in an image, many pixels that are classiﬁed wrong may be only one pixel away from the
correct positions. Therefore, absolute percentage of the points that are not classiﬁed
correctly is a very conservative measure. The most signiﬁcant measure in Table 6.6 is

averaged D3 that indicates the median distance between ground truth boundary to

 

210

   

(b)

Figure 6.8. Examples of man made tubular objects of wires.

Wﬂw-zswmﬁ-amﬁ

        

Figure 6.9. Examples of organic tubular objects of bacteria.

estimated boundary. Figure 6.13 visualizes the improvement of recognition accuracy
in terms of D3 and eg based on the averaged measures in Table 6.6. For D3, the

desired distribution is:

|B(f)| d=0
0 (1750

D3(d) =

That is, the closer the D3 and 033 are to zero, the better. Figure 6.1 and Figure 6.14
show how D3 changes when processing continues for the plant root image R2 and
R5. The center columns of these distributions in the two ﬁgures correspond to zero
in distance. Therefore, it is clear that the distance distributions for both test images

gradually changes toward the desired distribution.

211

 

Figure 6.11. Detected tubes for test images (a) R1, (b) R2, (c) R3, (d) R4, (6) R5.

We now present some of the results obtained from objects of wires. Due to lack of
the ground truth for this type of data, performance is currently assessed through visual
evaluation. The results for this type of data were obtained using system parameter
77:0.65. The precision of the estimated center points of local tubes can be seen from
the superimposed images for input MM7 and MM8 in Figure 6.15.

One example of the effectiveness of the system in disambiguating 2D ribbons
because of the use of 3D model information is shown in Figure 6.16. There are
ribbons in Figure 6.16(c) that do not correspond to object. Only two of the ribbons

Were veriﬁed as tubes and labeled as local tubes in Figure 6.16(c).

212

.. J \\
r / l
2‘ — "‘ \ y \
I X,
a) (b) (C) (d) (6)

Figure 6.12. Detected tubes for test images (a) R6, (b) R7, (c) B1, (d) B2, (e) B3.

 

1‘1"“ .3

6.5 Experimental Results From 3D Volumes

_Q . cum.
1
A

The model-based recognition system using hierarchical token grouping is tested on 3D
synthetic data and 3D real blood vessels from MRA scans. To evaluate the system,
synthetic tubular objects with different structural properties are generated using a
parametric approach. The object parameters used in generating synthetic data are
also used as ground truth in evaluating the system performance and robustness. The
robustness is assessed at different levels of noise. Different degrees of Gaussian noise
are added with n = 0,10,15,20, where n is the standard deviation of a Gaussian
white noise distribution. For synthetic data, 17 = 0.5 is used. For real data, 1} = 0.3 is
used. In all experiments, A’s are set to 2.0. A total of more than 50 synthetic volumes
and 30 volumes of real data with arterial blood vessels from MRA brain scans are

tested.

6.5.1 Results and Evaluation on Synthetic Data
Test Data

As introduced before, two methods can be used to generate an axis of a 3D synthetic
tubular object. One is to obtain a 3D curve by applying a homogeneous transforma-

tion to a 2D curve whose shape can be controlled through a set of parameters. The

213

 

 

 

 

 

 

 

 

Improvement in Distance to Groundtruth Improvement in Type 11 Error
zcl Y T 1' I V I U
18* .
0.6»
16’
14+ _,
= 05*
8 1-3
5 12’ g
“5 10 0.4t
a i
'3 8* =3
2' a
g3.“ 0.3»
6+
x
4 . L
02, .
2 . 4
G L l. I T 0. l l 1 l
0 0.5 l 1.5 2 0 05 1 1.5 2
Stages of Processing Stages of Processing

Figure 6.13. Average improvement in D3 and 6119’.

other method is to use directly the parameterized 3D helix formulation. Different
axes with different shapes can be generated by controlling two helix parameters (w
and c).

Table 6.7 lists the parameter settings for the 3D synthetic test data generated
using the ﬁrst method. Three of the synthetic test volumes are shown in Figure 6.17,

displayed using Maximum Intensity Projection.

Table 6.8 gives the parameter settings for generating 3D helixes using the second
parametric method. Figure 6.18 shows three helical synthetic objects generated.
Similar to the 2D case, all 3D test volumes were corrupted by different degrees of

Gaussian noise and then used in assessing the robustness of the system.

214

 

 

 

 

 

 

 

 

 

 

 

 

88 S 8.:- 888- 888- 888
88 s. 883. 83 888- 888

3. 8 8888.8 888- 888- 8888 8 88
88 2 8888.8 83- 888 888 8 8.8
3. 882 8888.8 83- 888 8888.8

88 8.2. 888 883 8888.8 8888.8 8 88
88 s. 888- 888- 888- 888-

88 S 88.2 888 888- 888 8 88
8.2 8.2. 8888.8 883 8888.8 8888.8 8 88
.. 8 Q Q m 8. .3088 88.5

 

 

 

 

 

 

 

 

82839» 03059? Qm no.8 58.5 “836.80 so @389

 

215

(a) (b) (c)

Figure 6.14. Examples of the distance distribution 133 for image R5 at (a) initial
stage, (b) intermediate stage, (c) ﬁnal stage.

(a) (b)

Figure 6.15. Superimposed center points of local tubes for (a) MM7, (b) MM8.

 

Performance and Robustness

The results for the synthetic volumes were obtained by setting 71:0.50 and A’s = 2.0.
The experimental results for the ﬁrst set of 3D synthetic objects (in Table 6.7) are
shown in Table 6.9. All 65’s and 6”s are small and all but one value of 32' are above
0.92. The values of 6”s and p"s reported in Table 6.9 are weighted averages taken
over the entire length of an object. The weight put on the measures from each local
tube is determined by the ratio of the tube length to the total length of the object.

Figure 6.19(a) and (b) plot the individual values of p‘ for local tubes along three

‘216

- l
(a) (b) (F)

Figure 6.16. The recognition result for input image MM4, (a) input image, (b) de-

tected boundary. (c) the tubes found.

(a) (b) (C)

Figure 6.17. Examples of SD synthetic volumes. (a) volume S32, (b) volume S34, (c)

 
  

        

Volume S35.

Syr‘ithetic objects. It is clear that values of 33" from individual connected local tubes
are stable and smoothly distributed. Similar phenomenon has been observed for 6"
and 6‘3 (whenever it applies).

Table 6.10 gives the experimental results from 3D helixes. For each test volume,
different degrees of noise were added (indicated in column two of Table 6.10). At
every situation of noise, three measures are listed: average error in estimating radius
5r, average estimated length of local tubes law, and average value of 53'. All the

a"fe‘l‘ages are taken over the entire length of the objects. The degradation of these

'31 Lle-s w1th the 1ncrease of n01se indicates how sensmve the system is to the norse. The

217

Table 6.8. Ground truth for 3D synthetic helixes.

 

Figure 6.18. Three synthetic helical objects. (a) helix-1, (b) helix-3, (c) helix-6.

error in estimating object radius does not have an obvious change with the increase
0f noise but their corresponding 30‘ values do decrease, signaling less conﬁdence in
estimated radius. The average lengths of local tubes seem to decrease slightly with the
increase of noise. But with the increase of object curvature (the more cycles a helix
has, the higher curvature it has), the average local tube lengths decrease, meaning
that the sweeping operation adaptively adjusts itself to an appropriate situation in
Order to track the objects. The optimized ﬁgure of merit p“ has obvious trend of
degradation when the noise level increases but the amount of degradation is fairly
SIT1311], indicating that the system performance is fairly robust. Figure 6.20 describes

”1 more detail how the estimated radii along object helix-2 ﬂuctuate under different

218

Table 6.9. Results from 3D curved synthetic objects.

 

Obj. ﬂ 63 i‘ 6" p“

831-1 45.0 0.0 11.7 0.3 0.9314
832-1 na na 6.0 0.0 0.9284
832-2 na na 8.3 0.3 0.9877
833-1 45.0 0.0 6.1 0.1 0.9735
833-2 135.0 0.0 4.3 0.3 0.9593
834-1 na na 9.1 0.1 0.9458
835—1 na na 4.2 0.2 0.9819
835-2 na na 4.8 0.2 0.8982
835-3 na na 6.0 0.0 0.9522

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

noise situations. The higher the noise degree, the more often the estimates deviate
from the ground truth. Signiﬁcantly higher ﬂuctuation occurs at one point only at
noise degree level n = 20. Since the step length used in optimizing estimated radius

is 0.5, the minimum ﬂuctuation is 0.5.

To see the detailed degradation in p“, individual values of p“ along two helical
Objects under different degrees of noise are plotted in Figure 6.21. It is demonstrated
that the ﬁgure of merit degrades gracefully with the increase of noise. This can also
be seen in Figure 6.22 which visualizes the estimated object axis for helix-6 under
different degrees of noise. Figure 6.22(a) and (c) are the input volumes of helix-6

With noise degrees n = 0 and n = 20. Figure 6.18(b) and (d) are the estimated object
33:68 for the input in (a) and (c), respectively. From the ﬁgure, we can visually see
that, even with corrupted data, the object axis was estimated fairly precisely with

Very slight deviation in positions.

The experimental results suggest that it is more difﬁcult to track more curved
Obj ects than less curved objects. We compare the estimated radius values of helix-2

and helix-6 (more curved) because these two objects have the same radius of 3.0.

 

219

 

 

 

 

 

 

 

 

Figure of Merit AlongObjectlcngth FigurcofMuitAlongObjecthgth
l-3Iiiirrirrriilri l-3IIIIIIITTllllT—T
S34— $35-2—
l.2 . ., [.2 - 535-3 ----..

5 1.1 - - g 1.1 - -

2 2

“5 1" " “6 [FM-— 'l

a \________ a ________________________________

1: MP q ii". 0.9. 1
0.8- 1 0.8 - -
0.7JJ_LLL1J_1LJI_1111 mininiiiiiiiiiii

l 61116212631364146515661667176 I 6Ill6212631364l46515661667l76
Length Len

(a) (b)
Figure 6.19. Values of ﬁgure of merit along objects (a) 834-1, (b) 835-1 and 835—3.

The estimated radius values along objects helix-2 and helix-6 when n = O are shown
in Figure 6.23 (the values of p“ from helix-6 when n = 0 is also plotted in Fig-
ure 6.23(b)). Compared the variation of the estimated radius values for helix-2 with
that of the estimated radius values for helix-6, the latter seems to vary more. How-
ever, examining the plotted ﬁgure of merit values along helix-6 in Figure 6.23(b),
Whenever the estimated radius values deviate from the ground truth, the ﬁgure of

merit drops, signaling less conﬁdence in the estimated object parameters, a desirable
Property.

The estimated local tube lengths should vary with the curvature along an ob-
ject. Since a helical object has a constant curvature (theoretically but may vary
in quantized data), we use two helixes with obviously different curvatures to show
that the optimization scheme we used will adapt to the changing curvature so that
the estimated tube lengths will be adjusted so that the object is correctly tracked.
I:‘igllre 6.24 plots the estimated lengths of local tubes along synthetic objects helix-1

and helix-6. The plot is made based on the ﬁrst 30 local tubes of both objects. The

,1

220

Table 6.10. Results from 3D synthetic helixes.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Helix Noise 6" lavg @269
0 0.0500 3.8300 0.9635
helix-1 10 0.1333 3.1667 0.9209
15 0.0833 3.1667 0.8634
20 0.1167 3.0667 0.7956
0 0.1067 2.9600 0.9613
helix-2 10 0.0773 2.7667 0.9047
15 0.1311 2.7333 0.8615 ..,
20 0.1133 2.7943 0.7491 i;
0 0.0667 3.2600 0.9788 1;"
helix-3 10 0.1500 3.1000 0.9321 ll.
15 0.1333 3.1000 0.8801
20 0.1333 2.9800 0.7865
0 0.1636 3.2500 0.9652
helix-4 10 0.2734 2.9444 0.9224
15 0.1944 2.7647 0.8731
20 0.1833 2.8333 0.7633
0 0.0500 2.8000 0.9626
helix-5 10 0.0833 2.7843 0.9187
15 0.0167 2.7667 0.8696
20 0.0333 2.7333 0.7868
0 0.2381 2.5800 0.9648
helix-6 10 0.2000 2.4500 0.9238
15 0.2167 2.3054 0.8714
20 0.2333 2.5667 0.8012

 

 

 

 

 

 

 

difference between the two distributions of the local tube lengths from helix-1 and
helix-6 can be clearly seen. Almost all the tube lengths of helix-6 are shorter than

those of helix-1 even though fluctuations exist in both.

221

Impact of Noise on the Estimated Radius

 

 

 

 

 

8 1 l 1 l l r I
7357 : Ground Truth — :

n— ----
m 6-5 ' n=10 ----- "‘
.5 6 - “=15 ......... -
E 5g ‘ n=20 ----- "
B 4.5 " '2‘ '-
N 4 1— .I . _
g 3.5 73'. '1 A .- "l \‘7. 1m": .‘. .r'y' 3,198 \ .'ﬁ.‘
[1‘3 3 "1;! "‘ 'li‘! ' ‘54:) '1’ \I u ‘5 '1 "J M‘. 1". ""
2.5 6 V 1
2 ~‘- _
1.5 - '-

1 1 1 1 1 1 1 1

O 10 20 30 40 50 6O 70

Length along object

Figure 6.20. The estimated radius for helix-2 under different noise situations.

6.5.2 Results and Evaluation on Real Data

Test Data

30 blood vessel subvolumes from MRA brain scan have been processed. Some of the
Subvolumes are visualized in Figure 6.25 using the Maximum Intensity Projection
(MIP) technique. Lacking ground truth for the data in this category, system perfor-

Inance is evaluated visually based on the results from different stages of processing.

Pel‘formance of Initial Stage

The noisy nature of MR volumes is a well-known problem and results obtained from
the initial stage of processing are often not reliable. This can be seen in Figure 6.26.
All the images shown at the left in this ﬁgure are the MIP of the original 3D volumes.
The second column is the initial global segmentation results in which each voxel is an

1 1115'631‘ior point token. The third column shows all the initial boundary points of the four

222

 

 

   
  

   

 

 

 

 

 

 

 

 

 

lmpactotNoiseontlrFigmeochnt lmpactolNoiseontheﬁgumolMetit
_ I l l T f l l l l l l _ 5 I l l l l l l l l l
:_ - 11:0 — - l4 - 0:0 - -
- : 0:10 ----: _ n=lO .
i _ ll=l5 ..... - 1’3 n=l5 .....
E .1 : n=20-: :5 1.2 r n: — -
g 3 - - z 1.1 - -
a- - " ‘ a.
o - o _ .
E I “s: E ’.~"‘\'2“-7'1."\‘. ' -
Q 5‘ N - .'-,_ . .
if. 3 '7'. ix”. . - 'V' - ..
.1 3 0.7) -
I} < 0.6 - 4
‘ l l l l l l l l l l l -‘ 05 l l I l l l 1 l l l
051015202530354045505560 05101520253035404550
lzngth along object langth along objcct

(a) (b)

Figure 6.21. Sensitivity of the ﬁgure of merit to noise (n). (a) helix-2, (b) helix-6.

input volumes. No set of initial classiﬁcation results is satisfactory. Nevertheless, they
provide partial evidence. For example, both the segmentation and boundary detection
results from sub21 represent fairly good classiﬁcation. Some type II errors (misclassify
background points as interior points) are obvious in Figure 6.26(k). In the boundary

detection result (Figure 626(1)) , some type I errors are made (classify boundary

 

Figure 6.22. Experimental results on a helix. (a) the helix without noise, (b) the
estimated helix axis from (a), (c) the helix with noise degree n = 20.0, (d) the
estimated helix axis from (c).

223

 

 

 

 

 

 

 

 

 

 

Impact ofNoiseonthertimatcd Radius Impact 01 Noiscoutlieﬁgute of MentandEstimated Radius

55 1 1 1 1 1 1 1 1 55 1 1 1 1 1 1 1 1
5- 0101deth Radius —* .a 5- GroundTruthRndius — -
4 5 . Estimated Radius --- .. a 45 . Estimated Radius ---- ..

FigmcofMerit -----

g 41- -t B 41- -
I t - r- - d "1 t " 1' I" -~ n 1 -
3 33‘": 11.11 11 1'11: . .5. ”31:11 111701. 1.11.114 7
g 25 r - g: 15 - -
g 2 .. . g 2. .
m 1.5 - '1 “5 15 - -
1 - a g 1 t- ----- --:
0.5 - a a; 05 - -

0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1
0102030405060708090 0102030405060708090

lxngth along object Length along object

(a) (b)

Figure 6.23. Estimated local tube radii with n = 0 along (a) helix-2, (b) helix-6.

points as non-boundary points). As we will see later, these missing boundary points
will eventually cause the loss of the entire thin object seen clearly in the top half of
the volume.

Another demonstration of reasonable results from initial processing is from the
third set of results from input sub5. Even though the initial segmentation is very
noisy, the initial boundary detection provides good evidence for object surface. To
see more clearly, the initially detected object surface for sub5 is displayed from four

different viewpoints and rendered as range images in Figure 6.27. Later we will see

that the initial surface detection on sub5 yields good seeds, one of which (the best

one) activates a global sweeping that recovers the entire object.

Performance of Automatic Seeding

To examine its effectiveness, the results from automatic seeding stage are evaluated in

terms of two aspects. First, the generated seeds should be located on objects instead

224

Impact of Object Curvature on Tube Length
r 1 l 1 1 1

Tube length on helix-l —-—
Tube length on helix-6 ----

 

 

lllllllllllll

 

 

 

Estimated Tube Length
0 '-‘ NUIU) #UIUIUIQ \) “MO

0 5 10 15 20 25 30
Length along object

O.)
M

Figure 6.24. The estimated tube lengths for synthetic objects helix-1 and helix-6.

of in background noise. Secondly, there should be as few as possible false alarms such
that when there is no object, no seed should be generated even when input data is
noisy.

Figure 6.28 displays some of the seedng results. The seeds detected are repre-
sented by the dark dots superimposed on the original volumes. The dark dots are
the center points c’s of the cross sections that the seeds represent. Ideally, the super-
imposed seeds should be on the axis of an object. From these three sets of seeding
results, we can see that most seeds are located fairly precisely near objects even
though there are sometimes many spurious boundary points detected. This is due to
the use of model-based seeding process which relies on the presence of both distinct
Surface features and the circular group of boundary points. Due to noise, the location
Of the seeds may not be precise. There are two possibilities. One is because of the

Size of the underlying objects. When an object is very thin, due to quantization, one

 

(01)

Figure 6.25. Volumetric blood vessel data displayed using MIP, (a) subvolume sub2,
(b) subvolume sub14, (c) subvolume sub16, (d) subvolume sub33.

position off the real axis may seem to be a big deviation. For example, some objects
in these volumes have radius about 1 to 2 voxels. In that case, it is impossible to
center the seeds. However, there are indeed some bad seeds that fall either on the
boundary of objects or fall in the background region, i.e., false alarms. For these
seeds, it is expected that they will be invalidated at the stage of local recognition
because of the explicit use of both object geometric model and the sensing model
(this will be discussed in the next section).

Because seeds are generated based on the distinct features of cylindrical surfaces,

they are generally located on essentially straight segments of objects. One example is

226

shown in Figure 628(1) where seeds are not found near the relatively curved segment

of the object even though there is good evidence.

Figure 6.30 gives several examples that illustrate the effectiveness of the seeding
process when there should be no seed detected even though many spurious surface
points are present. Figure 6.31 shows other two examples in which seeds are generated
only at positions near objects even though the initially detected surface points are
noisy. These two sets of seeding results demonstrate the effectiveness of the model-
based seeding approach we adopted.

Generating seeds from only boundary points causes some problem. Whenever
there is no boundary point, there is no seed to be found. This can be seen in Fig-
ure 6.29. Even though some boundary points are missing, the initial segmentation
actually also provides good evidence of object boundaries. Since the initial segmen-
tation result is not participating in the seeding process, seeds can not be recovered
near those regions. As a consequence, no object will be recovered which will be seen

in the section where the effectiveness of global recognition is discussed.

Performance of Local Recognition

Given a set of seeds (hypotheses), the task of local recognition is to verify and optimize
the seeds. Each seed may either be veriﬁed or be invalidated. A veriﬁed seed may
or may not activate a global sweeping because a seed may become unnecessary once
the object on which the tube that the seed represents has been recovered from a
sweeping activated by another seed. Therefore, a seed can be in one of the following
four status: valid, invalid, used (to activate a global sweeping), or unused. False
alarm seeds should be classiﬁed as invalid seeds at this stage because of the use of

the blood ﬂow model.

In order to recover objects as reliably as possible, seeds (or local tubes) are ranked

227

by their optimized ﬁgure of merit p“ and global sweepings are activated in the de—
creasing order of p'. A valid seed is one that has p“ Z 17. Figure 6.32 shows the
results of local recognition for input volume subl4. Figure 6.32(a) visualizes the set
of seeds that are detected from the previous stage (dark dots superimposed on the
original volume). Figure 6.32(b) shows the set of valid seeds as bright dots, ranked
by their p*’s (the brighter a dot is, the higher its p“ is). The ranks of the seeds in
Figure 6.32(b) agree with what we can see from the input volume. A better seed
position (on the object axis) yield higher rank. The more the surrounding data of a
seed looks like a vessel, the higher the p“ is. The set of invalidated seeds are shown
in Figure 6.32(c) as superimposed dark dots. It can be seen that the seeds that are
invalidated are either not positioned right or the surrounding data does not look like
a blood vessel. There is one seed centered at the middle of a long thin vessel. It gets
invalidated because its ,0" is just below 1]. Lastly, Figure 6.32(d) gives the set of seeds
that are actually used in initiating global sweeping operations. Visually, they seem
all to be good seeds. The unused seeds are the ones that are seen in Figure 6.32(b)
but not in Figure 6.32(d).

A similar set of results for input volume sub34 is shown in Figure 6.33. From this
set of results, we can see that some objects (the big one in Figure 6.33(a)) can be

completely recovered from one good seed (the recovered object axis is given in the

next section).

Performance of Global Recognition

The performance of the system at the stage of global recognition is evaluated based
on the quality of recovered objects, including how many objects are recovered and the
accuracy of the recovered axes. Some of the experimental results from blood vessel

VOlumes are shown in Figure 6.34 to Figure 6.36. It can be seen that the stochastic

228

model allows the ﬂexibility for each seed to grow in space and the optimization scheme
developed in Chapter 4 makes the estimated object axes follow the correct object
trajectories. Although the blood vessels in these volumes have varying shapes and
sizes, the sweeping is effective in all cases as far as there is a seed to start with. The
smallest radius that are detected so far is one voxel. Some vessels in Figure 6.34(d)
and (g), Figure 6.35(d), Figure 6.36(a) and (c) have segments of such small size.
They are all recovered. Some vessel segments have very low contrast and surrounded
by noisy data. One example is the left branch of vessel in Figure 6.34(g). Because
of the correctly located seeds, it is recovered. The estimated axis for volume sub5
(shown in Figure 6.34(j)) is superimposed on the original data which visually shows
how accurate the estimated axis is.

The right column of the ﬁrst three sets of results in Figure 6.34 illustrate that sim-
ple branch points between blood vessels can be correctly identiﬁed. Axes for different
vessels are visualized using different intensities. In tubular object token representa-
tion, the relational feature Min" describes such detected intersection relationships.
Assume we have recovered two tubular objects that are represented as tokens Tf’
and T 2", respectively. If they intersect at position x, this spatial relationship will
be recorded in their relational feature sets as follows. For token Tf', the relational

feature Mm" is:

R1121 : {Mintr, (2,CI,X)},

Ineaning that token T,“ has a intersection relationship with token T? at position x.

Similarly, for token T 2“ , its relational feature set contains:

123': {Mintr,(1,el,x)}.

229

With the explicitly intersection relationship description, other types of spatial rela-

tionship analysis among vessels may also be performed.

There are some cases that reveal the weakness of our current seeding approach
and show its effect on the overall object recovery. Figure 6.35 gives two such cases.
The second column in this ﬁgure shows the seeding results from the two volumes in
(a) and (d) of the same ﬁgure. As we pointed out in evaluating the effectiveness of
the seeding method, lacking of seeds on some vessels is because there is no surface
points detected from these vessels (see Figure 6.29). Therefore, even though the initial
segmentation results from these volumes actually provide better evidence, no seed is
generated because a seed has to be constructed from surface points. Without any
seed, the global recovery completely missed underlying objects. The thin vessel that
is on the top of a large vessel in Figure 6.29(a) did not get recovered which can be
seen from Figure 6.29(c). One possible reason for not being able to detect the needed
surface points on this small object can be the magnitude threshold mt used. It is
computed as

mtzum+2-am,

where um and cm are estimated from a unimodal Gaussian ﬁtting. When there is a
big vessel present (as in the volume in Figure 6.29(a)), such computed threshold will
likely cut off small objects. Similar situation occurred to the right branch vessel in
Figure 6.29(d). The reason for not being able to detect the surface points for this
object is likely due to the low contrast. More sophisticated integration is needed in

order to make the system more robust in such situations.

Figure 6.36(a) and (b) illustrate that the sweeping history can constrain the pos-
sible extension of object. The small object on the right of the big object has a sharp
turn near the edge of the volume. The sweeping operation stops there and reported

reason for such a stop is “due to a large curvature”. The recovered object axes for this

230

volume shows another point. When two objects intersect, if one object is much bigger
than the other, it is almost impossible to recover the intersection relation through
sweeping. The sweeping along the small object will never be able to get into the re-
gion where the big object is because the veriﬁcation will fail. Therefore, the two axes
are apart, hence no intersection can be identiﬁed. To detect this type of intersections,
more analysis is necessary.

Figure 6.36(c) and (d) gives another situation in which the tube model-based
veriﬁcation may lead to wrong recognition. On the top right corner of input volume
sub33 (shown in Figure 6.36(c)), there are two separate vessels. Because the gap
between the two objects is very small and there is a seed generated near that region,
the system sweep through the gap and connect the two objects. Therefore, even
though many objects are recovered in our test volumes, the results should still be
considered as a rough segmentation of instances of vessels.

Table 6.11 gives a summarized subjective evaluation on the system performance
on 30 real volumes. The evaluation is with respect to the quality of the input volumes
and the system performance on these volumes. The system performance is assessed
with respect to the results from initial surface detection, automatic seeding, local
recognition, global recognition, and intersection relationship detection. The scale of
the evaluation is “very good”, “good”, “fairly good”, “not very good”, “not good”,
“bad”, and “undecided”. The reason we include “undecided” is that, for some vol-
umes, it is impossible even for human eye to make a judgement about blood vessels
and their spatial relationships. As it can be seen from Table 6.11, a number of input
Volumes do not have good quality and most of it is due to the noise present in the
input. When the processing progresses, more and more results show a trend towards

promising recognition. This can be seen from the shift of bigger numbers to the left
Of the table, especially at automatic seeding and local recognition stages. The sum-

Iilarized information about intersection detection only includes thosed of the volumes

231

from which the positions of the intersection points can be roughly identiﬁed by the
human eye via some visualization tools. Due to the complex structure of blood vessels
in many test volumes, it is impossible to even roughly determine their ground truth.
Therefore, the detection results from those volumes are not evaluated here. Some of

the intersection detection results are visualized previously.

6.5.3 3D Visualization

We choose the blood vessel volume sub5 to illustrate the new proposed visualization
method. Due to space limits, Figure 6.38 displays only 25 of the cross sections on
the segment of the object marked in Figure 6.37. Nine of these 25 cross sections are
further displayed in the form of 3D plots for visualizing their real blood ﬂow patterns.
The results are shown in Figure 6.39. For each of the nine cross sections, r.- + 2.0
is used in order for the viewer to be able to visually judge the location of the blood
vessel boundaries. It is clear that the detected radii are very close to what the data
reveals. Such 3D visualization of blood ﬂow can be helpful in diagnostic purposes.
For example, the cross section plotted in Figure 6.39(h) has a blood ﬂow pattern
that seems to deviate from an expected ﬂow pattern which, if extended, may signal
some abnormality. A radiologist can view this “cross section movie” along with the

subvolume from which the vessel is extracted in order to make diagnoses in cases of

aneurysm or stenosis.

6.6 Summary

A model-based recognition scheme is used to recognize tubular objects. The 4-stage
recognition system is developed under the framework of hierarchical token grouping.
A set of grouping agents perform various vision tasks at different levels of abstraction.

This recognition system is tested on synthetic and real data. Experimental results are

232

 

8838838

.8288.“er—

 

3

85338."

Eco—U

 

a“

cosmnwonxz

88.8

 

NH

wameoom

osﬂboﬂé

 

n

n

A:

o

83.5w
HE:—

 

m

o

h

o

QED

 

 

“8286285.

 

8.3

 

eoow

80:

 

teem

.38, Ho:

 

teem

Dame.“

 

voow

 

voow

50>

 

momma

 

.8838 82 on no voceﬁeotom 83$? 2: mo dosage: Roseanne—em .26 @388

 

233

presented and assessed in this chapter using the proposed three evaluation schemes.

The explicit integration of the models that characterize different aspects of the
tubular objects yields both more reliable and more precise recognition results than
what is achievable using a single technique. The incremental reconstruction paradigm
allows a continuous reﬁnement of recognition and it has so far produced improved
recognition results. By using a parametric form of the geometric model, when objects
are recognized they are simultaneously quantiﬁed. An attractive visualization method
can be realized based on the recognition result which may beneﬁt clinical diagnosis.
The experimental results have demonstrated the effectiveness of both the recognition
System developed in Chapter 5 under the framework of hierarchical token grouping

and the model-based recognition strategy proposed in Chapter 4.

234

 

Figure 6.26. Initial classiﬁcation results from four subvolumes (sub30, sub26, sub5,
aIld sub21). The ﬁrst column ((a),(d),(g),(j)) shows the original volumes. The
Second column ((b),(e),(h),(k)) shows the initial segmentation. The third column
((c),(f),(i),(l)) displays all the initially detected surface points using MIP technique.

235

 

Figure 6.27. Result of initially detected surface points from subvolume sub5 (rendered
as range images), viewed from front, left, back, and right, respectively.

236

 

Figure 6.28. Seeding results for volumes sub16 ((a)-(c)), sub7 ((d)-(f)), sub29 ((g)-
(i)), and sub5 ((j)-(l)). The ﬁrst column: original input volumes. The second column:
initially detected surface points. The third column: seeds detected and superimposed
on input.

237

 

Figure 6.29. Seeding results for volumes sub14 ((a)-(d)) and sub21 ((e)-(i)). The
ﬁrst column: original input volumes. The second column: initially detected surface
points. The third column: initially detected interior points. The fourth column:
seeds detected and superimposed on input.

 

Figure 6.30. Effectiveness of the model-based automatic seeding approach. For all
three input volumes, even with very noisy surface detection, there is no seed detected.
The ﬁrst row: original input volumes. The second row: initially detected surface
points.

238

 

(’0

Figure 6.31. Effectiveness of the model-based automatic seeding. The initial surface
detection results ((b) and (f) for input volumes sub17 and subl3 ((a) and (e)) are
Very noisy. Seeds are generated only near the objects in these volumes ((c) and (g)).

239

 

Figure 6.32. Effectiveness of the model-based veriﬁcation of seeds from data sub14
(a) the detected seeds, (b) the seeds that are veriﬁed, the brighter the seed, the higher
the conﬁdence in the seed, (c) the seeds that are invalidated using the tube model,
((1) the seeds that actually activate global sweepings.

240

 

Figure 6.33. Effectiveness of the model-based veriﬁcation of seeds from data sub34
(a) the detected seeds, (b) the seeds that are veriﬁed, the brighter the seed, the higher
the conﬁdence in the seed, (c) the seeds that are invalidated using the tube model,
(d) the seeds that actually activate global sweepings.

241

 

Figure 6.34. Sweeping results for volumes sub26 ((a)-(c)), sub7 ((d)-(f)), sub29 ((g)-
(i)), and sub5 ( (j )—(l)). The ﬁrst column: original input volumes. The second column:
seeds detected. The third column: recovered object axes.

242

 

Figure 6.35. Sweeping results for volumes sub21 ((a)-(c)) and sub14 ((d)-(f)). The
ﬁI‘St column: original input volumes. The second column: seeds detected. The third
C01umn: recovered object axes.

243

 

(C)

 

Figure 6.36. Sweeping results for volumes sub34 and sub33; (a),(c): original input
volumes; (b),(d): recovered object axes.

 

Figure 6.37. A real volume with the portion of visualized segment marked.

244

 

 

 

 

 

Figure 6.38. Twenty ﬁve visualized cross sections (frames of a “movie”).

245

 

(9) (h) (2‘)

Figure 6.39. 3D visualization of nine of the cross sections, starting from the 14th in
the previous ﬁgure.

CHAPTER 7

Conclusions, Contributions, and

Future Work

Our study makes contributions to two different yet related areas of computer vision
research. The ﬁrst area is the study of vision problem solving formalisms. The contri-
bution is a homogeneous problem solving architecture for vision, called Hierarchical
Token Grouping. The second area is the study of integrated model-based object
recognition. Our contributions in this area are (1) a parameterized Generalized
Stochastic 'Ihbe Model that describes the geometric shape of the class of tubular
objects, (2) an integrated, model-based, and constructive approach for recog-
nizing tubular objects, and (3) a visualization method that can produce a “movie” of
the cross sections of a recognized 3D object. The work advances the capability of ex-
tracting and quantifying plant roots from images taken underground and blood vessels
from magnetic resonance images. The contributions to these two areas are linked by
applying the proposed formalism of Hierarchical Token Grouping to the development
of an integrated model-based vision system for recognizing tubular objects.

In the following sections, research conclusions are presented ﬁrst. Then the con-

tributions are discussed.

246

247

7 .1 Conclusions

7 .1.1 Hierarchical Token Grouping

Both modularity and integration are indispensable aspects in vision problem solving.
Identiﬁcation of individual modules in the human visual system has led to the isolated
study of heterogeneous modules, producing a bag of tools for a bag of problems. How-
ever, this has not resulted in an adequate overall solution to computer vision. The
issues of cooperation and competition among individual modules are crucial to build-
ing computer systems capable of vision. This is directly related to an important topic
in computer vision research: integration formalism. A major obstacle in achieving
integrated vision system behavior is the heterogeneity in modules, in data repre-
sentations, and in knowledge. Consistent and systematic formalisms for integrating
complex machine vision systems need to be established.

Motivated by the above observations, we asked following questions:

0 What homogeneous characteristics exist?
0 What do they offer and why are they useful?

0 How do we utilize them in the context of integration?

The Hierarchical Token Grouping resulted from our attempt to answer these ques-
tions.

First, we eXplored the homogeneous characteristics behind seemingly hetero-
geneous machine vision techniques. By identifying grouping to be the only operation
performed in solving the two computational problems of vision, the process of vision
problem solving can be viewed as the process of continuous grouping operations.
Such a perspective establishes the foundation for a homogeneous architecture for
vision problem solving because heterogeneous individual modules are ultimately con-

sidered as homogeneous operational units that systematically perform aggregation

248

operations.

The heterogeneity in representation is not inherent. In the context of hierarchi-
cal vision problem solving, the functional role of data representations is to provide
the interface among individual modules and to bridge the processes that are below
or above in the hierarchy. Heterogeneous representations at different levels hinder
ﬂexible interactions among modules. For vision problem solving, a homogeneous rep-
resentation scheme is not only possible but also beneﬁcial. We proposed the token
representation for the perceptual data at any level of abstraction. This generally
designed representation offers a consistent interface for the interacting modules.

The heterogeneity in modules is due to the inherent heterogeneous nature of dif-
ferent vision tasks. Grouping operations are governed by rules or grouping criteria.
Ultimately, grouping criteria determine the behavior or the functionality of grouping
processes. Due to the inherent heterogeneity among different visual tasks, grouping
criteria need also to be heterogeneous. But, grouping criteria affect only the internal
behavior of grouping processes. In order to identify a general syntax for specify-
ing ﬂexible grouping criteria, we employed the more abstract grouping principle of
homomorphism. The concept of homomorphism has been used in the literature to
deﬁne structural similarity[41, 90]. In this thesis, we extended this concept so that
both structural and non-structural similarity can be described. Such a generalization
of grouping principles makes it possible to use a uniﬁed speciﬁcation language to
instantiate ﬂexible grouping criteria.

Using formally deﬁned data representation and operational principles, a grouping
agent can also be formally speciﬁed by its domain, its internal behavior, and its range.
From such formal speciﬁcations for grouping agents, both interaction models and
data ﬂow models can be extracted.

Utilizing the homogeneity in operation, in representation, and in grouping prin-

ciple, a homogeneous problem solving architecture for vision, called Hierarchical

249

Token Grouping, was presented. In this architecture, both the role of knowledge
and the role of sensor are made explicit. Organized grouping agents systematically
perform grOupings that naturally impose a hierarchical structure on the token rep-
resentation. They act both individually and cooperatively through interacting with
each other. Information can ﬂow bottom-up, top-down, and across.

The proposed architecture has several desirable properties. It lends itself to a
constructive paradigm. It is homogeneous and it possesses systematic be-
havior. Since each grouping agent combines data structure (token) with behavior
(grouping), it is a conceptually encapsulated object. Therefore, the architecture is
object-oriented. The token representation provides a consistent interface that
facilitates the interaction and integration among different modules. The general syn-
tax for deﬁning grouping criteria offers a cohesive and concise way to specify
interaction models and data ﬂow models. Since grouping agents are activated
whenever domain data is ready, the proposed architecture supports both oppor-
tunistic problem solving and distributed computing. From a practical point of
view, Hierarchical Token Grouping presents a formal method to build complex
vision systems. Since grouping agents are encapsulated, they can be used to assem-
ble different vision systems. Such software reusability leads to quick prototyping.
Since interaction models and data ﬂow models can be extracted from the concise
speciﬁcations for grouping agents, it is conceivable that large parts of the process
of developing vision systems can be automated. From a theoretical point of View,
because a grouping agent can be modeled as an algebraic system, an entire vision
system built from a hierarchy of grouping agents can also be modeled as an (big-
ger) algebraic system. Therefore, the process of vision problem solving is formally
modeled, which makes it possible to utilize many existing mathematical formalisms

to describe dynamic behavior of vision systems.

250

We demonstrated that (1) token representation generalizes a wide range of rep-
resentation schemes, (2) token grouping abstracts most recognition methodologies,
(3) the grouping principle of homomorphism uniﬁes organizational principles across
levels of abstraction, and (4) the framework of Hierarchical Token Grouping has the
potential to be used as a general paradigm for vision problem solving.

While the proposed architecture offers a more systematic and more formal way of
constructing vision systems, there are serious issues in this proposed architecture that
have not been attacked in this thesis. One issue concerns the trade-off in computation
and space complexity between using a homogeneous representation scheme versus
using heterogeneous representations at different levels. While a homogeneous interface
beneﬁts the interaction among modules and the data ﬂow control within the system,
it may not be most appropriate for certain types of search strategy adopted within
some grouping agents. Yet another issue is the possibility of quantitative analysis of
the effectiveness of the proposed architecture. Further research is needed to study

these issues. We further address such needs below.

7 .1.2 Integrated Model-Based Object Recognition

At the initial stage of this research, our interest was to extract plant roots from
images taken underground. To solve this practical computer vision problem, a model-
based approach was taken with the intention of deriving a generalized approach for
recognizing the class of tubular objects so that different application domains can also
beneﬁt from it. The development of such a generalized model-based approach enables
us to transfer the same technique to the tubular objects from different application
domains such as plant roots, blood vessels, bacteria, and wires.

In our model-based approach, parameterized models were developed for character-
izing different aspects of the objects and integrated at multiple levels of abstraction.

The distinct shape of tubular objects was explicitly modeled. Realizing that the

251

difficulty in using conventional Generalized Cylinders for recognition is due to its
qualitative deﬁnition, we made this powerful representation scheme mathematically
concrete. A parameterized GC-based model, called the Generalized Stochastic
'Ihbe Model, was proposed to capture both the salient and the dynamic geometric
shape properties of a special class of GC. With this model, the recognition problem
was rendered as an object parameter estimation problem.

Acknowledging the indispensable role of the sensor in recognition, we established
sensor models for different imaging conditions. Speciﬁcally, both photometric imag-
ing and Magnetic Resonance imaging conditions were explicitly modeled. These mod-
els predict the expected conﬁguration of sensor measurements within object regions.
With a speciﬁed pose estimated by utilizing the geometric model of the object, the
sensor models render the recognition problem as a matching problem between an ex-
pected conﬁguration and the observed conﬁguration of sensor measurements. Such
an integration of geometric model with sensing models is realized under the
framework of matched ﬁlters.

Based on the models developed, an automatic multi-level recognition strat-
egy was designed which exploits and integrates the power of the models at different
levels of abstraction. Shape information is combined with the sensed information,
yielding the potential for a recovery of objects that is both more reliable and more
precise than what is achievable using a single feature. The incremental recon-
struction paradigm allows a continuous reﬁnement of recognition. Because of the
parametric form of the geometric model, when objects are recognized they are simul-
taneously described and quantiﬁed. Region-based and boundary-based recognition
techniques are integrated in order to achieve reliable performance under difﬁcult noise
conditions. An visualization method for 3D tubular objects is presented that al-
lows a sweeping along an object trajectory to be visualized. This has strong potential

for clinical diagnosis.

252

The proposed recognition system was developed under the paradigm of Hierar-
chical Token Grouping. We demonstrated how a computer vision problem could
be posed as a token grouping problem and how a vision system could be built under
this paradigm. To assess the system, three performance evaluation schemes were
proposed that are applicable to the vision systems with different recognition tasks.

The implemented system was tested and evaluated using both synthetic and
real data. Real data from different application domains was used that included 2D
plant root images, wire images, bacteria images, and 3D blood vessels from human
MR brain scans. In the 2D case, a total of 33 synthetic and 20 real images were
tested. In the 3D case, a total of 29 synthetic and 30 real volumes were processed.
System performance was evaluated by applying the three evaluation schemes to the
test data and the robustness of the system was assessed based on the performance
degradation under different controlled noise situations.

System performance was robust when evaluated using synthetic data with different
levels of white noise. Recognition results with both 2D and 3D real data showed
that the Generalized Stochastic Tube Model is appropriate and applicable to the
object domain we dealt with. The sensor model for both the photometric sensing
and the MRA sensing conditions worked properly. The model we developed for the
MRA sensing condition is more effective than the step function model used by other
researchers for recognizing blood vessels. The model-based automatic seeding method
is effective in terms of both false alarm invalidation and accurate seeding positions.
With noisy data such as MR volumes, having a reliable automatic seeding procedure
is crucial. Some problems in seeding remain and they are mainly caused by the poor
initial segmentation results. Further improvement is needed. The system is capable of
tracking objects with different shapes and sizes. For blood vessel data, the system can
identify vessels in MRA without human intervention and can present visual displays

of their cross sections while traversing them. The recognition results are encouraging

253

and are shown to enable useful visualization of recognized objects. Experimental
results have demonstrated the effectiveness of both the proposed recognition strategy
and its hierarchical token grouping implementation. Analyzed structure is still short

of what is seen by the trained human eye, however.

7 .2 Contributions

The contributions this thesis makes to each of the two areas of computer vision

research are emphasized in separate sections below.

7 .2.1 A Formalism for Vision Problem Solving

The contributions of this thesis toward vision problem solving result from our research

effort in answering normative questions about vision.

0 We explored, using the central theme of grouping, the homogeneous charac-
teristics behind seemingly heterogeneous machine vision techniques. Specif-
ically, with our philosophical view toward vision problem solving, we identi-
ﬁed the homogeneity in operation, in representation, and in grouping princi—

ples. This established the foundation of a homogeneous vision problem solving

paradigm.

0 We exploited such identiﬁed homogeneity and proposed a homogeneous prob-
lem solving architecture for vision, called Hierarchical Token Grouping.
Through this proposal, we established a formalism for developing complex
computer vision systems. We showed that this architecture possesses a set of
desirable properties that impact both the practical and theoretical potentials of

the architecture.

254

0 We claim that the proposed architecture generalizes a wide range of com-
putation vision techniques in representation, recognition methodologies, and
interaction frameworks. Therefore, it has the potential of being used as a

general paradigm for vision problem solving.

0 We predict that, using the proposed architecture, computer vision systems can
be developed more efﬁciently and systematically because a large portion of the
system implementation can be conceivably automated due to the homogeneous

structure of the framework and the formal method of specifying vision systems.

0 We demonstrated that the proposed vision problem solving architecture can be

applied to real-world problems.

7 .2.2 Model-Based Tubular Object Recognition Via Hier-

archical Token Grouping

The contributions of this thesis toward model-based object recognition methodologies

are listed below.

0 We developed a Generalized Stochastic 'Dnbe Model that describes both
the salient and dynamic shape properties of tubular objects. This model can
be used for a wide range of recognition problems because it is a parameterized

representation for a class of Generalized Cylinders.

0 We acknowledged the role of sensor in recognition and modeled sensing condi-

tions within our research context.

0 We integrated the models that describe different aspects of objects using the

framework of matched filters.

0 We proposed a new visualization method that may beneﬁt a number of vision

application domains.

255

0 We proposed three quantitative methods to evaluate computer vision system

performance.

0 We designed an automatic multi-level information recovery strategy for tubular

object recognition.

0 We posed the problem of recognizing tubular objects as a hierarchical token
grouping problem and developed a vision system that performs the task using

a hierarchy of grouping agents.

a We tested the system on 33 synthetic and 20 real 2D images as well as 29 syn-
thetic and 30 real BD volumes, including 2D bacteria, wires, and plant roots as
well as 3D blood vessels. The system performance and robustness was quanti-
tatively evaluated. The system built for recognizing tubular objects was shown
to be capable of automatically localizing instances of tubular objects in noisy
images and correctly tracking the objects in BD space. The estimated object pa-
rameters were accurate and the system performance degraded gracefully when

noise is increased.

7 .3 Future Research

7 .3.1 Formalisms in Vision Problem Solving

In the future, we would like to further exploit the potential of the HTG architecture
presented in Chapter 2. Theoretically, formal analysis is needed to better understand
the full potential of the HTG, including various trade-offs between a homogeneous
design and a heterogeneous design of vision systems.

Practically, an environment can be built that facilitates vision system development
under the HTG architecture. This environment should include interactive tools, in—

cluding a speciﬁcation tool that supports efforts by system designers to deﬁne and

256

specify related system components, an architecture generator tool that accepts a user’s
speciﬁcation as input and produces a HTG architecture deﬁned by the speciﬁcation,
including all of its communication channels, a visualization tool that graphically dia-
grams the interaction models, data ﬂow models, and vision system simulation model
such as Petri nets, a behavior modeling tool that describes the dynamic information
ﬂow within the architecture.

Another objective will be to develop a collection of grouping agents that perform
a set of generic vision tasks and to reuse these standard agents to assemble different

vision systems.

7 .3.2 Model-Based Object Recognition

While the system developed in this thesis produced fairly robust recognition perfor-
mance, some problems remain. For example, because the automatic seeding procedure
relies on initially detected surfaces, poor surface detection may result in missing seeds
which can have serious impact on recognition results. There are several possible ways
of improving the seeding process. One is to integrate the region information with
the surface information during the seeding process. Another possible strategy is to
iteratively perform recognition. Before each iteration, recognized objects are removed
from the scene and surrounding regions of the removed objects are processed in the
next iteration. The effectiveness of such possibilities need to be explored and tested.

The current recognition results provide a rough separation of objects from back-
ground. To precisely describe the shape of the recognized objects, further processing
is needed. In future research we will study the use of deformable models to accurately
describe the deviations of objects from the tube model in order to be able to represent

abnormalities such as aneurysms and stenoses.

257

Another issue that needs to be addressed in future research is analyzing the spa-
tial relationships among different objects. The current approach to detecting in-
tersections among objects during sweeping may be inadequate for some situations.
Methods which analyze more than one type of spatial relationship based on the axial
representation of objects are possible and need to be developed and tested.

Finally, we like to parameterize more ﬂexible GC-based models, including the
geons. Combining with some common sensor models such as photometric sensing or
range sensing, integrated model-based methods can be employed to recognize a generic
set of geons through estimating object parameters from input images. The long
term goal is to realize a Recognition-By-Component (RBC) recognition scheme via
Hierarchical Token Grouping. Individual modules that are responsible for recognizing
the geons can be developed as a set of grouping agents under the HTG paradigm which
can be reused to assemble many different recognition systems. Geons recognized by
the grouping agents can be used to assemble many different objects or to be further

grouped to form bigger objects under the HTG paradigm to realize the RBC theory.

APPENDICES

APPENDIX A

Distance Transformation:

Chamfering Technique

The technique of Chamfering was proposed to perform efﬁcient distance transforma-
tion [7]. Using this technique, only two scans (raster scan and reverse raster scan) are
needed to compute a 2D distance map in which every discrete point is assigned a value
representing the minimum distance between this point and a given set of points. In
the boundary-based evaluation method proposed in Chapter 6, we need to assess the
match between the ground truth boundary QB represented as a set of discrete points
and the corresponding boundary B detected by the vision system to be evaluated.
Such a match is described by distance distribution index ’DB established based on a
distance map d(x,g3) where x is a discrete point: In order to efﬁciently compute
d(x, 93), we employed the original Chamfering technique for 2D case and extended

it to 3D case.

A.1 2D Chamfering

Assume both QB and B are deﬁned on a 2D grid {x} = {(x,y)}, 1 S a: S M,1 S

y S N. Derive d(x, 93) by performing a distance transformation using the following

258

259

Chamfering algorithm:

2D Chamfering Algorithm[7]

 

STEP 1: Initialization:
Set d(x,gB) to zero iff x 6 QB and inﬁnite iff x ¢ QB.

STEP 2: Forward pass: Vx = (22,31) 6 [(1,1),..,(M,N)], update d((:c,y),gB) in araster

scan by:

«(w/W) = mz‘n(d((x,y),93), d<<x — 1, y + 0,98) + «12
d((x.y+1),g‘9)+ d1,d((z +1,y+ 1),gB)+ d2,

(“(13 +1,y),g8)+ d1)

STEP 3: Backward pass: Vx = (55,31) 6 [(M,N),..,(1,1)], update d(($,y),gB) in a

reverse raster scan by:

d((x,y>.93) = min(d((x,y),g3), dam — 1. M”) + d,,
dam —1,y +1),93)+ d2,d((x,y — 0.9”) + 4,,

(1((3 +1a3/ ‘1),gBl‘l' d2)-

Here, d1 = 1.0 and d2 = ﬂ = 1.4142 approximate the distance to a 4-connected
neighboring point and to a diagonal neighboring point on a discretized grid, respec-

tively.

A.2 Extended 3D Chamfering

In order to compute a distance map in a 3D discrete space, we extended the 2D

Chamfering technique to 3D case. The goal of the extension is to conserve the merit

260

of only needing two scans of the original 2D Chamfering technique.

Extended 3D Chamfering Algorithm

 

Assume both 93 and B are deﬁned on a 3D grid {x} = {(x,y,z)}, 1 S a: S
M,1 S y S N,1 S 2 S D. Derive 3D distance map d(x,gB) by performing a

distance transformation using the following Chamfering algorithm:

STEP 1: Initialization:
Set d(x,g3) to zero iff x 6 GB and inﬁnite iff x ¢ 93.

STEP 2: Forward pass: ‘v’x = (:c,y,z) E [(1,l,1),..,(M,N,D)l, update d(($,y,z),gB)

in a raster scan by:

d(($,y,z),gB) = min(d((x,y,2),93),d((x - 1,y+ 1,2 + 0,0”) + d3,
d((:r -1,y+1,z),gB)+ d2,d((x — 1,y,z+ 1),gB)+ d2,
d((x- 1,31- 1,Z+1),QB)+ ds,d(($,y- 1.2+1),GB)+ d2,
d((:r,y,z+1),gB)+ d,,d((x,y+1,z+1),gB)+ d2,
d((:c,y+1,z),gB)+ d,,d((x+1,y,z+1),gB)+ d2,
d((:c+1,y+1,z+1),93)+d3,d((x+ 1,y,z),gB)+ 4,,

d((:c+1,y+ 1,2),g3)+d2,d((2:+ 1,y-—1,z+ 1),gB)+ d3).

STEP 3: Backward pass: Vx = (x,y,z) E [(M,N,D),..,(1,1,1)], update

d((:c, y, z),QB) in a reverse raster by:

d((:r,y,z),gB) = min(d((:c,y,z),gB),d((x — 1,y+ 1,2 — 1),g3) + d3,
d(($—1,y,Z—1),gB)+ d2ad(($ _ Iay— 132—1)ag3)+ d3?
d(($ —1,y,z),gB)+ d13d(($ —1ay- 1,2),g3) "l' d2:

d((:c,y +1.2 —1),gB)+ d2,d((z,y,z —1),gB)+ d1,

261

d((~r,y-1,2-1),QB)+d2,d((x,y+ 1,2),QB)+ dz,
d((:z:+1,y,z-1),QB)+d2,d((:r+1,y+1,2— 1)ygB)+d3a

d((a:+1,y—1,z— 1),g3)+ d3,d((a:+1,y— 1,2),g3)+ d2).

Here, d1 = 1.0, d; = x/2 = 1.4142, and d3 = \/3 = 1.7321 approximate the
distance to a 4-connected neighboring point, to a 2D diagonal neighboring point, and

to a 3D diagonal neighboring point on a discretized grid, respectively.

BIBLIOGRAPHY

BIBLIOGRAPHY

[1] N. Abuja and A. L. Abbott. Active stereo: Integrating dosparity, vergence,
focus, aperture, and calibration for surface estimation. IEEE' Transactions on
Pattern Analysis and Machine Intelligence, 15(10):1007—1029, October 1993.

[2] Narendra Ahuja and Mihran Tuceryan. Extraction of early perceptual structure
in dot patterns: Integrating region, boundary, and component gestalt. Com-
puter Vision, Graphics, and Image Processing, 48(3):304—356, December 1989.

[3] M. Atiquzzaman. Multiresolution hough transform - an efﬁcient method of

detecting patterns in images. IEEE Transactions on Pattern Analysis and Ma—
chine Intelligence, 14(11):1090—1095, November 1992.

[4] Dana H. Ballard and Christopher Brown. Computer Vision. Pretice Hall, 1982.

[5] H. G. Barrow and R. J. Popplestone. Relational descriptions in picture pro-
cessing. Medical Imaging, 6, 1971.

[6] H. G. Barrow and J. M. Tenenbaum. Recovering intrinsic scene characteristics
from images. In Computer Vision Systems, pages 3—26. Academic Press, 1978.
Edited by Allen R. Hanson and Edward M. Riseman.

[7] H. G. Barrow, J. M. Tenenbaum, R. C. Bolles, and H. C. Wolf. Parametric cor-
respondence and chamfer matching: Two new techniques for image matching.
In Proceedings of the 5th International Joint Conference on Artiﬁcial Intelli-
gence, page Vol. 2, 1977.

[8] Harry G. Barrow and J. M. Tenenbaum. Retrospective on “interpreting line
drawings as three-dimensional surfaces”. Artiﬁcial Intelligence, 59:71—80, 1993.

[9] Paul Besl. Machine Vision for Three-Dimensional Scenes, chapter The Free-
Form Surface Matching Problem, pages 25—71. Academic Press Inc., 1990.

[10] Paul J. Besl and Ramesh C. Jain. Three-dimensional object recognition. Com-
puter Surveys, 17(1):75-145, March 1985.

[11] Irving Biederman. Human image understanding: Recent research and a theory.

C VGIP, pages 29—73, 1985.

[12] Irving Biederman. Recognition-by-components: A theory of human image un-
derstanding. Psychological Review, 94(2):115—147, 1987.

262

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

263

T. O. Binford. Visual perception by computer. In IEEE Conference on Syetems
and Control, Miami, 1971.

Harry Blum. Symposium on Models for the Perception of Speech and Visual
Form. CambridgezMIT Press, 1964.

R. M. Bolle, A. Califano, and R. Kjeldsen. A complete and extendable approach
to visual recognition. IEEE Transactions on Pattern Analysis and Machine

Intelligence, 14(5):534-548, May 1992.

K. Bowyer, D. Eggert, J. Stewman, and L. Stark. Developing the aspect graph
representation for use in image understanding. In Proceedings of Image Under-
standing Workshop, pages 831-849, 1989.

Michael Brady. Preface-the changing shape of computer vision. Artiﬁcial In-
telligence, 17:1—15, 1981.

Michael Brady and Haruo Asada. Smoothed local symmetries and their im-
plementation. The International Journal of Robotics Research, 3(3):36—61, Fall
1984.

Rodney A. Brooks. Model-based three-dimensional interpretations of two-
dimensional images. IEEE Transactions on Pattern Analysis and Machine In-

telligence, 5(2):]40—150, March 1983.

Rodney Allen Brooks. Model-based Computer Vision. UMI Research Press,
1984.

John Canny. A computational approach to edge detection. IEEE Transactions
on Pattern Analysis and Machine Intelligence, PAMI-8(6):679—698, November
1986.

J.Y. Catros and D. Mischler. An artiﬁcial intelligence approach for medical
picture analysis. Pattern Recognition, 8:123—130, September 1988.

Subhasis Chaudhuri, Shankar Chatterjee, Norman Katz, Mark Nelson, and
Michael Goldbaum. Detection of blood vessels in retinal images using two-
dimensional matched ﬁlters. IEEE Trans. on Medical Imaging, 8(3):263—269,
September 1989.

S. Chen and H. Freeman. Computing characteristic views of quadratic surfaced
solids. In Proceedings of the 10th Int. Conf. Patt. Recogn., 1990.

Roland T. Chin and Charles R. Dyer. Model-based recognition in robot vision.
Computing Surveys, 18(1):67—108, March 1986.

Chen-Chau Chu and J. K. Aggarwal. Image interpretation using multiple sens-
ing modalities. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, 14(8):840—847, August 1992.

264

[27] Steven D. Cochran and Gerard Medioni. 3-d surface description from binocu-
lar stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence,

14(10):981—994, October 1992.

[28] Alan Doerr and Kenneth Levasseur. Applied Discrete Structures for Computer
Science. SRA Pergemon, 1988.

[29] J. Dolan and E. Riseman. Computing curvilinear structure by token-based
grouping. In Proc. IEEE C VPR, pages 264—270, 1992.

[30] J. Dolan and R. Weiss. Perceptual grouping of curved lines. In Proc. of DARPA
Image Understanding Workshop, pages 1135—1145, Palo Alto, CA, 1989.

[31] J. H. Duncan and T. Birkholzer. Reinforcement of linear structure using
parametrized labeling. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 14(5):502—515, May 1992.

[32] J. C. Ferguson, A. J. M. Smucker, and Qian Huang. Segmentation of roots from
their soil background in minirhizotron video images by adaptive thresholding

and ridge detection algorithms. In Proceedings, ASA-CSSA-SSSA 1991 Annual

Conference, Denver, November 1991.

[33] F. J. F ierens, Van Cleynenbreugel, P. Suetens, and A. Oosterlinck. Iconic repre-
sentation of visual data and models. Pattern Recognition, 12:781-792, December

1991.

[34] Bruce E. Flinchbaugh and B. Chandrasekaran. A theory of spatio-temporal
aggregation for vision. Artiﬁcial Intelligence, 17:387—407, 1981.

[35] Patrick J. Flynn. CAD-Based Computer Vision: Modeling and Recognition
Strategies. PhD thesis, Michigan State University, 1990.

[36] H. Freeman. Computer processing of line drawing images. Computing Surveys,

6(1):57—98, March 1974.

[37] Stuart A. Friedberg. Finding axes of skewed symmetry. Computer Vision,
Graphics, and Image Processing, 34:138—155, 1986.

[38] Z. Gigus and J. Malik. Computing the aspect graph for line drawings of poly-
hedral objects. In Proceedings of IEEE Conference on Computer Vision and
Pattern Recognition, pages 654—661, 1988.

[39] Ari David Gross. Shape from a symmetric universe. Technical Report CUCS-
065-90, Columbia University Department of Computer Science, Department of
Computer Science / Columbia University / New York City, New York 10027, 1990.

[40] Allen R. Hanson and Edward Riseman. Segmentation of natural scenes. In
Computer Vision Systems, pages 129—164. Academic Press, 1978. Edited by
Allen R. Hanson and Edward M. Riseman.

265

[41] Robert M. Haralick. Scene analysis, arrangements, and homomorphisms. In
Computer Vision Systems, pages 199—212. Academic Press, 1978. Edited by
Allen R. Hanson and Edward M. Riseman.

[42] Robert M. Haralick and Linda G. Shapiro. Computer and Robot Vision.
Addison-Wesley Publishing Company, 1992.

[43] Simon Haykin. Communication Systems. John Wiley and Sons, New York,
1978.

[44] Julian E. Hochberg. Effects of the gestalt revolution: The cornell symposium
on perception. Psychological Review, 64(2):73—84, 1957.

[45] Richard L. Hoffman. Object Recognition From Range Images. PhD thesis,
Michigan State University, 1986.

[46] Berthold Klaus Paul Horn. Robot Vision. The MIT Press, McGraw-Hill Book
Company, London, New York, 1986.

[47] Qian Huang and G. C. Stockman. Generalized tube model: 3d elongated object
recognition from 2d intensity images. In Proceedings of International Conference
on Computer Vision and Pattern Recognition, New York, USA, June 1993.

[48] Qian Huang and G. C. Stockman. Model-based elongated object recognition
using invariant surface features and matched ﬁlters. In Proceedings of SPIE:
Symposium on Mathematical Methods in Medical Imaging, San Diego, USA,
July 1993.

[49] D. H. Hubel. Eye and Brain. Scientific American Library, New York, 1988.

[50] C. L. Jackins and S. L. Tanimoto. Oct-trees and their use in representing 3-d
objects. Computer Graphics and Image Processing, 14(3):249—270, 1980.

[51] Anil K. Jain. Fundamentals of Digital Image Processing. Prentice Hall, 1989.

[52] Anil K. Jain and Richard Hoffman. Evidence—based recognition of 3-d objects.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(6):783—
802, November 1988.

[53] Ramesh C. Jain and Thomas O. Binford. Dialogue: Ignorance, myopia, and
naivete in computer vision systems. C VGIP: Image Understanding, 53(1):112—
117, January 1991.

[54] Laveen N. Kanal. On pattern, categories, and alternate realities. Pattern Recog-
nition Letters, 14:241—255, March 1993.

[55] Koichi Kitamura, Jonathan M. Tobis, and Jack Sklansky. Estimating 3d skele-
tons and transverse areas of coronary arteries from biplane angiograms. IEEE

Trans. on Medical Imaging, 7(3):173—187, 1988.

266

[56] Greg C. Lee. Reconstruction of Line Drawing Graphs From Fused Range and
Intensity Imagery. PhD thesis, Michigan State University, 1992.

[57] Martin D. Levine. Vision in Man and Machine. McGraw-Hill Book Company,
1985.

[58] David G Lowe. Perceptual Organization and Visual Recognition. Kluwer Aca-
demic Publishers, 1985.

[59] David G. Lowe. Organization of smooth image curves at multiple scales. Inter-
national Journal of Computer Vision, 3:119—130, 1989.

[60] Y. Lu and R. C. Jain. Reasoning about edges in scale space. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 14(4):450—468, April 1992.

[61] D. Marr and S. Ullman. Directional selectivity and its use in early visual
processing. Technical Report A.I. Memo No. 524, M.I.T. A.I.Lab.,, Cambridge,
MA, 1979.

[62] David Marr. Vision. McGraw-Hill Book Company, 1982.

[63] James D. McCafferty. Human and Machine Vision. Digital and Signal Process-
ing. Ellis Horwood, 1990.

[64] R. D. Merrill. Representations of contours and regions for efﬁcient computer
search. Communication ACM, 16(2):69—82, February 1974.

[65] M. Minsky. Logical versus analogical or symbolic versus connectionist or neat
ersus scruffy. Artiﬁcial Intelligence Magzine, 12(2):34-51, 1991.

[66] Olivier Monga, Nicholas Ayache, and Peter T. Sander. From voxel to intrinsic
surface features. Image and Vision Computing, 10(6):403—417, July/August
1992.

[67] Olivier Monga and Serge Benayoun. Using partial derivatives of 3d images to
extract typical surface features. Technical report, INRIA-ROCQUENCOURT,

France, F ebuary 1992.

[68] Ahmed M. Nazif and Martin D. Levine. Low level image segmentation: An
expert system. IEEE Trans. PAMI, 6(5):555-577, September 1984.

[69] R. Nevatia and T. O. Binford. Description and recognition of curved objects.
Artiﬁcial Intelligence, 8(1):77—98, 1977.

[70] Ramakant Nevatia and Thomas O. Binford. Description and recognition of
curved objects. Artiﬁcial Intelligence, 8:77—98, 1977.

[71] P. Parent and S. Zucker. Trace inference, curvature consistency, and curve

detection. IEEE Trans. PAMI, 11(8):823—839, 1989.

267

[72] Alex Pentland. Automatic extraction of deformable part models. International
Journal of Computer Vision, 4:107—126, 1990.

[73] J. Ponce, D. Chelberg, and W. B. Mann. Invariant properties of straight ho-
mogeneous generalized cylinders and their contours. IEEE Trans. on PAMI,
11(9):951—966, 1989.

[74] J. Ponce and D. J. Kriegman. Computing exact aspect graphs of curved objects:
Parametric surfaces. In Proceedings of the 8th Nat. Conf. Artiﬁcial Intell., pages
1074—1079, 1990.

[75] John M. Prager. Extracting and labeling boundary segments in natural scenes.
IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-
20(1):16—27, January 1980.

[76] Narayan S. Raja and Anil K. Jain. Recognizing geons from superquadrics ﬁtted
to range data. Image and Vision Computing, 10(3):l79—190, 1992.

[77] Kashipati Rao. Shape Description From Sparse and Imperfect Data. PhD thesis,
University of Southern California, 1988.

[78] L. G. Roberts. Optical and Electro—optical Information Processing, chapter Ma-
chine Perception of Three-dimensional Solids, pages 159—197. CambridgezMIT
Press, 1965.

[79] Hille Rom and Gerard Medioni. Hierarchical decomposition and axial shape
description. In Proceedings of International Conference on Computer Vision
and Pattern Recognition, pages 49—55, Champaign, Illinois, June 1992.

[80] A. Rosenfeld. Pyramid algorithms for perceptual organization. Behav., Res,
Methods, Instr., and Computers, 18(6):595—600, 1986.

[81] Azriel Rosenfeld. Axial representations of shape. Computer Vision, Graphics,
and Image Processing, 33:156—173, 1986.

[82] Azriel Rosenfeld and Avinash C. Kak. Digital Picture Processing. Academic
Press, 1982.

[83] James Rumbaugh, Michael Blaha, William Premerlani, Frederick Eddy, and
William Lorensen. Object-Oriented Modeling and Design. Prentice Hall, 1991.

[84] H. Samet. Region representation:quadtrees from boundary codes. Comm. ACM,
23:163—170, 1980.

[85] P. T. Sander and S. W. Zucker. Inferring surface trace and differential structure
from 3-d images. IEEE Trans. on Pattern Analysis and Machine Intelligence,
12(9):833—854, September 1990.

268

[86] S. Sarkar and K. L. Boyer. Integration, inference, and management of spatial
information using bayesian networks: Perceptual organization. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, 15(3):256—274, March
1993.

[87] E. Saund. Symbolic construction of a 2d scale space image. IEEE Trans. PAMI,
12(8):817—830, 1990.

[88] E. Saund. Labeling of curvilinear structure across scales by token grouping. In

Proc. IEEE CVPR, pages 257—263, 1992.

[89] Steven A. Shafer. Shadows and Sihouettes in Computer Vision. Robotics and
Vision. Kluwer Academic Publishers, 1985.

[90] Linda G. Shapiro and Robert M. Haralick. Structural descriptions and inexact
matching. IEEE Transactions on Pattern Analysis and Machine Intelligence,
3(5):504—519, 1981.

[91] J. Sklansky. Measuring concavity on a rectangular mosaic. IEEE Trans. Com-
puters, 21(12), December 1972.

[92] T. Sripradisvarakul and R. Jain. Generating aspect graphs for curved objects.
In Proceedings of IEEE Workshop: Interpretation 3D Scenes, pages 109—115,
1989.

[93] K. Stevens and A. Brookes. Detecting structure by symbolic constructions on

tokens. CVGIP, 37:238—260, 1987.

[94] P. Suetens, C. Smet, F. Van De Werf, and A. Oosterlinck. Recognition of
the coronary blood vessels in angiograms using hierarchical model-based iconic
search. In Proceedings of IEEE Conference on Computer Vision and Pattern

Recognition, pages 576—579, 1989.

[95] Paul Suetens, Fua Pascal, and Andrew J. Hanson. Computational strategies
for object recognition. Computing Surveys, 24(1):5-61, March 1992.

[96] Steven L. Tanimoto. Regular hierarchical image and processing structures in
machine vision. In Computer Vision Systems, pages 165—174. Academic Press,

1978. Edited by Allen R. Hanson and Edward M. Riseman.

[97] Gabriel Taubin, Ruud M. Bolle, and David B. Cooper. Representing and com-
paring shapes using shape polynomials. In Proceedings of IEEE Conference on
Computer Vision and Pattern Recognition, pages 510—516, 1989.

[98] Saeid Tehrani. Knowledge-Guided Boundary Determination in Low-Contrast
imagery: An Application to Medical Images. PhD thesis, University of Michi-
gan, 1991.

[99]

[100]

[101]

[102]

[103]

[104]

[105]

[106]

[107]

[108]

[109]

269

A. Trisman. Properties, parts, and objects. In Handbook of Perception and Hu-
man Performance. Wiley-Interscience, 1986. Edited by K.R. Boff, L. Kaufman
and J .P. Thomas.

Leonard Uhr. “recognition cones,” and some test results; the imminent arrival of
well—structured parallel-serial computers; positions, and positions on positions.
In Computer Vision Systems, pages 363—378. Academic Press, 1978. Edited by
Allen R. Hanson and Edward M. Riseman.

F. Ulupinar and R. Nevatia. Shape from contour: Straight homogeneous gen-
eralized cones. In Proceedings of ICC V’90, pages 582—586, 1990.

Faith Ulupinar and Ramakant Nevatia. Inferring shape from contour for curved
surfaces. In Proceedings of the 10th International Conference on Pattern Recog-
nition, pages 147—154, 1990.

Faith Ulupinar and Ramakant Nevatia. Recovery of 3—d objects with multi-
ple curved surfaces from 2-d contours. In Proceedings of IEEE Conference on
Computer Vision and Pattern Recognition, pages 730-733, 1992.

H. B. Voelcker and A. A. G. Requicha. Geometric modelling of mechanical
parts and processes. Computer, 10:48—57, December 1977.

Lars Westberg. Hierarchical contour-based segmentation of dynamic scenes.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(9):946—
952, September 1992.

Andrew P. Witkin and Jay M. Tenenbaum. On the role of structure in vision.
In Human and Machine Vision, pages 481—544. Academic Press, 1983. Edited
by Jacob Beck, Barbara Hope, and Azriel Rosenfeld.

S. W. Zucker and R. M. Hummel. A three-dimensional edge operator. IEEE
Trans. on Pattern Analysis and Machine Intelligence, 3(3):324—331, May 1981.

Steven W. Zucker. Vertical and horizontal processes in low level vision. In
Computer Vision Systems, pages 187—195. Academic Press, 1978. Edited by
Allen R. Hanson and Edward M. Riseman.

Steven W. Zucker, R. A. Hummel, and A. Rosenfeld. An application of relax-
ation labeling to line and curve enhancement. IEEE Transaction on Computers,
26, 1977.

"‘Tlllllllllillllllill“