dir-
.... ..
._ twirmur/
. a»: it.“ a

.

 

5r
V. . ’ {If
«44!!!! try}. I.
‘- ﬂgiouvoh'tp «flax!
~ a. 1!. 7'1!
”an? ..
'

.t.¢b.‘.
121.1..1

.nh-olr-Vﬂt
{a , 9.»... c

 

:J 5.x...-

uaoanlrl 3
.311. (1.13
c.

 

«f! .
(—5: n
a.
.

 

 

 

'1‘:

' .
virtrié
.§§.¥:h
‘0! 17.6111.
:3)" . «in

 

it:
.
o. 11:55:!an
(Irina-3y) 555.!!!
In! .-
”v.15. nut}. 5 21.
v .I
004!!!

(flit;
[I III-(lob):
(:7?
lili-

 

e .l

ind»: i311:

.Jro c; It... (I.
fat-97o.

1v...l:..
...r...:

1.
cl..€.v:
.(.,..

       

 

. part.“
I! Irihuxr... I
.larr‘ol/fryl
lea‘litf‘... I
Vol.8, - 1753 v . .
.5?!

......

.1 at: I
III! !

.--.c~-.o-... .

 

. ‘
:..........

 

 

 

r .15....
M - ... .2: .23.: I an
In

. ~ :

‘ , 4 . .
. . n. 311.2. 5.22%"th-
;...+?«uvww .uuhﬁﬂn ..._an.=.§=§aqmm§,t. .. . .

 

IN
V..“.o.:..!..... .23.?325

.....

 

 

f

 

HEM?

l HIGAN TATE

1 wall llzlll1HIllllllllllllllllﬂlilll

31 93 00899 0800

 

 

 

 

r

This is to certify that the

dissertation entitled

OBTAINING GENERIC PARTS FROM
RANGE IMAGES USING A MULTI-VIEW
REPRESENTATION

presented by

Narayan Sriranga Raja

has been accepted towards fulﬁllment
of the requirements for

PA ' D ’ degree in WSW

 

 

 

Major professor

Date W W, m»

 

MS U is an Afﬁrmative Action/Equal Opportunity Institution 0‘ 12771

 

LXﬁkuLRY
Michigan State

University 1
L . J

 

 

-Oﬂﬁ-mlpII- '0 I

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.

DATE DUE DATE DUE DATE DUE l!

 

 

 

 

 

 

 

 

 

 

 

 

‘7'

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MSU le An Afﬁrmative Action/Equal Opportunity Institution
owns-ed

 

 

 

OBTAINING GENERIC PARTS FROM
RANGE IMAGES USING A MULTI-VIEW
REPRESENTATION

By

N arayan Srimnga Raja

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

Computer Science Department

1992

ABSTRACT

OBTAINING GENERIC PARTS FROM
RANGE IMAGES USING A MULTI-VIEW
REPRESENTATION

By

N amyan Sriranga. Raja

We describe a system for obtaining a “generic” parts-based 3-D object represen-
tation. We use range image data as the input, obtaining a 3-D object representation
based on twelve geon-like 3-D part primitives as the output. The 3-D parts-based
representation consists of parts detected in the image, and their identities. Unlike
previous work, we do not make simplifying assumptions such as the availability of
perfect line drawings, perfect segmentation, or manual segmentation.

We propose a novel method of specifying “generic” 3-D parts, i.e., by means of
Surface Adjacency Graphs (SAGs). Using the SAGs, we derive an extremely compact
multi-view representation of the part primitives, consisting of a total of only 74 views
for all 12 primitives. Based on the multi-view representation of parts, we present
a method of performing part segmentation from range images, given a good surface
segmentation. This method for part segmentation is more general than common

approaches based on Hoffman and Richards’ “principle of transversality.”

We present two approaches for identifying the parts as one of the 12 3-D part
primitives. The ﬁrst approach applies statistical pattern classiﬁcation methods us-
ing parameters estimated by superquadric ﬁtting. Five features derived from the
estimated superquadric parameters are used to distinguish between the 12 part prim-
itives. Classiﬁcation error rates are estimated for k-nearest-neighbor and binary tree
classiﬁers, for real as well as synthetic range images. The second approach for part
identiﬁcation draws inferences from the distribution of angles between surface normals
and the principal axis of a part.

We show that intensity data can be used to recover from some misclassiﬁcations
yielded by the purely range-based methods of part identiﬁcation. A simple test is
applied to check the concavity or convexity of the part silhouette in the intensity
image. This serves as a reliable test of whether the part axis is straight or curved.

Results of part segmentation and identiﬁcation are presented for real range images
of several multi-part objects. Our system successfully performs part segmentation and

identifies the parts.

Copyright © by
Narayan Sriranga Raja
1992

To my Parents, Grandparents, and Great-aunt,

and to Min-Sun and Meera

ACKNOWLEDGMENTS

Many people contributed in many ways to make my Ph.D. thesis possible. Al-
though such a brief mention is inadequate, I gratefully acknowledge the contributions
of the following people.

Above all, I am grateful to my adviser, Dr. Anil Jain: for being patient when I
was choosing a problem to work on, but also for being impatient when it was time
for me to move on. He provided useful references and insightful suggestions almost
every week, supported me as a Research Assistant for 4 years, and by appointing me
a PRIP manager, made it possible for me to enjoy unlimited amounts of computer
hacking.

I am grateful to my parents, Smt. Lakshmi Narayan and Shri S. Narayan, to my
grandparents, and to my great-aunt Smt. J anakiammal, for bringing me up in such a
way that I was eventually interested and able to embark upon a Ph.D. program. For
the same reason, I am also grateful to my teachers at school and college in India.

Min-Sun provided invaluable moral support and sympathy for several years, and
my sister Meera boosted my conﬁdence by taking it for granted—mistakenly, at
times—that I was making steady progress and would soon be a Dr. Raja. My uncle,
Shri S. Krishnan contributed to my intellectual growth by keeping up a regular supply
of stimulating books.

Besides Dr. Jain, I would also like to acknowledge the contributions of my other
committee members—Dr. Koul, Dr. Stockman, Dr. Tiiceryan, and my favourite fac-
ulty member, Dr. Enbody. It was Dr. Tiiceryan, with his AI class, who got me inter-
ested in Computer Vision in the ﬁrst place. I owe him special thanks. I am grateful
to Dr. Stockman for conducting an excellent course on 3-D object recognition.

Every past and present member of the Pattern Recognition and Image Processing
Group (PRIP) deserves my gratitude. Each in their own way, they made the PRIP
lab an ideal place to work. M. D. Ramaswamy encouraged me to join the PRIP
group. Pat Flynn, Joe Miller, Debby Trytten, Sally Howden, and Farshid Farrokhnia
tolerated my initial bumbling and taught me how to use the lab equipment. Our

full-time managers, Dave Marks and John Lees, and all the various student managers

vi

ensured that the show would go on. J inlong Chen, Jon Courtney, Chitra Dorai, Marie-
Pierre Dubuisson, Hans Dulimarta, Qian Huang, Tim Newman, Philippe Ohanian,
Philippe Ballard, Sushil Bhattacharjee, Steve Walsh, Sharat Pankanti, Jian-Chang
Mao, and Yao Chen were helpful friends and knowledgeable colleagues.

Greg Lee, Satisha N adabar, Debby Trytten, and I, having taken our comprehen-
sive examinations together, came to be known as the Gang-of-Four. Ours will be a
lifelong friendship. Besides occasional technical chat, we kept one another sane by lis-
tening sympathetically to each other’s problems at our weekly Gang-of-Four lunches.
Towards the end, our conversations regularly turned to topics such as suicide, murder,
and cannibalism! Nevertheless, we bounced back soon after the Ph.D. defence and
are already planning the First International Gang-of—Four Conference.

In my work, I used several pieces of software developed by others. Pat Flynn wrote
dozens of useful HIPS and range image utilities, including a surface segmentation
procedure based on earlier work by Rick Hoffman. Honda Shing developed the MSU
Ph.D. thesis style ﬁle in LATEX. I am very grateful to Prof. Ruzena Bajcsy, Franc
Solina, Alok Gupta, Helen Anderson, Luca Bogoni, and others from the GRASP Lab
at the University of Pennsylvania, for making available their software for superquadric
ﬁtting. Farshid Farrokhnia and Suryana Setiawan adapted and improved the software
for ﬁtting a polygonal approximation, originally written by myself and Farshid.

Carol Case, Cathy Davison, and Lora Mae Higbee took care of all kinds of nec-
essary administrative work. I am grateful to them for helping me on innumerable
occasions. .

If I have forgotten to include anyone who should have been mentioned, I sincerely
apologize. I hereby acknowledge their help.

vii

TABLE OF CONTENTS

LIST OF TABLES
LIST OF FIGURES

1 Introduction

1.1 Object Recognition ............................
1.2 Object Representation and Matching ..................
1.3 3-D Parts-based Representation .....................
1.4 Range Images ...............................
1.5 Contributions of this Thesis .......................
1.6 Organization of this Thesis ........................
1.7 Summary .................................

2 3-D Object Representation
2.1 Introduction ................................

2.2 Viewpoint-independent vs. Viewpoint-dependent Representations

2.3 Object-intrinsic vs. Parts-based Representations ............
2.4 Examples of Object-intrinsic Representations ..............
2.4.1 Feature-based Approaches ....................
2.4.2 Aspect Graphs ..........................
2.5 Examples of Parts-based Representations ................
2.5.1 Generalized Cylinders or Generalized Cones ..........
2.5.2 “Geons” and Recognition By Components ...........
2.5.3 Superquadrics ...........................
2.5.4 Other Part Primitives ......................
2.6 Summary .................................

3 Parts and Part Segmentation

3.1 ChoiceofPartPrimitives......................._..
3.1.1 Notation ..............................
3.2 Surface Adjacency Graphs ........................

viii

xi

xii

GWMH

11
12
14
15

16
16
19
20
23
23
24
28
29
32
37
42
43

44
44
46
47

3.2.1 Surface Classiﬁcation by Curvatures ...............

3.2.2 SAGs of Part Primitives .....................
3.3 Multi-view Representation of Parts ...................
3.3.1 Catalog of Part Views ......................
3.4 Part Segmentation from Surface Segmentation .............
3.4.1 Surface segmentation and classiﬁcation, .............
3.4.2 Using the Catalog of Part Views .................
3.4.3 Part Identiﬁcation: Two Approaches ..............
3.5 Summary .................................

Geons from Superquadrics

4.1 Motivation .................................
4.2 Relating Superquadrics and Geons ...................
4.3 Experiments in Shape Discrimination ..................
4.3.1 Choice of Features ........................
4.3.2 Design of Experiments ......................
4.4 Experimental Results ...........................
4.4.1 Parameter Estimation ......................
4.4.2 Part Discrimination ........................
4.4.3 Confusion Matrices ........................
4.5 Summary .................................
Direct Tests of Shape Attributes
5.1 Variation of Cross-section Size ......................
5.1.1 Principal Axis Orientation ....................
5.2 Axis — straight or bent? .........................
5.3 Experimental Results ...........................
5.4 Summary .................................
Obtaining Parts-based Representation
6.1 Test Image 1 ...............................
6.1.1 Part Segmentation ........................
6.1.2 Part Identiﬁcation ........................
6.2 Test Image 2 ...............................
6.2.1 Part Segmentation ........................
6.2.2 Part Identiﬁcation ........................
6.3 Test Image 3 ...............................
6.4 Test Image 4 ...............................

ix

50
51
57
57
68
71
72
76
77

78
79
81
83
83
84
92
92
97
102
104

108
109
114
116
122
125

127

129
129

131
132

6.5 Test Image 5 ............................... 134

6.6 Test Image 6 ............................... 134
6.7 Test Image 7 ............................... 138
6.8 Test Image 8 ............................... 138
6.9 Test Image 9 ............................... 138
6.10TestImage10.................. ............. 142
6.11 Test Image 11 ............................... 142
6.12 Test Image 12 ............................... 142
6.13 Summary ................................. 146
7 Conclusions and Future Research 148
7.1 Contributions of the Thesis ....................... 149
7.2 Shortcomings ............................... 150
7.3 Recommendations for Further Work ................... 151
A Parameters 154
A.1 Surface Classiﬁcation ........................... 154
A.2 Constructing the SAG .......................... 154
A.3 Tree Classiﬁer ............................... 155
AA Direct test for Cross Section size ..................... 155
A5 Polygonal Approximation ........................ 155
A6 Direct test for Axis shape ........................ 155
BIBLIOGRAPHY ’ 157

3.1
3.2
3.3

4.1
4.2
4.3
4.4
4.5

4.6
4.7
4.8
4.9
4.10

4.11

4.12
4.13
4.14

5.1

LIST OF TABLES

Notation for specifying the 12 parts ...................
Curvature-based classiﬁcation into 5 surface types ...........

Some inferences of part shape based on surface type ..........

Mean and standard deviation of estimates of £1 for the 12 parts

Mean and standard deviation of estimates of £2 for the 12 parts

Mean and standard deviation of estimates of k, for the 12 parts . . .
Mean and standard deviation of estimates of k, for the 12 parts . . .
Mean and standard deviation of estimates of r/a, for the 12
parts ............................
Overall classiﬁcation error rates (in %) for synthetic range data
Classiﬁcation error rates (in %) by range image spatial resolution
Classiﬁcation error rates (in ‘70) by model “elongation” ........
12-class error rates (in %) for tree classiﬁer ...............
Overall classiﬁcation error rates (in %) for real range data from “rough”
models ...................................
Overall classiﬁcation error rates (in ‘70) for real range data from
“smooth” models .............................
Confusion table for the synthetic images ........
Confusion table for real images (“rough” models)

Confusion table for real images (“smooth” models)

Confusion table for real images (“smooth” models) ..........

xi

47
51
72

96
97
98
98

101

102

LIST OF FIGURES

1.1 Robust system for computing a 3-D parts-based representation . . . . 8
1.2 One module of the robust 3-D parts-based representation system . . . 9
1.3 Proposed system for obtaining a 3-D parts-based representation . . . 10

2.1 Aspect graph of a Tetrahedron ...................... 25
2.2 Four attributes yield 36 different geons ................. 34
2.3 Non-accidental properties indicate geon type .............. 34
2.4 Complex Objects represented using geons ................ 36
2.5 Effect of the parameter c on superquadric shape ............ 39
3.1 Parts with straight edges and constant cross-section .......... 47
3.2 Parts with straight edges and increasing cross-section ......... 48
3.3 Parts with straight edges and incr.-and-decr. cross-section ...... 48
3.4 Parts with curved edges and constant cross-section .......... 48
3.5 Parts with curved edges and increasing cross—section . . ‘ ........ 48
3.6 Parts with curved edges and incr.-and-decr. cross-section ....... 49
3.7 Coding scheme for SAG edge attribute ................. 52
3.8 SAG for the part s-s-co .......................... 53
3.9 SAG for the part b-s-co .......................... 53
3.10 SAG for the part s-s-t . . ., ....................... 54
3.11 SAG for the part b-s-t .......................... 54
3.12 SAG for the part s-s-id .......................... 55
3.13 SAG for the part b-s-id .......................... 55
3.14 SAG for the part s-c-co .......................... 55
3.15 SAG for the part b-c-co .......................... 56
3.16 SAG for the part s-c-t .......................... 56
3.17 SAG for the part b-c-t .......................... 56
3.18 SAG for the part s-c-id .......................... 56
3.19 SAG for the part b-c—id .......................... 57
3.20 S-aspects of a rectangular block ..................... 61

xii

3.21 S-aspects of an increasing-decreasing rectangular block ........ 62

3.22 S-aspects of a tapering rectangular block ................ 63
3.23 S-aspects of a cylinder .......................... 63
3.24 S-aspects of an increasing-decreasing cylinder ............. 64
3.25 S-aspects of a tapering cylinder ..................... 64
3.26 S-aspects of a curved-axis rectangular block .............. 65
3.27 S-aspects of a curved-axis, increasing-decreasing rectangular block . . 66
3.28 S-aspects of a curved-axis, tapering rectangular block ......... 67
3.29 S-aspects of a curved-axis cylinder .................... 68
3.30 S-aspects of a curved—axis, increasing-decreasing cylinder .' ...... 69
3.31 S-aspects of a curved-axis, tapering cylinder .............. 70
3.32 Two parts which meet along a convex crease .............. 74
3.33 Two parts which meet along a “mixed” crease ............. 74
3.34 L-shaped object for which part segmentation fails ........... 75
3.35 Another view of the L-shaped object .................. 76
4.1 Synthetic range images: shapes with straight axes ........... 86
4.2 Synthetic range images: shapes with bent axes ............. 87
4.3 Hand-carved geons with “rough” surface ﬁnish ............. 90
4.4 Real range images of “rough” hand-carved geons ............ 91
4.5 Real range images of some “smooth” geons ............... 93
5.1 Directions of principal axis and surface normals . . . . _ ........ 110
5.2 Part with constant cross-section ..................... 111
5.3 Part with increasing cross-section .................... 112
5.4 Part with increasing-and-decreasing cross-section ........... 113
5.5 Intensity image and silhouette of a straight-axis part ......... 118
5.6 Polygonal approximation of the straight-axis part. .......... 119
5.7 Intensity image and silhouette of a bent-axis part ........... 120
5.8 Polygonal approximation of the bent-axis part. ............ 121
5.9 Complete system for obtaining parts and their identities ....... 126
6.1 Correspondence between colors and alphabetical labels ........ 128
6.2 Object made up of a cylinder and a rectangular block ......... 130
6.3 Complicated object made up of 5 cylindrical parts ........... 133
6.4 Scene with occlusion, containing three Objects ............. 135
6.5 Object made up of two cylindrical parts ................ 136
6.6 Everyday object: a bulb ......................... 137
6.7 Funnel made up of a cone and a cylinder ................ 139

xiii

6.8 Object made up of a cylinder and a rectangular block ......... 140
6.9 Another view of the funnel ........................ 141
6.10 Complicated object made up of 5 cylindrical parts: second view . . . 143
6.11 Complicated object made up of 5 cylindrical parts: third view . . . . 144
6.12 View of an object resembling a table-lamp ............... 145
6.13Viewofacoffee-mug.............. ............. 147

xiv

CHAPTER 1

Introduction

The basic goal of computer vision, posed in its most Obvious form, is scene understand-
ing. That is, given a sensed 2-D array of picture elements of different brightnessee of
a 3-D scene, infer the presence, identity, position, and orientation of various “Objects”
in it. Animal vision is an existence proof that the vision problem can be solved; as
Lowe [60] remarks, “without this proof of feasibility, it is hard to imagine that anyone
would even think of attempting to interpret the array of light intensities projected
from a scene onto a two-dimensional screen”.

Ullman points out [91] that visual object recognition by humans is based on several
distinct kinds of reasoning. Small dark blobs ﬂying in an erratic, but characteristic
way are recognized as “ﬂies” on the basis of their motion. A door-knob, even one
of a completely novel shape, can be recognized from its context and from “world
knowledge”. Trees can often be distinguished from one another on the basis of their
coloring and textural appearance. Nevertheless, like most of the computer vision
community, we will restrict ourselves to shape-based 3-D object recognition.

Computer vision research shares some fundamental assumptions with the rest
of Artiﬁcial Intelligence. First, it assumes that like other perceptual and cognitive
capabilities, the visual faculty consists of “computational” processes, in some sense,

or can at least be simulated by such processes. Second, it assumes that corresponding

computational processes can equally be implemented on non-biological physical media
(for example, using cameras and electronic hardware instead of eyes and nervous
tissue). Computer vision, in its present form, is doomed to failure (in duplicating
human visual capabilities) if these assumptions are invalid.

The overall paradigm which is dominant among students of object recognition,
both by humans and by computers, postulates that object recognition is a two-stage
process. Object recognition is assumed to proceed by ﬁrst producing some internal
representation of the visual input, and then matching it against representations of
models known to the viewer. This is taken for granted by virtually all researchers.
Consequently, we can distinguish between, or classify, Object recognition systems

according to (a) their representation scheme, and (b) their matching scheme.

1.1 Object Recognition

The problem of object recognition is generally equated to the process of ﬁnding that
representation (of a model) in memory that “best” matches the representation of an
object in the scene. Clearly, this can be viewed as an instance of a more general prob-
lem, i.e., ﬁnding a solution from a solution space. Given the solution space, i.e., the
set of representations of all known models, the best solution has to be located. (Under
good viewing conditions, humans can identify objects with very little uncertainty—
so much so, that it makes sense to talk of a “correct” solution versus “incorrect”
solutions, rather than just the “best” solution from a continuum of possibilities).
Many other problems in Al have also been formulated in terms of searching or
traversing a solution space. Examples are game-playing (e.g., ﬁnd a “good” next
move when playing chess) and problem-solving (e.g., given the symptoms, diagnose
the patient’s illness(es)). These problems often deal with huge solution spaces and/or

search trees. Not surprisingly, therefore, they are hard problems if thus formulated.

While we do not know Of a better alternative formulation of the object recognition
problem at this point, it is worth remembering that the problem of adding 2 and 2
might also be difﬁcult if our addition algorithm searched for a solution in the set of
all possible integers.

Algorithms to solve the general problem of “finding a solution from a solution
space” can be classiﬁed into several broad categories. Since the “matching” stage
of object recognition can be viewed as an instance of this more general problem, a
natural taxonomy of different “matching strategies” can be obtained by relating them

to these broad categories of algorithms.

1.2 Object Representation and Matching

An attempt to roughly quantify the problem of object recognition by search is very
useful in pinpointing the exact reason why it is so difﬁcult. At present, computer-
based 3-D object recognition systems can cope with a few tens of models at most.
For the purpose of argument, suppose there are 1,000 possible models, and they
can only appear singly or in pairs in an image. Further, suppose that occlusion
is not allowed. The total number of possible combinations of models is, therefore,
011000 + 021000, which is approximately 500, 000. Object recognition requires searching
for the correct solution from among these possibilities. But searching a mere 500, 000
candidates sounds rather trivial! For example, even a PC can sequentially search
through 500, 000 numbers in a few seconds. Why, then, should the object recognition
task posed above (1, 000 models, unoccluded, either singly or in pairs) be far beyond
any object recognition system built so far? Why is searching a set of numbers so easy,
and searching a set of model objects so difﬁcult?

There are two reasons for this, and they are closely related. Firstly, consider the

task of checking whether a particular candidate solution is correct or incorrect.

If we are searching a set of numbers to ﬁnd the ﬁrst instance of, say, the number 7,
we can easily determine whether a particular candidate is the correct solution or not.
Here, the criterion for “correctness” of a solution is very clearly deﬁned—equality of
two numbers—and can be tested very quickly. By contrast, the analogous problem in
object recognition, i.e., “veriﬁcation,” is neither easy to deﬁne nor to perform. Given
some object, suppose we are searching through the solution space of possible models.
Testing whether a particular candidate is correct or not is itself a difﬁcult task. That
is, given some object, the question “Is it model #5, i.e., the cobra head?” itself is a
laborious one to answer given current techniques.

The reason is that answering this question could itself involve examining all the
possibilities in another search tree! This is the interpretation tree [39], which essen-
tially takes each component (e.g., 2-D or 3-D contours, or surface patches) of the
object and explores matching it against every component Of the model being consid-
ered. (Various heuristics are used to prune the interpretation tree, but it is still not
a trivial operation). Testing whether a particular candidate solution is correct would
be much simpler if every object and model were bar-coded! The computer system in a
supermarket has no difﬁculty in performing its own form of visual object recognition,
given the laser-readable unique bar-codes attached to objects.

We cannot expect every object in the universe to be conveniently labeled in such
a manner. Nevertheless, this aspect of the object recognition problem has not been
viewed before from this perspective. In our opinion, it would be very interesting
to investigate and improve methods for checking candidate solutions. Ideally,
what is desired is a module which, when asked, say, “is this object a cobra head?”
could answer “yes” or “no” very rapidly. If such a module were available, searching
the solution space of models for object recognition would be as easy as searching a
set of numbers.

Secondly, there is the problem of how to organize the solution space itself.

Undergraduate courses in data structures and algorithms teach us that sorting and
indexing are invaluable for searching a large set. Searching a set of 500, 000 numbers is
easy, because a key and an ordering are easily deﬁned. The set of numbers can be or-
ganized as a sorted array or as a search tree. Binary search can then explore this data
structure in logarithmic time. As usual, unfortunately, we do not have comparably
effective methods to organize the set of known models in object recognition.

To recapitulate, virtually all vision researchers assume that object recognition
involves producing an internal representation of the scene, and then matching it
against internal representations of known model objects. This formulation of object
recognition can be seen as an instance of “ﬁnding a solution from a solution space.”

Now, consider the standard methodologies for the general problem of ﬁnding a so-
lution from a solution space. They include exhaustive search (with various ﬂavors
or reﬁnements like depth-ﬁrst search, breadth-ﬁrst search, backtracking, alpha-beta
pruning, branch-and-bound, or dynamic programming), binary search of sorted
data, indexing methods, hashing schemes, the use of content-addressable mem-
ories, and various probabilistic algorithms. As expected, the common matching
schemes for object recognition can be related to these general solution methodolo-
gies. Due to the lack of effective sorting and indexing schemes, analogous matching
strategies for object recognition are not well-developed. Hypothesize-and—test and in-
terpretation tree methods [40, 51] are versions of exhaustive search. The generalized
Hough transform, pose clustering, and geometric hashing [18, 19, 45, 56, 85] can be
viewed as a type of parameter-based hashing, combined with voting. Artiﬁcial neural
networks for recognition are a kind of content-addressable associative memory.
With the possible exception of the RANSAC paradigm [32], the use of probabilistic

algorithms in machine vision has not been explored.

1.3 3-D Parts-based Representation

We presented a general framework within which various matching strategies for object
recognition ﬁt quite naturally. While doing so, we emphasized two major difﬁculties
encountered during the matching step of object recognition, i.e., excessive number
of hypotheses which need to be veriﬁed, and the difﬁculty of veriﬁcation itself. This
motivates an intelligent choice of the underlying representation scheme so that these
two difﬁculties can be minimized.

A 3—D parts-based representation is one which models objects in terms of a set
of 3-D part primitives together with the relative positions and orientations of the
parts. The appeal of 3-D parts-based representations arises in large measure because
they offer more sophisticated, yet natural ways to organize and index into the set
of object models. Geons are a set of qualitatively deﬁned 3—D part primitives [8].
We believe that geon-like volumetric part primitives are an excellent choice for 3-D
object representation because they are useful for indexing and have considerable shape
discrimination capability. Geons are deﬁned and discussed extensively in Chapter 2.

Although geons were originally proposed in the context of line drawings, their
utility for object recognition is not limited to one particular kind of input data. The
geon theory can be equally useful for object recognition from range images. Hence
the use of dense range image data to detect and identify geons is quite justiﬁable.

Assuming the use of a 3—D parts-based representation, computing this representa-
tion is the crucial ﬁrst stage of object recognition. Over the last three decades, there
has been an enormous amount of research in scene understanding by computer. There
has been much success in solving speciﬁc problems under highly limiting assumptions.
However, almost every system reported in the literature is fragile, i.e., performance
degrades sharply if the very speciﬁc, often unrealistic, underlying assumptions are

violated.

In our view, a robust system for computing a 3-D parts-based representation,
capable of operating in a general setting (noisy, cluttered, and with possible occlu-
sion), can be reliably achieved only by integrating different kinds of information. This
information would include intensity data, imperfect line drawings, “intrinsic array”
data such as depth and orientation, as well as hypotheses about line labelings, surface
segmentation, part segmentation and part identities.

At any given time, each of these kinds of information might not be complete, but
only partial, imperfect, or hypothesized. The robust system (Figure 1.1) might consist
of a number of co-operating modules, each of which, given the current hypotheses
about all the above kinds of information, would output an improved version or better
hypothesis about one speciﬁc kind of information. (In Figures 1.1 and 1.2, rectangular
boxes represent modules for performing some computation, whereas round and oval
boxes represent information).

For example, the part segmentation module might take as input the intensity and
range data, current versions of the imperfect line drawing, hypothesized line labeling,
hypothesized segmentation into surfaces, hypothesized segmentation into parts, and
hypothesized identity of various parts. It would produce as output an improved
version of the part segmentation (Figure 1.2). Iterative reﬁnement of the hypotheses
would continue till an acceptable 3-D part representation (i.e., segmentation into
parts and their identiﬁcation as speciﬁc part primitives) is obtained.

To put this dissertation in perspective, the work described here can be viewed as a
partial implementation of two modules—part segmentation and part identiﬁcation—
Of the idealized, robust system described above for computing a 3-D parts-based
representation. Range image data are the only input information used for part seg-
mentation and part identiﬁcation. There is no ongoing interaction between the two
modules, nor are the hypothesized part segmentations or part identiﬁcations improved

iteratively as in the design proposed in Figures 1.1 and 1.2.

cm‘
mot and
Hypotbucs Range data

 

 

 

 

 

 

, f , , i v - , \ _ Vs -
Line Linc Surface Pan ‘ Put
Drawing Labeling Segmentation Segmentation Idatiﬁcntion

\ 7 7 \ I / i /

 

 

 

 

Figure 1.1. Robust system for computing a 3-D parts-based representation

Line Line Surface
Drawing Labeling Segmented .

   
 

 

 

Module
for
Part Segmentation

 

 

Pan
Segmented .

Figure 1.2. One module of the robust 3-D parts-based representation system

10

Surface

Segmentation

 

Range
. lrnage Part
data Segmentation

 

 
 

1
Identiﬁcation ]

3-D Parts
and their
Identities

Figure 1.3. Proposed system for obtaining a 3-D parts-based representation

11

A block diagram of our work is shown in Figure 1.3. Surface segmentation is
performed by the method of Flynn [33], based on the earlier work of Hoffman and
Jain [50]. We have designed and implemented our own methods for part segmentation

and part identiﬁcation. These are discussed in subsequent chapters.

1.4 Range Images

Most researchers in computer vision have used and continue to use intensity images
as the input. However, since the early ’80s there has been an increasing use of range
images [13]. With intensity images as the input, considerable research effort has been
devoted to various Shape-from-X computations such as shape from shading, shape
from texture, shape from stereo, shape from motion, etc. These efforts were aimed
at computing various kinds of 2.5 D information, especially the depth or range value
at each pixel.

With the development of various range sensors, there is an increasing tendency
to assume the availability of this 2.5 D data directly in the form of range images.
The advantage of using accurate range data is that it enables researchers to bypass
the difﬁcult “lower-level” computations of 3-D depth and orientation values from
intensity values, and work directly on 3-D representation and later stages of 3-D
object recognition. Disadvantages are that sensors for acquiring high-quality range
images tend to be slower and much more expensive than the ordinary CCD cameras
used to acquire intensity images.

Another recent development is the use of fused range and intensity data for various
aspects of computer vision research [57, 65]. Registration, i.e., establishing the corre-
spondence of points in two separate images, is the major problem in fusing separately
acquired range and intensity data.

Flynn [33] describes 3-D sensing in more detail, while Brady et al. [13] give an

12

overview of recent research using range imagery. We will give a brief description of
our range sensor, i.e., the 100X White Scanner from Technical Arts Corporation [88].

The 100X scanner is a structured-light range sensor. It projects a plane of laser
light which illuminates objects placed on the work stage. The shape of the resulting
light stripe is determined by the 3-D contour of the object onto which the laser light
has been projected. The light stripe is imaged by a CCD camera with a 240 x 240
sampling array. The stage can be moved past the plane of light by stepper motors
for the X and Y axes. An object of known 3-D dimensions is used to calibrate the
scanner so that for each point on the stripe of laser light, the X, Y, and Z values can
be computed.

Range data acquired by the 100X scanner are estimated to have a noise standard
deviation of less than 0.01 inch in the Z value. A range image takes 2 to 3 minutes
to acquire, and may be as large as 240 rows by 240 columns. Data are in the form of
X, Y, and Z values at each pixel.

For two reasons, we have chosen to use range images as the input to our system
for computing a 3-D parts-based representation. First, the problem of Obtaining a
3-D parts-based representation in terms of a distinct, fairly large set of geon-like
parts from range images has not been explored previously. Second, previous work on
obtaining a geon-based 3-D representation has made certain simplifying assumptions
such as the use Of perfect line drawings, perfect part segmentation, and/or manual
segmentation. We hope that the use of range images may enable us to avoid these

unrealistic assumptions.

1.5 Contributions of this Thesis

This dissertation describes research on obtaining a “generic” parts-based 3—D object

representation. By “generic’ we mean that the part primitives are deﬁned by qualita-

13

tive shape properties and not by exact numerical measures. We use range image data
as the input, obtaining a 3-D object representation based on a small number (12) of
geon-like 3-D part primitives as the output. The 3-D parts-based representation con-
sists of parts detected in the image, and their identities. Previous work on obtaining a
3-D object representation in terms of geons or geon-like parts [8, 4, 3, 21, 22] assumed
perfect line drawings as the input, and / or performed segmentation manually. We do
not make any such idealized assumptions. Real range images are used as the input,
and a segmentation is computed automatically from the image.

Multiple views of the 3-D part primitives are stored in order to help segment
objects into component parts and identify those parts as being speciﬁc part primitives.
Large, “meaningful” surfaces of the part primitives determine the views to be stored in
the multi-view representation. Two approaches are described for identifying the parts
as one of the 12 3-D part primitives. The ﬁrst approach applies statistical pattern
classiﬁcation methods using parameters estimated by superquadric ﬁtting. The second
approach draws inferences from the distribution of angles between surface normals
and the principal axis of a part. Intensity image data are used to recover from some
shortcomings of the method based only on range data.

The speciﬁc contributions of this dissertation are as follows:

0 We describe the ﬁrst attempt (to our knowledge) to obtain, from range data, a
3-D parts-based object representation in terms of a speciﬁc, but fairly large set

of 12 geon-like 3-D part primitives.

e We propose a novel method of specifying “generic” 3-D parts, i.e., by means of
Surface Adjacency Graphs (SAGs). Using the SAGs, we derive an extremely

compact multi-view representation of the part primitives.

0 Based on the multi-view representation of parts, we present a method of per-

forming part segmentation from range images, given a good surface segmen-

14

tation. This method for part segmentation is more general than common ap-
proaches based on Hoffman and Richards’ “principle of transversality” [47], i.e.,
the observation that parts usually come together with a concave crease at the

joint.

0 We describe two methods to perform the part identiﬁcation of isolated parts.
In doing so, we attempt to relate two inﬂuential families of 3-D part primi-
tives, superquadrics and geons. We test the shape discrimination capability of
superquadric parameters under varying conditions of image resolution, model
elongation, and noise. Lastly, we show that intensity data can be used to re-
cover from some shortcomings of the purely range-image based method of part

identiﬁcation.

1.6 Organization of this Thesis

In order to motivate and justify our design choices, it is necessary to review the
previous research on object representation and related issues. The following section
describes the organization of this report.

Chapter 2 classiﬁes and discusses object representations. Chapter 3 describes
our choice of 3-D part primitives. It sets forth a very compact multi-view representa-
tion of the parts, and explains how this can be used for part segmentation. Chapter
4 presents our ﬁrst approach towards identiﬁcation of the parts previously isolated
by the segmentation algorithm. It also tests the shape discrimination capability of
superquadric parameters under varying conditions of image resolution, model elon-
gation, and noise. Chapter 5 presents a second approach towards part identiﬁcation,
using the multi-view representation as well as the distribution of angles of surface nor-
mals with respect to the principal axis of a part. A method for using intensity data

to recover from some Shortcomings of the purely range-based approach to part iden-

I5

tiﬁcation is also described. Chapter 6 gives the results of part segmentation and part
identiﬁcation for several multi-part objects. Finally Chapter 7 concludes, identifying

some unresolved issues and offering suggestions for future research.

1 .7 Summary

Object recognition is usually posed as a two—stage problem, i.e., representation fol-
lowed by matching. Object recognition systems can therefore be meaningfully classi-
ﬁed and compared by identifying their representation schemes and matching strategies
within an coherent overall framework. Excessive number of hypotheses to be veriﬁed,
and the complexity of veriﬁcation itself, are the main difﬁculties encountered during
the matching stage of object recognition. This motivates the choice of powerful geon-
like volumetric part primitives for 3-D object representation, in order to minimize
these two difﬁculties.

The research described in this dissertation is aimed at obtaining a parts-based 3-D
object representation, using range data as the input. We derive an extremely compact
multi-view representation of the part primitives, based on their large, “important”
surfaces. We use the multi-view representation to perform part segmentation, given
a good surface segmentation. Further, we develop two methods to perform the part
identiﬁcation of isolated parts. We also attempt to relate two inﬂuential families of
3-D part primitives, superquadrics and geons. We test the shape discrimination capa-
bility of superquadric parameters under varying conditions of image resolution, model
elongation, and noise. Lastly, we show that intensity data can be used to recover from

some shortcomings of the purely range-based method of part identiﬁcation.

CHAPTER 2

3-D Object Representation

Besl and Jain [6], Binford [10], Chin and Dyer [17], and Stockman [86] give broad
surveys of object recognition systems. Brady et a1. [13] discuss the recent work in
object recognition using range data. This chapter brieﬂy surveys some inﬂuential

schemes for 3—D object representation.

2.1 Introduction

Representation schemes can be classiﬁed in many different ways, of which two seem
most fundamental. First, they may be either viewpoint-independent or viewpoint-
dependent. Second, they may adopt either an object-intrinsic representation, or one
which is parts—based. Other important differences are whether a scheme involves
geometric modeling of the object, or instead uses some symbolic representation. In
the case Of representations performing geometric modeling of 3-D objects, one can
also distinguish between volumetric (i.e., 3—D) and surface-based (i.e., 2.5-D) shape
primitives.

A wide variety Of representation schemes is available for 3—D object recognition.
The underlying primitives or features for representation include 2-D or 3-D points,

such as polyhedral vertices or the center of a sphere; 2-D or 3-D contours such as

16

l7

intensity edges or 3-D jump edges or crease edges; 2.5 D features such as object wings;
surface patches of various types such as planar, concave, convex, etc; parametrized
surfaces such as quadrics or splines; and ultimately 3-D volumetric primitives such as
generalized cylinders, geons, superquadrics, and deformable models.

Assuming that objects have “internal representations”, the nature of the map-
ping between objects and representations is one point of divergence. Viewpoint-
independent schemes may lead to a one-to—one mapping. Viewpoint-dependent rep-
resentations can result in a one-to—many relationship between objects and their
internal representations. A many-to-one mapping could be produced when individ-
ual objects are lumped together into a single category. For instance, we often ignore
the unique shape of individual lampshades or shampoo bottles. Indeed, objects like
these are Often recognized by properties other than shape, such as context or function.
The same glass object which looks like a lampshade when attached to the ceiling may
appear like a fruit-bowl or ﬂower-vase if placed on a table. Such categories might
be represented by a generic, prototypical model of a “typical-shaped” lampshade,
shampoo bottle, etc.

It would be quite reasonable to classify representation schemes according to, say,
their memory requirements or their degree of psychological plausibility. Two classiﬁ-
cations, however, seem most fundamental: whether the representation is viewpoint-
independent or not, and whether it is “object-intrinsic” or parts-based. These distinc-
tions have immediate implications for other properties such as memory requirements.

The ultimate test of a representation scheme, of course, is its usefulness and ef-
fectiveness for object recognition. This being understood, having classiﬁed various
representation schemes, how can we compare them? Marr and Nishihara [62] pro-
pose three criteria for doing so. Firstly, the representation’s accessibility, i.e., can the
desired description be computed from an image, and can it be done relatively inex-

pensively? The second criterion is the representation’s scope and uniqueness. What

18

class of shapes can the representation handle, and do the shapes in that class have
unique descriptions? For example, a representation intended for polyhedral objects
would have difﬁculty with curved surfaces. Also, to avoid the problem of deciding
whether two different representations describe the same shape, it is important that
shapes have unique descriptions as far as possible. Thirdly, there is the stability and
sensitivity of a representation. To be useful for recognition, the similarity between
two shapes must be reﬂected in their descriptions. At the same time it should be
possible to represent even subtle differences. For instance, when using stick ﬁgures
to represent animal shapes, stability might be increased by using larger sticks, while
smaller sticks would improve sensitivity by revealing ﬁner details.

Dickinson ([20], pg. 4) also proposes several criteria along which representation
schemes differ. Firstly, there is the primitive complexity, ranging from the simplest
2-D points, through 3-D points and contours, to complex 3-D volumetric primitives.
Next, model complexity generally decreases as the primitive complexity increases,
since the same object can either be described using many simple features or a few
complex features. Thirdly, the large number of simple features may lead to higher
search complexity, i.e., the number of hypothesized matches between features in the
model and scene representations. Fourthly, simple features have less discriminatory
power and may increase the recognition procedure’s reliance on veriﬁcation. Model
ﬂexibility, i.e., sensitivity to changes in shape, is another important criterion. Lastly,
there is the issue of ease of recovery, i.e., the difﬁculty or otherwise of computing the

representation from an input image.

19
2.2 Viewpoint-independent vs.
ViewPOint-dependent Representations

Mart and Nishihara [62] list some important choices to be made in the design of a
representation scheme. One of these, the choice of viewer-centered or object-centered
co-ordinate system, is rather similar to our distinction between viewpoint-independent
and viewpoint-dependent schemes. However, the latter is more general because it does
not assume that a representation must necessarily include a co-ordinate system.

For instance, some representation schemes may not perform geometric modeling at
all. An aspect graph representation cannot be said to use viewer-centered or object-
centered co-ordinates, but still captures a viewpoint-independent topological prop-
erty. In spite of this distinction, we will sometimes use terms like “viewer-centered”
and “object-centered” interchangeably with “viewpoint-dependent” and “vieWpoint-
independent” .

Representations which require geometric modeling of objects must have an as-
sociated co—ordinate system. Such schemes can be classiﬁed as viewer-centered if
locations are speciﬁed relative to the viewer, or as object-centered if locations are
speciﬁed in a co-ordinate system deﬁned by the viewed object [62].

Since vieWpoint-dependent properties can be obtained directly from an input im-
age, viewer-centered representations may be easier to compute. The advantage is
that, unlike in an object-centered representation, one does not have to compensate
for the vantage point. However, different views of the same object may have to be
treated essentially as distinct Objects. This may require a very large number of views
of each Object—potentially, every possible view of every possible object.

The other major alternative is to use an object-centered representation. Ideally,
just one canonical description, independent of the vantage point, would have to be

stored for each object. This is the approach favored by Marr and Nishihara [62] and

20

many others. The disadvantage is that the representation can be difﬁcult to compute
from the image. Not only does a unique co-ordinate system have to be deﬁned for
each object, but this has to be identiﬁed from the image before constructing the
description. This can be a problem when the object has several plausible natural
axes. Further, the effects of perspective projection haVe to be corrected for.

Other shape representation schemes have also been explored by cognitive psychol-
ogists [46]. For instance, objects might have a single viewpoint-dependent representa-
tion corresponding to a “canonical upright” position. A “mental rotation” operation
would transform an input shape into the canonical orientation before memory and in-
put are compared. Experiments suggest that shapes may be represented by a limited
number of viewer-centered descriptions, corresponding to a variety of orientations at
which the object is likely to be observed. Work by Hinton and Parsons [46] indicates
that people may use scene-based representations in some shape comparison tasks.
These are descriptions relative to some larger salient object in the scene, such as a
room, table-top, blackboard, or page. When subjects were asked to decide whether
two objects (at different lines of sight from the viewer) were identical or mirror im-
ages, they often physically rotated one of the objects till it had the same relationship

to the table-top (and room) as the other Object.

2.3 Object-intrinsic vs. Parts-based Representa-
tions

This distinction has to do with the “unitary-ness” and organization of the information
in a representation scheme. It is closely related to the distinction between statistical
and structural approaches in pattern recognition. Ullman [91] makes a very similar
distinction between invariant properties methods and parts-decomposition methods

for object recognition. An object-intrinsic representation consists of some overall in-

21

variant features or properties characteristic of the object as a whole. It does not
model an object as a compound entity consisting of its “parts” arranged in some
spatial conﬁguration. Examples of object-intrinsic representations range from sim-
plistic schemes like template-matching, through spatial Fourier-domain descriptions
or representation by a suitable set of features. We would even classify a fairly recent
and sophisticated approach like aspect graphs as an object-intrinsic representation
scheme.

A “parts-based” representation, on the other hand, considers an object as having
one or more meaningful parts which are put together in a particular spatial conﬁgu-
ration, and which may possibly have further detailed descriptions of their own. This
is quite similar to structural description theories of shape recognition. A structural
description is a data structure which can be thought of as a list of propositions,
whose arguments correspond to parts of the object, and whose predicates correspond
to properties of the parts and spatial relationships among them [63, 73]. A structural
description is a more general extension of a parts-based description, since it allows
for representation of non-visual and non-spatial information as well.

Marr and Nishihara [62] list the organization of shape information about an
object as one of the important design choices for a representation. In the simplest
case, all the elements in a description have the same status, and no organization
is imposed. Alternatively, primitive elements of the description could be organized
into modules, in order to distinguish certain subgroupings of the primitives. Thus
they can be referenced efﬁciently, and properties can also be associated with them. A
modular organization might be particularly useful for recognition because it can make
“stability” and “sensitivity” differences explicit, by arranging that all constituents of
a given module lie at roughly the same level of stability and sensitivity [62].

The explicit representation of spatial relationships of parts, i.e., geometric mod-

eling in some sense, is a key aspect of parts-based descriptions. This distinguishes

22

them from “unitary” feature-based descriptions which may very well be based on
geometrical parts (e.g., holes, corners), without modeling the spatial relationships
between those features themselves. Thus, it appears that parts-based descriptions
must necessarily have (one or more) associated co-ordinate systems.

By representing different parts of an object as separate modules, we may be able
to break up the recognition process into simpler subprocesses [73]. We may, for
instance, merely need to be able to visually recognize a small number of part and
attribute primitives. Additionally, if logical and spatial predicates can coexist in the
representation, structural descriptions can differentiate between parts that must be
present in an object, parts that may be present with some probability, and parts
that must not be present. Another advantage is that some visual information in a
structural description may be useful for non-visual reasoning. Conversely, non-visual
information about objects or parts (i.e., their uses, typical situations where they are
found) may be used to aid segmentation and recognition in a top-down manner.

It is, indeed, generally agreed that a parts-based approach seems more natural
for realistic, complicated scenes. One of the main problems with this representation,
however, is the lack of a ﬁrm basis on which to decide a set of primitives to be used,
and to justify why they are necessary, sufﬁcient, or appropriate. This is even more true
for the predicates (properties) than for the part primitives. The very generality of this
representation lends it a certain vagueness. Another crucial issue is the accessibility
of all the primitives (in the sense mentioned by Mart and Nishihara [62] ) from the
input image. Problems of uniqueness of the description, and of part segmentation

also naturally arise.

23
2.4 Examples Of Ob ject-intrinsic Representations

Among well-known examples of object-intrinsic representations, Templates and
Fourier representations are traditional 2-D shape representations described in cog-
nitive psychology textbooks [73]. Feature-based approaches are more commonly used
for 2-D than for 3-D object representation. Aspect graphs are a relatively new concept

and have attracted considerable interest over the last few years.

2.4.1 Feature-based Approaches

An object is represented by a list or vector of numerical .and symbolic values, the
features [73, 86]. Typical numeric features might be the object’s perimeter, moments
of area, or the number of holes or right-angled corners in it. One can also use symbolic-
valued features like the color of the object, or test for the presence of speciﬁc “salient”
features (say, a 60° corner formed by a circular arc and a straight line segment). The
representation is simply an aggregate of such features. There is no attempt to describe
the spatial relationships (e.g., relative locations) between different geometric features.

Given the feature representation of an object, a variety of standard statistical
pattern recognition methods [23, 35] are available to classify it into a known category.
For instance, one might use discriminant functions, a k-nearest-neighbor (knn) classi-
ﬁer, or a hierarchical (tree) classiﬁer. Discriminant functions deﬁne decision surfaces,
bounding different regions in the feature space. A knn classiﬁer makes its decision
based on the categories of the unknown object’s k nearest neighbors in the feature
space. The object is assumed to belong to the same category as the majority of its
lc nearest neighbors. With an inﬁnite number of training samples, it is known that
the knn rule approaches the optimal Bayes decision rule as the number k Of nearest
neighbors to be considered tends to inﬁnity. A hierarchical classiﬁer is basically a

decision tree. It may be able to make decisions and arrive at a classiﬁcation, based

24

on just a subset of the features. This could be particularly useful when some of the
features are difﬁcult or impossible to compute (e.g., due to occlusion).

The advantages of a feature representation are its extreme compactness as com-
pared to other representations, and the existence of the large body of statistical pat-
tern recognition theory which can be used to perform'the classiﬁcation and estimate
the classiﬁcation error rate. Further, the classiﬁer can be “trained” from examples.
It may even be possible to use clustering techniques and discover distinct categories
of Objects which are naturally separated in the feature space [52]. A “reject” option
can easily be provided to take care of ambiguous cases.

The disadvantages of this representation are, ﬁrstly, that the object cannot be
recreated or modeled accurately from the features. Secondly, it is difﬁcult to estab-
lish a suitable set of features for general 3-D objects. Thirdly, it may be difﬁcult
to compute the feature representation of an object from a single 3—D view, or when
multiple objects with occlusion are present. (A Viewsphere or property sphere ap-
proach can extend the applicability of feature representations to 3—D objects. A large
number (e.g., 200 or more) of covering viewpoints is selected, and the feature vector
is computed for the projection onto 2-D from each viewpoint [24, 54]). Fourthly,
an object might be confused with a totally different object which has the same fea-
tures, but arranged differently—e.g., an unsymmetrical object and its mirror image.
Lastly, a feature representation achieves compactness at the cost of discarding a lot

of potentially useful information about the object.

2.4.2 Aspect Graphs

Aspect graphs are an example of the multiple-view approach to 3-D object represen-
tation. The aspect graph concept was ﬁrst proposed by Koenderink and van Doorn
[53] as a possible mechanism involved in human vision. They described a graph struc-

ture called the “visual potential” of an object. Each node of this graph represents a

25

\

0

T\ \W
éE/W
A//®

Figure 2.1. Aspect graph of a Tetrahedron (based on [53])

 

4—0—

 

different aspect of the object. An aspect is a “stable view” of the object as seen from
some maximal connected region or cell of the viewpoint space. (It could be thought of
as a speciﬁc collection of faces and edges visible from any viewpoint within that cell).
Nodes are connected by arcs indicating “visual events” where the observer transits
from one such cell to another and the aspect of the object changes. “Thus the visual
potential represents in a concise way any visual information an observer can obtain
by looking at the object when traversing any orbit through space” [53], (pp. 214).
For example, Figure 2.1 shows the aspect graph of a tetrahedron.

Callahan and Weiss [15] have proposed an object representation called the viewing
data or viewing graph, which is closely related to the aspect graph. The viewing data
of an object is the partition of the viewing sphere (corresponding to different aspects),
together with the view from each region, edge, and vertex of the partitioned sphere.

The aspect graph of an object could very well, for example, be combined with

an exact 3-D model of the object, thus getting the completeness of a viewpoint-

26

independent representation as well as the simplicity of a viewpoint-dependent one.
It seems, therefore, that aspect graphs are potentially very useful for 3—D object
recognition. A major problem is that complicated objects can have aspect graphs
with millions of nodes. Appropriate heuristics have to be established to arrive at
reasonably “stable” aspect graphs of manageable size. Following the original paper
by Koenderink and van Doorn [53], aspect graphs have in fact been the subject of
very active research [12, 15, 16, 27, 29, 37, 38, 44, 51, 54, 55, 74, 82, 83, 84, 93].

Computing the aspect graph for an arbitrary object is itself an unsolved problem.
Much work has been devoted to solving this problem for progressively more general
classes of objects, starting with convex polyhedra. These efforts can be classiﬁed
according to the way they partition the viewpoint space.

A basic distinction in computing an aspect graph is whether an orthographic or
perspective projection is assumed. If orthographic projection is assumed, then the
“cell” of viewing space associated with a node of the aspect graph corresponds to a
surface area on a unit sphere [27, 37, 38, 44, 51, 54, 55, 74, 82]. The advantage is that
an approximate aspect graph can be computed by uniformly tessellating the viewing
sphere and then merging facets which see the same aspect. However, this orthographic
projection aspect graph only captures changes in the aspect due to changes in the
viewing orientation. Relatively less progress has been made in computing the more
general perspective projection aspect graph [26, 74, 83, 84, 93], in which cells correspond
to volumes of the viewing space. It should be noted that some of these algorithms
for ﬁnding aspect graphs have never been implemented because of computational
difﬁculties.

Gigus et a1. [37, 38] have developed an algorithm to ﬁnd the so-called “viewing
data” (closely related to the orthographic projection aspect graph) for line drawings
of arbitrary polyhedral Objects. They also provide a full catalog of the visual events

that can occur for such objects. Eggert and Bowyer [27] and Kriegman and Ponce [55]

27

have independently developed algorithms to ﬁnd the orthographic projection aspect
graph for solids of revolution.

Rieger [78] cataloged some of the visual events which can occur for an object con-
sisting of smooth surfaces which meet in smooth curves (i.e., piecewise smooth, curved
objects). Sripradisvarakul and Jain [82] have extended this catalog and presented an
algorithm to ﬁnd the aspect graph under orthographic projection for arbitrary piece-
wise smooth, curved objects.

Stewman and Bowyer [84], Watts [93], and Edelsbrunner et al. [26] independently
developed methods to compute the perspective projection aspect graph for convex
polyhedral objects. Stewman and Bowyer [83] further extended this for arbitrary
polyhedra.

A number of researchers have used methods based on or similar to aspect graphs to
build object recognition systems. Perhaps the earliest such work is that of Dudani et
a1. [25], who store rotation- and scale-invariant moment features from several different
vieWpoints to perform identiﬁcation of aircraft types. Chakravarty and Freeman
[16] use characteristic views to represent objects. They use heuristic constraints
(on the orientation of the Object with respect to the camera) to select a subset of
views as the representation. The aspects and the partition of viewing space are done
manually. Ikeuchi [51] also uses aspect graphs, with aspects being deﬁned by the
faces detectable by photometric stereo. Hebert and Kanade [44] use the aspect graph
approach for recognizing polyhedral objects in range images. Aspects are deﬁned by
the set of occluding edges in the image. Both Ikeuchi [51] and Hebert and Kanade
[44] approximate the exact partition of the viewing sphere by performing a uniform
tessellation and then merging neighboring regions which view the same aspect. Fekete
and Davis [29] propose the use of “property spheres” which encode “geometrical and
topological properties of orthographic images as a function of vieWpoint”. Viewpoints

are Obtained by approximating a spherical surface by planar patches and using the

28

center ofeach patch as a viewpoint. Korn and Dyer [54] tessellate the unit sphere in
a manner similar to Fekete and Davis. Facets with a similar view are then merged
into larger regions. Bowyer et al. [12] have also implemented an object recognition
system for convex polyhedral objects. According to them, aspect graphs can be useful
in generating initial estimates and avoiding local minima of rotation and translation

parameters in systems which use non-linear optimization as a strategy for recognition.

2.5 Examples Of Parts-based Representations

In this section we discuss the best-known schemes for parts-based representation or
recognition. Much vision research has been devoted to obtaining 2.5 D information,
i.e., local surface characteristics (shape from shading, texture, stereo, etc.) in the
form of image arrays. However, reliable and general procedures to compute such
image arrays from the local image information are not available. In fact, according to
Pentland, “these recovery techniques have been sufﬁciently unimpressive that many
researchers now believe that it is simply not possible to compute accurate, object-
centered representations from arrays of locally derived estimates Of surface properties,
as was originally envisioned” ([69], pg. 143). Pentland also points out that even
if depth maps and other maps of intrinsic properties could be reliably and densely
computed, it is still merely an array of numbers which by itself sufﬁces only for simple
applications like obstacle avoidance. It must be segmented and interpreted if it is to
be used for more sophisticated tasks ( [69], pg. 143). This motivates the idea that
the initial task of perception is not to produce local descriptions of images, surfaces
or volumes, but rather to recognize instances of simple, generic real-world modeling
primitives.

If we accept this view, it remains to discover such a set of generic primitives,

describe how they combine to form objects, recognize these primitives and their com-

29

binations from the image, and use this description to interpret or recognize the image
content. Hoffman and Richards propose that the visual system segments objects into
constituent parts by detecting part boundaries rather than the parts themselves [48].
This decomposition depends on a regularity of nature called transversality, i.e., the
observation that when two arbitrarily shaped surfaces interpenetrate, they usually
meet in a contour of concave discontinuity of their tangent planes. In other words,
part boundaries are marked by a concave crease at the joint. Hoffman and Richards
argue that people use this part decomposition rule, along with spatial relations and
part descriptions, to perform the ﬁrst indexing into the database of known objects.
Generalized cylinders or cones, geons, and superquadrics are the best known mod-
eling primitives for 3-D shape description or recognition. They are discussed in the

following subsections.

2.5.1 Generalized Cylinders or Generalized Cones

Generalized cones were introduced by Binford [9] and have been used in the
ACRONYM system [14] for recognizing airplanes from aerial photographs. General-
ized cylinders were proposed by Marr and Nishihara [62] as the modeling primitives
in their theory of Object-centered hierarchical representation and recognition. We will

discuss both these methods in subsequent sections.

Marr and N ishihara’s Theory of Recognition

Marr and Nishihara’s theory of 3-D object representation, together with Marr’s ideas
about the stages by which such a representation is computed, is deﬁnitely among the
most inﬂuential body of ideas in vision research [61, 62]. Marr and Nishihara point
out that a representation does not have to reproduce a shape’s surface in order to
describe it adequately for recognition. For instance, animal shapes can be portrayed

quite effectively by “stick ﬁgures” (ﬁgures which Show the arrangement and relative

30

sizes of a small number of sticks).

The simplicity of this stick ﬁgure representation is due to the correspondence be-
tween the sticks and the natural or “canonical” axes of the shapes described. Mart
and Nishihara propose an object-centered co-ordinate system together with the use
of volumetric primitives (generalized cylinders) of various sizes, and a modular, hier-
archical representation.

The notion of stick-ﬁgure-like representation is not original to Matt and Nishihara.
For instance, Blum [11] studied a similar classiﬁcation scheme for 2-D silhouettes
based on a brushﬁre technique, while Binford [9] proposed the use of “generalized
cones” for 3-D objects. The difference is in the modular organization of information in
Mart and Nishihara’s scheme. In Blum’s or Binford’s scheme, for example, each part
of a human arm would be represented by a single primitive. In Marr and Nishihara’s
scheme, it is possible to have both a single stick corresponding to the whole arm, as
well as three smaller sticks corresponding to the upper arm, forearm, and hand. The
hand, in turn, can be represented at a ﬁner level Of detail with individual sticks for
each of the ﬁngers.

This overall representation method is reasonably effective for some classes of ob-
jects, i.e., those which have a canonical co-ordinate system (since the description is
object-centered) and a natural decomposition into component axes. However, Mart
and Nishihara did not develop a generally applicable way of deriving this 3-D model
description from an image. They used points of deep concavity on an object contour
as segmentation points. Heuristics were used to connect them with other segmenta-
tion points, giving the decomposition into parts.

The overall recognition process functions by ﬁrst selecting a model from the cata-
logue, based on the distribution of components along the length of the principal axis.
This model then provides relative orientation constraints that help to determine the

absolute orientations (relative to the viewer) of the component axes in the image.

31

This can be used to compute their relative lengths. This new information can then

be used to disambiguate shapes at the next level Of speciﬁcity.

Generalized Cones and the ACRONYM System

Generalized cones [9] describe 3-D volumes. A generalized cone is speciﬁed by a curve
through space, called the spine, along which a 2-D shape, called the cross-section, is
swept. The cross-section is kept at a constant angle to the tangent of the spine, and
may be deformed according to a deformation function called the sweeping rule.

The particular domain of the ACRONYM system [14] is aerial photographs of
airplanes. In ACRONYM, the generalized cones are restricted to having straight line
segments or circular arcs as spines, cross-sections bounded by straight lines or circular
arcs, and sweeping rules which are linear magniﬁcation functions about the spine, or
linear in two orthogonal directions about the spine. The input to ACRONYM consists
of aerial photographs with a number of airplanes in view.

Given certain classes of geometric models (of airplanes), the goal is to identify
their instances in the image, along with their location and orientation. ACRONYM
muSt also make any possible subclass identiﬁcations, determine 3-D parameters of
the Objects, and determine the location and orientation of the camera, if not known
a priori.

A description of the image is extracted in terms of ribbons and ellipses. Ellipses
are described by the lengths Of their major and minor axes. Ribbons (2-D analog
of generalized cones) are suitable for describing images generated by the generalized
cones themselves, while ellipses are suitable for describing the shapes generated by
the ends of generalized cones.

Even with noisy data, ACRONYM produces fairly accurate interpretations in
terms of 3—D models, by the combined use of geometric and algebraic constraints.

However, the top-down approach of ACRONYM, which predicts the projected ap-

32

pearance in the image of the parts of a speciﬁc model using complex quantitative
constraints, make it unsuitable for general 3-D object recognition with a large num-

ber of models which may or may not be present in a scene.

2.5.2 “Geons” and Recognition By Components

The fundamental assumption in Biederman’s theory of “Recognition By Components”
(RBC) [8] is that there exists a small number (i.e., 36) of fundamental part primi-
tives, whose combinations can represent more complicated objects for the purposes
of “primal access”.

An analogy is drawn with speech perception, in which only about 55 phonemes are
needed to represent virtually all the millions of words in all of human speech. Further,
Biederman proposes that these 36 basic “geometric ions” or geons can be identiﬁed
by using non-accidental properties of edges in an image, i.e., curvature, collinear-
ity, symmetry, parallelism, and co-termination. Recognition occurs by detecting the
identity and spatial arrangement of an object’s component geons.

Like Marr and Nishihara’s theory [62], Biederman’s scheme also does not do de-
tailed geometrical modeling of object surfaces. Biederman’s scheme avoids the prob-
lem of computing 2.5 D information by working directly from 2-D edges. The‘ 3-D
representation is obtained from easily observable non-accidental features in the line
drawing. Thus, Biederman’s theory provides an elegant link between perceptual
organization (recognition of “low-level” structure) and 3-D object representation (de-
scription in terms of “high-level” structure). Detection of low-level structure helps to
identify the geons, and geons with their combination identify the object.

RBC is a theory of so-called “primal access” in humans, i.e., “the ﬁrst contact
of a perceptual input from an isolated, unanticipated object to a representation in
memory” ( [8], pg. 178). Typically, humans recognize the object quickly, even when it

is viewed from novel orientations, with moderate levels of visual noise, when partially

33

occluded, or when it is a new exemplar of a category. This requires a quick procedure
which does not need very precise quantitative computations. RBC assumes that an
object is recognized through the following subprocesses: ﬁrst an edge extraction stage
provides a line drawing of the object. From this, non-accidental properties of edges
are detected. The object is parsed into parts, primarily at concave regions. The non-
accidental properties are used to constrain the identity of the components. (Stages
up to and including the identiﬁcation of components are assumed to be bottom-
up). Finally, the arrangement of components is matched against representations in
memory.

The primitive components or geons hypothesized in the RBC theory are simple
generalized cones without sharp concavities. They are classiﬁed into 36 basic types
on the basis of 4 attributes (Figure 2.2). Three of the attributes describe character-
istics of the cross section: its shape (straight or curved), symmetry (asymmetrical,
reﬂection, or reﬂection and rotation), and size as it is swept along the axis (constant,
increasing, or increasing and decreasing). The fourth attribute describes the shape of
the axis (straight or curved).

The values of these four generalized cone attributes can be directly detected as
differences in non-accidental properties, e.g., collinearity, curvilinearity, symmetry,
parallelism, and co-termination (Figure 2.3). Further, these non-accidental properties
are rapidly identiﬁable from a line drawing, and are quite invariant to viewpoint and
noise.

A set of geon relations is also proposed. The relations between any pair of geons
includes verticality, i.e., whether the ﬁrst geon is above, below, or to the side of the
second one; relative size, i.e., whether the ﬁrst geon is much larger than, much smaller
than, or approximately the same size as the second; centering, i.e., whether the point
of attachment on a geon’s surface is centered or off-centered; and surface size at join,

i.e., whether a geon is joined at a large surface or a small one. Altogether, there are

34

""
v - Expl::;Conuact
W\ Reﬂection SIZE “T,"

SYMMETRY

Constant
ReﬂectA

  

Rotation

 

Figure 2.2. Four attributes yield 36 different geons (based on [8])

BRICK CYLINDER

A’" ZingentY

/4. ........ . I ”m
or _~ ‘
\’7 ’ ‘t‘\ v ‘\
I A \
I I. “ \\

 

 

 

 

 

  

L \
l I' ‘
I l ’1’ g ,’
" :1, ’ ’ Curved edges
" 2 parallel edges
3 parallel edges

Inner Y vertex

Figure 2.3. N on-accidental properties indicate geon type (based on [8])

35

almost 60 possible combinations of relations that can hold for a pair of geons.

Biederman argues that 36 geons apparently have sufﬁcient representational power
to express humans’ capacity for basic visual categorization. Figure 2.4 shows some
multi-part 3-D objects which can supposedly be represented using geons. Note that
although the decomposition of some objects into geons seems obvious (e.g., watering-
can, table-lamp, ﬂashlight, and knife), others are not very obvious (e.g., telephone
handset, head of the elephant). Biederman states that there are less than 3,000
common basic-level categories such as “chairs” and “elephants” in a typical human
language (e.g., English). Liberally assuming an average number of 10 distinguishable
exemplars for each category, this gives an estimate of 30,000 readily discriminable
objects. By contrast, the combination of 2 geons can give over 70,000 possible objects
(36 x 36 possible geon combinations, multiplied by about 60 geon relations for each
combination). Similarly, 3 geons can be combined in over 150 million ways.

Biederman’s experiments indicated that line drawings of objects with only 2 or 3 of
their components could be accurately identiﬁed from a single 100—ms exposure. These
simple line drawings were identiﬁed almost as rapidly as full-color, detailed, textured
slides showing all the components. It appears that neither the full complement of an
object’s geons, nor its color, texture, or full bounding contour need be present for
rapid identiﬁcation.

Bergevin and Levine [4, 3] have built a computer vision system based on geons,
called PARVO (Primal Access Recognition of Visual Objects). PARVO introduces a
new intermediate representation not present in the original RBC theory, i.e., faces of
parts. First, the parts are identiﬁed as belonging to one of 11 generalized solid types,
based on the ﬁrst three attributes of geons (curved or straight edges, curved or straight
axis, and constant, increasing, or increasing-and-decreasing sweeping function). The
symmetry attribute is inferred later, giving the complete identity of the geon.

The system was tested with a variety of complete as well as imperfect line drawings

36

 

 

 

 

 

Figure 2.4. Complex objects represented using geons (taken from [1])

37

of common man-made objects. PA RVO was usually able to segment the line drawings
into component parts, identify each part as a geon, compute a coarse spatial structural
description of the object, and match it to coarse models. Disadvantages include the
use of perfect line drawings and the assumption that pairs of matched segmentation
points (i.e., points of concave discontinuity) can always be found.

Recently, Dickinson et a1. [21, 20, 22] have built a system called OPTICA (Object
recognition using Probabilistic Three-dimensional Interpretation of Component As-
pects). OPTICA computes a 3-D object representation based on 10 geon-like parts.
The most signiﬁcant contribution of Dickinson et a1. is the idea of combining the
“parts-based” and “viewpoint-dependent” ideas. They put forward the excellent idea
of using a multi-view representation only for the limited number of 3-D part primitives
rather than for entire complex 3-D Objects composed of these parts.

Storing characteristic views of only the part primitives has two advantages. Firstly,
the number of views is much smaller. Secondly, even with a different set of part
primitives, it may be possible to use the same scheme for recognition by using the
multi-view representation of the new set of primitives. Constructing the multi-view
representation of the chosen generic part primitives is an off-line process that has to
be performed only once.

OPTICA has been tested with perfect line drawings of 3-D objects. Segmentation
into parts is performed manually. Methods have been outlined for computing the line
drawing from an input intensity image and for performing segmentation, as well as for

recovering from segmentation errors. However, these are not completely implemented.

2.5.3 Superquadrics

Superquadrics are an extension of basic quadric surfaces. They are a family of para-
metric shapes discovered by the Danish designer Piet Hein [36]. Superquadrics have

been used or proposed for use as shape representation primitives for computer graph-

38

ics [2], and recently, for computer vision [71]. Depending on the parameter values,
superquadrics can take different shapes (e.g., ellipsoids, cylinders, parallelepipeds,
pyramids, cones, as well as round-edged shapes intermediate between these standard
shapes). Thus, they are versatile part primitives, and can be deformed and “glued”
together into realistic-looking models as in Pentland’s Supersketch graphics system
[71].

Superquadrics are described by the following equation (using the notation,

cos(n) = Cn,sin(w) = 5w):

a,,,-C,‘,l -C:,’
ﬂaw): a,.C,;1-S;2 , —§SnS%, -7r$w<1r
b az.5:’l d

 

 

where X(n,w) is a 3-D vector that sweeps out a surface parametrized in latitude 17
and longitude to in an object-centered co-ordinate system. The parameters a,” a", and
a, affect the size of the superquadric along the x, y, and z axes, respectively. The
relative shape parameters £1 and £2 affect the relative shape of the superquadric in
the latitudinal (xz) and longitudinal (xy) directions, respectively. When all these ﬁve
parameters are unity, the shape of the superquadric is the unit sphere. When a relative
shape parameter (6.1 or 62) is equal to l, the cross-section along the corresponding
(i.e., latitudinal or longitudinal) direction is an ellipse. As 6 approaches 0, the shape
becomes progressively more rectangular; as e approaches 2, the cross-section changes
from an ellipse to a diamond shape. When 6 is greater than 2, the shape becomes
“pinched”, approaching a cross as e approaches inﬁnity. The effect of e on shape is
shown in Figure 2.5, in which one of the 6 parameters is held constant at 1.0, while
the other is varied between 0.0 and 3.0.

After Pentland’s paper [71] which ﬁrst proposed the use of superquadrics as part

representation primitives for computer vision, most of the subsequent literature on

39

 

3

=0

6

= 0.0

6

 

6:3.0

2.0

Figure 2.5. Effect of the parameter c on superquadric shape

40

superquadrics has dealt with estimation methods and error-of-ﬁt measures for recov-
ering the parameters from real or synthetic range data [1, 41, 42, 72, 81]. Issues of
segmentation are avoided and range data is taken from the isolated, unoccluded su-
perquadric whose parameters are to be estimated. Only Ferrie et al. [30] report some
results from range image segmentation and the actual use of superquadrics as part
primitives for complex multi-part Objects.

Besides the three extent parameters (a,, a”, a.) and two relative shape parameters
(61,62), an additional three parameters are needed to represent displacement of the
superquadric from the origin, and another three more for its orientation. Hence, a
total of 11 parameters have to be estimated for a superquadric at arbitrary location
and orientation.

Pentland [71] originally suggested an analytic solution using the parametric equa-
tions of the position vector and normal vector at each point on the surface. Using
linear regression, and starting with information from 2-D contours and shading, a
best-ﬁt estimate was to be computed. This approach turned out to be too compli-
cated and was not fully implemented. Pentland’s second approach [72] was based on
search through the entire superquadric parameter space. For each 3-D range image
point, the parameter space is searched for the best ﬁt to the immediately surrounding
area. This method is too expensive computationally, unless perhaps implemented on
a parallel architecture.

Bajcsy and Solina [1] suggested a faster (iterative) solution technique, i.e., mini-
mizing an error-of-ﬁt function by the Levenberg-Marquardt method [75] for non-linear
least squares. Their error measure favors the superquadric of smallest possible vol-
ume that closely ﬁts the data points. Poisson “jitter” is introduced to escape local
minima in the space of 11 parameters. Gross and Boult [41] compare the effective-
ness of different error-of-ﬁt measures for Bajcsy and Solina’s technique. According to

their experiments (with superquadrics centered at the origin and aligned along the

41

co-ordinate axes), an error measure based on Euclidean distance between each data
point and the surface of the estimated superquadric performs better than the one
proposed by Bajcsy and Solina. In particular, use of the Euclidean distance error
measure leads to faster convergence and yields better estimates from sparse range
data. It also gives better estimates when only partial range data are available (e.g.,
from only 1, 2, or 4 octants instead of all 8). However, Ferrie et al. [30] report that
while Gross and Boult’s Euclidean distance error measure generally ﬁt the range data
well, the volume of the estimated superquadric tended to become very large, espe-
cially when the surface patch was ﬂat. According to them, the minimum-volume error
measure motivated by Bajcsy and Solina [1] produced “more intuitive” results.
Solina and Bajcsy [81] have proposed a formalism for expressing deformations of
superquadrics (speciﬁc kinds of tapering, bending, and cavities). They are able to
estimate all the parameters by incorporating appropriate extensions in their original
technique. Gupta, Bogoni and Bajcsy [42] argue that both qualitative and quantita-
tive measures should be used to evaluate the ﬁt of an estimated superquadric. They
propose two global, quantitative measures (Bajcsy and Solina’s original goodness-of-ﬁt
measure [1], and a mean-distance criterion derived from Gross and Boult’s Euclidean
distance measure [41]). They also introduce three local, qualitative evaluation criteria.
The Min-distance map is produced by mapping the magnitude of the shortest Eu-
clidean distance of individual points from the model surface. The Contour-diference
map compares the apparent contour formed by the model in the viewing direction,
with the object’s occluding contour. Lastly, the Z-distance map shows the distance
between the points in the range image and the model surface in the viewing direction.
These three measures complement the earlier two, and, in conjunction with them, can

suggest improvements in object segmentation or model deformation.

42

2.5.4 Other Part Primitives

Yet another part representation for man-made objects (e.g., chairs), using sticks (long,
thin objects), plates (ﬂat, wide objects), and blobs (objects with 3 signiﬁcant dimen-
sions) has been proposed by Shapiro et al. [80].

Terzopoulos et al. [90] have proposed a 3-D part model similar to generalized
cylinders, with additional deformation parameters to control the elasticity of the
main axis and walls of the cylinder. With human interaction to set constraints on
parameters, they are able to recover the models from 2-D silhouettes. Later work
by Terzopoulos and Metaxas [89] proposes a representation which combines the local
degrees of freedom of splines (which makes them useful for surface reconstruction)
with the global shape parameters of a superquadric (which makes them useful for
shape recognition). Results of ﬁtting are provided for both intensity and range data.
Initial segmentation into parts is done manually by the user.

Pentland and Sclaroff [68] propose a very interesting 3-D deformable model rep-
resentation based on the Finite Element Method (FEM) and the FEM technique of
Modal Analysis. The model consists of a large number of ﬁnite‘elements, with prop-
erties such as mass, stiffness, and damping which govern their dynamic behavior
when the model is deformed. The input data are treated as constraints forcing the
deformable model to assume a shape that closely ﬁts the data. The complex deforma-
tion undergone by the model is decomposed into the superposition of various simple
deformations corresponding to the free vibration modes of the FEM equilibrium equa-
tion. This is similar to a Fourier representation where a complicated waveform can
be. represented by the combination of independent simple (sinusoidal) waveforms of
different frequencies.

The vector of coefﬁcients corresponding to the lowest order vibration modes (say

the 30 lowest order vibration modes) are used to represent the ﬁtted shape. Two

43

shapes can be matched simply by ﬁnding the normalized dot product of the corre-
sponding modal representations. The use of this representation for tracking motion

is also described by Pentland and Horowitz [67].

2.6 Summary

We presented a brief survey of 3-D object representation schemes. These were clas-
siﬁed as either vieWpOint—dependent or vieWpoint-independent, and either object—
intrinsic or parts-based. A parts-based approach has the advantages of ﬂexibility and
modularity. Selection of an appropriate set of part primitives, and part segmentation
(decomposition of an image corresponding to the separate parts) are the crucial issues
in a parts-based approach.

We are primarily interested in somewhat “meaningful” part primitives, i.e., volu-
metric models with a considerable degree of capability for shape discrimination. As
mentioned earlier, this choice simpliﬁes later steps in the recognition process (repre-
sentation of a 3-D Object is more compact and “meaningful,” fewer hypotheses need
be generated for matching, and veriﬁcation is much less essential). The price to be
paid is that segmenting and identifying these parts from an image is more difﬁcult.

For our 3-D object representation scheme, we intend to use a small set of 3-D
volumetric parts based on geons. The choice of these primitives, as well as a method
for part segmentation (i.e., decomposition into separate parts), are discussed in the

following chapter.

CHAPTER 3

Parts and Part Segmentation

This chapter is organized as follows: ﬁrst, we explain our choice of parts for the 3-D
parts-based object representation. Next, we put forward a novel method of specifying
“generic” 3-D parts, i.e., by using Surface Adjacency Graphs (SAGs). Then we derive
an extremely compact multi-view representation of the part primitives, based on
their SAGs. Finally, we present a method of performing part segmentation from
range images, given a good surface segmentation. Some of the work described here

has been presented earlier in [76].

3.1 Choice Of Part Primitives

Many of the 3-D object recognition systems described in the literature share some
common problems: ﬁrstly, they cannot cope with a large number of known objects,
say even 100 models. This problem is particularly acute when models are not stored
hierarchically, but are put all together in a “ﬂat” knowledge base. This makes efﬁcient
indexing much more difﬁcult, which in turn makes the recognition strategies imprac-
tical when the number of models is large. Secondly, most of them use features or
modeling primitives which are too “weak,” such as line segments, angles, and surface

patches. This means that every object yields a large number of such features or prim-

44

45

itives, making for an unconvincing representation scheme which leads to too many
hypotheses and too much “branchiness” of the search tree if the matching scheme is
based on search. Chapter 1 discussed this point in considerable detail.

In order to resolve these issues, a good object representation scheme should have

the following features:

0 It should use more powerful modeling primitives, yielding a compact and “mean-
ingful” representation scheme. Speciﬁcally, we would like to use a 3-D parts-

based representation.

0 The representation scheme should lend itself naturally to efﬁcient indexing when
matching object and model descriptions, so that a model-based recognition

system built on top of it would be able to cope with a large number of models.

0 It should facilitate efﬁcient segmentation/matching routines, by using
vieWpoint-dependent representations. But the number of such views should
be small and the choice of views should be “natural” and justiﬁable at least to

some extent.

We are interested in powerful part primitives, i.e., volumetric models with a consid-
erable degree of capability for shape discrimination. As we saw, this choice simpliﬁes
later steps in the recognition process. It makes the representation of a. 3-D object
more compact, fewer hypotheses are generated for matching, and veriﬁcation is much
less essential.

Many 3-D parts-based representations have been proposed in the object recogni-
tion literature. The most inﬂuential 3-D part primitives are generalized cylinders,
superquadrics, and geons. Geons, since they are capable of satisfying the criteria
listed above, and due to the evidence (from visual perception literature) for their
plausibility as a shape indexing mechanism, are an attractive choice as 3-D part

models.

46

Of the four geometrical properties that characterize the 36 different geons, some
are more useful than others as important shape attributes. Straightness or curvedness
of the axis, as well as nature Of the “sweeping function” of the cross-section along
the axis, seem to be important attributes. Straightness or curvedness of the cross-
sectional edge is an attribute which can be difﬁcult’to extract from images: ﬁrstly,
there may be objects which do not show distinct “faces” where the cross-sectional
edges are easily observable (e.g., a cucumber or a rectangular block with smoothened
edges). Secondly, this distinction becomes blurred in the case of, say, an octagonal
prism approximating a cylinder.

Cross-section symmetry is somewhat problematic. It is harder to compute, as well
as less “Obviously important” than the others as a shape attribute useful for object
recognition. It is worth noting that much other RBC-inspired research also ignores
cross-section symmetry, and uses a smaller set of parts either as the complete set of
primitives, [20, 64] or as an intermediate step towards computing the entire set of 36
geons [4].

Ignoring the symmetry distinctions leads to a reduced set of only 12 geon-like
3-D parts. We select these 12 geon-like parts as the 3-D modeling primitives for 3-D

Object representation.

3.1.1 Notation

We will use the following notation to specify the 12 geon—like 3—D parts: axis shape
is either bent (b) or straight (s); cross-section edges are either curved (c) or straight
(s); cross-section size is either constant (co), increasing-&-decreasing (id), or tapered
(t). Any of the 12 parts can therefore be speciﬁed by a triplet for axis shape, cross-
section edge, and cross-section size. For example, b-c-co indicates a part with bent
axis, curved cross-section edges, and constant cross-section. s-s-id indicates a part

with straight axis, straight cross-section edges, and increasing-decreasing cross-section

47

Table 3.1. Notation for specifying the 12 parts

 

 

 

 

 

 

 

Notation Axis Cross-sectional edges C.S. size along axis
b—c-co bent curved constant
b-c-id bent curved incr.-&-decr.
b-c-t bent curved tapering
b-s-co [l bent straight constant
b-s-id bent straight incr.-&-decr.
b-s-t bent straight tapering
s—c-co straight curved constant
s-c-id straight curved incr.-&-decr.
s-c-t straight curved tapering
s-s-co straight straight constant
s-s-id I straight straight incr.-&-decr.
s-s-t straight straight tapering
s-s-co b-s-co

Figure 3.1. Parts with straight edges and constant cross-section

size.

 

Table 3.1 lists the 12 parts using this notation, and the parts themselves are shown

in Figures 3.1 through 3.6.

3.2 Surface Adjacency Graphs

Previous work on geons is based on line drawings, i.e., it uses edges derived from
intensity images as the input. Consequently, methods for identifying the geons have
also used perceptual properties of edges in the line drawing.

We pointed out earlier that ours is the ﬁrst attempt to recover a speciﬁc, reasonably

48

b-s-t

Figure 3.2. Parts with straight edges and increasing cross-section

 

s-s-id

Figure 3.3. Parts with straight edges and incr.-and-decr. cross-section

’@

s-c-co b-c-co

Figure 3.4. Parts with curved edges and constant cross-section

s-c-t b'C't

Figure 3.5. Parts with curved edges and increasing cross-section

49

s-c-id b'C'id

Figure 3.6. Parts with curved edges and incr.-and-decr. cross-section

rich set of geon-like 3-D parts from range data. Generally speaking, when working
with range data it is natural to deal with surfaces and their properties rather than
with edges. Hence, we would like to specify the part primitives by using their surface
properties. Also, this has to be done in a way that captures the “generic” nature of
the parts, i.e, the fact that they are deﬁned by qualitative shape properties and not
by speciﬁc numerical shape measures.

We use the Surface Adjacency Graph (SAG) as a fundamental representation
throughout this chapter. This representation has many advantages. It can be used
to precisely deﬁne the generic parts in terms of their surfaces. Next, it yields a multi-
view representation of the parts in a very natural manner. Given a good method for
surface segmentation, the SAG can be easily obtained from range data. The same
representation can be used for part segmentation by comparing the SAG of the scene
with our multi-view representation of parts. Gupta [43] has independently proposed
the use of SAGs, and their use for segmenting a data set by breaking concave edges

of the SAG. Our segmentation method is more general.

50

3.2.1 Surface Classiﬁcation by Curvatures

Given a surface S, at each point P on the surface there is a tangent direction along
which the surface curves most, as well as a tangent direction along which the surface
curves least. Except in degenerate cases, these directions are orthogonal. They
are called principal directions; the corresponding maximum curvature and minimum
curvature are together referred to as the principal curvatures. For example, on a
cylindrical surface, the curvature is zero along the tangent direction parallel to the
axis of the cylinder; this is the minimum curvature. The curvature is greatest (in
absolute value) along the surface tangent direction perpendicular to the axis of the
cylinder; this is the maximum curvature. Planes and spheres are special cases because
the curvature on a planar or spherical surface is independent of the direction of travel
along the surface. Curvature of a plane is zero in all directions. For a sphere, the
curvature is the inverse of the radius and is also constant in all directions.

The actual sign of a curvature depends on the way the surface curves with respect
to the positive Z axis. For our range images, the positive X axis corresponds to
the direction of increasing column numbers, the positive Y axis is in the direction
of decreasing row numbers, and the positive Z axis points out of the X Y image
plane. Hence, the range image of a convex spherical cap would show a surface curving
downwards with respect to the positive Z axis, and so it would have negative principal
curvatures. However, in our terminology we reverse the signs and refer to a “convex”
curvature as positive. Similarly, the concave “in-side” of a sphere would be referred
to having a negative curvature.

Mean curvature (H) and Gaussian curvature (K) are deﬁned as the average and
the product, respectively, of the principal curvatures. Based on the signs (positive,
zero, or negative) of H and K, we can perform a qualitative classiﬁcation of surfaces

into 8 possible surface types [5]. We use a qualitative classiﬁcation based on the

51

Table 3.2. Curvature-based classiﬁcation into 5 surface types

 

 

Max. curvature Min. curvature
+ +
+ 0
+ _
0 0 .
— 0

 

 

 

 

principal curvatures. Restricting ourselves to a situation where the “inside” surfaces
of part primitives are never visible, some surface types never occur. Based on the
signs of principal curvatures, we can then distinguish between ﬁve kinds of surfaces,
as shown in Table 3.2.

We compute curvature by the method implemented by Flynn [33]. At each point
on a surface, a bicubic surface is ﬁtted to the point and its neighbors. Principal
curvatures and directions are then computed from the coefﬁcients of the ﬁtted bicubic
surface. A threshold of 0.2 is used for qualitative classiﬁcation of principal curvatures
as positive, zero, or negative. A curvature measure for a surface is taken to be positive
if its median value is greater than 0.2. It is taken to be negative if the median value
is less than -0.2. If the median value is between -0.2 and 0.2, the curvature measure
is taken to be zero. (A curvature value of 0.2 is the same as that of a sphere of radius

1/0.2, i.e, 5 inches).

3.2.2 SAGs of Part Primitives

The Surface Adjacency Graph (SAG) is a graph in which the nodes represent surfaces,
while an edge between two nodes indicates that the corresponding surfaces are adja-
cent. Nodes as well as edges may have attributes. Node attributes can represent any
property of the corresponding surface, say its type (planar, convex, concave, etc.) or
area. In our case a node has only one attribute, i.e., the “type” of the corresponding

surface as determined from its principal curvatures. An edge has only two attributes,

52

 

Right angle

oeoooooooeooooeooo Acuteangle

— Zero angle

Figure 3.7. Coding scheme for SAG edge attribute

both concerning the angle between the surfaces corresponding to the two nodes con-
nected by the edge. The ﬁrst attribute describes whether the angle is concave or
convex. The second attribute describes whether the angle is acute, right, obtuse,
or zero. The node attribute (i.e., surface type) as well as edge attributes (i.e., con-
vex or concave adjacency, acute/ right/obtuse/ zero angle) can also be “don’t-care,”
indicating that the attribute may take any of the possible values.

Figures 3.8 through 3.19 show the SAG representations of our 12 geon-like parts.
Figures 3.8 through 3.13 Show SAGS for parts with straight-edged cross-sections, while
Figures 3.14 through 3.19 shows SAGS for parts with curved-edged cross-sections.
Each node is labeled according to the signs of the principal curvatures of the corre-
sponding surface. For our 12 parts, adjacencies between surfaces are always convex,
so the convex/ concave attribute is not shown for any of the edges. The other SAG

edge attribute, i.e., acute/ right /obtuse/ zero angle, is coded as in Figure 3.7.

 

 

 

 

 

 

 

Figure 3.9. SAG for the part b-s-co

54

 

Figure 3.10. SAG for the part s-s-t

 

Figure 3.11. SAG for the part b-s-t

 

 

 

 

 

 

 

Figure 3.13. SAG for the part b-s-id

 

Figure 3.14. SAG for the part s-c-co

56

 

Figure 3.15. SAG for the part b-c-co

m.------..................

Figure 3.16. SAG for the part s-c-t

 

Figure 3.17. SAG for the part b-c-t

Figure 3.18. SAG for the part s-c-id

57

\ I
“ Q ’I

Figure 3.19. SAG for the part b-c-id

3.3 Multi-view Representation Of Parts

We stated our intention to use viewpoint-dependent representations for part segmen-
tation and as an aid towards part identiﬁcation. In the last few years, under the name
of “aspect graphs” or “characteristic views,” there has been a ﬂurry of research activ-
ity on viewpoint-dependent representations. This was brieﬂy reviewed in the previous

chapter. We will now explain the details of our own multi-view representation.

3.3.1 Catalog of Part Views

Recently, the geon-inspired literature has also dealt with the issue of a multi-view,
viewpoint-dependent representation. A viewpoint-dependent representation for gen-
eral 3-D objects would cause two major difﬁculties: ﬁrstly, for a general, complicated
3-D object, no analytical method is known for computing the aspect graph. Sec-
ondly, the number of aspects would be too large (the literature on aspect graphs ﬁnds
hundreds of aspects even for simple objects).

Dickinson et al. [21, 20, 22] overcome these difﬁculties by combining the “parts-

based” and “vieWpoint-dependent” ideas. That is, there is a small number of 3-D part

58

models. Dickinson et al. put forward the idea of using a multi-view representation
only for this limited number of 3—D parts rather than for entire complex 3-D objects
composed of these parts. By using an empirical viewsphere approach and making use
of part symmetries, they obtain 688 views for their entire set Of 10 part primitives.

As mentioned earlier, storing characteristic views of only the part primitives has
two advantages. Firstly, the number of views is much smaller. Secondly, even with
a different set of part primitives, it may be possible to use the same scheme for
recognition by using the multi-view representation of the new set of primitives. Con-
structing the multi-view representation of the chosen generic part primitives is an
off-line process that has to be performed only once.

Following the idea proposed by Dickinson et al. , we will use multi-view represen-
tations of the part primitives, and not of entire multi-part objects. The expectation
is that a multi-view representation of parts will be useful in segmenting objects into
component parts, as well as towards identifying those parts as instances of speciﬁc
part primitives.

What views are “natural” to use for the vieWpOint-dependent representation of
the 3-D part primitives? When we look at simple objects (such as our 12 3-D part
primitives), we can generally distinguish a few large, “meaningful” surfaces oriented
in a particular way with respect to one another. The surfaces usually make sense,
like “the top of the coke can,” “the side of the monitor,” etc. Usually, if we want to
draw the object, we can do so unambiguously by drawing its important surfaces.

We may be able to extract large, “meaningful” surfaces from an image by looking
at perceptually important 2-D features such as long curves, co-terminating lines, and
parallel lines. These might have been obtained from an intensity image. But these
features can also be found from range images. The point is that this multi-view SAG
representation (in terms of large, “meaningful” surfaces) can be easily obtained for

the part primitives. It need only be computed once, and can in fact even be done

59

manually and off-line. On the other hand, for a scene image, the SAG representation
can easily be computed if a good surface segmentation is available.

In our case, the catalog of possible views of parts is based only on surface conﬁg-
urations. We are concerned only with the types of visible surfaces, their adjacencies,
and angles between them. We are not interested in every possible aspect, but only
those in which the surface types, adjacencies, and/or angles differ. Unlike the usual
deﬁnition of an aspect, which takes into account qualitative changes in the vertices,
edges and surface identities, we deﬁne a simpler kind of characteristic view, the sur-
face aspect (s-aspect). The s-aspect of an object is the set of all possible views for
which the surface types, adjacencies, and angle attributes are the same. Our multi-
view representation will contain not the aspects (in the usual sense), but rather all
possible s-aspects of the part primitives. So our multi-view representation of the part
primitives, i.e., our catalog of views, turns out to be much smaller and simpler than
the usual “aspect graph.”

There are a number of ways to construct the catalog of views. Given the complete
aspect graph of a part, we could obtain the multi-view representation by combining all
those aspects in which the number, types, adjacencies, and angles of the surfaces is the
same. A simpler alternative would be to make use of the fact that for our purposes, a
part is completely characterized or deﬁned by its attributed SAG. Hence, all possible
views of a given part can be determined by ﬁnding all possible connected subgraphs
of the part’s attributed SAG, and removing duplicates and impossible views.

This method results in a total of 161 views for all 12 parts. However, many of
these views can never be obtained in reality, e.g., a view of a rectangular block in
which all 6 surfaces are visible.

It is easy to show that any subgraph that includes two oppositely oriented planar
surfaces corresponds to a conﬁguration of surfaces that can never be seen from any

viewpoint. (When we say that a conﬁguration of surfaces can be seen from a point,

60

we mean that every surface in that conﬁguration can be seen from that point).

Surfaces of part primitives are considered to have two sides, i.e., an “inner” side
and an “outer side.” Surface normals on the outer side of a part surface point out
Of the volume occupied by the part. For a point P on a part surface to be visible
from a point Q in space, the outward-pointing surface normal at P must make a
positive projection on the vector from P to Q. Given two “oppositely oriented” planar
surfaces 5'1 and 52 (i.e., planar surfaces whose outward-pointing surface normals are
oppositely oriented), obviously there exists no vector in space such that a pair of
outward-pointing surface normals, one from 5'1 and one from 3;, both have a positive
projection on it. Hence, there is no point Q in space such that from Q, the outer
sides of both 51 and 52 are visible.

We showed that if a subgraph includes two oppositely oriented planar surfaces,
then it corresponds to a conﬁguration that can never be seen from any viewpoint
in space. However, the converse is not true. There are many surface conﬁgurations
without two oppositely oriented planar surfaces, but which still cannot be seen from
any viewpoint in space. For example, no two surfaces of a tetrahedron are oppositely
oriented. Nevertheless, there is no vieWpoint from which the “outer” sides Of all four
surfaces of a tetrahedron are simultaneously visible.

We Obtained the multi-view part representation empirically, because, as mentioned
earlier, a large number of “impossible” views are included in the catalog found by
the exact approach (based on enumerating all distinct connected subgraphs of the
attributed SAG of each part). We constructed CAD models corresponding to the 12
shape classes using GEOM OD, a geometric CAD solid modeling package. GEOMOD
is one component of I-DEAS (Integrated Design and Engineering Analysis System)
[87]. The 12 CAD models are shown in Figures 4.1 and 4.2.

We then viewed the CAD model of each of the part primitives from a large number

(100) of random viewpoints on the surface of a viewsphere and noted the different

61

 

 

 

 

 

0.0 o o
0’0 0,0 ’

0,0

Figure 3.20. S-aspects of a rectangular block

aspects. Figure 3.20 illustrates the simplicity of our multi-view representation of a
rectangular block (one of the 12 basic geons), which has just 3 s-aspects. Note that
the true aspect graph of even a simple cube would have 26 aspects. Figure 3.29 shows
the multi-view representation of a bent cylinder, another Of the 12 basic geons. This
part has only 7 s-aspects. In fact, the entire set of 12 geon-like shapes (as deﬁned by
the SAGs shown earlier) has only 74 different views or s-aspects!

The surfaces in Figures 3.20 through 3.31 are labeled according to the signs of the
maximum and minimum curvatures, which may be positive (+), zero (0), or negative
H.

The multi-view representation of parts contains the following information: for
each characteristic view of each of the chosen generic parts, it describes the surface
types, their adjacencies, and the angle attribute (concave or convex) between adjacent
surfaces. This information is stored in the form of attributed SAGS, one for each view
in the multi-view representation. At present, we do not take into account whether

the angle between two surfaces is acute/right/obtuse.

62

 

 

 

 

0,0
+,0

+,0

2i .

0,0

 

_,

 

 

0,0

 

+,0

 

Figure 3.21. S—aspects of an increasing-decreasing rectangular block

 

 

0,0

 

 

 

 

 

 

 

\
l

.0

0.0

0.0

i!!!

0.0

Figure 3.22. S-aspects of a tapering rectangular block

0 I ‘.,o
\O 3/

Figure 3.23. S-aspects of a cylinder

 

 

 

 

64

 

 

 

0,0
+,+

Figure 3.24. S-aspects of an increasing—decreasing cylinder

Figure 3.25. S-aspects of a tapering cylinder

 

 

65

 

 

 

I T 93
Al Ti H 1”

.4 | H“

0.0

 

 

 

 

       

0.0

 

+.0

 

0.0 -,0

0.0

 

0.0 ‘

 

 

 

Figure 3.26. S-aspects of a curved—axis rectangular block

 

 

 

 

 

 

 

 

 

 

+.0 : 0,0
in 0'0 7 +.0
0.0 /|' I l \l\ I 1/
0 0.0
, / °’° /
/l l 'I J l ’ H”
’ +.0

 

0.0 0.0
0. I
~ "0
0.0
-.0 0
+.0 ’0

Figure 3.27. S-aspects of a curved—axis, increasing-decreasing rectangular block

67

 

 

I TK/Ef

 

 

 

 

 

 

 

 

 

 

 

.o/H lo) I T .
m Iii/1| H/
0.0 . To ’ +,o

Figure 3.28. S-aspects of a curved-axis, tapering rectangular block

68

 

 

 

 

+,-
+,+ _ $ +.-
+,-
+,+
0,0
0,0 0.0
—
o
+,+
I /
+,+ +"

 

 

Figure 3.29. S-aspects of a curved-axis cylinder

For our 12 geon-like parts, the angles between surfaces are always convex. Hence,
in this case the multi-view representation of parts simply amounts to a catalog of
adjacencies of attributed surfaces, with surface adjacencies always being convex. Note
that if we selected a different set of generic part primitives, for which some surface
adjacencies were concave (say “L” or “T” shaped primitives), then concave angles

would appear in the catalog of part views.

3.4 Part Segmentation from Surface Segmenta-
tion

Object recognition and surface classiﬁcation methods based on differential geometric

measures such as curvatures are common in the literature [5, 7, 28, 34, 92]. Curvature

69

0,0

 

+,+

 

+,+

 

+,+
0,0

 

+,+

Figure 3.30. S-aspects of a curved-axis, increasing—decreasing cylinder

70

 

 

 

 

 

 

+,+

+,+

 

0.0

Figure 3.31. S-aspects of a curved-axis, tapering cylinder

71

measures (such as principal curvatures, or mean and Gaussian curvatures) have been
used to distinguish between spheres, cones, cylinders and cubes [7, 59]. Hence it
seems natural to propose the use of similar measures to aid in part segmentation
and identiﬁcation for our somewhat larger set of part primitives. Since curvature
estimates are noisy, we use only the signs (positive, zero, or negative) of the median

values of principal curvatures on each surface.

3.4.1 Surface segmentation and classiﬁcation

First we segment the range image into surfaces of different types. Surface segmen-
tation is performed by the method of Flynn [33]. Flynn’s algorithm in turn begins
with a preliminary segmentation obtained by Hoffman and Jain’s “split-and-merge”
method [50]. Hoffman and Jain apply the CLUSTER program to get an initial sur-
face segmentation. A squared-error clustering procedure ﬁnds up to 16 clusters in
the 6-dimensional space consisting of 6 local surface features, i.e., the three compo-
nents of the normal vector at each point on the surface, along with the co-ordinates
of that point. The result is usually oversegmented. Hoffman and Jain then obtain
their ﬁnal surface segmentation by merging adjacent surface patches which meet a
“similarity” criterion. Using this as the preliminary segmentation, Flynn classiﬁes
individual surfaces into the following types: planar, spherical, cylindrical, and un-
known. Appropriate geometric parameters are estimated for each surface. Finally,
adjacent surface patches are merged if they are of the same type and have similar
parameters. In most cases, this gives a good surface segmentation.

After surface segmentation, we begin construction of the SAG by classifying each
surface according to the distributions of curvature measures on the surface. From the
signs of the maximum and minimum curvatures (within appropriate thresholds), we
can distinguish between 5 surface types, as shown in the ﬁrst two columns of Table

3.3, and draw some inferences about the part primitive to which they belong.

72

Table 3.3. Some inferences of part shape based on surface type

 

 

Max. curv. Min. curv. Inference

+ + Curved edges & bent axis, OR
Curved edges & c.s. size incr-decr

+ 0 Curved edges & straight axis, OR
Straight edges & curved axis

+ — Curved edges & bent axis

0 0 End face of any part, OR
Straight edges & straight axis

— 0 Straight edges & bent axis

 

 

 

 

 

 

3.4.2 Using the Catalog of Part Views

The main problem of parts-based representation is part segmentation, i.e., how to
segment an arbitrary object into instances of the part primitives. We will use com-
parison of the surface types and adjacencies in the image with those stored in the
multi-view representation (catalog of aspects or views) of the part primitives for part
segmentation as well as an aid towards part identiﬁcation.

Good surface segmentation is crucial to our method for part segmentation. After
surface segmentation and classiﬁcation, the image consists of a set of surfaces, with
their adjacencies, types, and angles between adjacent surfaces. Like the views in the
catalog, this is also represented as an attributed SAG. Surfaces of insigniﬁcantly small
area are ignored after taking their connectivity into account, i.e., if such a surface
connects two larger surfaces, it is removed from the graph after ensuring that the
nodes corresponding to the two larger surfaces are connected to one another.

It is assumed that a single surface in the SAG belongs to exactly one part. Every
edge in the SAG, therefore, is potentially an adjacency between surfaces belonging to
two different parts. Hence, each edge in the SAG is examined to see whether the multi-
view representation can account for two adjacent surfaces of those particular types,
being adjacent with those particular angle attributes. (Currently, concave/convex is

the only angle attribute used). If not, the surfaces must belong to different parts, so

73

the edge is removed. Note that we could have chosen more complicated generic part
primitives (for example a “T” or “L”), in which some adjacent surfaces meet along a
concave crease even though they belong to the same part.

Every edge of the graph is examined in this manner to get a set of hypothesized
part segmentations. The graph is thus divided into connected, attributed subgraphs,
each consisting of surfaces hypothesized to belong to the same part. Time complexity
of this part segmentation method is proportional to the number of edges in the SAG.

Each connected subgraph is then examined to verify whether it can be accounted
for by one or more part views from the multi-view representation, i.e., whether there is
some view of some part, such that the subgraph partially matches the corresponding
SAG.

This is worth doing because of the current fragility of surface segmentation and
curvature-based surface classiﬁcation. By comparing the subgraphs with the catalog
of part views, in some cases it is possible to determine immediately that there have
been errors in surface segmentation and classiﬁcation, because the subgraphs do not
partially match any view of any part in the catalog. Each subgraph for which we are
able to ﬁnd one or more partial matches in the catalog corresponds to a hypothesized
part and one or more possible interpretations of what the part identity might be.

This method of part segmentation is in some ways more general than common ap-
proaches which, in one way or another, make use of Hoffman and Richards’ “principle
of transversality” [47]. This principle states that when two arbitrarily shaped sur-
faces interpenetrate, they usually meet in a contour of concave discontinuity of their
tangent planes. However, Figures 3.32 and 3.33 show intersecting parts which do not
meet along concave creases. Part segmentation techniques which depend solely on
concavity would fail in these cases. However, our method would still work because ir-
respective of the angle between the surfaces, our catalog of part views cannot account

for two adjacent cylindrical surfaces belonging to the same part.

74

Convex crease

 

Figure 3.32. Two parts which meet along a convex crease

Grease
is Convex

/

 

 

/

Crease
is Concave

 

 

Figure 3.33. Two parts which meet along a “mixed” crease

75

 

 

 

 

Figure 3.34. L-shaped object for which part segmentation fails

Our part segmentation method also has some major limitations. Firstly, it is to-
tally dependent on reliable surface segmentation and reliable curvature-based surface
classiﬁcation. But in reality, surface segmentation as well as curvature-based surface
classiﬁcation are often incorrect.

Secondly, the method does not use any information about shape of the object,
relying instead solely on surface properties. Hence, it fails when two parts come
together in such a way that the adjacency of the corresponding surfaces is not directly
visible, even though in many such cases the intensity or occluding contour data clearly
indicates the joining of two parts. For example, consider a view of an L-shaped Object
in which only a single surface, i.e., the L-shaped surface, is visible (Figure 3.34).
The occluding contour clearly suggests the presence of two parts. Nevertheless, our
method cannot segment this object into two parts from such a vieWpoint. The SAG
consists of a single node, and there are no SAG links to be broken.

Additionally, for this example (the L-shaped object), even if two surfaces belonging
to different parts had been visible (Figure 3.35), part segmentation would have been
incomplete. The reason is that the single L-shaped surface is itself produced by the

coming together of two parts, and our part segmentation method has no way to

76

 

 

 

 

 

 

Figure 3.35. Another view of the L-shaped object

break up a single surface into pieces belonging to different objects. Many other such

examples can be found.

3.4.3 Part Identiﬁcation: Two Approaches

We used the catalog of part views to perform part segmentation by breaking SAG links
of the given scene image. The following chapters discuss two methods for identiﬁcation
of the isolated parts thus obtained.

The ﬁrst method does not make any further use of the catalog Of part views. The
isolated parts are classiﬁed based on features derived from estimated parameters of
the best-ﬁtting superquadric. This approach is described in Chapter 4.

The second method uses the catalog of part views to hypothesize which parts could
possibly account for each of the subgraphs isolated by part segmentation. A direct
test of one geon attribute (cross-section size) is then applied to eliminate some of the
possibilities and determine the part identity. A direct test of another geon attribute
(axis shape) using intensity image data is also developed in order to partially recover

from some shortcomings of the method. Further details are given in Chapter 5.

77
3.5 Summary

In this chapter, we ﬁrst explained our choice of parts for the 3-D parts-based ob-
ject representation. A set of 12 geon-like “generic” parts was arrived at by ignoring
the cross-sectional symmetry attribute of geons. Next, we introduced the notion of
Surface Adjacency Graphs (SAGS), i.e., attributed graphs in which nodes represent at-
tributed surfaces and edges represent attributed surface adjacencies. We put forward
a novel method of specifying “generic” 3-D parts, i.e., by specifying their SAGs. After
a brief overview of aspect graph (i.e., vieWpoint-dependent representation) research,
we derived an extremely compact multi-view representation of the part primitives,
based on their SAGS. We presented a method of performing part segmentation from
range images, given a good surface segmentation. Lastly, we pointed out that the
catalog of part views can also be used to detect errors in surface segmentation and
curvature-based surface classiﬁcation.

After part segmentation, the next step is part identiﬁcation, i.e., the parts that
have been isolated must be identiﬁed as instances of the 12 part primitives. The

following two chapters examine two approaches towards part identiﬁcation.

CHAPTER 4

Geons from Superquadrics

In the previous chapter, we presented a method for part segmentation, given surface
segmentation. The next step is to classify the isolated parts into one of the 12 geon-
like shape classes. The research described below has two goals. Firstly, it attempts
to perform this part identiﬁcation of isolated parts. Secondly, in doing so, it tests the
shape discrimination capability of superquadric parameters under varying conditions
of image resolution, model elongation, and noise.

We will now attempt to relate two inﬂuential families of 3—D shape primitives, su-
perquadrics and geons. As mentioned previously, superquadric surfaces have attracted
considerable interest recently as part models for representation of 3-D objects. Here,
we explore their utility for a different purpose, i.e., for shape discrimination. We in-
vestigate how well the estimated superquadric parameters can reﬂect certain intuitive
geometric attributes of elongated objects, such as axis shape (straight or bent), type
of cross-sectional edges (straight or curved), and variation of cross-section size along
the axis (constant, tapered, or increasing-and-decreasing). Parameters of the best-
ﬁtting superquadric are estimated for real as well as synthetic range images, obtained
from a large number of vieWpoints, of models belonging to 12 shape classes based on
the above geometric attributes.

Five features derived from the estimated superquadric parameters are used to

78

79

distinguish between these 12 shape classes. Classiﬁcation error rates are estimated
for k-nearest-neighbor and binary tree classiﬁers. The effects of varying the range
image resolution, noise level, and the model elongation are also investigated. The
results indicate that existing methods of superquadric parameter estimation are quite
sensitive to noise in range values, and also do not wOrk well for “rough,” coarse-
surfaced objects. However, for low-noise real range images of objects with “smooth”
surfaces, numerical features derived from estimated superquadric parameters can be
used with a binary tree classiﬁer to infer qualitative shape properties with almost 80

‘70 reliability. Some of the work described here has earlier been presented in [77].

4.1 Motivation

As mentioned earlier, superquadric surfaces, as well as geons, are recent additions to
the set of 3-D part primitives proposed for object modeling. Aside from this fact,
however, the two differ in almost every possible way. For instance, superquadrics
are precisely deﬁned parametric surfaces, whereas geons are speciﬁed by qualitative
geometric properties; superquadrics are usually estimated from range images, while
geons are obtained from line drawings; superquadrics are “ﬁtted” by minimizing some
error-of-ﬁt criteria, whereas geons are recognized from their non-accidental geometric
properties, and so on. Nevertheless, superquadrics and geons have strengths which
mutually complement one another, making it advantageous to relate the two repre-
sentations.

While it seems clear that superquadrics provide a compact, versatile representation
for modeling and visualization (CAD / graphics), their use for 3-D object recognition is
almost unexplored. Since obtaining reasonable estimates of superquadric parameters
is computationally laborious, using the superquadric representation seems worthwhile

only if it can offer signiﬁcant advantages for object recognition (e.g., by serving as a

80

good basis for indexing into the object database).

The above reference to indexing suggests a natural connection with another in-
ﬂuential class of shape primitives, i.e., geons, which are supposed to discriminate
between classes of 3-D shapes. It seems, therefore, that a good test of superquadrics
for shape discrimination (as opposed to just representation) would be to see whether
they can distinguish between geons.

Establishing such a link would be valuable because the superquadric and geon
representations have some mutually complementary strengths. For example, the su-
perquadric representation by itself provides no advantage from the point of view of
indexing into a large database of objects. However, if superquadrics could be classi-
ﬁed into shape categories similar to geons, they would have all the strengths of the
geon representation in indexing into a database. Similarly, a weakness of the “pure”
geon representation is that it does not perform precise geometric modeling, e.g., does
not store size information which would allow it to distinguish between, say, a human
being and a doll. On the other hand, a superquadric representation would capture
these “numerical” shape differences and produce a precise geometric model.

The superquadric representation by itself cannot distinguish between all the sym-
metry attributes relevant to geons. This is because undeformed superquadrics have a
cross-section with at least reﬂection symmetry, and deformations such as tapering or
bending do not distort the cross-section. Hence, we have tried to at least discriminate
between a set of only 12 geon classes (ignoring symmetry). This leaves 3 geometric
properties to be detected: edge Shape, axis shape, and type of cross-section sweeping
function.

This chapter does not deal with part segmentation, which is a separate issue
[31, 70], and has been discussed in Chapter 3. Rather, it focuses on a possible subse-
quent step, i.e., classifying the individual parts into one of 12 geon-like shape classes.

We are concerned with the sensitivity to shape property of estimated superquadric

81

parameters, i.e., their capability to reliably discriminate between shape properties of
the isolated parts.

The existing literature on superquadrics does not address this issue, and it is
not at all obvious that such shape discrimination can in fact be done. For instance,
estimation methods may yield superquadric parameters which minimize some error
criterion (e.g., least squares), and which may appear even visually to be a “good”
overall ﬁt. Yet, the sensitivity of one or more of the estimated parameters to the
qualitative shape properties may be too low (or too high) for their successful use
as features for reliable shape classification. Besides attempting shape classiﬁcation
based on the estimated parameters, this is also the ﬁrst study which performs an
analysis of superquadric parameter estimation over a large number of range images
taken from many different viewpoints, while varying three other relevant factors (noise
level, object elongation, and image spatial resolution).

The remaining parts of this chapter are organized as follows: Section 4.2 dis-
cusses the relationship between superquadrics and geons. Section 4.3 justifies the
choice of features to be used by the statistical classiﬁers, and explains the design
and methodology of experiments performed. Section 4.4 presents the results of our
classification experiments, and Section 4.5 discusses conclusions which can be drawn

from the experimental results and summarizes this chapter.

4.2 Relating Superquadrics and Geons

As argued earlier, it would be advantageous to relate superquadrics and geons. But,
do the superquadric parameters contain enough information, at least theoretically, to
distinguish between the shape attributes relevant to geons? The answer, in general,
is negative, because not all geons can be modeled by superquadrics (with possible

bending and tapering). All elongated, convex superquadrics are geons, but not all

82

geons. are superquadrics. (Non—convex superquadrics, with shape parameters 6 > 2,
would more naturally be modeled as a Boolean combination of simpler geons). Both
superquadrics and geons are classes of generalized cylinders, but the speciﬁcation of
a geon is less restrictive. Superquadrics are precisely deﬁned mathematical shapes,
whereas the description of a geon, by intention, specifies only 4 qualitative (non-
numerical) shape attributes and is indifferent to other details.

For example, a geon with unsymmetrical cross—section cannot be modeled by a
superquadric. Cross-sections of undeformed superquadrics are necessarily symmetric
at least with respect to reflection. If their :1; and y dimensions (a; and a,) are equal,
they are also symmetric with respect to 90° rotation. Bending and linear tapering
are two deformations commonly applied to superquadrics, which affect the symmetry
attributes of the object as a whole. However, these operations do not affect the
symmetry attributes of the cross-section. We could try to formalize more complex
deformations, such as warping, shearing, etc. in order to distort the cross—sections
themselves, and thus to model a larger variety of geons. But this would increase the
number of parameters which need to be estimated from range data to fully specify
the deformed superquadric.

Since every elongated, convex superquadric is a geon, whereas not every geon can
be modeled by a superquadric, it is clear that elongated, convex superquadrics are a
subset of geons. Consequently, in general, the superquadric parameters will not suf-
ﬁce to model or discriminate between all geons. Nevertheless, deformed superquadrics
can model most of the geons that are commonly encountered in industrial and house
hold objects. For this reason, it is still worth exploring the utility of superquadric

parameters for shape discrimination.

83
4.3 Experiments in Shape Discrimination

The objective in these experiments is, primarily, to evaluate the utility of the su-
perquadric representation for discriminating the 12 classes of 3-D parts shown in
Figures 3.1 through 3.6. We attempted to do this by-ﬁrst estimating parameters of
superquadrics which best ﬁt the object in the given range image, deriving a set of
features from the estimated parameters, and then using standard statistical pattern
recognition methods (k-nearest-neighbor and binary tree classiﬁers) for classifying the
given shape into one of the known categories. We also tested the effects of (a) using
range images of different resolutions to estimate the superquadric parameters, and (b)
using objects which are ”elongated” to different extents. Our initial experiments with
real range images used hand—carved objects with rough, “noisy” surfaces. In order to
understand / explain the results of these experiments, we also studied the effect of (c)

adding varying levels of noise to the range images.

4.3.1 Choice of Features

FolloWing the formalism of Solina and Bajcsy [81], 15 parameters must be estimated
from a range image of a bent, tapered superquadric at arbitrary location and orien—
tation. Of these, only 8 are relevant for the purposes of shape discrimination: the
dimensions along 2:, y, and z axes, a,, a,,, and a,; the shape parameters 61 and 62; co-
efficients for linear tapering, k; and Icy; and the radius of bending, r. Of the remaining
7 parameters, 6 specify the estimated position and orientation of the superquadric,
which is irrelevant to its shape. The seventh unused parameter estimates the angle of
bend in case of superquadrics deformed by bending. However, we can detect bending
using other parameters, as explained below.

By convention, the object is oriented with its longest dimension along the z-axis. It

is clear that the ratio r / (1; indicates the extent to which the superquadric is bent. The

84

parameter 62 determines whether the cross-sectional edges are straight or curved. The
nature of the cross-section sweeping function is more complicated, depending on 61,
k,” and Icy. In the absence of tapering, the object has almost “constant” cross-section
size if £1 is close to zero. As cl increases, the cross-section size becomes “increasing-
and-decreasing”. The cross-section size is “increasing” (or “decreasing”) if 61 is zero
and at least one of the tapering coefﬁcients is non—zero. Hence, a set of 5 features
is selected from the superquadric parameters for our 12—class shape discrimination
problem: r/az, e], 62, k,, and kg.

The superquadric representation is non-unique in the sense that the same shape
may be represented by more than one set of parameters. For example, a “square”
cross-section (e = 0) can also be interpreted as a “diamond-shaped” cross-section
(e = 2), rotated by 45°. Such ambiguities do not arise here because the estimation

procedure always yields 5 parameters within the range [0, 1].

4.3.2 Design of Experiments

Experiments were performed using synthetic as well as real range images. The syn-
thetic images are useful because they are taken from objects designed with a CAD
modeling system, allowing exact control over the elongation and other shape at-
tributes. Ignoring the symmetry attribute, there are 12 shape classes corresponding
to the “collapsed” set of geons.

Each classification experiment was repeated with l-nearest neighbor (l-nn), 3-
nn, 5-nn, and binary tree classiﬁers. K-nearest-neighbor classiﬁers determine the 1:
training patterns that lie closest in the feature space to a test pattern, and assign it to
the category with the largest number of representatives among those neighbors [23].
The binary tree classiﬁer is based on the method proposed by Sethi and Sarvarayudu
[79]. It hierarchically partitions the feature space, at each stage using hyperplanes

parallel to the feature axes. A single feature is used at each nonterminal node of the

85

decision tree. The feature to be used at each node, and the corresponding threshold,
are determined by maximizing an information-theoretic measure, the “average mutual
information gain.”

Error rates for the k-nearest—neighbor as well as binary tree classiﬁers were ob-
tained using the “holdout” method [23], in which part ”of the available patterns (90%
in our case) are used for training, and the remaining part is held back for testing the

classiﬁer. This method is known to yield a pessimistic estimate of classiﬁcation error.

Experiments with Synthetic Range Images

We constructed models corresponding to the 12 shape classes using GEOMOD, a ge-
ometric CAD solid modeling package. GE 0M OD is one component of I-DEAS (Inte-
grated Design and Engineering Analysis System) [87], which, besides object modeling,
also has ﬁnite element analysis, database manipulation, and drafting capabilities. Fig-
ures 4.1 and 4.2 show synthetic range images of the 3—D shapes modeled using IDEAS.
For ease of viewing, the range images are rendered as pseudo-intensity images.

Three models of different “elongations” were constructed for each of these 12 shape
classes. The “axis bent” shapes are actually piecewise linear, as seen in Figure 4.2.
By “elongation”, we mean the ratio of the longest axial dimension (by convention,
oriented along the z-axis) to the larger dimension along either of the other two axes
(a: or y) at the mid-point. The three models for each shape class have elongations of
1:2, 1:4, and 1:6. 90 synthetic range images are generated, corresponding to viewing
each of these models from 30 different viewpoints. (For each viewpoint, range images
are taken at resolutions of 64 x 64, 128 x 128, and 256 x 256). This gives a total
of 270 viewpoints for each shape class, 90 from each of the 3 models. One can also
categorize this data as 90 images from each of the three resolutions.

Given these 270 synthetic range images for each of the 12 shape classes, we ﬁrst

estimated the superquadric parameters for each image using Solina and Bajcsy’s

86

 

Figure 4.1. Synthetic range images: shapes with straight axes

87

 

Figure 4.2. Synthetic range images: shapes with bent axes

88

method. In practice, this procedure does not always quickly converge. In such cases,
we used the current estimate after 100 iterations. We then performed classification

experiments as follows:

1. Overall shape discrimination: Using all the 270 pattern vectors for the 12 shape

classes, attempt to discriminate between:

(a) parts with straight and curved axes
(b) parts with straight and curved cross-sectional edges
(c) parts with constant, increasing, and increasing-decreasing cross-sections

(d) all twelve parts

2. Effect of range image resolution: Perform the same four experiments as in (l),
separately for the patterns derived from 64 x 64, 128 x 128, and 256 x 256
range images (90 patterns for each resolution are available for each of the 12

parts).

3. Eﬂect of object elongation: Perform the same four experiments as in (1), sepa-
rately for the patterns derived from parts with elongations of 1:2, 1:4, and 1:6.

(90' patterns for each elongation are available for each of the 12 parts).

Preliminary experiments with real range images led us to realize that surface im-
perfections (“roughness”) have a very signiﬁcant effect on the quality of superquadric
parameter estimates. Consequently, we attempted to study the effect of surface im-
perfections by modeling them as “noise.” The experiments described earlier all made
use of noise-free synthetic images. We then studied the mean and standard deviation
of the estimated parameters with two different levels of zero-mean additive Gaussian
noise. Noise was added to the z-component of the synthetic images. The standard
deviations of the two levels of noise were 2.5 and 5.0 (about 1.0% and 2.0%, respec-

tively, on a scale of 255. Range values in the images were quantized between 0 and

89

255). These experiments with noisy images were done only for the “best” condition
of the other factors, i.e., for objects of elongation 1:6 and image resolution of 256 x

256.

Experiments with Real Range Images

Each of the 12 parts was constructed using easily carvable material such as potatoes
and sweet potatoes (Figure 4.3). No attempt was made to achieve a very smooth and
even surface ﬁnish, since geons are not supposed to be sensitive to such details.
Range images were then obtained by viewing each of these 12 parts from 20
arbitrary vieWpoints using a Technical Arts 100X White scanner [88]. Figure 4.4
shows typical range images of these parts, obtained with the White scanner. Since
the White scanner obtains depth maps by the principle of triangulation, it cannot
compute the depth at points where the laser stripe is invisible to the camera, whether
due to occlusion or due to the intensity of the reﬂected laser stripe being insufﬁcient
at some surface orientations. Such points on the object are displayed as black pixels
in Figure 4.4. From some viewpoints, entire surfaces of the models may thus be
missing, reducing the amount of depth data available for estimating the superquadric
parameters.
These real range images are of resolution 240 x 240. Parameters were estimated as be-
fore for the best—ﬁtting superquadrics, and classiﬁcation experiments were performed

to discriminate between:

1. parts with straight and curved axes
2. parts with straight and curved cross-sectional edges
3. parts with constant, increasing, and increasing-decreasing cross-sections

4. all 12 parts

90

 

 

Figure 4.3. Hand-carved geons with “rough” surface ﬁnish

91

 

Figure 4.4. Real range images of “rough” hand-carved geons

92

As mentioned earlier, surfaces of these hand-carved objects were not of smooth
ﬁnish. Further, over the 2 days when imaging was in progress, the surface quality
became somewhat rougher due to drying out of the vegetable material, resulting in
uneven surface deformations. Based on the poor results of these initial experiments
with real range data, we surmised that surface coarseness or roughness, as well as
“missing” data (due to the reﬂected laser stripe being insufficiently intense at some
orientations) substantially reduced the accuracy of estimated superquadric parame-
ters.

Hence, the above experiments were also repeated with real range images taken
from models with relatively smooth, even surface quality. Some of these were “man-
ufactured” objects, while others were carved by hand from soap bars. The advantage
of using this material was that surfaces could be made smooth and even by washing
the ﬁnished model in water. Range images were then obtained by viewing each of
these 12 models from 10 arbitrary vieWpoints using the White scanner. Figure 4.5

shows typical range images of these models.

4.4 Experimental Results

We report the results of superquadric parameter estimation, as well as shape classi-

ﬁcation based on features extracted from the estimated parameters.

4.4.1 Parameter Estimation

As described earlier, 5 features were chosen with the expectation that their values
would reﬂect the three shape attributes of interest, i.e., axis shape, type of cross-
section edges, and variation of cross-section size along the principal axis of a part.
In this subsection, for each of the 12 parts, we will discuss the “ideal” values of

the 5 features, and present the values actually obtained by superquadric parameter

93

 

Figure 4.5. Real range images of some “smooth” geons

94

estimation.

“Proper” Values for Features

It would be useful to indicate the “proper” values for each of the ﬁve features which
have been derived from superquadric parameter estimates. The parameter £1 gov-
erns the shape of the superquadric along longitudinal planes, i.e., for planes passing
through the principal axis. The primary effect of this parameter, therefore, is on the
variation of cross-section size along the principal axis. For constant cross-section ob-
jects, 61 should be close to 0.0, indicating a rectangular proﬁle in longitudinal planes.
At the same time, k, and 1st,, should also be close to 0.0. For objects with tapering
cross-section, 61 should be close to 0.0, but either kg, or Icy, or both, should be signif-
icantly higher than 0.0. By experience, we have observed that estimates of 61 which
are less than 0.3 can be considered “close to 0.0,” i.e., they correspond to a more or
less rectangular proﬁle in longitudinal planes. “High” estimates of £1 (closer to 1.0)
indicate an increasing-and-decreasing proﬁle in longitudinal planes, corresponding to
an increasing-and-decreasing cross-section size along the principal axis. Estimates
of k; and Icy less than about 0.25 or 0.3 can be considered “close to 0.0,” i.e., they
correspond to non-tapering longitudinal proﬁles.

The parameter :2 governs the shape of the superquadric along latitudinal planes,
i.e., for “cross-sectional” planes perpendicular to the principal axis. This parameter
primarily affects the nature of the cross-sectional edges, i.e., straight—edged or curved-
edged cross-section. Again by experience, we have observed that estimates of 62 which
are less than 0.3 can be considered “close to 0.0,” i.e., they correspond to a more or
less straight-edged proﬁles in cross-sectional planes.

The feature r/a, governs the straightness of the principal axis. By experience,
we have observed that estimates of r / a: less than about 6.5 correspond to bent-axis

objects.

95

Actual Values of Estimated Features

For each of the 12 parts, listed using the above notation, Tables 4.1 through 4.5 show
the mean and standard deviation for estimates of the ﬁve features 61, 62, lg, kg, and
r/a,, respectively. These statistics are presented for real range images as well as for
synthetic range images with varying levels of noise. It is useful to view these tables in
conjunction with the confusion matrices presented in Tables 4.12 through 4.14, which
indicate the nature of misclassiﬁcations for each of the 12 parts. “Unexpectedly” low
or high values of the parameter estimates (shown in Tables 4.1 through 4.5) explain
several of the striking kinds of misclassiﬁcations.

Table 4.1 shows the mean and standard deviation for estimates of the superquadric
parameter Q. The ﬁrst three columns give statistics from synthetic range images with
additive Gaussian noise of standard deviations (a) 0.0, 2.5, and 5.0, respectively.
The last two columns show statistics from real range images of the hand-carved and
“smooth”-surfaced objects, respectively. Tables 4.2 to 4.5 show similar statistics for
the other four features, i.e., 62, kt, Icy, and r/az.

In general, superquadric parameter estimates for synthetic images with low values
of noise a (0.0 and 2.5) are similar, and close to the values we would expect. The
standard deviations are quite large, indicating that parameter estimates vary consid-
erably depending on the viewpoint. For example, in column 2 of Table.4.1 (0:0.0),
standard deviations for the shape classes b-c-id and b-s-id are quite large. This is
because from certain viewpoints, these shapes (with increasing-and-decreasing cross-
section size) are mistaken for shapes with a bent axis and constant cross-section size.
On the other hand, in the same column, the standard deviation is almost zero for
objects with a straight axis.

The superquadric parameter estimates are poorer for synthetic images with a

higher noise a (5.0), as well as for real range images from the “rough” objects. This

96

Table 4.1. Mean and standard deviation of estimates of £1 for the 12 parts

 

 

 

 

 

 

 

 

 

 

Shape class Synthetic data with difErent noise levels Real data
a = 0.0 a = 2.5 a = 5.0 Hand-carved “Smooth”—

Mean, SD. Mean, SD. Mean, SD. Mean, SD. Mean, SD.
b-c-co 0.10, 0.02 0.13, 0.07 0.19, 0.20 0.16, 0.08 0.12, 0.03
b-c—id 0.49, 0.42 0.66, 0.43 0.23, 0.29 0.33, 0.23 0.86, 0.10
b-c-t 0.17, 0.12 0.17, 0.11 0.28, 0.30 0.14, 0.13 0.20, 0.03
b-s-co 0.11, 0.04 0.10, 0.00 0.16, 0.19 0.17, 0.13 0.10, 0.00
b-s-id 0.35, 0.26 0.23, 0.17 0.27, 0.29 0.49, 0.33 0.32, 0.03
b-s-t 0.19, 0.11 0.16, 0.14 0.26, 0.29 0.34, 0.29 0.21, 0.04
s-c-co 0.10, 0.00 0.11, 0.02 0.13, 0.05 n 0.21, 0.20 0.10, 0.00
s-c-id 1.00, 0.00 0.97, 0.07 0.21, 0.22 0.77, 0.25 1.00, 0.00
s-c-t 0.10, 0.02 0.12, 0.10 0.14, 0.07 0.39, 0.28 0.27, 0.17
s-s-co 0.11, 0.07 0.10, 0.01 0.12, 0.05 0.23, 0.24 0.10, 0.00
s-s-id 1.00, 0.00 0.96, 0.16 0.28, 0.35 0.70, 0.27 0.33, 0.28
s-s-t 0.11, 0.04 0.13, 0.11 0.12, 0.04 0.29, 0.27 0.13, 0.05

 

 

gives a forewarning that part identiﬁcation based on these parameter estimates will be
poor for the higher-noise synthetic images as well as for the real images of hand-carved
objects with “rough” surfaces. In fact, classiﬁcation error rates for the higher-noise
(a = 5.0) synthetic range images are similar to those for real range images of the
“rough” surfaced objects.

For most of the parts, parameter estimates from real range images of their
“smooth” models are generally similar to those from the low-noise synthetic images.

Ideally, the parameter 51 should be close to zero for objects whose cross-section
size is constant or tapering along the axis, and larger for objects with increasing-and-
decreasing cross-section size. The mean values of £1 for low-noise synthetic images
generally conform to this expectation, as can be seen from Table 4.1. Empirically, if
61 > 0.3, the superquadric cross-section has a distinctly “increasing-and-decreasing”
appearance.

The parameter 6; should ideally be close to zero for objects with “straight” cross-

section edges, and close to 1.0 for objects with “curved” cross-section edges. As can

97

Table 4.2. Mean and standard deviation of estimates of £2 for the 12 parts

 

 

 

 

Shape class Synthetic data with different noise levels J] Real data _‘
a = 0.0 a = 2.5 0 = 5.0 Bland-carved “Smooth”
Mean, SD. Mean, SD. Mean, SD. Mean, SD. Mean, SD.
b—c-co 0.72, 0.39 0.71, 0.36 0.33, 0.18 0.20, 0.18 0.28, 0.36
b-c-id 0.91, 0.16 0.84, 0.08 0.37, 0.21 0.45, 0.35 0.95, 0.05
b-c-t 0.57, 0.39 0.68, 0.31 0.31, 0.20 0.50, 0.28 0.94, 0.05
b-s-co 0.10, 0.00 0.10, 0.00 0.19, 0.16 0.21, 0.21 0.10, 0.00
b-s-id 0.10, 0.00 0.10, 0.00 0.17, 0.17 0.47, 0.40 0.14, 0.02
b—s-t 0.11, 0.03 0.13, 0.10 0.16, 0.11 0.27, 0.28 0.19, 0.09
s-c-co 1.00, 0.00 0.92, 0.07 0.42, 0.15 0.84, 0.14 0.98, 0.03
s-c-id 0.99, 0.05 0.83, 0.10 0.37, 0.21 0.83, 0.13 0.84, 0.08
s-c-t 0.98, 0.06 0.85, 0.18 0.39, 0.17 0.78, 0.18 0.93, 0.07
s-s-co 0.10, 0.00 0.10, 0.00 0.26, 0.30 0.28, 0.27 0.10, 0.00
s-s-id 0.10, 0.00 0.10, 0.00 0.21, 0.24 0.60, 0.39 0.15, 0.03
s-s-t 0.13, 0.16 0.10, 0.00 0.26, 0.32 0.30, 0.35 0.18, 0.07

 

 

 

 

 

 

 

be seen from columns 2 and 3 of Table 4.2, this expectation is borne out quite well
for synthetic images with low or no noise.

Estimates for k: and Icy should ideally be close to zero for non-tapering parts,
and distinctly higher for tapering parts. The deterioration in estimates of these two
parameters is particularly striking as the noise level increases (columns 4 and 5 of
Tables 4.3 and 4.4).

For the feature r / az, low values indicate a bent-axis part, while high values indicate
a straight axis. While estimates of this feature are best for the low noise images, it is
in fact estimated quite well even for the higher noise synthetic images and real images

of the hand-carved models.

4.4.2 Part Discrimination

Using all 5 features together, the 12-category classiﬁcation results using binary tree
classiﬁers and k-nearest-neighbor (k-nn) classiﬁers were strikingly different. In gen-

eral, the binary tree classiﬁer performed much better. This is expected, since it

98

Table 4.3. Mean and standard deviation of estimates of k, for the 12 parts

 

 

 

 

 

Shape class Synthetic data with different noise levels Real data H
a = 0.0 a = 2.5 a = 5.0 Hand-carved “Smooth
Mean, SD. Mean, SD. Mean, SD. Mean, SD. Mean, SD.
b—c-co 0.07, 0.09 0.06, 0.08 0.27, 0.16 0.32, 0.35 0.17, 0.14
b—c-id 0.13, 0.11 0.11, 0.11 0.38, 0.30 0.26, 0.20 0.06, 0.04
b-c-t 0.18, 0.09 0.21, 0.11 0.30, 0.21 0.28, 0.21 0.29, 0.13
b—s-co 0.04, 0.07 0.09, 0.13 0.22, 0.18 0.25, 0.32 0.04, 0.03
b-s-id 0.08, 0.08 0.10, 0.08 0.25, 0.19 0.43, 0.31 0.06, 0.06
b—s-t 0.23, 0.26 0.33, 0.25 0.35, 0.29 0.36, 0.36 0.35, 0.24
s-c-co 0.07, 0.06 0.04, 0.07 0.24, 0.20 0.24, 0.22 0.29, 0.17
s-c-id 0.30, 0.21 0.22, 0.17 0.60, 0.34 0.27, 0.10 0.55, 0.27
s-c-t 0.51, 0.05 0.39, 0.10 0.31, 0.20 0.45, 0.21 0.32, 0.09
s-s-co 0.03, 0.05 0.06, 0.15 0.30, 0.20 0.30, 0.33 0.05, 0.02
s-s-id 0.30, 0.25 0.26, 0.21 0.60, 0.28 0.41, 0.26 0.15, 0.08
s—s-t 0.46, 0.09 0.37, 0.18 0.32, 0.21 0.49, 0.29 0.23, 0.24

 

 

 

 

 

 

Table 4.4. Mean and standard deviation of estimates of In, for the 12 parts

 

 

 

 

 

Shape class Synthetic data with different noise levels Real data
a = 0.0 a = 2.5 a = 5.0 Hand-carved “Smooth”-ﬂ

Mean, S.D. Mean, SD. Mean, SD. Mean, SD. Mean, SD.
b-c-co 0.07, 0.11 0.09, 0.12 0.36, 0.21 0.35, 0.28 0.19, 0.15
b-c-id 0.13, 0.10 0.11, 0.11 0.35, 0.24 0.22, 0.25 0.15, 0.08
b-c—t 0.20, 0.10 0.20, 0.10 0.26, 0.20 0.37, 0.30 0.27, 0.13
b-s-co 0.05, 0.08 0.07, 0.11 0.25, 0.20 0.35, 0.34 0.02, 0.01
b-s—id 0.10, 0.12 0.09, 0.09 0.31, 0.24 0.36, 0.30 0.14, 0.10
b-s-t 0.35, 0.33 0.27, 0.27 0.40, 0.36 0.30, 0.30 0.17, 0.17
s-c-co 0.12, 0.08 0.04, 0.05 0.39, 0.19 0.26, 0.22 0.21, 0.19
s-c-id 0.33, 0.19 0.24, 0.18 0.58, 0.25 0.15, 0.10 0.46, 0.31
s-c-t 0.49, 0.06 0.34, 0.10 0.29, 0.18 0.39, 0.23 0.27, 0.13
s-s-co 0.05, 0.13 0.02, 0.02 0.33, 0.19 0.15, 0.21 0.06, 0.04
s-s-id 0.28, 0.24 0.21, 0.21 0.49, 0.35 0.30, 0.24 0.24, 0.23
s-s-t 0.45, 0.09 0.42, 0.20 0.29, 0.16 0.27, 0.26 0.41, 0.21

 

 

 

 

 

 

 

 

99

 

 

 

 

 

 

 

 

 

 

8.3m 538 new” .22 8.3m dew: was: 632 :85 43mm: 3....
a; sea ”is .32 5.: see 3.8 .23 8.3 .5; 2-3
«was is... 2.3” .awsﬂ 3.5 $52 as: .33” 3.38 5.2% 8a...
8a .32 3.3 .33... ”New .33 3.2-w .852. was-ma .382 is
:3 43 2.3 «we a; 43 3% .mﬁm 8.2 .32 3.3
«we: .3me 8%: .5.me SE .33 8.22 «was scans £99.: 8.:
w; is 3.2 .23 a“: .32 3.:- éa :8 .35 is
«3 is 8.34. .w; as.” gm 3.” .m: and .NS 2-3
a; .3... New .23 2: .Sa 34 .34 2: .84 8-3
one .3...” as 63 new .23" m3 .2: a... .2: is
a; gm 8; is and 53 «.2. .3; :8 as; 2-3
92 .2.” «S .9: 8s 43 3.5 53 was 53 8-3

.Q.m «so: .Q.m .502 .Q.m “and”: .Q.m «Q60: .Q.m «adv:

b.5008? EZdo-mvadm ox». N 8 nd H b o6 H b

See 3 32:: came: .288va £3 Sec omaoﬁamm 33o 093m

 

33¢ «a 2: new as? no 33858 .«o dome-33 Eaves: was :32 .mé 0358

 

Table 4.6. Overall classiﬁcation error rates (in %) for synthetic range data

100

 

 

 

Experiment l-nn 3-nn Tree
Axis - straight or curved? 7.1 6.8 6.8
C.S. edges — straight or curved? , 23.1 24.4 7.4
C.S. size - const., tapered, or incr.-&-decr.? 33.6 38.0 10.0
l2-class 49.1 53.1 13.8

 

 

 

 

 

 

Table 4.7. Classiﬁcation error rates (in %) by range image spatial resolution

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Experiment 64 x 64 128 x 128 256 x 256
1-nn 3-nn Tree 1-nn 3-nn Tree l-nn 3-nn Tree
Axis 17.6 13.9 7.9 4.6 5.6 7.6 1.9 1.9 5.1
CS edges 29.6 28.7 8.8 19.4 17.6 7.5 20.4 22.2 4.3
CS sweep 39.8 43.5 11.6 41.7 38.9 9.5 38.9 44.4 10.6
12-class 60.2 66.7 15.9 50.0 55.6 13.2 50.9 50.9 13.7

 

 

discriminates the 12 classes hierarchically instead of performing a one-shot classiﬁ-
cation like the k-nn classiﬁers. Classiﬁcation error rates for synthetic data (models
constructed using IDEAS) were estimated using the holdout method. Error rates
were higher (or comparable) for 5-nn classiﬁer, and are therefore not reported. Table
4.6 shows the overall error rates using all the 270 patterns for each of the 12 parts. It
can be seen that axis shape (straight or curved) is by far the most reliably discrimi-
nated shape attribute. 12-category classiﬁcation using the tree classiﬁer is quite good
(86.2% correct), and is far better than the performance of k-nn classiﬁers.

The effect of range image spatial “resolution” on classiﬁcation error is shown in
Table 4.7. We would expect the features derived from lower-resolution images to have
a higher error rate, both because less range data are available for estimating the best-
ﬁtting superquadric, as well as because the range values themselves are less precise.
The results do generally conform to this expectation. Estimates of the parameters 1:,
and Icy seem to be particularly sensitive to the image spatial resolution. Overall, the

results are not very different for images of resolution 128 x 128 and 256 x 256. Both

101

Table 4.8. Classiﬁcation error rates (in %) by model “elongation”

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Experiment 1 : 2 1 : 4 1 : 6

1-nn 3-nn Tree l-nn 3-nn Tree l-nn 3-nn Tree
Axis 13.9 8.3 7.5 6.5 5.6 6.2 6.5 0.9 3.5
CS edges 19.4 19.4 7.2 32.4 28.7 6.8 25.9 27.8 8.1
CS size 30.6 29.6 10.3 43.5 46.3 10.9 38.0 34.3 9.8
12-class 48.1 46.3 14.8 60.2 62.0 i 13.5 51.9 52.8 13.3

 

Table 4.9. 12-class error rates (in %) for tree classiﬁer

 

 

 

 

128 x 128 256 x 256
Elongation 1:4 12.5 12.8
Elongation 1:6 12.8 12.2

 

 

 

 

of these cases are noticeably better than for resolution 64 X 64.

As the elongation of the parts increases, we would expect to obtain a better ﬁt of
superquadric surfaces to the “increasing-and-decreasing” class of parts. Hence, error
rates should decrease with increasing elongation. Further, the axis shape (straight
or bent) should also be more reliably estimated. These predictions are borne out, on
the whole, by the data in Table 4.8. The results are quite similar for models with
elongations of 4 and 6 units.

It is clear that the tree classiﬁer performs much better than the k-nn classiﬁers, and
that error rates decrease with increasing image resolution and increasing elongation
of the models. Hence, Table 4.9 shows only the overall (12-category classiﬁcation)
results for the tree classiﬁer, using only the data from images of resolutions 128 x 128
and 256 x 256, and model elongations 1:4 and 1:6. Even with our rather simple
choice of features, correct classiﬁcation of about 87% can be achieved for the l2-class
problem.

For the real range images, Table 4.10 shows the error rates for classiﬁcation of the

rough-surfaced objects. Table 4.11 gives the corresponding results for the “smooth-

102

Table 4.10. Overall classiﬁcation error rates (in %) for real range data from “rough”

 

 

 

models
Experiment l-nn 3-nn Tree
Axis - straight or curved? 15.4 12.5 12.5
C.S. edges - straight or curved? 33.8 32.1 33.3
C.S. size - const., tapered, or incr.-&-decr.? 53.3 58.3 50.0
12-class ' 64.2 67.1 70.8

 

 

 

 

 

Table 4.11. Overall classiﬁcation error rates (in %) for real range data from “smooth”

 

 

 

models
Experiment l-nn 3-nn Tree
Axis - straight or curved? 8.3 13.3 15.0
C.S. edges - straight or curved? 28.3 36.7 11.7
C.S. size - const., tapered, or incr.-&-decr.? 28.3 31.7 20.0
12-class 43.3 55.0 23.3

 

 

 

 

 

 

 

surfaced” objects. The real data itself was used both for training and testing, and the
error rate is estimated by the “holdout” method, as before. The error rate is much
higher for the rough-surfaced objects than for the synthetic data. Classiﬁcation error
is considerably less for the smooth-surfaced objects, though still higher than for the

synthetic images.

4.4.3 Confusion Matrices

Table 4.12 gives the confusion matrix for 12-category classiﬁcation of the synthetic
images. For each row in the confusion matrix, the leftmost column shows the true
part identity. The remaining 12 columns show the percentage of cases where that part
was classiﬁed as each of the 12 parts. The most striking fact is that classiﬁcation is
much better for parts with straight axis. In fact, we get 100% correct classiﬁcation for
4 of the 6 straight-axis objects. For the bent—axis parts, most of the misclassiﬁcations
are with respect to variation of cross-section size along the axis. The other two shape

attributes are much less often misclassiﬁed.

103

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

odo- cd M.” ed «rm :6 ad ad m.m ad ad ad fmé
c6 :63 ed :6 0.: ad ad od ed o6 ad ad mom-ma
93 ed ”.mw :6 od 05 We od od 06 ad ad one-ma
ad ad o.o 9.2: od ad ad ad ad ad ad :6 form
ad ad ad ad ode ad ad ad o.o ad ad ad 2-9m
ad ad ad ad ad 92: ad ad ad ad ad ad coo-m
m.m ed Wm :6 od od ch. Nae n6 56 ed M.” 3...:
ad 9: 9.0 :6 ad ad ma; ~23. 9.: ad ad od 2%.:
ad ad ad o.o ad ad oA: od mdw Nae ad ed 8a.:
ad ad ad ed 0.: od 93 ed Wm Won «.2 5.2 fob
ad ad ad ad ad ad ad Wm :6 m4! odc ”.mm 2.9:
:6 ad o.o ad ad ad ﬁe od 50 We 53 ”do coo-n.
aim-m 3-?» ooh-w a-o-m 3-9m coo-m ”Tm-n 2a.: ooh-A fob Bob 8-0-2 madam

 

mow-SE £359? 2: .8.“ 033 nommsmaoo .26 Each.

 

104

Table 4.13 shows the confusion matrix for real range images of the rough-surfaced
models. As in the case of synthetic images, classiﬁcation is distinctly better for the
straight-axis objects. Much of the misclassiﬁcation still tends to be with respect
to variation of cross-section size, though this effect is not so pronounced as for the
synthetic images.

Table 4.14 shows the same statistics for range images of the “smooth-surfaced”
models. Some strange effects can be noticed here. The b-c-co part is very poorly
classiﬁed; most of the misclassiﬁcation is in the straight-edged/curved-edged cross-
section attribute. For this part, whose visual appearance is unmistakably curved-
edged, Table 4.2 does indeed show a very low estimate for the parameter 6:. In fact,
the b-c-co part accounts for most of the error in the entire 12-category classiﬁcation
for all the images. If we eliminate this part, ll-category classiﬁcation with real data
" can be achieved with about 89% accuracy.

The other striking fact is that two straight-axis parts, s-c-id and s-s-id, tend
to be mistaken for their curved—axis counterparts. We can see from Table 4.5 that
mean values of the parameter r/az are very low for these two parts, explaining the

misclassiﬁcation.

4.5 Summary

The research presented in this chapter had two goals: ﬁrstly, to develop a method to
identify the parts previously isolated by performing part segmentation; and secondly,
to test the utility of superquadric parameters for part discrimination. Superquadrics,
as well as geons, have recently been proposed as 3-D part primitives for object rep-
resentation. In the case of superquadrics, most of the literature has been devoted to
formalizing various deformations, estimating the parameters reliably, and formulating

measures of goodness-of-ﬁt.

105

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ad“N ad adm ad ad ad ad~ adH ad ad ad ad ”Tm-..“
ad adv adﬁ ad adm ad ad adm ad ad ad ad Rama
adm ad adm ad ad add ad ad ad ad ad ad oo-m-m
ad ad ad adw adH ad” ad ad ad ad ad ad a-o-m
ad ad~ ad ad ad» ad ad ad ad adH ad ad 3-9m
ad ad ad adm ad ada ad ad ad ad ad ad 8-9m
ad ad ad~ ad ad ad adm adH ad ad add adH fwd
ad 93 ad ad add ad ad adm adH add adH adH Bad
ad ad ad ad ad ad ad ad add ad adH ad“ 00%.:
ad ad ad ad ad ad adH ad ad add adH add ebb
ad ad ad ad ad ad ad adH ad adm ad H adm Bod
ad ad ad ad ad ad adm ad adm ad ad H adm 86A
farm 3.?» 8.?” ”Tea 2-9m 8-9m fwd amid 8a.; god 2-0.3 8-9.3 093m

 

anoﬁ aamaouuv women: 1.6." no.“ 03.3 ~833qu .24. Ban?

 

106

 

 

 

adw ad adm ad ad ad ad ad ad ad ad ad farm
ad ada ad ad ad ad adm adm ad ad ad ad 3-2..
ad ad adS ad ad ad ad ad ad ad ad ad oo-m-m
ad ad ad adaH ad ad ad ad ad ad ad ad .7”;
ad ad ad ad adw ad ad ad ad ad adm ad 3-9m
ad ad ad ad ad adaH ad ad ad ad ad ad 8-9m
ad ad ad ad ad ad ada adm ad ad ad adm dad
ad ad ad ad ad ad ad adaH ad ad ad ad amid
ad ad ad ad ad ad ad ad adaH ad ad ad 8a-:
ad ad ad ad ad ad ad ad ad adaH ad ad ebb
ad ad ad ad ad ad ad ad ad ad adaH ad Ebb
ad ad ad ad ad ad adm ad add adm ad ad oood
aéa Baa ooéa ”To-..“ 3.“; 8.“; ﬁnd Bid 8%.: fob 26¢ 090d omenm

 

 

 

 

 

 

 

 

 

 

 

 

 

3338 .2902th woman: 12 .8.“ 033 nomad—00 .36 Edd.

 

107

The amount of computational effort required to estimate the superquadric pa-
rameters leads us to desire that they should be useful not merely for the accurate
and concise representation of 3-D objects, but also towards their recognition. As a
possible step in this direction, we have tried to use superquadrics for shape discrimi-
nation. Speciﬁcally, we represented 3-D parts by the best-ﬁtting superquadrics, and
attempted to use the superquadric parameters to classify the part into one of 12 broad
classes of shapes similar to geons. Relating superquadrics to geons is advantageous
because of the complementary strengths of these two representations.

The experimental results indicate that qualitative shape attributes can be inferred
from superquadric parameters quite reliably (about 87% accuracy for synthetic im-
ages; about 77% for real images) using a hierarchical (binary tree) classiﬁer, even with
a rather simple choice of features. In fact, ll-category classiﬁcation of the “smooth-
surfaced” real models (with only 11 shape types) could be done with about 89%
accuracy! The worst-case time complexity of this classiﬁer is a constant, depending
on the depth of the decision tree constructed using the training patterns. Estimates
of superquadric parameters are highly dependent on the viewpoint. Estimates may
be very poor for parts with “rough” surfaces, or if the range values contain even a
small amount of noise.

The future of superquadrics as part primitives for 3-D object representation de-
pends on their effectiveness in facilitating object recognition or indexing. We have
suggested a possible method of using superquadric parameters for this purpose. Build-
ing a system which performs generic classiﬁcation of objects using a parts-based su-
perquadric representation would be an interesting goal for future research. In the

next chapter we will present another method for performing part identiﬁcation.

CHAPTER 5

Direct Tests of Shape Attributes

Chapter 3 described a method to perform part segmentation by using the catalog of
part views, and Chapter 4 presented one approach for classifying isolated parts as
one of the 12 part primitives. Here we will examine an alternative method for part
identiﬁcation.

Having constructed the attributed SAG of a sensed image and performed part
segmentation by referring to the catalog of part views, we are left with a number of
hypothesized parts as well as hypotheses about their identity. Hypotheses about part
identity are obtained when a subgraph of the scene SAG partially matches any view
of any part in the catalog. We can eliminate the incorrect hypotheses about part
identity by directly testing shape attributes of the part.

It would therefore be very useful to develop tests to directly check the shape at-
tributes which deﬁne our 12 geon-like generic parts. We saw that features derived
from estimated superquadric parameters can be used with fairly high accuracy to
determine the three shape attributes of our 12 geon-like parts. Nevertheless, the con-
fusion matrices in Chapter 4 indicate some consistent types of misclassiﬁcation of part
identity. For example, the b-c-co part is consistently misclassiﬁed as b-s-co. Similarly,
the s-s-id and s-c-id parts tend to be misclassiﬁed as their curved-axis counterparts.

For this reason we would like to develop tests for directly testing shape attributes,

108

109

which are independent or at least partially independent of the superquadric-based
method.

We have developed two such tests which directly check two of the three shape
attributes which deﬁne our set of 12 geon-like parts, i.e., axis shape and cross-section
size along the axis. The third shape attribute, i.e., type of cross-sectional edges, is
obtained implicitly by determining the other two shape attributes and accordingly

ruling out hypotheses about part identity.

5.1 Variation of Cross-section Size

To test cross-section size variation along the axis, we compute the histogram of the
angles of the surface normals with respect to the axis direction. The angle values
range from 0° to 180°.

In the case of constant cross-section, this distribution should be concentrated
near 90°, since normal directions should be mostly perpendicular to the axis. In the
case of objects with increasing cross-section, the angles should be concentrated some
distance away from 90° (either greater than or less than 90°). “Finally, for objects of
increasing-decreasing cross-section, the angles should be spread along both sides of
90°). The rationale behind this heuristic is illustrated in Figure 5.1.

Figures 5.2 through 5.4 show three parts with constant, tapering, and increasing-
decreasing cross-sections, respectively. From Figure 5.2, it is visually apparent that
normal directions on the side surfaces of this constant cross-section part are close to
90° with respect to the principal axis (direction along which the part is elongated).
Similarly, the tapering part in Figure 5.3 shows one entire side surface for which
normal directions are clearly at a non- perpendicular angle with respect to the principal
axis. Lastly, Figure 5.4 shows that the part with increasing-and-decreasing cross-

section has one entire side along which the normal directions spread from less than

110

Surface normal directions

HIT

 

 

 

 

   
 

--------------------------------- > Pnncipal Ans

Direction
Surface normal directions

I I Pnncipal Ans

"""""""""""""" \‘3' -----_--, Direction
Surface normal directions
Principal Axis
""""" Direction

 

Figure 5.1. Directions of principal axis and surface normals

111

 

Figure 5.2. Part with constant cross-section. (a) Range image of the part; (b) Normal
directions on surfaces of the part.

90° to greater than 90° with respect to the principal axis.

These observations about angle directions are not valid for the “end” surfaces of
parts. Hence, it is important that only the side surfaces (not the end surfaces) should
be considered for the histogramming of surface normal directions described above.
Hence we use a heuristic which ignores normal directions less than 45° or greater
than 135° with respect to the axis, since they may belong to end surfaces.

The following thresholds are used: if at least 80% of the angles are either between
45° and 90°, or between 90° and 135° (i.e, they are concentrated on one side, either
less than or greater than 90°), the cross-section is assumed to be increasing. Else, if
at least 80% of the angles lie within 80° and 100° (i.e., they are concentrated near

90°, the cross-section is assumed to be constant. Else, the cross-section is assumed to

112

 

Figure 5.3. Part with increasing cross-section. (a) Range image of the part; (b)
Normal directions on surfaces of the part.

113

 

Figure 5.4. Part with increasing-and—decreasing cross-section. (a) Range image of the
part; (b) Normal directions on surfaces of the part.

114

be increasing-and—decreasing. The percentages mentioned are over only those points

whose normals are not ignored due to the heuristic.

5.1.1 Principal Axis Orientation

As described above, our direct test of cross-section size variation requires ﬁrst identi-
fying the principal axis of the part. We ﬁnd the principal axis of a part by estimating
the best-ﬁtting superquadric model for it. The advantage of ﬁnding the axis by this
method is that it can be performed for bent-axis as well as straight-axis parts. Hence,
the principal axis can be estimated even if it is not straight. The disadvantage is that
a superquadric ﬁt is usually not very reliable if only a few small surface patches are
present. A reliable superquadric ﬁt requires general vieWpoint as well as the visibility
of a large fraction of the object being ﬁtted.

Once the best~ﬁtting superquadric is estimated, orientation of the principal axis
can be found from the estimated orientation parameters, i.e., the Euler angles 45, w,
and 1/2. The superquadric estimation software uses Z—Y—Z Euler angles. The Z—Y—Z
Euler angles relating two co-ordinate frames A and B with the same origin are deﬁned
as follows: we start with the two co-ordinate frames A and B coincident. Then, we
ﬁrst rotate B about Z3 by the angle d, then rotate the new B about the new Y3
by the angle w, and ﬁnally rotate the new B about Z3 by the angle t/I. Using the

notation

008015) = C¢33in(¢) = 54>
cos(w) = Cw,sin(w) = 5w

cos(¢) = C¢,sin(1/)) = 3,1,,

the co-ordinates of a point in frame B, PB, can be related to the co-ordinates of that

point in frame A, PA, by the transformation P3 = T - PA, where T is defined by:

115

q

C¢°CW'C¢—S¢'S¢ S¢'Cw°C¢+C¢'S./, -Sw°C¢
—C..C.,.s¢—s,.c¢ —S¢-Cw-S¢+C¢-C¢ as,
C¢°Sw S¢°Sw Cw

 

 

.. J

The superquadric estimation software aligns the principal axis with the Z direc-
tion. Hence, the principal axis is parallel to the new Z axis. We are not concerned
with its exact location in space but only with its orientation, since we use it only to
compute the angle between surface normals and the principal axis.

The cross-section variation attribute of the part primitives (cross-section size along
the axis—constant, increasing, or increasing-decreasing) can then be determined from
the angle of the normal direction at each point on the side surfaces, with respect to
the axis direction.

For straight-axis parts, this angle can be directly computed by taking the dot-
product of a unit normal vector with a unit vector oriented along the principal axis.
However, for bent-axis parts the axis direction is different at different points along
the axis. Having computed the normal direction at a given point on the surface,
we would need to know the principal axis direction at that location along the axis.
This is difﬁcult to compute. Instead, we ﬁrst “straighten” the bent-axis parts by
inverting the bending transformation [81] estimated during the superquadric ﬁt. If
the co-ordinates of a point are (X, Y, Z), its co-ordinates after applying the inverse

bending transformation are given by:

a:=X—cos(a)-(R—r)
y=Y-sin(a)-(R—r)
z=7/k,

where a and k are bending parameters of the ﬁtted superquadric, and R, r, and 7

116

are quantities computed from the bending parameters and the initial co-ordinates
(X, Y, Z).

After applying this inverse bending transformation, the angle computation for
bent-axis parts is the same as for straight-axis parts. The decision as to whether a
part has a straight or bent axis is taken by applying the hierarchical tree classiﬁer

with the set of ﬁve features described in Chapter 4.

5.2 Axis — straight or bent?

We pointed out that using features derived from superquadric parameters, the s-s-id
and s-c-id parts are often confused with their curved-axis counterparts. Since we
use superquadric ﬁtting to determine the principal axis, obviously the axis estimate
also becomes erroneous because the ﬁtting procedure ﬁnds “bent” axes for these two
parts. We have therefore explored the use of intensity data to decide whether the
axis is indeed straight or bent.

Lee and Stockman [58] have developed a method for capturing fused range and
intensity images using the 100X White Scanner. A pair of range and intensity images
is sensed separately and the two are brought into registration by applying a transfor-
mation matrix. The transformation is estimated by initially calibrating the system
using a known object and manually entering the correspondence of several pairs of
image points.

For our experiments we captured 5 pairs of range and intensity images of the
“smooth” models of each of the 4 parts involved in the misclassiﬁcation of axis shape,
i.e., s-s-id and s-c-id, and the corresponding bent-axis parts, b-s-id and b-c-id. How-
ever, we did not apply the fusion procedure to put the range and intensity images
into registration, because we do not use fused data at any time, i.e., we do not simul-

taneously need registered range and intensity values in the scene.

117

The intensity images alone are used to determine whether the part axis is straight
or bent, as follows: ﬁrst the intensity image is thresholded to get its silhouette. Edges
are detected from the thresholded image, and line segments are ﬁtted to the silhouette
to get a polygonal approximation of the part contour. The silhouette is broken into
pieces at points where the change in direction between adjacent line segments is
“large” (greater than 20°). This breaks the silhouette into several components.

The following simple test for “concavity” of the silhouette is then applied: we
join the end-points of each component and check whether the mid-point of the line
joining the end-points lies inside or outside the object silhouette. If the mid-point
lies outside the silhouette, that component is taken to be concave. If more than 10%
of the total length of the silhouette is thus classiﬁed as concave, the entire silhouette
is classiﬁed as concave, i.e., the part has a bent axis. Otherwise, the part is assumed
to have a straight axis.

Figure 5.5 shows a straight-axis part and the extracted silhouette. Figure 5.6
shows the polygonal approximation of the silhouette. Similarly, Figure 5.7 shows a
bent-axis part and the extracted silhouette, while Figure 5.8 shows the corresponding
polygonal approximation.

We now brieﬂy explain the polygonal approximation algorithm described by
Pavlidis [66], which we adapted in our implementation. This suboptimal algorithm
divides the data points into groups and approximates each group by a side of a
polygon. For simplicity, the algorithm draws the side of the polygon between the
endpoints of each group rather than searching for the optimal approximation. The
algorithm is essentially a split-and-merge type algorithm. Each time the algorithm
examines a small group of points of size k0, called the hop distance. The algorithm
keeps track of a current line L1, and a new line L2, ﬁtted to two groups of points. If
the angle between L1 and L2 is “small,” the two groups are merged and a line is ﬁtted

to the new larger group. On the other hand, a group of points is divided (split) into

118

 

(b)

Figure 5.5. Intensity image and silhouette of a straight-axis part. (a) Intensity image
of the part. (b) Extracted silhouette of the part.

119

 

Figure 5.6. Polygonal approximation of the straight-axis part.

120

 

Figure 5.7. Intensity image and silhouette of a bent-axis part. (a) Intensity image of
the part. (b) Extracted silhouette of the part.

121

 

Figure 5.8. Polygonal approximation of the bent-axis part.

122

smaller groups if the maximum deviation, measured by distance from the ﬁtted line
divided by the length of the line, is “large.” The tolerances for angle and deviation
are speciﬁed by parameters A0 and Ad, respectively.

We used k0 = 7, A0 = 20°, and Ad = 0.2 in our implementation. These values
were determined by trying different values for the three parameters and visually
evaluating the polygonal approximations for selected images.

Each part description consists of a set of line segments with following attributes:

(I, y, a, I),
where :c and y are the coordinates of the midpoint, a is the angle with respect to
positive x—axis, and I is the length of the segment. We break the contour into separate
components at points where the direction change between successive line segments is

greater than 20° .

5.3 Experimental Results

Initial hypotheses about part identity arise when subgraphs of the scene SAG partially
match part views in the catalog of views. Incorrect hypotheses are then eliminated
by directly testing the variation of cross-section size of our geon-like parts. We have
tested our method using 10 real range images of each of the “smooth”—surfaced parts
described in the previous chapter.

To get the initial hypotheses about part identity, we need to construct the at-
tributed SAG for each image of the “smooth” parts. However, we already pointed
out that surface segmentation and curvature-based surface classiﬁcation are not ro-
bust. In many cases, the surface segmentation is so poor as to be totally unusable.
Hence, in order to test our method of part identiﬁcation, we assumed perfect sur-
face segmentation and surface classiﬁcation of the smooth models, giving a set of

hypothesized part identities for each image. This is assumed as the input data for the

123

experiments. Thereafter we tested the cross-section variation as described in Section
5.1.

Table 5.1 shows the confusion matrix for 10 range images of each of the 12 parts.
In general, except for the s-c-id and s-s-id parts, the overall classiﬁcation performance
is comparable to the superquadric-based approach described in Chapter 4. However,
parts with increasing-and-decreasing cross-section size are sometimes mistaken for
tapering parts. This is probably because of self-occlusion or shadowing, where only
some portion of the side surfaces is visible. A striking improvement over the hierar-
chical tree classiﬁer approach to part identiﬁcation is that the b-c-co part is identiﬁed
much more accurately. Actually, this is true even without assuming perfect surface
segmentation and surface classiﬁcation, because for this part, surface segmentation
and classiﬁcation are correctly performed in most cases. Identiﬁcation is noticeably
more accurate for straight-axis parts than for bent-axis parts.

The s-s—id and s-c-id parts have a very high misclassiﬁcation rate. This is because
the superquadric estimation method confuses them with their bent-axis counterparts,
leading to error in estimation of the principal axis. We obtained 5 range and intensity
image pairs for each of these two parts and their bent-axis counterparts. The views
were chosen so that for bent-axis parts the silhouette was distinctly concave. Using
the method described above, we then attempted to determine from the intensity image
whether the part axis is really bent or straight. This could be done correctly in all 20
cases. While this by itself does not sufﬁce to determine the actual part identity, it does
provide an indication when the superquadric estimation routine mistakes straight-axis

parts for bent-axis ones.

124

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

adw ad adm ad ad ad ad ad ad ad ad ad ﬁne
adv adv adﬁ ad ad ad ad adH ad ad ad ad Baa.
ad“ ad ada ad ad ad ad ad ad ad ad ad co-m-m
ad ad ad adaH ad ad ad ad ad ad ad ad fed.
ad ad adH adm adv ad ad ad ad ad adm ad 2-9m
ad ad ad ad ad adaH ad ad ad ad ad ad 8-9m
ad ad ad ad ad ad ad» adm ad ad ad ad fwd
ad ad ad ad ad ad adv ada ad ad ad ad Bard
ad ad ad ad ad ad adm ad ad» ad ad ad 8a-:
ad ad ad ad ad ad ad ad ad adaH ad ad fob
ad ad ad ad ad ad ad ad ad adm adv ad Ebb
ad ad ad ad ad ad adm ad ad adm a.a . ada oood
Harm Baa 8&5 ab -m 2.?» 8.”; ﬁnd 3%.: Sad fob 2-9: 8-9a oasam

 

mo 08 “00:8 women: .78 o .3 :08: no Ado e
:u an a L8 a E . a 0 3H.

 

125
5.4 Summary

We described an alternative method for determining part identities by directly testing
the variation of cross-section size along the principal axis. This method succeeds
in identifying 10 of the 12 parts. The main cases. of misclassiﬁcation occur when
parts with straight axis and increasing-and-decreasing cross-section are mistaken for
bent-axis parts by the superquadric ﬁtting routine. For other 10 parts, classiﬁcation
performance is comparable to the method based on the use of superquadric parameters
in conjunction with a hierarchical tree classiﬁer. We are able to use intensity image
data successfully to decide whether the axis is really bent or straight.

Chapters 4 and 5 presented two alternative methods of determining part identi-
ties. Both the methods were unreliable for the parts s-s-id and s-c-id, because the
superquadric estimation confused these two parts with the corresponding bent-axis
parts. However, the confusion matrix shown in Table 4.14 reveals that a simple
heuristic can improve the reliability of the ﬁrst method (which is based on the use of
superquadric parameters in conjunction with a hierarchical tree classiﬁer). If a given
part is classiﬁed as b-c-id or b-s-id, we can use the intensity image to check whether
the axis is really bent, as described earlier in this chapter. If in fact the axis is not
bent, we can classify the part as s-c-id or s-s-id, respectively. No such simple heuris-
tic is available for the second method. Hence, we will use the ﬁrst method (which is
based on the use of a hierarchical tree classiﬁer), together with the above heuristic,
as the procedure for part identiﬁcation.

The flowchart of our complete system for obtaining a 3-D parts-based object
representation is shown in Figure 5.5. The next chapter presents the results obtained

by our system when tested with several real range images.

126

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Range
image Compute Merge
Mace “similar"
normals patches
SURFACE SEGMENTATION
Catalog of
part Vich
Construct Delete Classify Examine
e SAG "small" surfaces SAG
Range surfaces links
image
PART SEGMENTATION
Fit U“
a superquadric tree
Range classiﬁer
image
Intenstty 5
image

 

   

PART IDENTIFICATION

Figure 5.9. Complete system for obtaining parts and their identities

CHAPTER 6

Obtaining Parts-based

Representation

The previous chapters presented methods for part segmentation and part identiﬁca-
tion. We pointed out that part segmentation relies on good surface segmentation, and
fails if the surface segmentation is poor. It also fails in cases when two parts come
together in such a way that the adjacency of the corresponding surfaces is not directly
visible. Similarly, for part identiﬁcation we presented confusion, tables indicating some
consistent types of misclassiﬁcation.

This chapter shows the results of part segmentation as well as part identiﬁcation
for 12 real range images of various 3-D objects. The images include isolated geon-
like objects as well as complicated multi-part objects with occlusion. The choice of
objects was dictated by two considerations: ﬁrstly, major surfaces should be separated
by distinct creases (in order to get a good surface segmentation and subsequent part
segmentation). Secondly, the individual parts should be fairly elongated and not too
thin (in order to obtain a good superquadric ﬁt).

In all the examples, results of surface segmentation (from which the initial sur-
face adjacency graph is derived) are shown in color. Nodes in the SAG are labeled

“A,” “B,”, “C,” and so on. Throughout all the examples, the above alphabetical

127

 

Figure 6.1. Correspondence between colors in the surface segmentation and alpha-
betical labels in the Surface Adjacency Graph.

labels always correspond to the same color shown in the surface segmentation. The
correspondence between colors and the alphabetical labels is shown in Figure 6.1.

Different views of the most complicated object for which we have successfully
performed part segmentation and identiﬁcation are shown in Figures 6.3a, 6.10a, and
6.11a. This is a manufactured object consisting of ﬁve cylindrical parts. One major
part is attached to the main body at an oblique angle, and an additional cylindrical
part is attached to each of the three remaining ends of the two major parts. This
object yielded three different 3-D parts-based representations, corresponding to the
three different part segmentations obtained.

Part segmentation was performed by the method described in Chapter 3. Part
identiﬁcation was performed by two different methods. The ﬁrst method used a
hierarchical tree classiﬁer with five features derived from estimated superquadric pa-
rameters, as discussed in Chapter 4. The second method applied tests for directly

checking axis shape and cross-section variation along the axis, as described in Chapter

129

6.1 Test Image 1

Figure 6.2a shows the range image of an object made‘up of a cylinder and a rectangular

block.

6.1.1 Part Segmentation

Figure 6.2b shows the surface segmentation, and Figures 6.2c and 6.2d show the initial
and ﬁnal surface adjacency graphs, respectively. As can be seen from Figure 6.2d,
the surface adjacency graph of Figure 6.2c is correctly broken into two subgraphs,

corresponding to two parts, i.e., the rectangular block and the cylinder.

6.1.2 Part Identiﬁcation

We ﬁt superquadrics to these parts and extract the ﬁve statistical features. The
hierarchical tree classiﬁer correctly identiﬁes the rectangular block as part s-s-co and
the cylinder as s-c-co.

Part identiﬁcation can also be performed successfully by the second method. The
subgraph corresponding to the rectangular block can be accounted for (from the
multi-view representation) as a straight-axis, straight—edged part of either constant
or increasing cross-section. The other part can be accounted for either as a straight-
axis, curved-edge part of constant or increasing cross-section, or as a straight-axis,
straight-edged part of increasing-decreasing cross-section.

We observed that the subgraph corresponding to the rectangular block can be
accounted for as a straight—axis, straight-edged part of either constant or increasing
cross-section. The principal axis of this part is estimated by ﬁtting a superquadric,

and the histogram of angles of surface normals with respect to the axis direction is

130

 

(01)

Figure 6.2. Object made up of a cylinder and a rectangular block. (a) Range image of
the object. (b) Surface segmentation showing 6 surfaces. (c) Surface adjacency graph
(initial). (d) Surface adjacency graph (ﬁnal). The small surface B has been removed,
and the edge between the side of the cylinder and the top of the block (D—C) broken.

131

constructed. The 80% threshold is exceeded neither in the interval between 45° and
90°, nor in the interval between 90° and 135°. Hence the object is not inferred to have
an increasing cross-section. Next, it is found that for 87% of the side surface points
the angle between the surface normal and the axis direction is between 80° and 100°.
Since the percentage is above the threshold of 80%, it is inferred that the part has
a constant cross-section along the axis. From the surface types and the multi-view
representation, this part was already known to have a straight axis and straight edges.
Hence the part identity is known unambiguously because all three shape attributes
are known.

Similarly, by ﬁtting a superquadric to the other part (cylinder), we estimate its
principal axis and construct the histogram of the angles of surface normals with
respect to the axis direction. Once again, the object does not exceed the thresholds
for increasing cross-section. However, 82% of the side surface points have a normal
between 80° and 100° with respect to the axis. Hence this part is also inferred to have
a constant cross-section. This information eliminates two of the three possibilities
obtained earlier for the part identity. It must be a part with straight axis, curved

edge, and constant cross-section.

6.2 Test Image 2

Figure 6.3a shows the range image of a complicated object made up of five cylindrical
parts. This is the most complicated object for which we have so far performed part

segmentation and part identiﬁcation successfully.

6.2.1 Part Segmentation

Figure 6.3b shows the surface segmentation, and Figures 6.3c and 6.3d show the initial

and ﬁnal surface adjacency graphs, respectively. The surface surface segmentation is

132

fairly good but not perfect, as can be seen from Figure 6.3b. The initial SAG has as
many as four small surfaces which arise due to imperfect surface segmentation. These
can be removed while retaining connectivity of the SAG. The remaining 5 “large”
surfaces are all classiﬁed as having positive maximum curvature and zero negative
curvature. The catalog of part views shows no example of two adjacent surfaces of this
type. Hence, interestingly, the corresponding SAG links can be removed without even
referring to the angles between these surfaces. This gives a correct part segmentation

into 5 parts.

6.2.2 Part Identiﬁcation

By ﬁtting superquadrics to the ﬁve cylindrical parts and deriving the same 5 features,
we are able to identify the parts as being of type s-c-co by using the hierarchical tree
classiﬁer.

Similarly, we are also able to identify the 5 parts correctly by using the distribution
of normal directions with respect to the principal axis. Each of the parts can be
accounted for (from the catalog of part views) either as a straight-axis, curved-edge
part of constant or increasing cross-section, or as a straight-axis, straight-edged part
of increasing—decreasing cross-section. The distribution of normal directions with
respect to the principal axis eliminates the other possibilities, and we conclude that

all 5 parts are of type s—c-co.

6.3 Test Image 3

Figure 6.4a is a scene with three separate objects which partially occlude one another.
Surface segmentation (Figure 6.4b) results in an initial SAG with 7 nodes (Figure
6.4c). Note that the SAG is already unconnected because from the range values,

the cube and cylinder are recognized as being separate non-touching objects. The

133

   

 

D F
c A

    

  

O I B
G H (small (811113311
(small surface) surface) surface)

(C)

F
0 0A

0e

0 0B
G
(d)

Figure 6.3. Complicated object made up of 5 cylindrical parts. (a) Range image of
the object. (b) Surface segmentation showing 9 surfaces. (c) Surface adjacency graph
(initial). (d) Surface adjacency graph (ﬁnal). Four small surfaces (C,E,H,I) have been

removed, and edges between cylindrical surfaces broken.

134

small object below the cube as well as a small surface on the cylinder are discarded
since each of them is below 5% of the total surface area. This gives the ﬁnal part
segmentation into two objects (Figure 6.4d).

Part identiﬁcation using either of the two methods is also successful. The cube is

identiﬁed as s-s-co, and the cylinder as s-c-co.

6.4 Test Image 4

The object consists of two cylindrical parts. The surface segmentation of this ob-
ject produces two small spurious surfaces which are discarded. Once again the part
segmentation can be performed without referring to the angle between the two cylin-
drical surfaces, because the catalog of views has no example of two adjacent cylindrical
surfaces in any single part.

Both part identiﬁcation methods are also successful in determining that the parts

are of type s-c-co.

6.5 Test Image 5

Figure 6.6a shows the range image from a pair of registered range and intensity
images of a common household object, i.e., a bulb [58]. We simply use the range
image. Surface segmentation gives just one surface (Figure 6.6b). The initial and
ﬁnal SAGS are identical (Figures 6.6c and 6.6d).

Both methods of part identiﬁcation lead to the conclusion that that part is s-c-id.

6.6 Test Image 6

The funnel shown in Figure 6.7a is also the range image of a registered range—intensity

pair. Surface segmentation (Figure 6.7b) leads to a correct part segmentation into

135

 

 

F
C A G
small E (small
( surface) surface)
B D
(C)
A

w

((0

Figure 6.4. Scene with occlusion, containing three objects. (a) Range image of the
scene, showing a cube, a cylindrical object, and (at top left) a small object used to
support the cube. (b) Surface segmentation showing 7 surfaces. (c) Surface adjacency
graph (initial). ((1) Surface adjacency graph (ﬁnal). Two small surfaces (C,E) have
been removed.

136

    

‘

(b)

 

A (small surface)

  
  

   
 

(small
surface)

 

C D B
(C)

o-——-—O O

C D B
((1)

Figure 6.5. Object made up of two cylindrical parts. (a) Range image of the object.
(b) Surface segmentation showing 5 surfaces. (c) Surface adjacency graph (initial).
(d) Surface adjacency graph (ﬁnal). Two small surfaces (C,E) have been removed.

137

 

O

c)

A

A
0
(d)

Figure 6.6. Everyday object: a bulb. (a) Range image of the object. (b) Surface
segmentation showing 1 surface. (c) Surface adjacency graph (initial). (d) Surface
adjacency graph (ﬁnal).

138

 

Figure 6.7. Funnel made up of a cone and a cylinder. (a) Range image of the object.
(b) Surface segmentation showing 2 surfaces. (c) Surface adjacency graph (initial).
(d) Surface adjacency graph (ﬁnal). The edge between the conical and cylindrical
parts has been removed.

two parts (Figures 6.7c and 6.7d).
Part identiﬁcation using either of the two methods is also performed correctly for

both the parts. The cylinder is identiﬁed as s-c-co, and the cone as s-c-t.

6.7 Test Image 7

Figure 6.8 is another object made up of a rectangular block and a cylinder. Surface

segmentation gives 4 surfaces (Figure 6.8b), of which one (the long, thin side surface

139

 

Figure 6.8. Object made up of a cylinder and a rectangular block. (a) Range image of
the object. (b) Surface segmentation showing 4 surfaces. (c) Surface adjacency graph
(initial). ((1) Surface adjacency graph (ﬁnal). The edge between the rectangular block
and the cylinder has been removed.

of the rectangular block) is misclassiﬁed as a kind of saddle surface with negative
maximum curvature and positive minimum curvature. Since the the catalog of part
views has no such surface from any view of any part, the corresponding SAG edge is
left unbroken.

Part identiﬁcation is successful for both parts, using either method. The rectan—

gular block is identiﬁed as s-s-co, and the cylinder as s-c-co.

140

 

Figure 6.9. Another view of the funnel. (a) Range image of the object. (b) Surface
segmentation showing 1 surface. (c) Surface adjacency graph (initial). ((1) Surface
adjacency graph (ﬁnal).

6.8 Test Image 8

Figure 6.9a shows another view of a funnel. In this case, surface segmentation (Fig-
ure 6.9b) completely eliminates the small cylindrical part. The part segmentation
therefore ﬁnds just one part (the cone).

This part is identiﬁed successfully as s-c-t by both methods.

141

6.9 Test Image 9

Figure 6.10a shows a second view of the complicated object discussed in Section 6.4.
Shadowing caused the non-availability of range data between the leftmost part and
the main body of the object. The leftmost part therefore is treated as a separate
object. Surface segmentation gives 5 surfaces as seen in Figure 6.10b, and as men-
tioned earlier, the initial SAG (Figure 6.100) is itself unconnected because of the
non-availability of range information. The ﬁnal SAG (Figure 6.10d) shows three un-
connected nodes. Note that the same object that was earlier interpreted as having 5
parts (in test example 2) is now interpreted as two different objects, one of which has
2 parts.

All three parts are correctly identiﬁed as s-c-co by either of the two methods.

6.10 Test Image 10

Yet another view of the same complicated object is shown in Figure 6.11a. Surface
segmentation finds 7 surfaces (Figure 6.11b). Of these, 4 are small surfaces. Figure
6.11c shows the initial SAC, and the ﬁnal SAG (Figure 6.11d) segments the object
into 3 parts.

Either of the two methods is able to correctly identify each of the three parts as

8- C-CO.

6.11 Test Image 11

An object resembling a table-lamp is shown in Figure 6.12a. Surface segmentation
ﬁnds 4 surfaces (Figure 6.12b). Figures 6.12c and 6.12d show the initial and ﬁnal
SAGS, respectively. The three parts are disjoint even in the initial SAG.

Both methods are able to correctly identify each of the three parts. The three

142

 

(small
surface)

A D
(d)
Figure 6.10. Complicated object made up of 5 cylindrical parts: second view. (a)
Range image of the object. (b) Surface segmentation showing 5 surfaces. (c) Surface

adjacency graph (initial). (d) Surface adjacency graph (ﬁnal). One small surface has
been removed, and edges between cylindrical surfaces broken.

143

 

 
  
 

(small
surface)
(small
surface) surface)
D G A
(C)
B F

(d)

Figure 6.11. Complicated object made up of 5 cylindrical parts: third view. (a)
Range image of the object. (b) Surface segmentation showing 7 surfaces. (c) Surface
adjacency graph (initial). ((1) Surface adjacency graph (ﬁnal). Four small surfaces

have been removed, and edges between cylindrical surfaces broken.

144

parts are identiﬁed as s-s-co, s-c-co, and s-c-t.

6.12 Test Image 12

Figure 6.13a shows a view of a coffee-mug. Surface segmentation ﬁnds 2 surfaces
(Figure 6.13b). Figure 6.13c shows the initial SAG. The surface corresponding to the
curved handle is very small. It is removed in the ﬁnal SAG, which is shown in Figure
6.13d.

Both methods are able to correctly identify the one remaining part as a cylinder

(soc-co).

6.13 Summary

Results of part segmentation and part identiﬁcation for several real range images
showed that our methods are quite successful in cases where occlusion by the same
or a different object is not too severe.

In general, we have observed that part segmentation is successful whenever a good
surface segmentation, together with a view of the adjacent surfaces where different
parts join together, is available.

Part identiﬁcation is the most reliable part of our system. Two methods were
developed for part identiﬁcation, and their performance is comparable. Assuming
a “good” view where several large surfaces are visible, 10 of the 12 parts can be
identiﬁed with high accuracy (close to 90% for real range images). A heuristic was
also developed for recovering from misclassiﬁcation of axis shape property by the
binary tree classiﬁer.

Two parts (s-c-id and s-s-id) have straight principal axes, but are often mistaken

for their curved-axis counterparts by the superquadric estimation routines. We de-

145

 

O...

c
0
(d)

Figure 6.12. View of an object resembling a table—lamp. (a) Range image of the
object. (b) Surface segmentation showing 4 surfaces. (c) Surface adjacency graph
(initial). ((1) Surface adjacency graph (ﬁnal). The initial and ﬁnal SAGS are identical.

146

 

Figure 6.13. View of a coffee-mug. (a) Range image of the object. (b) Surface
segmentation showing 2 surfaces. (c) Surface adjacency graph (initial). ((1) Surface
adjacency graph (ﬁnal). One small surface (corresponding to the handle) has been
removed.

147

veloped an intensity-based method for checking whether the principal axis is in fact
straight or curved. This heuristic can be used to correct misclassiﬁcation of the axis
shape property. If the intensity-based heuristic indicates that the axis is straight, we
assume that the axis is indeed straight even if the binary tree classiﬁer decides that

it is a bent-axis part.

CHAPTER 7

Conclusions and Future Research

In this dissertation, we described our work on obtaining a “generic” parts-based 3-D
object representation. We used range data as the input, obtaining as the output a
3-D object representation based on 12 geon-like 3-D part primitives.

Multiple views of the 3-D part primitives were stored in order to help segment
objects into component parts and identify those parts as instances of speciﬁc primi-
tives. Major surfaces of the part primitives determined a small number of views to
be stored in the multi-view representation.

The shape discrimination capability of superquadric parameters was tested under
varying conditions of image resolution, model elongation, and noise. Two approaches
were presented for identifying the parts subsequent to part segmentation. The ﬁrst
approach applied a hierarchical tree classiﬁer using features derived from estimated
superquadric parameters. The second approach drew direct inferences about shape
attributes of the geon-like part primitives, using properties computed from the range
image or the associated intensity image.

Results of part segmentation and identiﬁcation were presented for several images.
Generally, given a good surface segmentation, part segmentation is successful in cases
where the adjacency of surfaces from two parts coming together is visible. Part

identiﬁcation is quite accurate for 10 of the 12 geon-like parts. The two remaining

148

149

parts have straight axes but are often mistaken for their bent-axis counterparts. We
developed a heuristic to correct such misclassiﬁcations by directly checking from the
intensity image whether the axis is straight or curved.

Other research on obtaining a 3-D representation using geon-like parts [3, 22]
has made simplifying assumptions such as the use of perfect line drawings, perfect
part segmentation, and/ or manual part segmentation. Our work, on the other hand,
uses real range images as the input and computes the part segmentation. Part seg-
mentation is often successful even if surface segmentation is imperfect; further, part
identiﬁcation is often successful even if part segmentation is imperfect.

One of the real objects for which we performed part segmentation and identiﬁca-
tion (the object made up of 5 cylindrical parts) is more “difﬁcult” than any object for
which Bergevin et al. [3] or Dickinson et al. [22] have presented results. Conversely,
given a good view, we believe that our method would be successful for most objects

dealt with by Bergevin et al. or Dickinson et al.

7 .1 Contributions of the Thesis

Contributions of this thesis are as follows:

0 We presented the ﬁrst attempt (to our knowledge) to obtain, from range data,
a 3—D parts-based object representation in terms of a speciﬁc, but fairly large

set of geon-like 3-D parts.

0 We proposed a method of specifying “generic” 3-D parts, i.e., by using Surface
Adjacency Graphs (SAGs). Based on their SAGS, we derived a very compact
multi-view representation of the part primitives. We presented a method of
performing part segmentation from range images, given a good surface segmen-

tation. This method for part segmentation has some advantages over approaches

150

that, in one form or another, make use solely of Hoffman and Richards’ “prin-
ciple of transversality” [47]. It can perform part segmentation in some cases
where two parts coming together meet at a convex or “mixed” crease rather

than a concave one.

a We proposed two methods to perform the part identiﬁcation of isolated parts.
We tested the shape discrimination capability of superquadric parameters under
varying conditions of image resolution, model elongation, and noise. Lastly, we
showed that the associated intensity image can be used to recover from some

shortcomings of the purely range-image based method of part identiﬁcation.

7 .2 Shortcomings

0 Part segmentation is not robust. Good surface segmentation is crucial for the
success of our part segmentation method. But the surface segmentation (that
of Flynn, based in turn on the earlier work of Hoffman) [33, 49] is purely data-
driven, and often gives a surface segmentation which is so far from ideal that
our subsequent methods fail completely. Part segmentation is indeed successful
even in the presence of minor surface segmentation errors—but no mechanism
exists for recovering from an unacceptably poor surface segmentation. Further,
even if a perfect surface segmentation were available, part segmentation would

fail in some cases—as pointed out below.

a Our part segmentation method does not use intensity data. Nor does it use
information about shape of the occluding contour. Hence, it fails when two
parts come together in such a way that the adjacency of the corresponding
surfaces is not directly visible, even though in many such cases the intensity or

occluding contour data clearly indicates the joining of two parts. In Chapter

151

3, we gave the example of an L-shaped object in which only a single surface,
i.e., the L—shaped surface, was visible. The intensity image or occluding contour
clearly suggested the presence of two parts. Nevertheless, our method cannot
segment this object into two parts from such a viewpoint. Additionally, for this
object, as mentioned in Chapter 3, even if two surfaces belonging to different
parts had been visible, part segmentation would have been incomplete. The
reason is that the single L-shaped surface is itself produced by the coming
together of two parts, and our part segmentation method has no way to break
up a single surface into pieces belonging to different objects. Many other such

examples can be found.

The part identiﬁcation method gives only part identities and nothing more. It
does not give complete details of the identiﬁed parts and their adjacency, e.g.,
“the small end of the tapered cylinder is attached to the side surface of the

rectangular block,” and so on.

7 .3 Recommendations for Further Work

In our opinion, the most useful as well as most interesting problem for future
work would be to develop top-down (part-model-based) methods for recovering
from imperfect surface segmentations, and to successively improve the surface
segmentation. Another crucial contribution would be to enhance our part seg-
mentation algorithm to use overall shape or contour information instead of

relying solely on surface adjacencies.

Flynn, [33] as well as Hoffman [49], have taken an almost entirely bottom-up,
data-driven approach to surface segmentation using range data alone. However,

the present availability of fused range and intensity data adds a valuable new

152

dimension of local information. Further, the recent work on MRF-based edge de-
tection [65] and wing detection [57], provides additional low-level model-based
evidence regarding surface types and the correct locations of boundaries be-
tween surfaces. It should be interesting to combine this with the earlier, purely

bottom-up approach with the aim of getting more reliable surface segmentation.

An even higher—level model-based approach towards obtaining better surface
segmentation might be to use the surface properties of initially detected surface
patches to hypothesize about the part primitive to which they belong. We saw
that surface type provides strong clues regarding part identity. This might be
used to generate hypotheses about parts which may be present in the scene.
Moreover, curvature values and curvature directions on surface patches give
strong evidence about the likely orientations and parameters of parts, e.g., the
radius of curvature and directions of the principal axes. Hypotheses about parts
might in turn be used to predict which surface patches belong to the same

surface of the same part, and might hence be good candidates for merging.

An essential improvement to our part segmentation method would be to make
use of intensity, or at least occluding contour shape information. Range data has
been useful in developing a method for part segmentation based on surface and
angle attributes at surface adjacencies. However, as explained earlier, surface
adjacency by itself does not sufﬁce to detect the joining of two parts in some
cases where it would be obvious from the shape, e.g., from the intensity image

or occluding contour.

We presented two different methods for part identiﬁcation. Tests were applied
to directly test some shape attributes such as axis shape and variation of cross-
section size along the part axis. Integrating these semi-independent judgements

about part identity and about speciﬁc shape attributes would be very useful in

153

improving the ﬁnal reliability of part identiﬁcation.

Lastly, building an object recognition system for range images using an en-
hanced version of this 3-D parts-based object representation module does seem
practicable. The use of shape information as a criterion for part segmenta-
tion, rather than surface adjacencies alone, [would be a particularly valuable

enhancement.

APPENDICES

APPENDIX A

Parameters

A number of empirically-determined parameters are used in various modules and
algorithms described in the thesis. We list these parameters in the order in which

they are mentioned in the text.

A.1 Surface Classiﬁcation

This topic is discussed in Section 3.2.1. A threshold value of 0.2 is used to determine
whether the median principal curvatures are “close to zero.” If the median curvature
is less than -0.2, it is taken to be negative; if it is greater than 0.2 it is taken to be

positive; else it is taken to be zero.

A.2 Constructing the SAG

Surfaces whose area is less than 5% of the total surface area are ignored after taking

their connectivity into account.

154

155

A.3 ’D‘ee Classiﬁer

The binary tree classiﬁer is described in Section 4.3.2. A maximum error rate of 2.0%

is used when constructing the decision tree using training data.

A.4 Direct test for Cross Section size

This material is presented in Section 5.1. If the angle between a surface normal and
the principal axis is less than 45° or greater than 135°, it is not taken into account
for ﬁnding the distribution of angles. If at least 80% of the angles are either between
45° and 90°, or between 90° and 135°, the cross-section is assumed to be increasing.
Else, if at least 80% of the angles lie within 80° and 100°, the cross-section is assumed
to be constant. Else, the cross-section is assumed to be increasing-and-decreasing.
The percentages mentioned are over only those points whose normals are not ignored

due to the heuristic.

A.5 Polygonal Approximation

Details of the polygonal approximation algorithm are given in Section 5.2. The follow-
ing parameters are used for constructing the polygonal approximation: Hop distance

k0 = 7, tolerance for distance deviation Ad = 0.2, tolerance for angle deviation

A0 = 20°.

A.6 Direct test for Axis shape

This topic is discussed in Section 5.2. The silhouette is broken into separate compo-
nents at locations where the angle change between successive line segments is greater

than 20°. The line joining the end-points is computed for each separate component.

156

If the mid-point of this line lies outside the silhouette, the component is classiﬁed as
concave. If more than 10% of the total length of the polygonal approximation belongs
to components classiﬁed as “concave,” the entire silhouette is taken to be concave,

i.e., the part has a bent axis.

BIBLIOGRAPHY

BIBLIOGRAPHY

[1] R. Bajcsy and F. Solina. Three-Dimensional Object Representation Revisited. In
Proc. First IEEE International Conference on Computer Vision, pages 231-240,
London,1987.

[2] A. H. Barr. Superquadrics and Angle-preserving Transformations. Comput.
Graphics Applicat., 1:11—23, 1981.

[3] R. Bergevin and M. D. Levine. Extraction of Line Drawing Features for Object
Recognition. Pattern Recognition, 25(3):319—334, March 1992.

[4] Robert Bergevin and Martin D. Levine. Generic Object Recognition: Build-
ing Coarse 3D Descriptions from Line Drawings. In Proc. IEEE Workshop on
Interpretation of 3D Scenes, pages 68—74, Austin, November 1989.

[5] Paul J. Besl and Ramesh C. Jain. Segmentation Through Variable-Order Surface
Fitting. IEEE Trans. Pattern Analysis and Machine Intelligence, 9(2):l67—-192,
1988.

[6] P.J. Besl and R.C. Jain. Three-Dimensional Object Recognition. Computing
Surveys, 17:75—145, 1985.

[7] B. Bhanu and L. Nuttall. Recognition of 3-D Objects in Range Images Using a
Butterﬂy Multiprocessor. Pattern Recognition, 22(1):49—64, 1989.

[8] Irving Biederman. Recognition-by-Components: A Theory of Human Image
Understanding. Psychological Review, 94(2):1l5—-147, 1987.

[9] T. O. Binford. Visual Perception by Computer. In Proc. IEEE Conference on
Systems and Control, Miami, 1971.

[10] Thomas O. Binford. Survey of Model-Based Image Analysis Systems. Interna-
tional Journal of Robotics Research, 1(1):18—64, Spring 1982.

157

158

[11] H. Blum. Biological Shape and Visual Science. J. Theoretical Biology, 38:205—
287, 1973.

[12] K. Bowyer, D. Eggert, J. Stewman, and L. Stark. Developing the Aspect Graph
Representation for Use in Image Understanding. In Proc. DARPA Image Un-
derstanding Workshop, pages 831—849, Palo Alto, 1989.

[13] J.P. Brady, N. Nandhakumar, and J .K. Aggarwal. Recent progress in the recog-
nition of objects from range data. In Proc. 9th Int. Conf. on Pattern Recognition,
pages 85—92, Rome, November 1988.

[14] Rodney A. Brooks. Model-Based Three-Dimensional Interpretations of Two-
Dimensional Images. IEEE Transactions on Pattern Analysis and Machine In-
telligence, PAMI-5(2):140—150, March 1983.

[15] J. Callahan and R. Weiss. A Model for Describing Surface Shape. In Proc.
IEEE Computer Society Conference on Computer Vision and Pattern Recogni-
tion, pages 240—245, San Francisco, 1985.

[16] I. Chakravarty and H. Freeman. Characteristic Views as a Basis for Three-
Dimensional Object Recognition. In Proc. SPIE Conf. on Robot Vision, volume
336, pages 37—45, May 1982.

[17] R.T. Chin and CR. Dyer. Model-Based Recognition in Robot Vision. Computing
Surveys, 18:67—108, 1986.

[18] M.F. Costabile and CC. Pieroni. Detecting Shape Correspondences by Using
the Generalized Hough Transform. In Proc. Eighth International Conference on
Pattern Recognition, pages 589-591, Paris, 1986.

[19] ER. Davies. Improved Localization in a Generalized Hough Scheme for the
Detection of Straight Edges. Image and Vision Computing, 5:279—286, 1987.

[20] Sven Dickinson. The Recovery and Recognition of Three-Dimensional Objects
Using Part-Based Aspect Matching. Technical Report CAR-TR-572, Computer
Vision Laboratory, University of Maryland, August 1991.

[21] Sven J. Dickinson, Alex P. Pentland, and Azriel Rosenfeld. Qualitative 3-D
Shape Reconstruction using Distributed Aspect Graph Matching. In Proc. 3rd
IEEE International Conference on Computer Vision, Osaka, December 1990.

159

[22] Sven J. Dickinson, Alex P. Pentland, and Azriel Rosenfeld. 3-D Shape Recovery
Using Distributed Aspect Matching. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 14(2):174—198, February 1992.

[23] R. O. Duda and P. E. Hart. Pattern Classiﬁcation and Scene Analysis. Wiley,
1973.

[24] R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley,
1973.

[25] Sahibsingh A. Dudani, Kenneth J. Breeding, and Robert B. McGhee. Air-
craft Identiﬁcation by Moment Invariants. IEEE Transactions on Computers,
C-26(1):39-46, January 1977.

[26] H. Edelsbrunner, J. O’Rourke, and R. Seidel. Constructing Arrangements of
Lines and Hyperplanes with Applications. SIAM Journal on Computing, 15:341-
363, 1986.

[27] D. Eggert and K. Bowyer. Computing the Orthographic Projection Aspect Graph
of Solids of Revolution. In Proc. IEEE' Workshop on Interpretation of 3D Scenes,
pages 102—108, Austin, 1989.

[28] T.J. Fan, G. Medioni, and R. Nevatia. Surface Segmentation and Description
from Curvature Features. In Proc. DARPA Image Understanding Workshop,
pages 351—359, 1987. '

[29] G. Fekete and LS. Davis. Property Spheres: a New Representation for 3-D Ob-
ject Recognition. In Proc. IEEE Workshop on Computer Vision: Representation
and Control, pages 192—201, Annapolis, 1984.

[30] F.P. Ferrie, J. Lagarde, and P. Whaite. Darboux Frames, Snakes, and Super-
Quadrics: Geometry From the Bottom Up. In Proc. IEEE Workshop on Inter-
pretation of 3D Scenes, pages 170-176, Austin, 1989.

[31] F.P. Ferric, J. Lagarde, and P. Whaite. Darboux frames, snakes, and super-
‘ quadrics: Geometry from the bottom-up. In Proc. IEEE Workshop on Interpre-
tation of 3D Scenes, pages 170—176, Austin, November 1989.

[32] M. Fischler and R. Bolles. Random Consensus: A Paradigm for Model-ﬁtting
with Applications in Image Analysis and Automated Cartography. Communica-
tions of the ACM, 24:381—395, 1981.

160

[33] Patrick J. Flynn. CAD-Based Computer Vision: Modeling and Recognition
Strategies. PhD thesis, Department of Computer Science, Michigan State Uni-
versity, 1990.

[34] Patrick J. Flynn and Anil K. Jain. On Reliable Curvature Estimation. In Proc.
IEEE Computer Society Conference on Computer Vision and Pattern Recogni-
tion, pages 110—116, San Diego, June 1989. i

[35] K. Fukunaga. Introduction to Statistical Pattern Recognitiion. Academic Press,
1972.

[36] M. Gardner. The superellipse: A curve that lies between the ellipse and the
rectangle. Scientiﬁc American, 213:222—234, 1965.

[37] Z. Gigus, J. Canny, and R. Seidel. Efficiently Computing and Representing
Aspect Graphs of Polyhedral Objects. In Proc. Second IEEE International Con-
ference on Computer Vision, pages 30—39, Tarpon Springs, 1988.

[38] Z. Gigus and J. Malik. Computing the Aspect Graph for Line Drawings of
Polyhedral Objects. In Proc. IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, pages 654-661, Ann Arbor, 1988.

[39] W. Eric L. Grimson and Tomas Lozano-Pérez. Model-Based Recognition and Lo-
calization from Sparse Range or Tactile Data. International Journal of Robotics
Research, 3(3):3—35, Fall 1984.

[40] W.E.L. Grimson and T. Lozano—Pérez. Localizing Overlapping Parts by Search-
ing the Interpretation Tree. IEEE Transactions on Pattern Analysis and Machine
Intelligence, PAMI-9(4):469—482, July 1987.

[41] AD. Gross and TE. Boult. Error of Fit Measures for Recovering Parametric
Solids. In Proc. Second IEEE International Conference on Computer Vision,
pages 690—694, Tarpon Springs, 1988.

[42] A. Gupta, L. Bogoni, and R. Bajcsy. Quantitative and Qualitative Measures
for the Evaluation of the Superquadric Models. In Proc. IEEE Workshop on
Interpretation of 3D Scenes, pages 162—169, Austin, 1989.

[43] Alok Gupta. Surface and Volumetric Segmentation of Complex 3-D Objects Using
Parametric Shape Models. PhD thesis, Department of Computer and Information
Science, University of Pennsylvania, June 1991.

161

[44] M. Hebert and T. Kanade. The 3-D Proﬁle Method for Object Recognition.
In Proc. IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, pages 458-463, San Francisco, 1985.

[45] Yaron C. Hecker and Ruud M. Bolle. Is Geometric Hashing a Hough 'Ik'ansform?
Technical Report RC 15202(#67767), IBM Research Division, T.J. Watson Re-
search Center, 1989. l

[46] G. E. Hinton and L. M. Parsons. Scene-based and Viewer-centered Representa-
tions for Comparing Shapes. Cognition, 30:1—35, 1988.

[47] D. Hoffman and W. Richards. Parts of recognition. In Stephen Pinker, editor,
Visual Cognition. MIT Press, 1985.

[48] D. Hoffman and W. Richards. Parts of Recognition. Cognition, 18:65-96, 1985.

[49] Richard L. Hoffman. Object Recognition from Range Images. PhD thesis, De-
partment of Computer Science, Michigan State University, 1986.

[50] Richard L. Hoffman and Anil K. Jain. Segmentation and Classiﬁcation of

Range Images. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, PAMI-9(5):608—620, September 1987.

[51] K. Ikeuchi. Generating an Interpretation Tree from a CAD Model for 3D-Object
Recognition in Bin-Picking Tasks. International Journal of Computer Vision,

1:145—165, 1987.

[52] AK. Jain and R. Hoffman. Evidence-Based Recognition of 3-D Objects. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 10:783—802, 1988.

[53] J. J. Koenderink and A. J. van Doorn. Internal Representation of Solid Shape
with Respect to Vision. Biological Cybernetics, 32(4):211—216, 1979.

[54] M.R. Korn and GR. Dyer. 3-D Multiview Object Representations for Model-
Based Object Recognition. Pattern Recognition, 20:91—103, 1987.

[55] D.J. Kriegman and J. Ponce. Computing Exact Aspect Graphs of Curved Ob-
jects: Solids of Revolution. In Proc. IEEE Workshop on Interpretation of SD
Scenes, pages 116—122, Austin, 1989.

[56] Y. Lamdan and H.J. Wolfson. Geometric Hashing: A General and Efﬁcient
Model-Based Recognition Scheme. In Proc. Second IEEE International Confer-
ence on Computer Vision, pages 238-249, Tarpon Springs, 1988.

162

[57] Greg C. Lee. Reconstruction of Line Drawing Graphs From Fused Range and In-
tensity Imagery. PhD thesis, Michigan State University, East Lansing, Michigan,
1992.

[58] Greg C. Lee and George C. Stockman. Obtaining Registered Range and Intensity
Images using the Technical Arts White Scanner. Technical report, PRIP Lab.,
Department of Computer Science, Michigan'State University, 1991.

[59] Wei-Chung Lin and Tsu-Wang Chen. CSG-Based Object Recognition Using
Range Images. In Proc. 9th Int. Conf. on Pattern Recognition, pages 99—103,
Rome, November 1988.

[60] DC. Lowe. Perceptual Organization and Visual Recognition. Kluwer, Boston,
1984.

[61] D. Marr. Vision. Freeman, 1982.

[62] D. Marr and H. K. Nishihara. Representation and Recognition of the Spatial
Organization of Three-Dimensional Shapes. In Proc. Royal Society, London, ser.
B, volume 200, pages 269-294, 1978.

[63] M. Minsky. A Framework for Representing Knowledge. In P. H. Winston, editor,
The Psychology of Computer Vision. McGraw-Hill, 1975.

[64] Roger Munck-Fairwood. Recognition of Geometric Primitives Using Logic-
Program and Probabilistic-Network Reasoning Methods. In SPIE Vol. 1708
Applications of AI X, pages 589—600, Orlando, April 1992.

[65] Sateesha G. Nadabar. Markov Random Field Contextual Models in Computer
Vision. PhD thesis, Michigan State University, East Lansing, Michigan, 1992.

[66] T. Pavlidis. Algorithms for Graphics and Image Processing. Computer Science
Press, 1982.

[67] A. Pentland and B. Horowitz. Recovery of N onrigid Motion and Structure. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 13(7):730—742, July
1991.

[68] A. Pentland and S. Sclaroff. Closed-Form Solutions for Physically Based Mod—
eling and Recognition. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 13(7):715—729, July 1991.

[69] A. P. Pentland. From Pixels to Predicates. Ablex, 1986.

163

[70] Alex P. Pentland. Automatic extraction of deformable part models. International
Journal of Computer Vision, 4:107—126, 1990.

[71] A.P. Pentland. Perceptual Organization and the Representation of Natural Form.
Artiﬁcial Intelligence, 28:293-331, 1986.

[72] A.P. Pentland. Recognition by Parts. In Proc. First IEEE International Con-
ference on Computer Vision, pages 612—620, London, 1987.

[73] S. Pinker. Visual Cognition: An Introduction. In Visual Cognition. MIT Press,
1984.

[74] Harry Plantinga and Charles R. Dyer.‘ Visibility, Occlusion, and the Aspect
Graph. Technical Report 736, Computer Science Department, University of
Wisconsin-Madison, December 1987.

[75] William H. Press, Brian P. Flannery, Saul A. Teukolsky, and William H. Vetter-
ling. Numerical Recipes: The Art of Scientiﬁc Computing. Cambridge University
Press, Cambridge, England, 1986.

[76] Narayan S. Raja and Anil K. Jain. Obtaining Generic Parts from Range Data
Using a Multi-view Representation. In SPIE Vol. 1708 Applications of AI X,
pages 602—613, Orlando, April 1992.

[77] Narayan S. Raja and Anil K. Jain. Recognizing Geonsfrom Superquadrics ﬁtted
to Range Data. Image and Vision Computing, 10(3):l79-190, special issue on
Range Image Understanding, April 1992.

[78] J .H. Rieger. On the Classiﬁcation of Views of Piecewise Smooth Objects. Image
and Vision Computing, 5:91-97, 1987.

[79] I. K. Sethi and G. P. R. Sarvarayudu. Hierarchical Classiﬁer Design using Mutual
Information. IEEE Transactions on Pattern Analysis and Machine Intelligence,
4(4):441-445, July 1982.

[80] LG. Shapiro, J.D. Moriarty, R.M. Haralick, and P.G. Mulgaonkar. Matching
Three-Dimensional Objects Using a Relational Paradigm. Pattern Recognition,
17:385-405, 1984.

[81] F. Solina and R. Bajcsy. Recovery of Parametric Models from Range Images:
The Case for Superquadrics with Global Deformations. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 12(2):131-147, February 1990.

164

[82] T. Sripradisvarakul and R. Jain. Generating Aspect Graphs for Curved Objects.
In Proc. IEEE' Workshop on Interpretation of 3D Scenes, pages 109-115, Austin,
1989.

[83] J. Stewman and K. Bowyer. Creating the perspective projection aspect graph of
polyhedral objects. In Proc. Second IEEE International Conference on Computer
Vision, pages 494—500, Tarpon Springs, 1988.

[84] J .H. Stewman and K.W. Bowyer. Aspect Graphs for Convex Planar-Face Ob-
jects. In Proc. IEEE Workshop on Computer Vision, pages 123-130, Miami
Beach, 1987.

[85] G. C. Stockman. Object Recognition and Localization Via Pose Clustering.
Computer Vision, Graphics and Image Processing, 40:361-387, 1987.

[86] George Stockman. Object Recognition. In Ramesh C. Jain and Anil K. Jain,
editors, Analysis and Interpretation of Range Images, pages 225-253. Springer-
Verlag, 1990.

[87] Structural Dynamics Research Corporation. I-DEAS User’s Guide. Milford,
Ohio.

[88] Technical Arts Corporation. 100X 3—Dimensional Scanner: User’s Manual and
Application Programming Guide. Redmond, Washington.

[89] D. Terzopoulos and D. Metaxas. Dynamic 3D Models with Local and Global De-
formations: Deformable Superquadrics. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 13(7):703—714, July 1991.

[90] D. Terzopoulos, A. Witkin, and M. Kass. Symmetry-seeking Models for 3-D
Object Reconstruction. Int. J. Comput. Vision, 1(3), 1988.

[91] S. Ullman. An Approach to Object Recognition: Aligning Pictorial Descriptions.
Technical Report A.I. Memo No. 931, M.I.T. AI Laboratory, 1986.

[92] BC. Vemuri, A. Mitiche, and J .K. Aggarwal. Curvature-Based Representation
of Objects from Range Data. Image and Vision Computing, 4(2):107—114, May
1986.

[93] N .A. Watts. Calculating the Principal Views of a Polyhedron. In Proc. Ninth
International Conference on Pattern Recognition, pages 316-322, Rome, 1988.

"lllll'llllllllllllf