_....

A
!

 

‘y‘EJ'v-rvv vvvvv

    
 

,"i..”

-44?”

~L.'.;- r-_.

. . L-I -"n‘l‘

 

 

. - V
,. A '.w . V ’ ‘ a .1. , L 53:.“ 06.‘..;.'.
‘ n: . . J 1.... 1'
(1,102,; r\ . ' ’5‘“ ‘C ~ 9' H ~r _I%WM -
33/ '
1

>55:

113;, if“. 5,. . . . ~ 5L 5“, ‘
. A- a; . v . “TN ~. ‘I
- 4* 5y: ; #535,». 54.153 ,-'-Jr\.t$
.....l,\‘ :L .. ”,i . rov-{k'g‘ 4':an
A:€‘. . . . . ' ‘ \ "K

~. --.S a:

I
@521

#3465" “L92" 3' ' *
L‘wxﬁr. m"
9+. .vywr m... u
.. ¢*.-
A321“, 235:}?— 21*
'aA?’ . I-y‘éﬁ‘ “'Ai‘.:{jls_u;t' .
$213;j”&'~$’§«1;'~’€‘5 ' '

.l I:-4 ‘
. . n - . -.
. yo ...r_|. . ' 3... A... K"
. r' .

'-
HEY»: '4.
4 "7"

f .

[O if
"V 'I :11: if: .I I 5"
’5”:'}‘;‘J};' _.. H, y- .' ”3..” it ”4/," mg?
1.1: 63177. .~' 6-,\,"f?-:.-"31

”w ~
11“..‘?’ Jiffy ‘ " .
4‘;- H ._ A '5‘ MA. '- 7251.}?
{a}; v":- 53;; :7 _ 61" - if». 7%“:5‘1
1".“ ‘ I. .J . "I .H’
. l5, . N ' g ' . ig’ €rh

I :l,"(
. _ ., H y 7 "'v' ‘1}...‘1'N’v ".‘f If. I
.,1' 'u. - -' 7'.‘ UL.“ ’-’!.“- .‘v‘.’ "l'v'lrlt‘ '“v'l'? '
If ‘y, .., , Y" I,
"v.3,r'nw'l ' A ~ ' . ‘ - . 2-H.

:1
.Iu. .'.

‘ .i'
I. o '
' I n
, r537 "III-'0 4",”? .41?"
._,O '

.t/ u"
:II.’.,.l—: ‘."' 1"; U
I

I , V
'&I{,,J!.‘r>/:'Ivr .,
II

#35 .1.
" 3/.
f 'I :2, :72 :Jl'r "{ll’
'1'! dong" u. ‘ P4,;
_, “1.0.43" 71/3,}. ,, “'m
g. 1-; ,1. x a
'T;"l' gal-Paw a; '1’ {'2’};-
-- ,.'/ a ,‘.. 45::
m’
’ ‘ ‘H .'."/‘ '4 I."

 

 

3 1293 10812 0803
«na- 'u 4" f -

LIBRARY
Michigan State
University

 

 

 

This is to certify that the ’

dissertation entitled

I
Three-Dimensional Scene Representation
using Structured Light

presented by

Gongzhu Hu

has been accepted towards fulﬁllment
of the requirements for

the PhD Computer Science

degree in

 

WV

0 Major professor

20 May 1988
Date

 

MS U is an Afﬁrmative Action/Equal Opportunity Institution 0-12771

 

 

_ MSU RETURNING MATERIALS:
Piace in book drop to

 

LJBRARJES remove this checkout from
—3— your record. FINES will
~ , be charged if book is
a returned after the date

stamped below.

 

300 "b 233;
1 3 a E -. ~!‘
93.15 3 J.)

 

 

 

THREE-DIMENSIONAL SCENE REPRESENTATION
USING STRUCTURED LIGHT

By

Gongzhu Hu

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Computer Science

1988

.7." {9'

_,,,l‘ (y _.."’ r

J

ABSTRACT

3-D SCENE REPRESENTATION USING STRUCTURED LIGHT

By

Gongzhu Hu

This thesis explores general use of structured lighting for surface and object
representation. Using structured light, image features are better deﬁned and easier to
detect because surface geometry is made explicit through the light stripe networks.
Because of the effectiveness in shape inference and its speed and cost potential, the struc-
tured lighting approach provides a promising and economical alternative for 3-D
machine vision, especially for industrial applications where controlled environment is
assumed. Previous work using structured light has been focused on 3-D sensing, but few
projects have gone further to include structured light in shape inference, surface

representation processes, and scene analysis.

This thesis investigates problems in using structured light from a general point of
view. Not only is structured light used in 3-D sensing, but also surface shapes are
inferred via both 3-D data ﬁtting and 2-D textures created from the structured lighting. In

addition, surface representation is constructed based on the light-striped surface patches

and their associated boundary lines. This type of 2%-D "wing" representation is suitable

for object recognition through model matching.
3-D sensing is accomplished by computing 3-D surface solutions at the sparse stripe
grid points via triangulation. The stripe identiﬁcation problem is partially solved using a

set of geometric and topological constraints. Surface shapes are inferred by two

independent methods: a) surface ﬁtting of the computed 3-D data followed by a curvature
computation and classiﬁcation, and b) classiﬁcation via features of 2-D stripe textures.
Experiments show that both methods are feasible. A fusion scheme that combines

features obtained from a striped image and features extracted from an intensity image of
the same scene is employed to construct a 2%-D representation. The features used in the

representation include surface patches (from striped image), edges (primarily from inten-

sity image), and their spatial relationships. A set of inference rules are developed that
label these representation primitives. The 2%—D representation will be delivered to the
matching procedure for recognition, which is not studied in this thesis.

Experiments with real images of multiple object scenes provide encouraging results.
More robust feature extraction, segmentation of general objects, and modeling and

matching approaches are currently under investigation.

ACKNOWLEDGEMENTS

For the completion of this thesis, I would like to thank Professor George Stockman,
my thesis advisor, for his valuable guidance and great patience in allowing me to pursue
this research. Without his invaluable suggestions, great inspiration and help, this work
would have been impossible. Members of my thesis committee, Professor Anil Jain, Pro-
fessor Carl Page, Professor Eric Goodman, and Professor Richard Hall, provided me with
continual encouragement and many excellent comments.

My thanks also go to Professor Richard Dubes, Professor M. Tuceryan and all the
members in the Pattern Recognition and Image Processing (PRIP) lab at Michigan State
University for their help during my stay in the PRIP lab. The Department of Computer
Science, and many friends showed great support and concern.

I also wish to acknowledge the ﬁnancial support I received during my doctoral stu-
dies from NSF Grants MCS-8304011, DCR-8600371 and IRI-8743697, and the Depart-
ment of Computer science at Michigan State University.

My greatest gratitude is for my parents, Huoxien Hu and Renzhang Yang, for pro—
viding me with the best fundamental education, for my wife Shu for her patience and

encouragement, and for my daughter Po for being wonderful.

iv

Table of Contents

Table of Figures ......................................................................................................... viii
Table of Tables ........................................................................................................... x
Chapter 1 Introduction ........................................................................................... 1
1.1. 3-D Scene Analysis ................................................................................... 2
1.2. Problems Addressed in the Dissertation ................................................... 5
Chapter 2 3-D Interpretation Using Light Striping ............................................. 8
2.1. Three Components of Visual Processing .................................................. 9
2.1.1. Light Source ........................................................................................... 9
2.1.2. The Scene ............................................................................................... 10
2.1.3. The Viewer ............................................................................................ 10
2.1.4. Tri-relationships ..................................................................................... 11
2.2. 3-D Sensing ............................................................................................... 11
2.3. Structured Lighting ................................................................................... 15
2.3.1. Is It A Good Approach? ......................................................................... 15
2.3.2. Sparse vs. Dense : Assumptions about the Scene .................................. 18
2.3.3. The Correspondence Problem in Light Striping .................................... 19
2.3.4. Stripe Texture ........................................................................................ 23
2.3.5. Fusion of Light Striping and Intensity ................................................... 24
2.4. Object Representation ............................................................................... 25

Chapter 3 Surface Solution —

Geometric Computation and Constraint Propagation ...................... 29
3.1. Oculomotor Cues and Calibration ............................................................ 30
3.1.1. Oculomotor Cues — Depth Information From Within The Eyes ......... 30
3.1.2. Camera and Projector Calibration ......................................................... 31
3.2. Grid Point Computation in 3-D Space ...................................................... 36
3.3. Constraints ................................................................................................ 38

3.3.1. Basic Constraints ................................................................................... 38

3.3.2. Geometric Rules .................................................................................... 39
3.3.3. Topological Rules .................................................................................. 42
3.4. Algorithms ................................................................................................ 46
3.4.1. Notations ................................................................................................ 46
3.4.2. Algorithm 1 — Stripe Extraction and Network Construction ............... 47
3.4.3. Algorithm 2 —— 3-D Computation for a Single Point ............................ 48
3.4.4. Algorithm 3 — Single Network Solution via Constraint Propagation .. 54
3.4.5. Algorithm 4 — Network Boundary Extraction ..................................... 55
3.4.6. Algorithm 5 — Solution for the Scene .................................................. 58
3.5. A Complete Example ................................................................................ 60
3.6. Quantitative Analysis on the Degree of Ambiguity ................................. 68
3.6.1. Orthographic Projection ......................................................................... 68
3.6.1.1. Assumptions ........................................................................................ 68
3.6.1.2. Basic Formulae ................................................................................... 71
3.6.1.3. Computation in 3-Space ...................................................................... 71
3.6.1.4. Computation in xy-Plane .................................................................... 73
3.6.2. Perspective Projection ............................................................................ 75
3.6.2.1. Assumptions ........................................................................................ 75
3.6.2.2. Computation ........................................................................................ 76
3.7. Conclusions ............................................................................................... 78
Chapter 4 Inference of Surface Shape .................................................................. 79
4.1. Surface Fitting and Local Curvature ......................................................... 80
4.1.1. Differential Geometry — Local Surface Theory ................................... 80
4.1.2. Computation —-— Surface Fitting Using B-Spline ................................... 84
4.1.3. Examples...........................: .................................................................... 87
4.1.4. Occlusion Relationships between Virtual Edges ................................... 92
4.1.5. Summary ................................................................................................ 92
4.2. Surface Shape from Stripe Texture ........................................................... 94
4.2.1. Relations of Features in 2-D and in 3-D ................................................ 94
4.2.2. Stripe Texture ........................................................................................ 97
4.2.3. Stripe Texture Classiﬁcation .................................................................. 99

vi

4.2.4. Experimental Results ............................................................................. 101
4.3. Virtual edges ............................................................................................. 107
4.4. Discussion and Conclusions ..................................................................... 109

Chapter 5 Multi-Channel Vision —

Fusion of Light Striping and Intensity ................................................ 111

5.1. Object Description .................................................................................... 112

5.2. Light Striping is Not Enough .................................................................... 113

5.3. Comparison of the Two Information Channels ........................................ 114

5.4. Stripe Extension ........................................................................................ 119

5.5. Interpretation of Contours ......................................................................... 124

5.5.1. Apparent Curvature of Visual Contour and Local Surface Geometry .. 125

5.5.2. Combination of Image Contours and Stripes ......................................... 127

5.5.3. Terminology and Deﬁnitions ................................................................. 128

5.5.4. Inference Rules ...................................................................................... 131

5.5.5. Some Mathematical Arguments about the Inference Rules .................. 138

5.5.6. Implementation of Inference Rules ....................................................... 141

5.5.7. Examples ................................................................................................ 143

5.5.8. Experiments ........................................................................................... 147

5.6. Conclusions ............................................................................................... 148
Chapter 6 Concluding Discussion and Future Work .......................................... 151
6.1. Summary ................................................................................................... 151

6.1.1. Computing 3-D Surface Solutions ......................................................... 153

6.1.2. Classiﬁcation of Surface Patches ........................................................... 154

6.1.3. Fusion of Intensity and Stripe Image Data ............................................ 155

6.2. Conclusion ................................................................................................ 156

6.3. Future Work .............................................................................................. 158
Appendix ..................................................................................................................... 159
Bibliography ............................................................................................................. 170

vii

List of Figures

Figure 1.1 - 3-D scene analysis scheme ......................................................................
Figure 1.2 - System diagram .......................................................................................

Figure 2.1 - Depth cues ...............................................................................................
Figure 2.2 - Examples of structured light images .......................................................

Figure 3.1 - Geometry of triangulation .......................................................................
Figure 3.2 - Sensing environment and coordinate systems ........................................
Figure 3.3 - Epipolar geometry ...................................................................................
Figure 3.4 - Topological rule T1 .................................................................................
Figure 3.5 - Two X-stripes cannot intersect in the image (Lemma 3.1) .....................
Figure 3.6 - Distinguishing X— and Y-stripes at an intersection point (Lemma 3.3)..
Figure 3.7 - X- and Y-stripes distinguished with respect to R,- ..................................
Figure 3.8 - Algorithm 4 (T umin g-Ri ght algorithm) ..................................................
Figure 3.9 - A striped image of a block, a Coke can, and a piston .............................
Figure 3.10 - Detected grid points in the jumble image .............................................
Figure 3.11 - Candidate labels for some grid points of the upper surface ..................
Figure 3.12 - Valid and invalid combinations of surface solutions ............................
Figure 3.13 - Ambiguity of location in 3-space ..........................................................
Figure 3.14 - Parallel projection model ......................................................................

Figure 4.1 - B-spline surface patch using 16 control points .......................................
Figure 4.2 - Histogram of curvatures of a planar surface patch .................................
Figure 4.3 - Histogram of curvatures of a cylindrical surface patch ..........................
Figure 4.4 - Histogram of curvatures of an elliptical surface patch ...........................
Figure 4.5 - Light stripe networks in a striped image .................................................
Figure 4.6 - Stripe texture primitives ..........................................................................
Figure 4.7 - Histogram on “(Ad) of stripe curves .......................................................
Figure 4.8 - Histogram on pm of stripe curves ..........................................................

viii

12
17

31
33
41
43
44
52
53
57
61
62
63
66
69
70

86
88
90
91
95
97

103
104

Figure 5.1 - Fusion of intensity and light striping — example 1 ................................ 116

Figure 5.2 - Fusion of intensity and light striping —— example 2 ................................ 117
Figure 5.3 - Striped image of an orange and the result of global thresholding .......... 120
Figure 5.4 - Local thresholding results ....................................................................... 121
Figure 5.5 - Segmentation using both stripes and intensity gradient .......................... 123
Figure 5.6 - Types of image contours ......................................................................... 124
Figure 5.7 - Radial curvature and transverse curvature .............................................. 126
Figure 5.8 - Intensity gradient image a sculpture, an orange and a block .................. 127
Figure 5.9 - Four types of P-edge ............................................................................... 130
Figure 5.10 - lnference rules 1-5 ................................................................................. 136
Figure 5.11 - Inference rules 613 ............................................................................... 137
Figure 5.12 - Lemma 5.1 ............................................................................................. 139
Figure 5.13 - Lemma 5.2 ............................................................................................. 140
Figure 5.14 - Fusion example 1 — image of a solid block ......................................... 143
Figure 5.15 - Fusion example 2 — image of a sculpture, an orange and a block ..... 145

ix

List of Tables

Table 3-1 - Two Solutions for the Upper Surface of the Block .................................. 64
Table 3-2 - Summary of Surface Solutions ................................................................. 67
Table 3-3 - Number of Possible Locations in 3-D under Parallel Projection ............. 75
Table 3-4 - Number of Possible Locations in 3-D under Perspective Projection ....... 77
Table 4-1 - Surface Patches, Texture Primitives, and Classiﬁcation .......................... 102
Table 4-2 - Average Gradient Values on 4 Features for 7 Surface Patches ................ 106
Table 4-3 - Shadow and Shadow-Making Virtual Edges ............................................ 108
Table 5-1 - Comparison of Two Information Channels .............................................. 118

Table 5-2 - Summary of Labeling Results Using Inference Rules ............................. 148

CHAPTER 1

Introduction

 

The task of a general 3-D machine vision system is to recognize the three-
dimensional objects in the ﬁeld of view of the system’s visual sensors, deduce the spatial
relationships among the objects, and understand the conﬁguration of the scene. This very
complex task can be broken down into three basic subtasks : (1) extraction of useful
information in 2-dimension and in 3-dimension, (2) construction of a description of the
objects based on the information extracted, and (3) matching of the constructed descrip-
tion against the descriptions of the object models. These three subtasks are themselves
complex and difﬁcult enough that no general solution exists without various assumptions
that considerably simplify the problem. The difﬁculties arise from the fact that the sensed
images are merely two-dimensional projections of the three-dimensional objects, and
therefore a great deal of information is lost. Furthermore, object models are themselves

hard to build automatically, and there are inﬁnitely many of them.

This thesis deals with problems in the ﬁrst two subtasks, i.e. feature extraction and
construction of object description (representation). In particular, a structured lighting

approach will be presented that

a) recovers the 3-D data of some surface points,

2

b) deduces surface shapes using the 3-D data and the 2-D image features,
c) develops a surface & boundary representation for later matching processes.

As with any other technique for machine vision, the approach presented in this
thesis imposes certain assumptions on the environment of the vision system. But, these
assumptions are quite general; they are not specially tuned for particular applications.

Some basic assumptions are the following.

(1) A controlled environment is assumed in which structured light is projected on
the scene to create a striped image, and a diffuse light is used for obtaining an
intensity image.

(2) Multiple objects are likely to be in the workspace so that occlusions to both the
projector and the camera are expected. Objects are assumed to have "arbitrary"

but relatively smooth shapes. Objects are solid, opaque and static.

(3) Imaging and projection use general positions.

Our experiments using structured lighting showed that this approach is promising
not only in 3-D sensing, but also in feature extraction and shape inference. Relatively
inexpensive equipment (a black-white camera, an image digitizer, and an ordinary 35mm
projector) is sufﬁcient to build the 3-D sensing system. This technique may ﬁnd its value
in many applications, especially in industrial environments such as automatic assembly

control and robotics.

1.1 3-D Scene Analysis

A 3-D vision system is expected to recognize the objects in the scene and where
they are. To accomplish this, it is often necessary to ﬁrst obtain spatial information about
the objects, the depth of some surface points, for example. The 3-D information obtained,
as well as feature information extracted from the 2-D images, is then analyzed to draw

inferences about the shapes of the objects. A proper description of the shape information

obtained is then developed in order to ﬁnally understand the scene. Surveys of 3-D scene

analysis can be found in [Brady 82, Bajcsy 80], and it is outlined in Figure 1.1.

 

 

 

 

 

 

 

2-D input 3-D data
depth cues
data acquisition
3-D data
geometric
processing

 

 

 

surface or volumetric

 

representation

 

models Matcher

 

 

 

description

Figure 1.1 3-D scene analysis scheme

 

4

The ﬁrst module in the diagram, 3-D Data acquisition, is also called 3-D sensing,
range sensing, or depth measurement. Many techniques have been developed, from
time-of-ﬂight direct sensing to stereo disparity measurement. Jarvis [Jarvis 83] and
Strand [Strand 85] have provided overviews of various approaches in this topic. I will

discuss these in more detail in later chapters.

Secondly, object shapes should be correctly inferred. Since there are so many dif-
ferent shapes, only a handful can be described in simple terms such as "planar", "cylindr-
ical", "spherical", or "waved", etc. The majority are too complicated to describe in any
easy way, such as a tree, a shirt, a telephone set, and the like. This difﬁculty leads to the
problem of representation of the objects. A commonly used representation is to
"describe" shape locally by surface normals in addition to position in 3-space. This
representation gives surface orientation at every point of interest, and shows how the
orientation changes from point to point. Marr and Nishihara called this type of represen-
tation the 295D sketch [Marr and Nishihara 78]. Depending on how such an orientation
map is obtained, approaches developed include shape from texture [Bajcsy and Lieber-
man 76, Kender 80], from shading [Horn 76], from contour [Stevens 79], from stereo
[Marr and Poggio 76], from motion [Ullman 79], and so on. After surface shapes have
been obtained, their relationships must be established in order to understand the scene.
Objects may be partially occluded, in the shadow of others, touching each other, and at
various distances. The relationships between surfaces (or surface patches), as well as the
surfaces themselves and their associated edges should then be properly described using
some representation, which is then fed to a model-matching module for recognition of

individual objects and understanding of the entire scene.

There are other t0pics in 3-D scene analysis, but I shall discuss issues mainly in the

two topics stated above, namely "3-D data acquisition" and "surface representation".

1.2 Problems Addressed in This Dissertation

In the Pattern Recognition and Image Processing (PRIP) lab of the Computer Sci-
ence Department at Michigan State University, the components of a 3-D machine vision
system have been developed consisting of 2-D processing for feature extraction, 3-D
sensing, generation of 295D representation, and model matching for recognition. It uses a
light striping approach to get sparse 3-D data and a rough surface shape description, and
reﬁnes it by using a gray-scale image of the same scene fused with the striped image.
Integration of information from both striped image and gray-scale image is feasible in a
controlled environment such as for automatic assembly and bin-picking for robotics. The
overall system is depicted in Figure 1.2. The problems that I have been investigating and

will be discussing in this thesis include the following.

(1). To obtain the 3 -D surface data through the use of structured lighting, which pro-
jects a grid of light to illuminate the scene. This process is a triangulating depth
sensing with the camera-projector forming a stereo pair. The major difﬁculty,
namely the line identiﬁcation problem, is partially solved using a set of general
constraints. Further breakdown of topics are :

camera and projector calibration

stripe extraction from the 2-D striped image

3-D computation via triangulation

3—D surface solution using general constraint propagation

(2). To deduce the rough surface shapes from the sparse 3-D data and from light stripe
texture, which refers to the stripe patterns seen in the image. This problem
involves topics in surface ﬁtting (B-spline), curvature computation (the Gaussian
curvature and the two principle curvatures at surface points, as deﬁned in classical
differential geometry), and texture analysis (placement of the distorted stripe pat-

terns in the image).

(3). To develop a surface representation suitable for later model matching. The sur-
face representation is a variation of Marr’s 21/2D sketch, including descriptions of
edges, surfaces, and their relationships.

(4). To integrate information obtained from both the light striped image and the gray
tone image of the same scene. Multi-channel fusion helps to complement the two,

and to reach a better description of the scene.

 

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Projecto D iffuse Illumination
Striped Intensity
Image Image
Threshold,
Thin, & Track Gradient
Network .
Shading Edge
image
2-D proc 2-D proc. 3-1) geometry .
Grouprng
. Creases 2—D Textured 3-D Surface 2-D Edge
2-D Lrnes . .
Regions Solutions Segments
3-D
Feature Extraction
Model Matching
Scene Analysis

 

 

 

Figure 1.2. System diagram

 

CHAPTER 2

3-D Interpretation Using Light Striping

 

Structured lighting has been used in machine vision for about two decades, and now
is regarded as a standard technique in machine vision. Most work using structured light
has concentrated on 3-D sensing. In the MSU PRIP Lab, we have been using a structured
lighting approach in 3-D surface sensing and in constructing surface representation as
part of our 3-D vision system. Using light striping to create artiﬁcial light patterns on
object surfaces, which are easy to detect and analyze, not only can we develop 3-D sur-
face data at the selected surface points (on the light stripes), but also we can infer the
shapes of surface patches directly from the light stripe texture seen in the 2-D image.
This chapter will outline the three phases in surface representation using structured light:
(1) 3-D sensing method using a grid of light, how the "line labeling" problem may be
solved; (2) inferring surface shape via sensed 3-D data and stripe textures; and (3) con-
structing surface representation through combination of information from light striping
and intensity. First, we brieﬂy discuss the relationships between the light source, the

scene and the viewer (camera).

9

2.1 Three Components of Visual Processing

A visual perception is the result of interaction between three participants : the light
source, the scene, and the viewer. Each of the three is indispensable to a normal percep-
tion, and so are their relationships. The human visual system is capable of adjusting itself
to adapt to the changing environment, so the three components of vision usually function
in a nicely cooperative manner and result in a successful perception. But a machine
viewer (machine vision system) may lack this capability. A proper arrangement of the
other two components and their relationships is often necessary for the perception to be

successful.
2.1.1 light source

In order to be able to see things, there must be at least a certain amount of light
reﬂected (or originated) from an object to the retina. This constitutes the input end of the
visual chain. We don’t need to worry too much about the physics of the light itself,
whether it’s waves or quanta. We simply accept the laws of optics such as "light travels
straight". But the physical properties of the illuminating light may considerably affect the
visual perception, since change of the properties will alter the input to the processing
chain. We want the light source to be neither too bright nor too dim and at a proper posi-
tion such that the image sensed by the camera carries essential detectable features of the
visible surfaces of objects in the scene, such as occluding boundaries, surface textures
etc. Structured light, such as light stripes, is one such light source that makes surface
features very salient and easy to detect. By properly adjusting the structured lighting
equipment (a slide projector, for example), we can easily make the light stripes on sur-
faces to be the only detectable features, so that any other inherent surface textures or

marks will not interfere with the imposed stripes.

10

2.1.2 the scene

Reﬂectance and other physical properties of the object surfaces play an important
role in perception that enable us to discriminate one surface from the others through the
change of the physical properties. For example, a piece of metal is distinguished from a
piece of wood by their different surface reﬂectance that makes one darker or more tex-
tured than the other. It is this contrast that contributes to recognition. If there were no
contrast, there would be no vision. But too slow or too rapid change of reﬂectance across
the ﬁeld of view (spatial frequency) prevents us from seeing the contrast [Campell and
Robson 68]. In machine vision, the contrast of reﬂectance creates what are called
"features" in an image. Feature extraction is one of the ﬁrst modules in early vision; all
further processing is based on whether the ﬁrst module did a good job or not. Obviously,
feature-less images provide no useful input to the processing chain. The objects in the
scene must offer sufﬁcient reﬂectance contrast at positions important to recognition. On
the other hand, detectable contrast at irrelevant positions (noise) tends to destroy the
structure of the object patterns and makes recognition harder. Images under structured

lighting always provide surface features that are those imposed bright light stripes.

In short, the light source and the scene compose the source from which a good qual-
ity or a poor quality image might be produced. The success of the entire processing
heavily depends on the quality of the image.

2.1.3 the viewer

Perception is the product of the viewer. Impairment of any part of the human visual
system will greatly affect the ﬁnal perception, regardless of whether the damage is in the
eyeball, in the lateral geniculate nucleus, or in the visual cortex. A severe defect in any
functional module in a computer vision system will lead to poor object recognition and

hence poor understanding of the scene, no matter whether it occurs in the sensor, in

11

feature extraction, in representation, or in model matching. Because each module is stu-
died and developed somewhat in isolation from the others, we often assume no
signiﬁcant defects (although there may be some small ones) inherited from preceding
modules which produce input to the later modules.

2.1.4 tri-relationships

When looking at objects in front of the light source, we see a silhouette, whereas
when the objects are lit from the front or side, their details look clearer. It is obvious that
the relative position and orientation, or spatial relationships, of the light source, the
objects, and the viewer, is critical to perception. Our aim is to infer their spatial relations,
especially the surface orientation relative to the viewer from 2-dimensional images. This
inference is a reverse procedure of imaging. Techniques that make such inference have
been widely studied, designated by the name "shape from x", where x can be shading,
texture, contour, stereo, motion, etc. Since the three components of vision interact in a
very complicated way, we often want to set up certain conditions or assumptions that
simplify the interaction of the three components. For example, the light source may be
assumed to be a distant point source generating parallel illumination; the object surfaces
may be assumed Lambertian so that light is reﬂected equally in all directions; the objects
in the scene are usually assumed to be solid, opaque and relatively smooth. Under these

conditions, various approaches to certain vision problems can be thoroughly studied.

2.2 3-D Sensing

It is remarkable that human beings are able to make distance judgements at all, let
alone do so automatically and effortlessly. The retinal images from which depth informa-
tion is extracted are two-dimensional and, therefore, inherently without depth. To extract

information about the third dimension, the brain utilizes a variety of different depth cues,

12

providing a 3-D visual world. The sources of information about depth, or cues to depth,
usually operate in harmony, yielding an unambiguous impression of 3-D space. Figure

2.1 shows different sources of depth cues [Gibson 50].

 

Depth information

/\

Visual Oculomotor

,/l

Binocular Monocular Accommodation Convergence

Motion parallax Static

/l\

Interposition Size Perspective

,/l\

Shading Texture Aerial

Figure 2.1 Depth cues

 

In these categories, oculomotor cues (accommodation of the lens and angle of con-
vergence of two eyes) are the only ones that provide unambiguous information about
absolute depth, the distance from an observer to an object. In contrast, any of the visual
cues can provide good information about relative depth, the distance between different
objects or different parts of a single object. But they must be supplemented with other

information in order to specify absolute distance.

13

Analogously, for a computer vision system, the camera image consists of a complex
distribution of intensity, contour and color, contained in a ﬂat, 2—dimensional picture. To
provide 3-D data from the 2-dimensional picture is just like solving for three unknowns
using only two equations. We must somehow ﬁnd a "third equation". The "third equa-
tion" must be independent of the two already provided, i.e. supplementary information
must be obtained from somewhere other than the input picture itself, if absolute depth is
to be recovered. As examples, the supplementary information may be obtained from
additional images taken at different positions in space or in time, from a sensor calibra-
tion process, from knowledge of the speciﬁc application domain, or from natural con-
straints in general. Information from different sources or channels is expected to be con-
sistent and yield good surface solutions. This principle of integrating information from
multiple sources is not only valid for 3-D sensing, but also important in later higher-level

processing.

3-D sensing methods can be divided into several categories, according to how the
additional information is obtained.

(1). Direct Sensing (Time-Of-Flight). The distance of an object is determined by
measuring the time it takes for light to travel from the source to the object and
back (receiver is at the same location as the source). In practice, the difference in
phase between the transmitted and received signals is measured rather than the
time of ﬂight. The phase shift is directly (but not uniquely) related to distance. A
representative of this approach is a laser range ﬁnder, which can sense the distance
to any surface point in its ﬁeld of view. This approach measures distance with no
image processing. It doesn’t really establish 3-D data for surface points, rather it
provides only information about depth. In this sense, this method should be called

third-dimension sensing instead of 3-D sensing.

(2).

(3).

(4).

14

Shape From x. As mentioned earlier, x may be texture, shading, contour etc.
Methods in this category develop 3-D surface shape in terms of local surface nor-
mals, i.e. local surface orientation with respect to the camera, provided that the
camera viewing direction (optical axis) is known. They do not really provide abso-
lute distances to objects but only relative distances or orientations. It is good
enough in situations where only object shapes are important, but does need extra
information to determine the absolute distance for, say, robot motion control pur-
poses. In situations where absolute distances are not critical to our tasks and we are
only interested in surface shapes or orientations, shape-from-x may be a good

approach to achieve our goal.

Stereo. Emulating human binocular vision, the stereo approach to 3-D sensing
employs two cameras that have a known relative displacement and a known con-
vergence angle of their optical axes. By measuring the disparity between the two
images, depth can be calculated via triangulation. As has been emphasized in the
literature, the major difﬁculty is the "correspondence problem" — given an image
feature in one image, to identify the image feature in the other image such that the

two are created from the same physical surface feature.

Structured Lighting. In illuminating the scene, natural ambient light is replaced
by an artiﬁcial light source, which can be of any structure (pattern) that is con-
venient for the task. Using structured light by itself is not an independent approach
to 3-D sensing; the underlying means is still a monocular (shape-from) or binocu-
lar (stereo) method, but under different illumination conditions and hence facing
re-phrased problems. If a single light beam or a single light plane is used as the
light source, the underlying method is direct sensing through triangulation. If a uni-
form grid of light is the light source, the underlying method is stereo and analysis
of textures and contours. The major advantage of using structured light over

ambient light is that features in the images are better deﬁned. Image features are

15

easier to detect; their relationships are more regular (parallel, equi-spaced, etc.)
following the property of the generating pattern of the light source, they prom-
inently reveal the surface geometry that humans can readily use to interpret the

806116.

2.3 Structured Lighting

Because of the potential advantages, we use the structured light approach not only
for 3-D sensing, but also for surface shape inference from 2-D stripe textures. In this sec-
tion, I shall discuss some issues in using structured light: stripe labeling (the correspon-
dence problem), comparison with the stereo approach, previous work using structured

light, and how light striping can help to infer surface shapes.
2.3.1 Is it a good approach?

Often, the illumination is ambient, using diffused light Every point (up to the reso-
lution granularity of the viewer’s sensor) in the ﬁeld of view that is illuminated is seen by
the viewer, and provides a piece of information about that point. The total input to the
processing system consists of millions of such pieces of information that are to be
resolved. For a machine vision system, how to pick useful ones from the millions is often
a very difﬁcult task. Edge detection, for example, is commonly considered as the ﬁrst
thing to do in the process, that ﬁnds features that are the signiﬁcant changes of input
information from point to point. Because of the inevitable noise introduced under
ambient illumination, even the best edge detection methods followed or preceded by a
noise removal process may never work well on a complex noisy image. If the ﬁrst pro-
cessing step is deﬁcient, how could we expect the later steps down the processing chain
to perform well when they are based on the output of the ﬁrst processing module? One
way to solve this problem is to get "good" input in the ﬁrst place, which provides just as

much useful information as necessary for the recognition purpose, and as little

16

information as possible that may interfere. Structured lighting is one of the alternatives

that may achieve this goal.

Structured lighting is an active approach that projects onto the scene a pattern that
has a geometrically regular distribution which is known a priori. A laser beam generat-
ing device or a slide projector can be used to generate the structured light patterns. The
most often used structured light patterns include a single light ray that generates a single
bright point in the scene, a single light sheet that produces a bright line or curve on the
object surfaces, parallel light sheets that cause parallel bright lines or curves, or a grid of
light that casts a grid pattern on object surfaces. Figure 2.2 shows some example images

taken under illumination of a grid of light.

l7

 

 

Figure 2.2 Example of structured light images

 

18

Structured light is good for the following reasons. First, it changes the image of
features by imposing artiﬁcial features on the object surfaces. These artiﬁcial features are
very easy to detect. Images thus obtained are directly "enhanced" by the structured light
rather than by applying image enhancement algorithms. Second, false features caused by
noise under ambient illumination are greatly reduced in the sense that the noise would
not be present in a light striped image where only bright light lines or grids are visible to
the camera and elsewhere is assumed totally dark. Thus a binary image with no false
parts due to noise is easily created. Third, unlike passive methods, a structured lighting
system processes only a small portion of the 2-D image (only bright pixels on light
stripes are of interest), resulting in considerable savings of storage and processing time.
Last but not least, the distortion of the projected grid pattern is a function of surface
shape; the 2-D light stripe networks alone convey a great deal of information about
object shapes, which would not be directly available otherwise. In addition, an exactly
registered intensity image of the same scene is easy to obtain by simply turning off the
projector and turning on another light which may be placed at the position of the camera:
hence the intensity image can be fused with the striped image to provide combined infor-

mation that would not be available in either of the two alone.

2.3.2 Sparse vs. dense : assumptions about the scene

The structured lighting approach deals with sparse surface points and hence gains
storage and time savings in computation. The trade-off is that the information about
many other surface points (not on bright stripes) is lost. What are the effects of this loss
on our goal of object recognition? In other words, we want the sampled surface points to
be dense enough to correctly represent the surface, and sparse enough to keep as little

superﬂuous information as possible.

19

How densely the surface points should be sampled depends on the spatial charac-
teristics of the surface and the granularity we want to describe the surface. For example,
if we knew a surface is an ideal planar surface, it would sufﬁce that only three non-linear
points be computed in order to determine the plane, and a few boundary points for
describing the surface. Any information more than that is superﬂuous. On the other hand,
if the surface undulates in a complicated fashion, we need to collect information from
many more points or we will obtain a seriously defective surface description. The prob-
lem is that we do not know a priori what kind of surfaces we are about to handle. On the
contrary, to determine the surface characteristics is our goal. Hence, the density of sam-
pled points, or equivalently the stripe spacing in structured lighting environments, can
only be determined under certain assumptions about the object surfaces. In other words,

the sparse light striping method would work, under these assumptions:

(1) Surfaces are smooth and are of "low order" in the sense that the spatial frequency
of surfaces is less than the stripe frequency; i.e. there is small spatial change on

the surfaces between consecutive stripes.

(2) Surfaces are "much" larger than stripe spacing so that a surface patch is covered

by a multi—stripe network.

Thus it is assumed that light stripes capture the essential features of the surfaces and
that the features missing from striping are insigniﬁcant. The class of objects this
approach may work on includes objects of smooth surfaces that are not too small relative
to the stripe spacing. This class of objects is often encountered in industrial applications,

such as bin-picking, where the possible objects are known a priori.

2.3.3 The correspondence problem in light striping

Binocular visual information : stereopsis

20

In humans, the two eyes look at much the same region of the visual space, only near
the margins of the visual ﬁeld do the two eyes provide exclusive monocular coverage.
You may ask, "Why did nature position our eyes in such a way as to provide two views
of the world, thereby duplicating a signiﬁcant portion of these views?" The answer may
be that the slight differences, or disparities, between the view seen by the left eye and the
view seen by the right eye enable us to make exceedingly ﬁne depth judgements that are
simply impossible when using just one eye. Psychologists have shown that we can tell a
1 millimeter difference from a distance of 1 meter, which corresponds to a judgement
accuracy of one-tenth of 1 percent. In other words, the resulting disparity between the
two eyes’ views is less than four ten-thousands of a millimeter, many times smaller than
the diameter of a single visual receptor in your eye [Sekuler and Blake 85]! Because of
this extraordinary resolving power, stereopsis has been extensively studied throughout

the literature.
The correspondence problem

In order to use stereo as a depth sensing model in machine vision, we have to ﬁrst
solve the correspondence problem, i.e. to identify the two matching points, one in each
image, between which the disparity occurs. The difﬁculties of the correspondence prob-
lem arise due to the following reasons. (1) There must be at least some feature points
(edge points, say) in the images; images of homogeneous properties make the matching
impossible. (2) The overlapping portion of the ﬁeld of view of the two images must be
such that a feature should appear in both images; so features in one image but missing
from the other will considerably complicate the matching process. And (3), the matching
process is computationally prohibitive due to combinatorial explosion if the number of
feature points is large. Julesz [Julesz 71] developed random-dot stereogram, both halves
of which consist of nothing more than a random array of black and white dots. The two
halves are identical except in one of the two halves of the stereogram a central subset of

dots has been shifted laterally by several rows. This lateral displacement creates a retinal

21

disparity between the two halves, and yields patterns separated in depth when viewed
through a stereoscope. Since each of the two halves is made up of numerous random
dots, it is hard to imagine how the "feature matching" is undertaken because there are so

many tiny dots in one image that would match up with any single dot in the other image.
The grid line labeling problem in light striping

A structured lighting system for depth sensing is essentially a stereo system, in
which the camera forms one view and the structured light generating device (a slide pro-
jector, say) is considered as the second view. The only difference is that "imaging" direc-
tion at the second view is reversed. As far as the geometry in 3-D computation is con-
cerned, imaging or reverse imaging makes no difference. Pretend the light is "reﬂect "
from the the object surfaces to the projector. The projector would "see" a regular struc-
tured pattern on its "image plane" which is exactly the pattern on the slide. Hence the
camera image and the projector "image" compose a stereo pair. Since the features of
interest are the light stripes, the feature matching between the two images is to tell which
grid line in the projector image generates a particular stripe in the camera image. Thus,
the correspondence problem becomes the grid line labeling, or grid line identiﬁcation
problem. Notice that in the camera-projector stereo the grid lines are complete in one
image of the stereo pair (the projector image), hence a matching will always succeed

although some stripes may be missing in the camera image due to occlusion.

Let’s now discuss how the grid lines might be identiﬁed. First, if only one single
line is projected, there is one stripe in the image. This stripe may be broken into pieces
because of occlusion or discontinuity of object surfaces, but all these pieces correspond
to the only projected line. The line identiﬁcation problem is immediately solved. But, the
projected line needs to be swept across the scene in order to obtain the depth measure-
ments at various points that cover the entire scene. Scanning usually is slow and requires

a special device which ought to insure the accuracy of the calibrated setup during the

22

sweeping. Shiri and Suwa used this simple stripe lighting scheme for polyhedron recog-
nition [Shiri and Suwa 71]. The range ﬁnder they developed employed a rotating slit
projector which projects a light beam on the objects. While the projector is rotated, pic-
tures are sampled by a TV camera at predetermined times and the information related to
each picture and the corresponding slit beam angle is stored. The 3—D position of each
slit is calculated via trigonometry. Agin and Binford [Agin and Binford 71], instead of
using a rotating projector, employed a rotating mirror, which reﬂect the light beam onto
the scene. The 3-D positions on that stripe were calculated. The mirror then rotated a
small angle and the procedure repeated. After a sweep was ﬁnished, the orientation of the
light lines was rotated by 90°, and a second mirror scan sweep was taken. This resulted in

data for an overlapping grid of laser lines covering the scene.

In cases where multiple lines are projected at once, it would not be trivial to distin-
guish one line from the others provided that all lines are identical. If each of the projected
lines bears a unique property, such as color [Yang, Boyer and Kak 84], thickness [Le
Moigne and Waxman], spatial code [Posdamer and Altschuler 82], etc., the problem will
be easier to attack. But the desired unique properties may themselves be hard to extract
from the image and sometimes require special equipment to generate (color, for exam-
ple). In using spatial code, identical lines are projected, each of which is assigned a
unique spatial code in advance. A sequence of images are taken, each with certain pro-
jected lines on or off according to the spatial codes assigned to each line. For example,
four projected lines are assigned codes 1,2,3 and 4, i.e. 001, 010, 011, 100 in binary.
Three successive images are needed to encode the lines. If a line is seen only in the ﬁrst
image it is line number 1; if it is seen in the ﬁrst and the second images, it must be the
line number 3, and so on. If N lines are projected, the spatial code length is log2(N)+l.
That is, log2(N)+l images are required. This method makes line labeling simpler, but
requires multiple frames and more importantly requires very precise alignment of these

frames.

23

It appears desirable to use a "snapshot" scheme — one single image of multiple
lines, which requires neither special devices nor multiple images. The trade-off is that the
line identiﬁcation will become more complicated. If the projected lines can be identiﬁed,
their 3-D locations can be calculated via triangulation. Our approach for identifying the
projected lines is to ﬁnd a set of labels a line may possibly be assigned, compute the 3-D
locations for each candidate label and get rid of those that violate certain constraints.
The constraints are very general ones that are available in the real world. A few or hope-
fully a unique line label may survive the constraint tests. The line labeling method using

constraints for solving for 3-D surface points will be discussed in detail in Chapter 3.

2.3.4 Stripe texture

Using projection of a grid of light, not only can we obtain data of surface points in
3-space, but also the stripe patterns (or the texture of the stripe networks) cast on the sur-
faces provide clues about surface shape. From the striped image examples in Figure 2.2,
we perceive 3-D shapes through the 2—D stripe texture alone. This is because the distor-
tion of the stripe pattern from its generating pattern (a regular grid) directly relates to the

surface shape.

Research shows that surface orientation (slant and tilt) can be resolved from surface
texture, texture gradient in particular [Gibson 50]. Shape-from-texture methods have
been used to recover shapes [Bajcsy and Lieberman 76, Ikeuchi 84, Kender 78] for many
years. In the case of light striping, the texture elements (texels) are those "quadrilaterals"
externally imposed on the object surfaces in the form of a light grid, rather than an intrin-
sic property of the surface as most natural textures are. The good thing about stripe tex-
ture is that it is more regular and more structured than most natural textures, and easier
to extract and analyze. Moreover, because the stripes are the intersections of 3-D planes

(planes of light) and the object surfaces, they reveal surface shapes in a more

24

deterministic manner since the geometry of the 3-D planes are known in advance.
Assume that the relative displacement of the projector and the camera is ﬁxed as it is in
our experiments, the ambiguity of the relation between a 2-D image stripe and its
corresponding 3-D surface stripe is very much constrained by the fact that the 3-D sur-
face stripe is also a planar curve (in the plane of a projected light sheet) that may not
freely ﬂap in the 3-space to image in the same observed 2-D stripe. This property does
not hold for natural textures. In Chapter 4, we shall discuss the issues of how the surface

shapes can be deduced from 2-D stripe textures.

2.3.5 Fusion of light striping and intensity

Although light striping is a promising approach in machine vision, it may not be
useful for probing information about surfaces that have no stripes due to occlusion of the
projector (shadows). Also, the stripe networks may not give an accurate segmentation of
the image, due to the stripe fading effect on curved objects. To overcome these
weaknesses, among others, we might need information from some sources other than
light striping. An intensity image of the same scene provides such a source, on which
classical image analysis such as edge detection can be applied. Edges are regarded as pri-
mary information for object recognition, but they are subject to a high percentage of
errors (noise), provide little information about surfaces, and need a signiﬁcant amount of
time to processi' Combination of information from the two sources — intensity and
light striping —— is needed. In short, light striping provides surface information, while
intensity provides boundary information. The fusion of these two provides a surface-
boundary representation of the objects in the scene, which will be the subject of Chapter
5.

 

TActually, human ability at determining shape from line drawings — even corrupted ones — is
uncanny [Biedennan 87]. General shape inference by machine using corrupted boundary data
remains an unsolved problem.

25

2.4 3-D object representations

Object recognition is done by comparing or matching features extracted from the
image with precompiled object models. To carry out the matching, the observable (image
features) and the target (model) must be described with the same representation scheme.
The choice of a representation depends not only on the object domain, but also on the

matching scheme. There are three categories of representation on the basis of their
dimensionality of spatial description. These three categories are 2-D, 2%—D, and 3-D

representations. The details of these representations are surveyed in [Chin and Dyer 86]

and [Bhanu and Ho 87].

In 2-D representation, objects are described using a set of one or more distinct
views. Each view is treated independently. The 3-D problem is reduced to 2—D using 2-D
image features and their relations as primitives [Lieberman 79]. A commonly used
approach is to organize global features of an object into a feature vector, or geometric
property list. The global features include area, perimeter, compactness, elongatedness,
center of gravity, moments of inertia, number of holes, number of corners, etc. Selection
of features can be determined by training [Gleason and Agin 79]. Matching schemes
using feature vector models usually involve statistical pattern recognition techniques,
such as the Bayes classiﬁer, nearest neighbor classiﬁer, and decision-tree classiﬁer [Agin
and Duda 75]. In the feature-vector approach, if features are easy to compute, object
representation is compact, and the matching is fast. But this approach requires a separate
model for each possible 2-D view of an object; and it is unable to handle occlusion.
Researchers have also developed various other methods using 2-D representations, such
as line and are boundary segments model [Perkins 78], Fourier descriptors [Persoon and
Fu 77], hierarchical feature model [Shirai 78], local evidence accumulation via cluster-
ing [Stockrnan et a1 82], and template matching in generalized Hough transform space

[Ballard 81], among others.

26

In 2-;—-D representation, features are deﬁned in "surface space" with respect to the

view coordinate system. Local surface properties such as orientation, depth, discontinui-
ties, reﬂectance, etc., as well as 2-D features (points, lines), are used in the description of

objects. Representations of this kind are the intrinsic images [Barrow and Tenenbaum
78], the 2-;—-D sketch [Marr 78], the needle map [Horn 79], parameter map [Ballard 81],

and surface-orientation-map [Brady 82]. In industrial parts-recognition applications, the
use of range maps and local surface-orientation maps are of particular interest. A great
deal of effort has been made in recent years in segmentation of range data into surface
patches [Bolles 81, Bolles and Fischler 81, Henderson 82, Henderson and Bhanu 82, Mil-
gram and Bjorklund 80, Sugihara 79, Besl and R. Jain 85, Hoffman and A. Jain 87].
Many shape-from techniques that aim to derive the surface-orientation map have been

proposed [Horn 75, Bajcsy and Lieberman 76, Stevens 81, Witkin 81, Woodham 78].
Using 2—;--D geometric features to represent rigid objects, matching can be done via

discovery of a linear transformation (translation, rotation and scaling) of the iconic object
model that put all salient features into correspondence. The sensed data are considered
an instance of the model through the transformation. The transformation is found
through clustering in the "transformation space" [Chen and Stockman 86, Stockman 87].
This method can handle multiple "general" objects and partial occlusion; it allows imper-
fect sensed data, and features can be added or deleted during the process. But for multi-
ple object scenes, especially for objects of arbiuary shape, the computational cost may be
rather high. Interpretation tree search [Grimson and Lozano-Perez 84, Chen 86] is also a
feasible approach to matching. This backtracking approach keeps track of the mutually
consistent pairs of sensed feature and model feature along the path searched in the tree.
There are many ways to incorporate heuristics to speed up the search process, for exam-

ple, by consideration of features in some special order or by using quick constraint

checking. Another popular approach using 2—;nD representation is the relational-feature

27

graph [Oshima and Shirai 83, Fan and Medioni 87, Hoffman and A. Jain 87]. In a
relational-feature graph, each node represents a surface patch with labels (planar, convex,
concave, ellipsoid, cone, cylinder, etc.), and an edge represents the relationship between
the two adjacent surface patches. Each edge is associated with a set of properties, such as
angle between the two surfaces, relative positions of the centroids, etc. Matching is per-
formed by comparing an observed graph with a set of graphs, one for each viewpoint of

each object modeled.

3-D representation is a viewpoint-independent representation using an object-
centered coordinate system. It describes objects in "object space". Objects are
represented using surface patches [Baumgart 72, Shneier 81], spines and sweeping rules,
volume descriptors such as generalized cylinders and generalized cones [Binford 71,
Brroks 83, Nevatia and Binford 77], or multi-view features [Koenderink and vanDoorn
79, Goad 83]. 3-D models allow the most general and complete descriptions of objects,

but require the most extensive and sophisticated processing.
In our work, we construct a 2—;--D representation using the features extracted from

the striped image and the gray-tone image. The representation is in the 2%~D category

since the features are obtained from a single viewpoint and they contain geometric sur-
face properties such as 3-D location, orientation (normals) and shapes, as well as the 2-D
features (edges). Because each surface patch is deﬁned associated with its boundary
edges, we also call the representation a wing representation. In particular, the representa-

tion consists of the following basic elements.
1. surfaces
(a) qualitative — type (planar, convex, concave, cylindrical, etc.),

(b) quantitative — boundary, 3-D location, orientation;

28

2. edges

(a) qualitative — type (extremum, blade, fold, mark, shadow),

(b) quantitative — location, length
3. surface-surface relationships

(a) occlusion,

(b) adjacency,

(c) belonging to same or different objects;
4. surface-edge relationships

(a) surface boundaries,

(b) surface marks,

(c) shadow edges,

((1) intersection (common border) of two surfaces.

This representation is a surface-boundary representation similar in purpose to

Marr’s 2—;-D sketch [Marr 82]. Upon construction of the image representation, image

features that correspond to object shapes are made explicit — they can be described sym-
bolically. This representation is to be matched against object models that are constructed
independently. We are currently investigating modeling and matching processes based on
the wing representation [Stockman and Chen 87, Stockman et a1 87]. The ultimate goal

is to recognize objects and understand the scene.

CHAPTER 3

Surface Solution —

Geometric Computation and Constraint Propagation

 

It is often desirable for a machine vision system to collect 3-D surface data as the
ﬁrst processing step (3-D sensing), because many geometric inferences about the object
surfaces can be made through the 3-D surface data. 3-D surface data may be expressed in
terms of some coordinate system or expressed as ranges (distances from the surface
points to the viewer). As discussed in Chapter 2, 3-D sensing techniques are divided into
four categories — direct sensing, shape-from—x, stereo, and structured light. This chapter
will describe a method to compute surface data using light striping, in which the stripe
identiﬁcation problem is partially solved using general constraints. These constraints are
based on the assumption that objects in the scene are solid, opaque and static. Once the
stripes are identiﬁed, the surface solution is calculated through triangulation. Experimen-
tal results show that this approach often produces surface solutions with a small degree of
ambiguity and sometimes a unique solution is reached. In case of multiple solutions, any

of the locations in 3-space can be used to infer the surface shape.

29

30

3.1 Oculomotor Cues and Calibration

This section deals with the camera calibration process. In order to obtain depth
information using triangulation, the parameters of the sensors, such as the focal length
and the direction of the optical axis, should be ﬁxed and known; or equivalently, the 3-D
to 2-D projection should be ﬁxed and known. In human vision, these parameters are
regarded as oculomotor cues; whereas in machine vision, they are obtained through a

calibration process.
3.1.1 Oculomotor cues — depth information from within the eyes

Among the various cues to depth perception, as briefed in section 2.2, the oculomo-
tor cues are the real ones that relate the information directly to the distance from you to
the objects [Wallach and Floor 71, Sekuler and Blake 85]. Oculomotor cues refer to
accommodation of the eye lens and the angle of convergence of the two eyes. The
former is a measure of how much the lens is strained when your eyes ﬁxate on an object;
while the later provides information on the directions of the optical axes of the two eyes.
Suppose we were able to attach tiny gauges to the two sets of muscles that control the
lens and the eyes’ inward-outward movement. The readings on the gauges, or the degree
of muscular contraction, are transferred to somewhere in the brain that can ﬁgure out the
two values -— angle of convergence and the amount of accommodation. The distance
would then be computed by solving the "trigonometry equation" that relates distance to
the two values. One amazing thing about this is that the whole process is done in mil-
liseconds and the depth judgement is made continuously as the ﬁxation point moves from
one place to another. But both accommodation and convergence operate over a limited
range of distance. When you focus on an object from a distance of 20 feet and beyond,
the muscle controlling accommodation assumes its most relaxed state; at the same time,
the convergence angle vanishes to zero. 80 the oculomotor cues are relyed on only within

a region of space immediately in front of you, otherwise the resulting distance measure

31

won’t be reliable.

The geometric principle of the distance computation is illustrated in Figure 3.1. If
the displacement of the two viewing points (two eyes) and the two optical axes are
known, the triangle can be easily solved. It is obvious that if the two optical axes are
almost parallel, or equivalently the two angles 91 and 02 are close to 90°, a very small

inaccuracy in the measurements will introduce a large error in the computed distance.

 

_ b81119} 811192
" sin(l80°-01—02)

 

 

 

 

b

Figure 3.1 Geometry of triangulation

 

3.1.2 Camera and projector calibration

In order for triangulation to be applicable to depth computation in a system with a
camera-projector setup, we must have some "oculomotor" information available. The

procedure that gains this information is known as calibration. A calibration procedure

32

estimates internal parameters of the camera such as focal length, direction of the optical
axis etc., embedded in a transformation that geometrically relates a point in 3-space and
its projection point in the image plane. The transformation is often expressed as a 3x4
matrix that transforms points in one coordinate system into points in another coordinate
system, with rotation, translation and scaling [Faig 75, Hall et a1 82, Yakimovsky and

Cunningham 78, Tsai 86]. We ﬁrst describe our coordinate systems.
Sensing environment and coordinate systems

Figure 3.2 is a sketch of the sensing environment used to create images such as the
one in Figure 2.2. Three coordinate systems are deﬁned in this environment: the global or
world coordinate system, the camera coordinate system, and the projector coordinate sys-
tem. The global world coordinate system is ﬁxed on the worktable with its xy plane coin-
cident with the table plane and z axis pointing up; all output surface and edge solutions
are in terms of the global coordinates. The camera coordinate system has origin at the
center of the camera lens with the z axis towards the focal point. The front image plane is
at -f on the z axis, where f is the focal length of the camera lens. The image is digitized at
512x512 resolution, while the 2-D digitized picture counts its rows (x axis) from top to
bottom and columns (y axis) from left to right. The projector coordinate system is simi-
lar to the camera coordinate system, except that the unit is grid line number rather than
pixel. MxN grid lines are used (in the experiments, we used M,N in the range from 14 to
21). The spacing of the lines is about 15mm on the worktable and the working ﬁeld-of-
view is about 250m square.

We also deﬁne two terms up and down as direction indicators in the image plane.
The optical axis of the projector is considered directional from the focal point to the
worktable. The projection of the optical axis in the image plane deﬁnes the up-to-down
direction in the image. Some times we also use the terms above and below. These terms

will be used in Chapters 4 and 5.

33

 

 

Projector

 
    

 

 

Camera
yt' ‘\ I
\
I
1
| 0
\\ x; : optical axrs
p,- ‘t : of projector
\
x , _ l
\ optical axrs '
\
x of camera
\\
2W
n
worktable
:xw

Figure 3.2 Sensing environment and coordinate systems

 

34

3-D to 2-D transformation

We use (x,,,,y,,,,z,,,), (x,,y,-,z,-) and (x,,y,,z,) to denote a world point in 3-space, image
point in camera coordinate system, and grid point in projector coordinate system, respec-
tively. The subscripts w, i, and g stand for world, image, and grid. Since the image plane
is ﬁxed at z,- = constant, all image points have the same 2 coordinate, we need only (x,,y,-)
to represent an image point in the image plane. Similarly, we use (x,,y,) for a grid point
in the projector plane.

The transformation from one coordinate system to another is commonly deﬁned as a
homogeneous 4x4 matrix M. Given a point (X,Y,Z) in one coordinate system, its coordi-

nates (x,y,z) in the transformed coordinate system are

i

Since in our case the image plane and the projector plane have constant 2 values in their

YR

 

 

~N~<><

t—IN

coordinate systems, the transformation from the world coordinate system to the camera

coordinate system would have the form

 

 

y 0‘"
MC 2” - 0’; (3-1)
i" t
and the transformation from world to projector has the form
;~
Mp Z: - S)" (3.2)
1 S

 

 

where t and s are scaling factors. We also call MC the camera calibration matrix, or cam-

era matrix for short; Mp the projector calibration matrix or projector matrix.

35

Calibration

Calibration is to determine the homogeneous transformation matrices MC and Mp,
each of which has 3x4 = 12 entries. Because one of the 12 entries can be made unit
(matrix scaling), there are 11 entries to be estimated for each matrix. Let’s take the cam-

era matrix as an example. Let

511012 013 014
MC: 021022 023 02A .
031032 €33 1

equation (1) becomes

Inc
011012 013 014 1’» tx,
021022023624 2,, = Wt
031 032 633 1 1 i
That is,
XwCrr +wa12+szl3 +Cr4=txr (3-3)
x..621 +yw022 + @623 + 014 = tr.- (3-4)
wa31+wa32+ZwC33+l =1 (3.5)

Substituting for r in (3.3) and (3.4) by the left hand side of (3.5), we get two linear equa-

tions

wa rr +YwC 12+ch 13+6 14 - rwa 31 ‘3wa 32—14sz 33 = xi (3-6)

wa 21 +wa 22+sz n+0 24—11150 31 ‘1‘.“wa 32—)?ch 33 = 1’: (37)

where the 11 c,,-’s are unknown. To solve for the 11 unknowns we need at least 11 equa-

tions. Since each 3-D point (x,,,y,,,z,,) and its corresponding image point (x,,y,-) offer two

36

equations, we need measurements of no less than 6 points which give 12 equations. 25
points were used in experiments reported here to overdetermine the 11 c,,-’s using least-
squares solution. Usually 20 to 30 may be sufﬁcient to guarantee that enough "good"

points are included and are widely scattered covering the ﬁeld of view of the camera.

The same procedure applies for projector calibration. Let M, = (p,,-)3,,4 be the projec-

tor matrix, we have two equations

wa11+YwP12+ZwP13+P14 *ngp 31 ‘1‘;wa 32‘ngwP33 = x, (3-8)

wa 21 +wa 22+ZwP 23+]? 24—ygxwp 31 ‘13pr 32‘Ygzwp 33 = 1'; (39)

By measuring enough light stripe intersection points in 3-D and knowing the grid labels
x,,y,, we solve for the 11 pij’s using a least squares method. The procedure described was

adapted ﬁ'om [Ballard and Brown 85].

More information about calibration can be found in [Tsai 86], where many calibra-

tion techniques are surveyed and a high precision method is proposed.

3.2 Grid Point Computation in 3-D Space

After calibration is done, MC and M, are ﬁxed. Our next task is to determine loca-

tions in 3-space of points that are observed in the image. So we rewrite (3.6) to (3.9) as

(cu—xic31)x,, + (c 12"xic 32)y,, + (c13—xicg3)zw = x,- (3.10)
(C21 -y.-631)x.. + (c'zz-y.-632)y.v + (023—th33)1~ = v.- (3.11)
(Prr‘xgpsr )1:w + (Pr2‘ .1032»w + (m- 1.1233)2.. =16, (3.12)
(pm-Manx. + (pm-mum + (pa—masks = v, (3.13)

with rc,,,y,,,zw as unknowns (See [Hall et a1 82]).

37

The calibration procedure established a model under which the geometry of the pro-
jection process and the imaging process are ﬁxed through equations (3.1) and (3.2), or
equivalently (3.10) to (3.13). From the equations we see that given a 3-D point p,,, =
(x,,y,,,z,,), its corresponding points p, = (x;,y,-) in the image plane and p, = (x,,y,) in the
projector plane are well deﬁned. On the other hand, given an image point p,, there exist
inﬁnitely many 3-D points that may be imaged at the same point p,. They are those that
lie on the same imaging ray. Mathematically speaking, for a given p,- = (x,,y,-) there are
two equations, namely (3.10) and (3.11), for 3 unknowns xw,y,,.,z,,.. The solutions would be
inﬁnitely many. Geometrically, each of the equations (3.10) and (3.11) deﬁnes a plane in
3-space; the intersection of the two planes is a 3-D line, or imaging ray, that deﬁnes the
solution space. Any point on that line satisﬁes (3.10) and (3.11). By the same token, a
projector ray emitting from a given grid point p, deﬁnes an array of points in 3-space (eq.

(3.12) and (3.13)).

Now let’s turn to the "triangulation" computation for (x,,y,,,z,,). Suppose we have
identiﬁed the correspondence between an image point p,- and a grid point p,, that is, we
have determined that p, is the image of a 3-D point (unknown (x,,,y,,,z,,)) that was created
by projecting p, onto the scene. Each of the imaging and the projection point will bring
us two equations; we end up with 4 equations in 3 unknowns. The 3-D point (x...y,,,z,.)
can now be determined. Theoretically, it is the intersection of the imaging ray and the
projection ray. In practice, because of inaccuracy in calibration and discrete imaging, we
will not get two rays intersecting in 3-space. The solution would then be the point in 3-

space that has least distance from both rays.

We return to the assumption that the correspondence has been established. But
how? When we see a bright spot in the image, its coordinates in the image plane (x,,y,-)
are known. But from which grid point (all grid points look alike) in the projector slide is
this image point created? If the grid point (x,,y,) is unknown, the above computation

wouldn’t work — 4 equations with 5 unknowns!

38

Here is where the constraints come in.

3.3 Constraints

The problem we are facing now is to determine which of the grid points in the pro-
jector plane is the one that creates the image point we are looking at. Once the right one
is determined, (3.10) - (3.13) are good for solving for (xw,y,,,z,,). The ﬁrst step is to dis-
card those grid points that can not possibly match the image point (x,,y,-), using a set of
general constraints that are available in the real world. The grid points that survive the

constraint test are considered "candidates" for matching the image point.
3.3.1 Basic constraints

First, we assume that the scene to be analyzed is from the real world, and that the
objects in the scene are solid, static, opaque, and that their surfaces undulate slowly at the
scale of the grid spacing.

Two constraints on physical world objects were identiﬁed by Marr and Poggio
[Mart and Poggio 76] in their stereo disparity computation : (C1) a given point on a phy-
sical surface has a unique position in space at any time instant, and (C2) the surfaces of
objects are generally smooth compared to their distances from the viewer and matter,
divided into objects, is cohesive. They are called the uniqueness constraint and the con-

tinuity constraint.

Since each object has some geometric form that occupies a certain volume and
objects do not overlap in space, not only is a single surface point uniquely determined in
space, but also a cohesive set of object points is uniquely determined in space. Stated in
other words, if a given space volume is occupied by one object, it cannot be occupied by
any other object at the same time. Thus (C1) is extended to a volumetric uniqueness con-
straint: a given solid object has a unique volume in 3-D space at any time instant. Under

the assumption of objects being opaque, a rule corresponding to (C1) applicable in

39

surface computation may be stated as follows: two physical objects have non-
overlapping 2-D images in any viewing (projecting) direction. This rule is very useful in
reducing ambiguity of surface solutions.

The continuity constraint (C2) asserts that if a continuous curve segment is present
in the 2-D image and is assumed to lie on one object surface in 3-D space, then points on
that curve must have continuous 3-D positions. Thus, neighboring points on a surface
constrain each other, or in Mart and Poggio’s terms, they inhibit or excite among them-
selves depending on their computed 3-D coordinates. This rule restricts possible grid line
label assignment to neighboring points, and hence is also useful in removing ambiguity

of surface solutions.

We now reﬁne and materialize the uniqueness constraint and the continuity con-
straint into a set of geometric and topological rules useful in surface computation using
structured light. The algorithms based on these rules are thus very general since these
rules are from the real world with no oversimpliﬁcation or application-speciﬁc assump-

tions being made other than the general assumptions already stated.
3.3.2 Geometric rules

From the uniqueness constraint C,, which says that a given solid object has a unique
volume in 3-D space at any time instant, the ﬁrst three geometric rules follow. We add
general position and work volume rules.

(G1) (Epipolar line — limit of degree of ambiguity) For each pixel in the image
there exists a small set of projector rays that could possibly image there.

(G2) (Uniqueness of single points) Any image point may be the projection of at
most one projector ray.

(G3) (Uniqueness of solid objects) Two stripe networks in an image correspond to

non-overlapping regions in the projector slide plane.

40

(G4) (General position) If two stripe endpoints lie on the same projected ray in 3 -D
then their stripe numbers are the same and one lies on a surface between the

other and the projector.

(G5) (Work volume) The (X,Y,Z) coordinates of any world point must lie within the
ﬁeld-of-view of both the camera and the projector, and be above the support
plane (Z = 0).

These geometric rules are explained as follows.

For (G1), because a surface point (a stripe intersect, say) has a unique position in 3—
D space, the projector ray that creates that bright spot and the camera ray that images the
point are also uniquely determined in 3-D space. The chance for other projector rays to
intersect the same camera ray in space, in order to image at the same pixel, is slim.
Although in practice we do not require the rays to intersect exactly, the number of pro-
jected rays that are close to the camera ray is still small, provided a general viewing
angle. The size of the set of those projected rays determines the degree of ambiguity of
3-D solutions for the surface point and depends on the number of stripes available, stripe
spacing, camera distance from the scene, and the error model. The ambiguity analysis

will be discussed in section 3.6.

Let’s now ﬁnd what are these possible projector rays. Consider a true but unknown
3-D point p,, that is projected from a grid point p‘ in the projector plane, and imaged at p,-
in the image plane. Figure 3.3 illustrates the situation. The straight line connecting p...
(or p;) and the camera focal point is the imaging rayTi the straight line connecting p, (or
p.) and the projector focal point is the projecting ray Ti. Because the two rays are not col-
inear in general and they meet at p”, they determine a plane that intersects the projector

plane at a linee’, which is called the epipolar line of the imaging rayT[Faugeras 86].

Since all 3-D points along the imaging ray are candidates for the unknown p,,, each
of which is sourced from a point on the epipolar line'é’, all possible grid points must be

41

 

 

 

C P

Figure 3.3 Epipolar geometry. Point p,- in the left image may correspond
only to points on the epipolar line?in the right image.

 

sought oni’. Given an image point, its epipolar line can be computed by computing the
imaging ray using (3.10) and (3.11), projecting two points (not too close to each other)
on the imaging ray onto the projector plane using (3.12) and (3.13), then forming the epi-
polar line using the two projected points. The epipolar line runs across the grid lines on
the projector slide, with intersections being those candidate grid points. A detailed algo-
rithm will be given in section 3.4.

(G2) and (G3) are directly from the uniqueness constraint (C1). By (G2), a grid
point in an image may be assigned a single stripe number at a time. (G3) is useful for
checking if two computed 3-D surface patches are valid or not. Their positions in 3-space
must be such that no overlap should occur if they are projected in either the projector-
projection direction or the imaging direction.

Rule (G4) applies to occluding-occluded relationships. Because objects have unique
volumes in 3-D, their 2-D projections on the projector slide plane do not overlap. In cases
where the two regions in the slide plane touch, a point on the touching boundary

corresponds to two 3-D points lying on two 3-D surfaces that differ in depth, one

42

occluding the other.

(GS) is obvious.

3.3.3 Topological rules

The continuity constraint C; says that object surfaces are generally smooth (com-
pared to their distances from the viewer) and points on object surfaces have continuous
positions in three-space. No matter what kind of objects are in the scene, the 3-D stripes
on the object surfaces preserve their topological relationships. This allows the following
topological constraints :

(Tl) If a grid intersect is assigned grid-line number pair (x,,y,), the only possibilities
for its neighbors in the connected 2-D stripe network, at most four, are
(x,—1,y,), (x,+1,y,), (x,,y‘-l) and (x,,yg+1).

(T2) In 3 -D, X stripe-curves (Y stripe-curves) do not intersect other X stripe-curves
(Y stripe-curves).

(T3) Any two stripe curves can intersect at most once in the image.

(T4) A continuous (smooth) 2-D curve indicates a continuous (smooth) 3-D curve.

(T5) A continuous (smooth) 2-D net indicates a continuous (smooth) 3-D surface.

These topological rules are further described below.

For (Tl), because the grid lines on the slide are ordered, from 1,2,...,to M in x direc-
tion, from 1,2,...to N in y direction, their projection on object surfaces (3-D stripes) are
ordered exactly the same, no matter what. These stripes are imaged in the image plane

with their order preserved also, as long as no viewing accident occurs. See Figure 3.4.

T2 claims that no X stripes can cross in 3-space. Do we have the same rule in the
2-D image? The answer is yes, provided that the stripes have zero thickness (mathemati-

cal curves) and surfaces are smooth. This is shown in the following lemma.

43

 

 

Figure 3.4 Topological rule T1. Labels of neighboring grid points in the
image must differ by one unit of either x, or y,.

 

Lemma 3.1. Let S be a surface, pe S be a point on S. Let their projection in the image
be S, and p,. If S is smooth at p, then there is at most one X-stripe in the
image that passes through p,.
proof. We prove that if an X-stripe in the image X,,- passes through p,, no other X-stripes
can pass through p,.
Referring to Figure 3.5, let the focal point of the camera be CW, be a vector on the
line of sight (pC) in the direction from p to C, Tr", be the surface normal at p.

Since p is visible,'r?,-V; > 0. Because S is smooth at p, the surface normal is continu-
ous at p. So, there exists a region 6,e S such that for all points qe 8,512,473, > 0; i.e., the
entire surface patch 8, is visible to the camera. Hence, there is a “general cone” deter-
mined by (C, 5,), which is divided by 8, into two parts : the visible part (from C to 8,) and
the invisible part (occluded by 8,).

 

 

Figure 3.5 Two X-stripes cannot intersect in the image (Lemma 3.1)

 

Since stripe spacing d is a constant (> 0), 8, can be chosen such that its “diameter”
is smaller than d. In other words, any point r on other X-stripes in 3—D space may not fall

in the surface patch 8,. Thus, for any point r, there are 3 cases :
1). TIL-"72 S 0, r is invisible;
2). ‘r‘fﬁ’, > 0, and r falls inside the occluded volume in the cone, it is invisible;

3). E-v’, > 0, and r falls outside the occluded volume in the cone; it is visible but its
projection r,- in the image plane is outside the image of 8,, and hence may not
be coincident with p,-.

So, as long as p is visible, any point on any other X-stripe cannot pass through p,- in

the image. Since p is an arbitrary point on an X-stripe, no two X-stripes can intersect in

 

 

45

the image. I

Note : In practice, since images are discretely quantized, stripes have non-zero
thickness (at least 1 pixel wide), and the image of the small region 8, may not be smaller
than one pixel, two X-stripes may meet at a common image point in this discrete case,

but this occurs infrequently.
T3 is an extension of T2. It is proved in the Lemma 3.2.

Lemma 3.2. Let S be a smooth surface. Any two stripes on S can intersect at most once
in the image.
proof. By Lemma 3.1, we only need to prove that an X-stripe X, and a Y-stripe Y,- in the

"in

image intersect at most once. Note that the subscript implies image plane, and no sub-
script implies 3-space. Assume they intersect at p,. For any point q,-e Y, and max“, where
q,¢p,- and they are projections of 3-D points qe Y, and reX,, there is a distance (“spacing”
> 0) between q and r. Since S is smooth at r (r may or may not be the same point p), for
X-stripe X, visible at r and by the same argument in the proof of Lemma 3.1, q,-, which is

the projection of q in the image, must be outside a 8,,- area in the image. Hence Y“. cannot

intersect X,,— at q,- which is different from p,. I

In practice, due to blooming effects, surface discontinuities, and viewing angle, T2
and T3 may not hold. But the situations where T2 and T3 break down are infrequent.
Thus we still consider T2 and T3 to be valid in 2—D with a footnote that says “most of

the time”. By the same argument, the topological rules T4 and T5 can be established.

These constraints are similar to some of those of Lowe and Binford [Lowe and Bin-

ford 85]

46

3.4 Algorithms

After the geometry for 3-D computation and constraints are discussed, we are now
ready to develop the algorithms to ﬁnd surface solutions. These algorithms, given striped
image input, will produce surface locations in 3-space with as little ambiguity as possi-
ble. Before these algorithms are applied, the camera and the projector are calibrated; the
two calibration matrices are known. Notice that the calibration process needs to be done

only once for successive images, as long as the camera-projector setup remains ﬁxed.

3.4.1 Notations and Data Structure

"in

As we have already been using, we shall use subscript to indicate items in the

image plane, g for items in the grid plane (projector plane), and "w" for items in 3-D

world, or no subscript at all in cases where no confusion would occur.

image-point : p,=(id, x,, y,, degree), where id is unique to each p,, (x;, y,) are image coor-

dinates, and degree is the number of its neighbors in the image network.

grid-point : p, =(x,, y,), the coordinates in the projector plane.
3D-point : p,,=(x,,, y... z,,), the 3-D coordinates of p,,,.
igw-point : p=(p,-, p,, p”), a 3D-point p, created by grid-point p,, and seen in the

image as image-point p,.
network : n=(V, E), where V: {p,,, j=l,..., I VI} is a set of image-points, and E:
{(p,,-, pa), pi], pas V} is the "neighboring" relation on V. Consider the net-
work n as a planar graph, V is the set of vertices, E is the set of edges.
stripe-curve : g= 3-tuple (stripe#, XorY, P), where stripe# is the id of the stripe, XorY indi-
cates this stripe being an X—stripe or a Y-stripe, P is an ordered set of

igw-points, ordered on x, or y, depending on the value XorY.

surface : F3-tuple (id, type, W), where type is a classiﬁcation of the surface

(planar, convex, concave, spherical, etc), W is an ordered set of stripe-

 

806116 I

47

curves sorted on stripe#, X-stripes ﬁrst followed by Y—stripes.

set of surfaces (and later, their relationships).

3.4.2 Algorithm 1 — stripe extraction and network construction

The purpose is to segment the 2-D image into a number of visible surfaces as indi-

cated by connected components of stripes. We call the connected components of stripes

networks.
input :
output :

method :

(1)

(2)

(3)

(4)

A striped image.
Set of 2-D networks, each is a set of connected stripes in the image.
We only give a brief outline, omitting many technical details.

Threshold the striped image to create a binary image in which pixels on bright

stripes are one and other pixels are zero.

Apply a thining algorithm to make the stripes one pixel wide. Clean up small
spurs caused by thining.
For each bright pixel, compute its degree (of value 1, 2, 3, or 4) by counting
the number of bright pixels in its 3x3 neighborhood. Pixels of degree 1 are
stripe end-points. Pixels of degree 3 or 4 are stripe intersection-points. Pixels
of degree 2 are connecting points on stripes. Since the thinning operation often
generates degree-3 points (as well as degree-4 points) at the crossing of two
stripes, a simple clustering algorithm is applied on the detected degree-3 and
degree-4 points with cluster radius of about 5 to 8 pixels depending on the
width of the stripes before thinning. The cluster centers are taken as the cross-
ing points.

Extract a network n, by tracking the stripe points. V,-(— {end-points and
intersection-points tracked}. Ejt— {(pa, pg), pa, pus V,- and connected by pixels

of degree 2}. n,-(— (V, E,}. Reset all stripe points tracked to zero.

48

(5) Repeat (4) until all pixels in the image are zero.
Since steps (1), (2) and part of (3) are performed by hardware (in our case on a VICOM
image processor), the time complexity is mainly due to steps (4) and (5). The clustering
step in (3) examines only the neighborhood of degree-3 and degree-4 points; it takes time
linear in the number of these points. For (4) and (5), stripe points occupy only a small
fraction of the whole image, and each stripe point is examined once. Hence the time

complexity is linear in the number of stripe pixels.

3.4.3 Algorithm 2 — 3-D computation for a single point

The purpose of this algorithm is to ﬁnd for each imaged grid point the set of possi-
ble surface points that may have created it.
input : Calibration matrices MC (camera) and Mp (projector), and an image-point
p,=(id, x,, y,-, °).
output : Set of 3D-points each of which satisﬁes the projection equations (2.1) and
(2.2). In other words, this algorithm ﬁnds 3D-points possibly created by grid-
points and imaged at p;. In addition, XY-stripe—ness of p,- and its neighboring
image-points are determined.
method : (1) Obtain the camera ray Tic (a 3-D line) using MC and (x,, y;). This takes 0(1)
time.
(2) If ° of p,- ¢ 1 (an intersection point) then
(i) Form epipolar line : Project the camera ray RC onto the projector plane
using Mp to get the epipolar line?of Tic. Time cost is 0(1). See Figure
3.3 for the epipolar geometry.
(ii) Grid line intersections that are closest to? are selected (one for each X-
grid-line). For each selected p,=(x,, y,), form a projector ray 3,, and

compute the distance d between RC and 3,. If d < threshold, RC and It;

(iii)

49

are considered to intersect in 3-space and p, =(x,, y,) is one candidate for
creating the image point p,- according to the geometric constraint G1;
the corresponding 3D-point p, is calculated and igw-point p=(p,-, p,, p...)
is included in the output. Since each X- or Y-stripe is examined once, it
take O(max(Nx, Ny)) time where Nx and Ny are the numbers of X-grid-
lines and Y-grid-lines, respectively.

Once a grid point (x,, y,) is obtained in (ii), the image RE, of the projec-
tor ray 3,, can be computed using matrix MC. The XY-stripe-ness of p,-
and its neighbors are determined with respect to If,- as described in

Lemma 3.3 below. This part of computation needs 0(1) time.

Else (a stripe end point)

compute 3D-point p... the intersection of the camera ray Tic and each
X-light-plane (x, ,0) or Y-light-plane (0, y,), depending on whether the
image-point p,- lies on an X-stripe or a Y-stripe, and then output

p=(p.. p,. Pw)- Time cost 0(maX(Nx. ~,)),

The time complexity of this algorithm is O(max(Nx, Ny)): linear in the number of grid

lines.

Lemma 3.3. X-stripes and Y-suipes in an image can be distinguished.

proof. Referring to Figure 3.6 we deﬁne the following notation.

light plane that creates stripe X, in 3-D,

light plane that creates stripe Y, in 3-D,

image plane,

intersection of PX and Py, a projector ray that produces the intersection

point A of stripes X, and Y,.

50

C : focal point of camera,

X,,,Y,, : projections of stripe X, and Y, in image plane P,,

A, : image of A in P,,

R), : image of R in P,, be used to indicate the up-down sense in the image.

P, : epipolar plane of If, a plane determined by point C and line R. Notice

that Ii, is the intersection of the image plane P, and the epipolar plane P,.

Under the "general viewing position" assumption, P, is not coincident with either PX
or Py. Because the stripe X, as a space curve 2 =f(x,y) is a single-valued function along
the curve, and P, divides PX into two half-planes, it is clear that X, is also divided into two
half-stripes lying at different sides of P,. At the same time P, also divides the image plane
P, such that the right half-plane of P, is mapped to the right half-plane of P, and the left
half-plane of P, is mapped to the left half-plane of P,. So, the two half-stripes of X, are
projected to different sides of Ti, in the image plane P,; that is to say, X,, must cross Ti,

once. A similar statement holds for Y,,.

Take two points v, and w, on the right half-plane of P, (at the right side of 13,) such
that v,e Y,,, w,e X,, and they have same image x-coordinate. Let their counterparts in 3-D
be veY, and weX,. Because X, and Y, lie on object surfaces and objects are assumed
opaque, points "below" the stripes in the PX and P, planes are invisible. Referring to Fig-
ure 3.6, let the line Cv intersect the light plane PX at v'. For both v and w visible, v' must
lie on the visible ("above") side of X, in PX. Thus the projection of v in the image P, must

be "above" that of w.

Because the pair (v,,w,) was arbitrarily chosen, every point on Y,, must be "above" its
corresponding point (with same image x-coordinate) on X,,, if there is one. So, the entire
half-stripe Y,, (on the right side of 3,) is "above" the half-stripe X,,. The same reasoning

can be applied to the left half-plane of P,, except that "above" is changed to "below".

 

 

51

We thus conclude that if an X-stripe and a Y-stripe intersect in the image, they can
be distinguished with respect to R9,, the image of the projector ray that produces the inter-

section point. I

52

 

”l

 

 

 

 

 

 

 

C

Figure 3.6 Distinguishing X- and Y-stripes at an intersection point (Lemma 3.3)

 

53

The procedure of making that decision at an intersection point A, is the following :

(a)
(b)

Project the projector ray Ti onto the image plane P,, obtain 13),.

From a point s at A, on 13,, moving clockwise, we will encounter stripes of Y-
X-Y—X (or of X-Y-X—Y, depending on the camera position), alternately. See
Figure 3.7.

 

 

 

 

 

 

Figure 3.7 X- and Y-stripes distinguished w.r.t. If,

 

54

3.4.4 Algorithm 3 — single network solution via constraint propagation

The purpose of this algorithm is to provide possible 3-D surface solutions for a sin-

gle 2-D stripe network.

input :

output :

method :

(1)

(2)

(3)

(4)
(5)

Network n=(V, E) and a set 0,, of igw—points resulting from algorithm 2 for each
image-point p, e V.
Set of surfaces that corresponds to the network It (ambiguous surface solutions).

This algorithm is a DFS (depth ﬁrst search) procedure. During the graph (net-
work) traversal, consistency between neighboring points in the network is
evaluated. Consistency criteria are the constraint rules T1-T3. In this algo-
rithm, s, u, v without subscript "i" are image-points, p,, p... p, denote their
corresponding igw-points. A decision must be made as to whether a consistent

interpretation for all network points is needed or whether some may be omitted

to allow for error or accidents.

Pick as the starting point an image-point s e V, whose associated set C, has
smallest size. All points in V and edges in E are labeled "undelet ". O(IVI) time
is needed.

Initialize the "output list" Le—E. Select an unused igw-point p, from the set G,

of s. If p, exists, put p, in L and u(—3; else stop.

Select an unexamined edge (u,v)e E where v is not marked "deleted". A igw-
point p,& G, is selected so that p, is compatible with p, and with other neighbors
of p, according to the consu'aints T1-T‘3 . If such a p, is found, Lt—LU [p,} and
ut—v; else mark v "deleted" from V and mark all edges that involve v "deleted"

from E.
If uaes go to (3) else go to (5).

Reorganize the "output list" L (set of igw-points) into X-stripes and Y-stripes

using x,, y, of each pe L, and sort these stripes on stripe#. Output L as a resulting

 

55

surface.

(6) Unmark all "deleted" ﬂags, go to (2).

In step (3), "deleted" marking is for obtaining partial surface solutions that involves
only a subset of V. If solutions involving all points in V are required, no marking is
needed; in this case, if propagation is blocked, the algorithm may simply discard the

current partial result in L and go back to step (2).

For one surface solution, each image-point in network n is visited once, hence O(|Vl)
time is needed. The total number of surface solutions for n is no more than IG,|. So the
algorithm takes O(IVI-IG,I) time, where M is the number of image-points in a network,
IG,I is the size of the smallest set of igw-points associated with image-points in the net-

work.

3.4.5 Algorithm 4 (Turning-Right algorithm) — network boundary extraction

The purpose of this algorithm is to locate the extremities of each imaged surface

element. It does so by a common technique of border following.
input : Network n=(V, E).

output : Ordered set of image-points B: {p,,-, j=l,..., IBI} such that it forms the boundary

of the network in the image.

method : Since all points of interest in this algorithm are in the image plane, we drop the

"i".

subscript

In a stripe network, points of degree 1 (stripe end points) and degree 4 (stripe
intersections) are easy to handle, because degree-1 points are always on net-
work borders, while degree-4 points are always interior to the stripe networks.
We only need to worry about of points of degree 2 and 3. They may be on the
boundary, but it is difﬁcult to get their order correct on the boundary as illus-

trated in Figure 3.8(a). To overcome this problem, we extend degree-2 and

56

degree-3 points to degree 4 as in the steps (1) and (2). Then all points are of

degree 1 and degree 4 and are handled in an uniform way.

 

57

 

Figure 3.8

(b)

(C)

(d)

(a)

(b)
(C)
(d)

 

 

Vi

 

Algorithm 4 (Turning-Right Algorithm)

The order of points on boundary (dashed lines) may not be correct at
points of degree 2 and 3.

Extend point of degree 2 to degree 4, v is the "mother" of v, and v2.
Extend point of degree 3 to degree 4, v is the "mother" of v, .

Find boundary of a network using the "turning right" algorithm.

 

(1)

(2)

(3)

(4)

(5)

58

For each image-point ve V of degree 2 (where an X-stripe and a Y-stripe meet
but do not cross, i.e. a comer), we "extend" it to degree 4 by adding two
degree-1 points v, and v2 to V, and two edges (v, v,) and (v, v2) to E such that v,
and v; are only l-pixel off v and in the directions extended by the original two

edges of v. Call v the "mother" of v, and v2. See Figure 3.8(b).

For each v e V of degree 3, "extend" it to degree 4 by adding a degree-l point
v, to V and an edge (v, v,) toE such that v, is only l-pixel offv and in the direc-
tion extended by one of the original three edges of v, determined by Lemma
3.3. Call v the "mother" of v,. See Figure 3.8(c). Steps (1) and (2) exarrrine only
those points of degree 2 and 3 that are a small fraction of the total number of

points in V. The time needed is O( I VI).

Initialize Be—O. Arbitrarily pick an image-point as V of degree 1 as the starting
point. B(—B u {u}. Let u’s neighbor be v.

If v is of degree 1, if v has "mother" v’ then B<—B U {v’}, else B(—BU {v}. If v is
of degree 4, let w be the rightmost neighbor of v with respect to the direction Er),
and ut—v, vt—w. See Figure 3.8(d). It takes 0(1) time.

If v= starting point, stop; else go to (4). The loop (4)-(5) takes O(lVl) time,

since each point in V is visited at most once.

The time complexity of this algorithm is O(IV|). For large stripe networks, the time cost is

much less since many interior grid points are never visited.

3.4.6 Algorithm 5 — solution for the scene

This algorithm eliminates incompatible sets of individual surface solutions by

enforcement of the non-overlappin g rule G3.

input :

Set of networks N: {n,-, j=1,..., IN I }, associated with each n,- a set S,- of surfaces

obtained from algorithm 3, the boundary of each surface resulting from

output :

method :

(1)

(2)

(3)

59

algorithm 4, and the projector calibration matrix Mp.

Set of scenes, i.e. set of compatible combinations of surfaces, each surface

being associated with one network n,.

This algorithm combines surfaces (one surface for each network) to form a
scene. A combination is valid only if all surfaces in the combination have non-
overlapping projections in the projector plane, according to the geometric con-

straint G3 and the topological constraints T4 and T5 stated in section II.

For each network n,, project the boundary of each surface in the associated set
of it,- onto the projector plane, using the projector calibration matrix M,. Now
we have for each network n,- a set of regions in the projector plane R,-=
{r,-,, k=l,..., IR,| }.

Select one region r, from each set R, at a time to form a combination
C={r,-,,j=1,...,INI, ISkSIlej. C is declared invalid and discarded as soon as
overlapping between regions in C is found (violation of the constraint G3,
under the smoothness and continuity assumptions T4 and T5). If C is valid and
completely constructed, output C as one possible scene. Note that the complete
set C needs not to be constructed if it is found invalid during its construction. In

the worst case the time cost is T = O(2lR,-|lr,-,l).

Repeat step (2) until all combinations are exhausted. The worst case time com-
plexity is O(I‘[TIR,I), where T is the previous worst case time cost in step (2).
But, since we are able to terminate expansion of combination C of regions at an
early stage when C is found invalid, just like tree pruning, the average time

needed is less.

60

3.5 A Complete Example

A series of experiments have been done in the PRIP lab; the results show that the
above algorithms do produce good surface solutions with a small degree of ambiguity.
Here by "good" we mean accuracy — an average of about 1mm (worst 3—4mm) error in
3—D, with the camera stand-off a little less than 1 meter. This performance is almost as
good as humans’ eyes. Because only sparse data are processed, the algorithms should
run reasonably fast — for a 512x512 image, usually only a couple of hundred or up to a
thousand points need to be processed rather than a quarter million pixels. Because we
didn’t put much effort on the efﬁciency of the algorithms, they run a little slower than
they should have. For a 512x512 image, the algorithms produced surface solutions in
about 5-10 minutes on a VICOM image processing system and VAX/8600, including
reading—writing ﬁles between algorithms.

Let’s go through the details, using an example, of how the surface solutions are
developed. A striped image of a scene consisting of a block, a Coke can and a diesel pis-

ton is shown in Figure 3.9 (reprint of Figure 2.2).

After applying algorithm 1, image points are detected and stripe networks are
extracted. Each image point is assigned a unique id. The detected network points of the
image are given in Figure 3.10. Notice that the stripes on the piston were not clearly
seen due to degraded illumination in the area of the piston, and the stripe network on the
piston surface is fragmented into small pieces. This was because we were not very care-
ful with the illumination condition when we ran the experiment. The results reported

here include only those for the block and the Coke can.

Algorithm 2 computes a set of candidate grid-points for each image point. Most of
the sets have 2 to 7 candidate points, a few sets have more or less candidates. The result

for the upper surface of the block is shown in Figure 3.11.

The constraint propagation algorithm (algorithm 3) computed network-consistent

surface solutions for each network in the image. Two possible solutions of the upper

61

surface of the block resulted, which are marked A and B in Figure 3.11. The detailed 3-D
results are given in Table 3.1. There are also two network-consistent solutions for the

lower surface of the block, two for the Coke can.

 

 

Figure 3.9 A striped image of a block, a Coke can, and a piston

 

62

 

 

er
.4
u
,, 1
X”
’3 m
,‘ us
105 ‘
us
no
121
130
r
131 136
m 1“ Miss

 

143
141 v 149 ’ r45
’ O o. .. ‘51
._ 144
r r . , ‘5'
153 1st
. r
r

‘ r 154 53
a 171 o
5
I 74
163
161 .- 173
5'7 164
167 l
l 1 O I. 81
1 r
169 ' -
71 e
‘ .
177 .
t
1,, 195 r ,

Figure 3.10 detected grid points in the jumble image

 

63

 

89
4
(10,1) -
(11,2) B
87 (12,3)
(14.6)
(15.7)
101
9
(10,1)
(11,2) A
(12,3) 3
100 (15.7)
199 (16,8)
11 1
(12,2) A
(11.1)
(16.7)
(13.3) B
(15,6)

A : ﬁrst solution

B : second solution

Figure 3.11 Candidate labels for some grid points of the upper surface of the
block. Results for some stripe end-points are not shown. Only two
network-consistent solutions (A and B) for this surface are feasible,as

91
88
5

('91) A
('2) 3 (8,1)
iii (9,2) A

a 90 (10.3) B 11

86

    

(10,2) A

(9’1) 102
(11,3) B
(12,4)
(15,8)

17

(10,2)
(11,3) A
(9.1)
(12,4) B
(16,9)
(13.5)

the result of algorithm 3.

 

54

Table 3.1 Two solutions for the upper surface of the block

 

 

 

 

 

 

 

 

 

 

NW Grid 3136661618 .11:
Solution Pom
No x; Y; x Y z
A

4 10 1 1515 34.7 80.3
5 9 2 136.4 49.7 79.6
9 11 2 166.3 49.7 79.9
10 10 3 151.6 64.4 79.3
11 9 3 135.9 65.2 79.7
12 9 4 136.0 80.0 79.4
13 8 5 120.9 95.1 795

17 11 3 1662 64.7 79.1
18 10 4 151.2 79.7 79.9
19 9 5 135.9 95.1 79.4
21 11 4 166.3 79.6 79.4
22 10 5 151.1 94.8 795
86 10 2 151.6 49.4 79.6
87 - 1 160.8 34.1 79.2
88 - 1 141.2 34.1 79.4
89 10 - 152.0 31.7 81.1
90 - 2 147.3 49.1 78.9
91 9 - 137.0 48.8 81.4
100 12 - 181.9 42.3 79.9
101 11 - 166.9 37.7 80.9
102 10 - 151.9 53.5 80.8
103 — 3 129.8 64.0 78.5
104 - 4 122.5 79.0 78.7
105 - 5 117.1 93.8 77.0
106 8 - 1220 88.8 81.6
116 - 3 174.3 64.0 78.1
117 9 - 137.0 1053 815
121 - 4 168.1 78.9 78.3
122 11 - 166.9 89.7 79.8
m - 5 161.5 93.8 76.8
124 - 6 155.1 108.8 75.8
198 10 6 1515 1093 80.0
199 12 2 1812 49.7 79.7
Note: Frunthernodelandpoce.itisknownthatsurfaee

 

points (X,Y,Z) should satisfy the planar equatiar Z

=80mm.

 

Network

 

 

 

 

 

 

 

 

, can 300“)!!!“
Solution Pam
NO- x; Y; x Y z
13

4 11 2 1645 52.3 102.0
5 10 3 1502 66.6 1015
9 12 3 179.0 67.1 102.4
10 11 4 165.1 81.1 101.9
11 10 4 150.0 81.6 101.9
12 10 5 1503 95.9 102.0
13 9 6 136.0 110.2 102.1
17 12 4 179.1 81.6 102.1
18 11 5 165.0 95.9 1m9
19 10 6 150.4 110.4 102.4
21 12 5 179.4 96.0 102.7
22 11 6 165.1 110.4 102.9
86 11 3 164.9 66.6 101.8
87 - 2 172.6 50.7 99.7
88 - 2 154.0 50.7 99.9
89 11 - 166.0 50.8 1045
90 - 3 1602 65.4 100.1
91 10 151.4 66.4 104.0
100 13 - 195.2 62.2 105.1
101 12 - 180.6 57.2 105.1
102 11 - 166.0 71.4 1042
103 - 4 143.9 80.1 100.3
104 - 5 137.4 94.8 1012
105 - 6 132.8 109.4 100.4
106 9 - 136.7 103.8 1085
116 - 4 1862 80.1 99.9
117 10 - 1513 119.9 104.1
121 - 5 180.7 94.8 100.8
122 12 - 1806 1062 104.1
123 - 6 174.9 109.4 1002
124 - 7 169.3 124.1 100.0
198 11 7 165.7 124.4 108.7
199 13 3 1933 67.3 1025

 

 

65

The network boundaries are extracted by the “Turning Right” algorithm (algorithm
4). Algorithm 5 checks the validity of combinations of surface solutions. Since there are
two solutions for each of the three networks in the image (upper surface and lower sur-
face of the block, the Coke can), we come up with eight possibile combinations. The
non-overlapping rule culls six of the eight possibilities, and leaves two as the ﬁnal
results. The upper panel of Figure 3.12 shows a valid and an invalid combination of sur-

face solutions of the jumble.

Several other images were also processed : (1) a clay sculpture of a cobra, (2) a
diesel piston, and (3) seven potatoes. We have seen these images in Figure 2.2. The
backprojection of two combined solutions for the potato image is shown at the lower

panel of Figure 3.12. Table 3.2 illustrates the results for these images.

From the experiments we see that (1) large image networks have fewer surface solu-
tions, sometimes a unique solution (as in the case of the cobra sculpture), because of
tightened constraints caused by a large number of image points; (2) the uniqueness con-
straint is very powerful on images of multiple networks, as in the case of the potatoes
where only 5 combinations of surface solutions survived among a total of 36 possibilities
— an 86% cut; (3) ambiguity still remains in the ﬁnal results, but to a much smaller
degree, and it can be sorted out by a little additional knowledge such as more assump-
tions about the real world (for example, in the 5 solutions of the potatoes image, four of
them have potatoes ﬂoating in the air), by information from a different viewing position,

or by use of object models.

66

 

 

 

 

\1
I

(a) (b)

 

 

 

 

 

backprojection of some combined solutions for the jumble image

 

 

 

y!
t.) t3 a.)

('8‘ 0

 

 

 

 

 

backprojection of some combined solutions for the potato image

Figure 3.12 (a) valid combination (b) invalid combination

 

67

 

Table 3.2 Summary of surface solutions for the jumble, cobra, piston

 

 

 

 

 

 

 

 

 

 

 

 

 

and potato images
surface number of number of surface
ima e possible combinations after
g patches surface applying
solutions uniqueness constraint
upper 2 (A,B)
jumble lower 2 (A,B) 2 (AAA,BBA)'
Coke can 2 (A,B)
head 3 (A,B,C)
cobra 1 (BA)
body 1 (A)
P , 1 (A) 5 solutions:
(ABAACAA,
P2 4 (A,B,C,D) ABACCAA,
ACABCAA,
P3 1 (A) ACAACAA
POWOCS ACACCAA)
P4 3 (A,B,C)
P 5 3 (A,B,C)
P, 1 (A)
P7 1 (A)

 

 

 

 

 

 

* Solution BBA represents the combinations of solution B of the upper surface,
solution B of the lower surface, and the solution A of the Coke can.

 

68

3.6 Quantitative Analysis on the Degree of Ambiguity

The degree of ambiguity refers to the number of surface solutions in 3-space that
survive the constraint test. We want the number of surface solutions to be small. If the
number is one, we get a unique solution. The question is, without any knowledge about
the objects in the scene except the projection and imaging geometry, how many solutions
may be possible? Figure 3.13 illustrates the situation. We see that the number of possible
surface solutions depends on several factors, for instance, light stripe spacing, number of
stripes, and the distance from the camera to the scene. In addition, a camera ray and a
projector ray (both are lines in 3D) may not intersect in space even though the projector
ray does create that very point in the 2D image, due to measurement and computation
inaccuracy. We treat a camera ray and a projector ray as if they did intersect in space if
the distance between the two rays is less than a predetermined tolerance T. Taking this
factor into consideration, the degree of ambiguity is also an increasing function of T.

Can the degree of ambiguity be arbitrarily large, if inﬁnitely many projected light
stripes are assumed? We shall show quantitatively that this will usually not be the case.

The analysis is based on two projection models : orthographic and perspective.

3.6.1 Orthographic projection

3.6.1.1 Assumptions

Under an orthographic, or parallel projection, model the projector projection is
parallel. That is, all X-stripes are parallel to each other, all Y-stripes are parallel to them-
selves. Camera imaging is assumed perspective. We are to examine how many locations
are possible in 3-space for a square-shaped surface patch on the xy plane (worktable)
formed by stripes whose stripe numbers are X,, X,+1, Y,, and Y,+1. The projection model is

illustrated in Figure 3.14.

 

Projector

/ Possible

object

6 .
A F {$1 location

Camera

 

 

/

\

worktable

Figure 3.13 Ambiguity of location in 3-space

 

 

70

 

 

2c—

[8110

 

(x,-10.5)t

 

Projection Rays

 

 

 

 

 

<—-—-—-

 

((x,+1)r, (y,+1)t)

 

 

 

1
1 1
1 1
1 1
1 1
1 1
1 1
1 1
Y W Y V V
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
\ 1 1 1
1 1 1
\ 1 1 1
‘¢ 1 1 1
\\\ I I
‘ 1 1 1
\ \ P\ | |
‘g \ 1 1
\ 1 1
“s 1 1
I \‘ I
1 v 1 1
\s 1 ‘ \ 1
\ s I
I \ I
I 3
1 X 1
1 1 \ ‘\ 1 ~
I "\\ I s
1 1 ~ ‘\ 1 ‘\
1 1 \ 1
)___-___ }_-__ ‘y‘ (xg'101+l)‘)
.1.-.’.( ..... --/..- ---- -- a._-
y I, I,
l“—:—" (190.!) ((X.+1)t.y.t)
ground plane

 

 

 

camera optical axis

Figure 3.14 Parallel projection model

 

71

3.6.1.2 Basic formulas
The equation of the line through two ﬁxed points (x,,y, ,z,) and (x2,y2,22) is given by

x—x, _ y—Y1 2‘21

 

 

 

xz—xr _ )’2-)’1 = 22—2, (1)
The direction numbers of a line
Ax+By +Cz +D =0
ax+by+cz+d=0
are l,m,n given by
I=|ifl m=|f’2I. ~=|2€|
The distance between two lines is
d: (Xi-121+01“)’2)M+(21“22)N I (2)
W

 

where (x,,y,,z,) is an arbitrary point one line and (x2,y2,22) is an arbitrary point on

another line, and

m, n,
"'2 n2

n, ’1
n2 ’2

’1 m,
I2 m2

 

 

 

 

 

 

with "11,111.11 the direction numbers of the ﬁrst line, and m2,n2,12 the direction numbers of

the second line.

3.6.1.3 Computation in 3-space

Let stripe spacing be 1, the four vertices of the surface square be (x,t,y,t,0),
((x,+l)t,y,t, O), ((x,+l)t, (y,+1)t,0), (x,t,(y,+l)t,0). Let the optical axis of the camera be from

the camera (0,y,,z,) to the center of the surface square, i.e. the coordinates of the camera

1 (x,+%)t
07 “84,-E)“ tang

are

 

72

The camera ray has the equation

L . x + 2w -(2y,+l)x,t = 0
1 (2x,+l)y - Ianez —(2.x,+l)y,t = 0

Thus, the direction numbers for L, are obtained as

27:, 0 2x
— _ _ 0 1 _ _ 1 t
’1 ‘ I 2x,+1 —tan0 I "2‘28"“ "‘1 ‘ I -tan0 o I ‘ma "1' I 0 2x,+l

 

= 2.x,+1

Suppose k solutions exist, i.e. the k-th solution is valid, which is obtained by shift-
ing the square surface patch towards the camera along its optical axis by k X-stripes, then

the corresponding equation of the projector ray is given by

. x —(x,—k)t =0
[‘2' y —y,t =0

and the direction numbers are

Hence, the distance between the camera ray L, and the projector ray L2 is computed

according to formula (2) :

 

 

 

2x +1
2 ((x,t)—(x,-k)t) ”3° '1 . +0+0 _ ,,
(Ian 6+4x§tan76 1+4x,

Making this distance less than a preassigned tolerance T, we have

_—Il+'4xT:§£.

I I

ks T

Since L, is the camera ray (among the 4 camera rays) that has the greatest distance
from the projector ray, we do not need to consider the other 3 rays. Hence, we conclude
that the number k of possible locations of a surface patch in 3-space is proportional to the
camera standoff (determined by x,) and the tolerance T, inversely proportional to the

stripe spacing r.

 

 

73

3.6.1.4 Computation in xy-plane
To make things simpler, we just look at the projection of these rays in the xy-plane.
Because of the parallel projector projection, the distances between a camera ray and pro-
jector ray is the same as its xy-plane projection.
Suppose that the camera is at (0,31,), and the four vertices of the square-shaped sur-
face patch are (x,t,0),((x,+1)t,0),((x,+1)t,t), and(x,1,1). The 4 camera rays are
l, :y,x+x,ty—x,ty, = 0
12 :y,x+(x,+l)ty —(x,+l)ty, = 0
l3 : (y,—t)x+(x,+l)ty—(x,+l)ty, = 0
l4 : (y,—t)x+x,ty-x,ty, = 0
The distance from point (u,v) to line Ax+By +C=0 is

Au +Bv +C

«1A 2+3 7

 

 

The i-th possible location of the surface patch in 3-space (moving the patch towards the
camera i x-stripes) would have 4 vertices ((x,—i)r,jr),((x,-i +1)r,j1), ((x,-i)t, 0+1»), and
((x,-i)t, 0+1»), where j is the number of y-stripes the surface patch should move to the

camera in order to obtain the i-th location.

These 4 points will have distances from l,,k=1,2,3,4 as follows

(1 _ tIx,tj-y,iI

6.21-6.01

d = tI(x,+l)tj—y,iI

\I)’3+((JC,+1)I)2

d _ tI(x,+l)tj-y,i+itI
3 ' «10,-t)2+((x,+1)t)7

 

 

 

Ax,tj-y,i+it I
d, =

‘jOh'Oz'Kngz

74

In order to make the i-th solution valid, d,(k=1,2,3,4) must 5 tolerance T at the same time.
Let us consider some special cases where the camera viewing direction varies.

(1). 0=1tl4

In this case, y,=x, 1, i=j. d3 is the largest :

 

' Tx
d3 = 5+! 3 lS—_8
I, ‘12:

(2). 0:0
Here the optical axis is parallel to y-stripes, i.e. y,=0, j=0. Computation shows that

d4=max(d.),k=l,2,3,4 :

it . TX
614 = — => 15,—:
x, 1

(3). tan9=2, i.e. 0:27”, or y,=0.5x,1, and i=2}.
In this case, all candidate solutions by shifting the surface patch towards the camera
odd number of x-stripes are considered invalid, since the distance calculated is
about half of the stripe spacing, which is usually considered too large. By shifting
the patch to the camera even number of x-stripes, say 2n, we get the largest distance
as

6jr

V5x,2+4x,+8

d3:-

Make dasT, we get

< \ISx,2+4x,+8T

 

where i =21: =2j.

According the above analysis, we obtain Table 3.3 for the number of possible loca-
tions in 3-space of a square-shaped surface patch on the ground plane, with a tolerance

=3.5mm.

75

 

Table 3.3 Number of possible locations in 3-D for a striped square, assuming paral-
lel projection.

 

 

 

 

 

camera stand-off stripe spacing (mm)
camera angle
(mm) 5 10 15 20 25 30
300 14 7 5 3 2 2
0° 450 21 10 7 5 4 3
600 28 14 9 7 5 4
300 5 2 l l 1 1
27° 450 8 4 2 2 1 l
600 10 5 3 2 2 1
300 10 5 3 2 2 1
45° 450 15 7 5 3 2 1
600 20 10 6 5 4 3

 

 

 

 

 

 

 

 

 

 

 

In our experiments, the stripe spacing is 15mm, camera is at about 450~600mm
from the scene with viewing angle about 45°, the number of possible solutions for a single
square is 5~6. For a larger network (more constrained), which is usually the case, the

number of possible solutions will be less.

3.6.2 Perspective Projection
3.6.2.1 Assumption

Under the perspective model, both camera imaging and projector projection are per-
spective. We will expect larger ambiguity than in parallel model, since both camera rays

and projector rays are tapering and getting closer and closer to each other. Thus there is a

 

76

good chance for a camera ray to “intersect” a projector ray in 3-space — the distance

between the two rays will be easily smaller than a predetermined tolerance T.

3.6.2.2 Computation

We use the similar set-up as in the previous section. The coordinate system is the
same. The camera position is at (0,yc, 2c). and the projector is at (x,t,t,z,,). That is, the opt-
ical axis of the projector is vertically from the center of the projector to the point (xvt),
which is assumed to be one of the vertices of the square patch on the worktable. The sur-

face patch has 4 vertices (xgt, 0),((xx+l)t, 0),((x,+1)t,t) and (x,,t), as shown in Figure 3.14.

Using the same formulas as in the orthographic projection case, it turns out that the
analytical expressions are too messy to be useful. Instead of obtaining explicit analytical
form for ray-distance, we just run a computer program to compute it numerically on dif-

ferent set of parameters. The algorithm would be easier to read in vector notations.

Let a, b be two points on line 11, and c, d be two points on line 12. Then, the u=a-b
and v=c-d are vectors on I, and 12. The unit common normal n=(uxv)/I(uxv)l is perpen-
dicular to both u and v. Take a vector whose head is on I, and tail on 12, say w=c-a, and
the projection of this vector onto the common normal will be the distance between I, and

12. That is, d=<w,n>. This algorithm yields the Table 3.4.

From Table 3.4 we can see that if the camera is viewing at 45" there might be many
surface solutions possible, since the camera rays will intersect the projector rays at every
shift. If the camera is far away from the surface patch, the number of possible surface
locations is also large, because the tapering camera rays and projector rays will be getting
very close to each other so the ray-distance will be ST. Also, it is obvious that the

number of possible surface locations decreases as the stripe spacing increases.

The above analysis is based on a single quadrilateral on the 2:0 plane. It is conceiv-

able that for a scene of multiple objects each is much larger than a single quadrilateral,

 

 

77

 

Table 3.4 Number of possible locations in 3-D for a striped square, assuming per-
spective projection.

 

 

 

 

 

 

 

 

camera angle distance from stripe spacing (mm)
camera to scene

horizontal vertical 10 15 20

325° 550 700 37 l l 4
45° 550 700 42 19 8
32° 400 700 23 9 4
45° 400 700 30 13 3
31° 250 700 13 5 1
45° 250 700 18 8 3
23° 350 700 18 7 1
45° 350 700 26 7 5
63 .4a 100 650 1 1 5 1
45° 200 650 13 4 1
26.6° 100 650 4 2 1
45° 100 650 5 3 O

 

 

 

 

 

 

 

 

 

the resulting surface solutions are expected to be much less ambiguous. This is confirmed

in our experiments.

 

78

3.7 Conclusions

This chapter dealt with 3-dimensional sensing problems using a projected grid. The
geometric computation for the surface points follows a stereo model — the camera-
projector pair. The difﬁcult correspondence problem in the stereo model is simpliﬁed to
the line identiﬁcation problem in the structured lighting environment, which can be par-
tially solved through the propagation of certain general constraints. In controlled environ-
ments, such as for industrial robots, this approach may provide an economical solution to
3-D sensing.

The discussion and experimental results in this chapter can be outlined as follows.

( 1) The 3-D location of a point is computed by intersecting the projecting ray and
the imaging ray in 3-space — a typical triangulation method. In practice, the exact inter-
section may not be found, instead, it is approximated by the least squares solution over-

determined by the four equations (3.10) ~ (3.13).

(2) Because the projecting ray is non-deterministic, there may be a set of possible
solutions, each associated with one possible projecting ray (or grid point). All candidate
grid points are within a narrow band along the epipolar line of the imaging ray.

(3) The uniqueness and the continuity constraints are from the real world. The
deduced geometric and topological rules are very powerful in developing surface solu-
tion and reducing ambiguity. Although the result of constraint propagation is still not
unique in general, it usually has only a small degree of ambiguity that may further be

reduced using knowledge from other sources.

(4) The ambiguity in surface solution represents several possible locations of a sur-
face patch in 3-space, each one of them is equally useful in inferring surface shape and

other geometric properties.

CHAPTER 4

Inference of Surface Shape

 

3-D measurement by itself is not the goal of a vision system. The goal is to under-
stand the scene — to tell what and where the objects are in the scene, i.e. their locations,
orientations, shapes, as well as their spatial relationships. The collected 3-D data are use-
ful only for this purpose. The 3-D data obtained from light striping are sparse but not
individually isolated, they are spatially related through the stripe networks. How surface
shape is inferred from 3-D grid point data and the stripe networks is the focus of this
chapter. The objective is to get symbolic labels on surface data. In Section 4.1, a surface
shape classiﬁcation method will be discussed, which is based on the intrinsic Gaussian
curvature at various points on the surface. In order to compute the Gaussian curvature, or
more speciﬁcally, to compute the partial derivatives of the surface function, a discrete
B-spline ﬁtting is applied to the 3-D grid data. Experiments show that this approach can
work on our striped data, as it has worked on other type of data, range data for example
[Besl and Jain 85]. Section 4.2 is concerned with the problems in inferring surface
shapes from the 2-D stripe networks in the image. The 2-D stripe networks are con-

sidered as stripe textures on which a structural texture approach is applied.

79

80

4.1 Surface ﬁtting and local curvatures

Given a set of points in n-dimensional space, it is often desirable to estimate the
position of other points in the space in general, or to ﬁt curves (n=2) or surfaces (n=3) in
particular. In our case, we have had certain points in 3-space computed (grid points), and
we want to infer the shape of surface on which these 3-D points lie. An analytical form
of a surface is usually hard to get except for those that are planar, spherical, cylindrical
and the like. A common way to describe a surface is to do so locally, in terms of surface
normal or curvatures. Curvature properties, especially the intrinsic Gaussian curvature,
has been studied for many years, and used in surface segmentation and classiﬁcation for
both intensity and range images [Ittner and A. K. Jain 85, Besl and R. Jain 85, Medioni
and Nevatia 84, Laffey, Haralick and Watson 82, Ponce and Brady 85, Yang and Kak 86,
Besl and R. Jain 86]. In the case of striped data, B-spline surface ﬁtting followed by
Gaussian curvature computation is feasible because the 3-D grid points available are the
control points needed in ﬁtting. First we brieﬂy review some local surface theory in dif-
ferential geometry [Millman and Parker 77, Lipschutz 69]. Then we describe the
approach that computes the Gaussian curvature and the two principal curvatures at sur-
face points based on B-spline surfaces [Yang and Kak 86]. Finally, surface shape is

determined based on the clusters of the calculated curvatures.

4.1.1 Differential geometry — local surface theory
Surface tangent plane and normal

A surface S e R3 can be expressed in parametric form as

X(u1.u2)=(x(u1.u2).y(u1.u2).2041.142»

where u 1,u2 are two parameters and x e S can be viewed as a vector from the origin to a

point on S.

81

The partial derivatives of x with respect to ul and uz, denoted x1 and x2, are

 

x. - .91
‘ Bu,-
and the second order partial derivatives are
x . _ 82x
U auiauj

{x1,x2} is a basis of the tangent plane TxS at point x, i.e. x1 and x2 span the tangent

plane.
The surface normal at point x is then

X1XX2
IXIXXZI .

First fundamental form

Since at point p of a surface S the two partial derivatives x1 and x2 span the tangent

plane T pS, any vector X in the tangent plane is a linear combination of XI and x2
X = a1x1+ 02X2 = Zaix;

The ﬁrst fundamental form on surface S is deﬁned as a bilinear form on TPS for each p e

S, given by
I(X,Y) = Zaibj<xbxj> = Zaibjgij
where X, Y are in the tangent plane TpS such that
X=alx1+a2x2=2a;x,-, Y=b1x1+b2x2=2bixi

and g;,- = <x;,xj>, the inner product of x; and X,, are called the metric coefﬁcients or the

coeﬂicients of Riemannian metric. The ﬁrst fundamental form is explicitly written as

I(X.Y) =(a1.a2) [ g“ 3‘2] [2;]

821822

The matrix (gij) assigns for any X, Y e TPS their inner product with respect to the basis

82

{x1,x2}. Notice that g 12 =g21.

Second fundamental form

The second fundamental form on surface S is a bilinear form on T [,5 for each p e S,
given by

[I(X,Y) = Zaibj<xijm> = EazbjLij

where X, Y e TPS are as before, and Lij = <xij,n>, the inner product of xi and n, are
called the coefﬁcients of the second fundamental form. We can also write the second fun-

damental form as
__ L11 L12 b1
II<X.Y)-<a1.a2) [ L21 L22] [ b2
Notice that L12 = L21. The geometric meaning of the second fundamental form is that if

y is a unit speed curve on S with tangent T, the II(T,T) deﬁnes the normal curvature of 7,

which measures how the surface normal is changing along the curve 7.
Weingarten map

The Weingarten map W is, for each pe S, the function TpS —> TpS given by
W(X) = -Xn

where X e TPS, and Xn is the directional derivative of surface normal It in the direction

of X. The matrix representation of W with respect to the basis [xbxz] is W=(w,k) where
W! = zLikg u

and g“ = (i,l)-entry of the inverse matrix of (gij), which is very easy to compute (for a

2x2 matrixl). More speciﬁcally, the matrix of the Weingarten map can be expressed as

W11 W12 L11 L12 311 g12
w_-. =
[W21 W22] [L21 L22] [g21 g22

83

C urvatures

To measure how a surface S is curving at a point p, one way is through the normal
curvature of curves on S that pass through p. This is aesthetically unsatisfactory because
it forces us to break up a surface into inﬁnitely many curves. Rather than to deal with the
inﬁnitely many curves, an alternative is to ﬁnd the maximum and the minimum values of
the nomral curvature, which are the representatives of the inﬁnitely many values so that
they reveal the geometric essence of S at p. A theorem in differential geometry says

[Millman and Parker 77]

"At each point of a surface there are two orthogonal directions
such that the normal curvature takes its maximum value in one direc-

tion and its minimum along the other."

The two orthogonal directions are called principal directions at p, the maximum and the
minimum values of normal curvatures are called principal curvatures of S at p. It turns
out that the two principal curvatures are the two eigenvalues of the Weingarten map W

and the two principal directions are the directions of the two corresponding eigenvectors.

Since W is a linear transformation, there are two associated numerical invariants :
the determinant of W which is the product of the two eigenvalues K1 and K2, called the
Gaussian curvature K; and one-half of the trace of W which is one-half of the sum of

the two eigenvalues, called the mean curvature H

K = K1 K2 = dCtCW)

1 l
= -2—(K1 + K2) = Etrace (W)

Surface classiﬁcation based on Gaussian curvature

The surface at point p is categorized into the following types according the Gaus-
sian curvature K at p :

K > O : elliptic (convex or concave)

84

K < O : hyperbolic (saddle point)
K=0zparabolic ifrcl ¢Oorr<2 #0

K=Ozp1anarifrc1=r<2=0

4.1.2 Computation — surface ﬁtting using B-splines
The local surface theory outlined above suggests the following simple computa-
tional procedure for obtaining curvatures and surface type

1. Find the partial derivatives x1, x2, and the second order derivatives xi}.

XIX X2
2. Compute normal n = -—-—.
IXIX X2]

3. Compute the coefﬁcients of the ﬁrst fundamental form ggj = <x,-,xj>.

4. Find the inverse matrix (g i1') = (gijrl.

5. Compute the coefﬁcients of the second fundamental form Lij = <x,-J-,n>.
6. Compute the Weingarten map W = (Lil) (g ij).

7. Compute the eigenvalues K1, K2 and the eigenvectors of W .

8. Compute the Gaussian curvature K and the mean curvature H.

9. Classify surface type at point p according to K.

The mean curvature H is not used in classifying surface type, but it is useful for
quantitatively describing the extent to which the surface is curving. In fact, H is exactly
the average of the normal curvatures of the inﬁnitely many curves at point p over all
directions

1 2n

H = 3?;— j meme

where X,,(O) is normal curvature along direction 0.

The eigenvectors are not used in surface type classiﬁcation either, but they provide

the orientation information that may be useful in further analysis. For example, if a

85

surface is determined to be cylindrical because most of the points on the surface are
classiﬁed so, the orientation of the cylinder (major axis) would be the average direction
of the eigenvectors in which the eigenvalues take the value 0 at the various points (the

surface does not curve along this direction).

In the computation procedure, step 2 to 9 are very straightforward. The problem is
in step 1 — How are these partial derivatives obtained from a set of discrete 3-D points
on the surface, which are the only information we have so far? A common approach is to

ﬁt an analytic surface locally to a set of points and then take derivatives.

B-spline surface patch

Given 16 points pi}, i, j = 1,2,3,4, on a 4x4 equally spaced grid on surface S, such as
the ones in Figure 4.1, a B-spline surface patch that ﬁts the middle 4 points is given by

x(u,v) = uBPB‘v‘ (4.1)
where

u=(u3 u2 u 1)

v=(v3 v2 v 1)

P = (p,,-), i, j=1,2,3,4

-1 3—3 1
B__l_ 3-6 3 O
-6 -3 0 3 0

1 4 l O

with 0 S u,v S 1 being the parameters [Foley and Van Dam 82, Yang and Kak 86]. Point
p22 is at u=v=0; p23 is at u=O and v=1;p32 is at u=1 and v=O; and point p33 is at u=v =
1. The B-spline patch deﬁnes every point in the middle square. For example, the

parametric center point would be at u=v=0.5.

86

 

 

 

Figure 4.1 B-spline surface patch using 16 control points

 

Partial derivatives

From the B-spline surface patch (4.1), the partial derivatives at any point in the

patch are easy to compute as follows.
x“ = u’BPB‘v‘
x, = uBPB‘v”
M = u"BPB‘v’
X,,, = u’BPB‘v"

xW = uBPB’v’”

87

Since in our experiments the 3-D grid points are in terms of the world coordinate
system, a surface patch is considered as a Monge patch 2 = z (x, y). Hence we take x and y
as parameters in substitution of u and v, and the surface patch can be written as x(u,v) =

(u,v,z (u,v)). The above partial derivatives are then applied to z (u,v), we obtain

X,, = (1, 0, z“)
Xv = (0, 1. 2v)
xuu = (0: 0: Zuu)

xuv = (Or 0’ zuv)

xvv = (09 0, Zw)

The partial derivatives thus calculated are just in the right form required for curva-

ture computation.
4.1.3 Examples

Based on the computed 3-D coordinates of the grid points, several surface patches
of different types were analyzed using the differential geometry approach. On each of the
surface patches, each quadrilateral of the stripe network was ﬁtted by a B-spline surface
using 4x4 neighboring points. Some 25 to 100 (u,v change from O to l with a step 0.2 or
0.1) points interior to the quadrilateral on the ﬁtting surface were picked, at which the

normals and curvatures were computed.
(1) The upper planar surface of the block in the jumble image.

Figure 4.2 shows the histograms of the number of surface points vs principal curva-
ture. The units on the curvature axis are not the same for the principal curvatures and the
Gaussian curvature. We put them in the same ﬁgure for simplicity. It is shown from the
histograms that both the two principal curvatures as well as the Gaussian curvatures at
most surface points are close to zero. Thus the surface patch is declared to be planar,

sinceKz Oand K1, K2 z 0.

88

180

 

160

 

 

140

120

 

Number
of

100

 

points

 

 

40

 

 

20

o
Ill.
our.

.1 .....
.....

 

 

 

 

 

 

I I I I l I I I I f
-0.5 —0.4 —0.3 —0.2 -0.1 O 0.1 0.2 0.3 0.4 0.5

Curvature

principal curvature K1

 

...._............ principal curvature K2
.............. Gaussian curvature K

Figure 4.2 Histogram of curvatures of a planar surface patch

89

(2) Surface of the Coke can.

The histograms on K1, K2 and the Gaussian curvature K are shown in Figure 4.3.
The histograms show that the values of the two principal curvatures and the Gaussian
curvature at various points on the surface patch are clustered with the cluster centers at
K1 as 0, K2 a -0.5, K —- 0. Accordingly, the surface is classiﬁed as parabolic. The non-zero
principal curvature K2 is the normal curvature of surface curves in the direction that the
values of curvature reach the maximum. Because the normal curvatures are the same

almost everywhere (clustered at about -0.48 to -0.5, according to the histogram), these

 

surface curves ought to be circles with a radius r = z 2. So, the surface is not only

“(2

parabolic, but also cylindrical with radius 2 units. The unit here is the stripe spacing on
the Z=0 plane, which is about 15m. Hence the cylinder has radius about 30m. As a

matter of fact, the radius of the Coke can is indeed 32mm.

Corresponding to the principal curvature K1 z 0, the computed eigenvectors also
show a cluster tendency with the cluster center being roughly the direction of the major

axis of the cylinder.
(3) A potato in the 7-potatoes image

A ﬁrst glance at Figure 4.4 shows that K > 0 and K1, K2 < 0. According to the
classiﬁcation theory, these curvatures deﬁne an elliptic surface patch. From the histo-
grams, we see the two principal curvatures K1 and K2 being well separated, each spreads
over a relatively wide range. We conclude that the surface curves downward in both prin-
cipal directions (K1 < 0, K2 < 0), but more rapidly in one than in the other — a surface of

an ellipsoid-like object, a reasonable description of a potato.

90

 

 

 

 

 

80
60
Number
of 40
points
20
IIh'I h.|| l .. . izigiiiim Iii its":
I I

 

 

 

 

 

—0.5—0.4-0.3.—02—01 0 0.10.2 0.3 0.4 0.5
Curvature

principal curvature K1
principal curvature K2
.............. Gaussian curvature K

 

 

Figure 4.3 Histogram of curvatures on a cylindrical surface patch (Coke can)

91

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

30
Number 20
of
points
10
ll
la, .
l l! . iii =5
I! w".
I II II E ii .Hiu's 2222222 l22222i2222222
—0.5-0..4-03—0..2-01 0 0.10.2 0.3 0.4 0.5
Curvature

principal curvature K1

 

 

principal curvature K2
.............. Gaussian curvature K

Figure 4.4 Histogram of curvatures on an elliptical surface patch (potato)

92

4.1.4 Occlusion relationships between virtual edges

The above curvature computation uses 3-D grid points interior to the stripe net-
works. But the stripe end-points are also very important in geometrical reasoning about
the spatial relationships of the objects, especially about occlusions. If one surface is par-
tially occluded by another surface with respect to the projector, the occluding surface
will create a "shadow" on the occluded surface. In this case if a light stripe runs across
the two surfaces, it will be interrupted at the place of the shadow. The 3-D data of the

stripe end-points can be used to obtain this information. The computation is simple.

4.1.5 Summary

In this section, we have discussed an approach to inferring surface shape from the
3-D grid data. A stripe network is considered to indicate a surface patch, on which the 3-
D data are available at equally-spaced grid points. Curvatures (K1, K2, K) at other surface
points interior to the stripe network are computed based on a B-spline ﬁtting to the sparse
3-D grid data. Clusters of the computed curvatures are sought, and surface shape is then
classiﬁed using the curvature of the cluster centers. This technique is satisfactory for
those objects that are of regular shapes, e.g. planar, cylindrical, spherical, etc. This work
just shows how techniques used on dense range data [Ittner and A. K. Jain 85, Yang and
Kak 86] can be applied to sparse 3-D data from stripes.Two problems remain : (1) What
can be said if the curvatures are scattered all over the place with no cluster tendency? Are
they of "arbitrary" shapes? (2) What about the points at the boundary areas of a surface
patch: the above computation is valid only for the quadrilaterals interior to the stripe net—
work? The ﬁrst problem is indeed not easy to solve, since the differential geometry
theory does not provide classiﬁcation of surfaces that have nondeterministic curvatures.
In many cases the surface has to be looked at locally, one small piece at a time. But, for
certain types of surfaces, such as sine wave, the spread curvature measures may be useful

for surface shape inference. The second problem requires more information about the

93

surface patches at the "critical" places — edges and boundaries. It is natural to probe

other sources for more information. Stripe texture is one of the sources.

94

4.2 Surface Shape from Stripe Texture

So far we have computed surface solutions (Chapter 3) which are in fact locations
of some isolated points on the surface. Also, we have done some surface shape inference
based on the computed 3-D points (Section 4.1). Now, let’s totally forget the 3-D points

for a while and concentrate on what information we can pull out from the 2-D stripes.

The stripes seen by the camera weave stripe networks in the image. This kind of
wire-frame network, or stripe texture, is often seen in computer graphics for 3-D display.
Figure 4.5 shows such a stripe-network image. In this image only the bright stripes pro-
vide information about the objects, while other parts of the image, objects or background,
provide little information, if any — they are homogeneously black. Through light stripes
in the ﬂat image, human eyes perceive something three-dimensional. We interpret 2-D
stripes three-dimensionally because we have the built-in knowledge to reason about the
3-D scenes from 2-dimensional retinal images. For example, we know that a curved con-
tour in the image never comes from a straight line in 3-D. Our task is to develop this kind

of knowledge in our machine vision system and use it to interpret the images.

4.2.1 Relations of features in 2-D and in 3-D

Binford and Lowe [Lowe 84, Lowe and Binford 85] listed a set of ten relations of
2-D features and their corresponding 3-D inferences. The inferences are derived from
very general assumptions about the image formation process, in particular that the cam-
era and the light-source positions are independent of the objects in the scene. Under these
assumptions, many types of alignments in an image are unlikely to arise by accident.
More intuitively, the assumption is analogous to so-called the general position assump-
tion that the image is taken from a representative viewpoint; i.e. a slight displacement of
the objects in the scene would lead to essentially (qualitatively) the same image.

Accidental alignment refers to the cases where certain image features arise by placing the

95

 

 

Figure 4.5 Light stripe networks in a striped image

 

camera in a special attitude. For example, a straight line segment in the image arises from
a straight line in 3-D (general position), or the camera is viewing a planar curve in a

direction that is in the same plane (accident alignment).

‘ To infer three-dimensional shapes from the light stripe textures, which are 2—D
features, a set of inference rules similar to that of Lowe and Binford’s must be esta-
blished. The following are the inference rules that I used in stripe texture analysis, under

the general position assumption.

(1) A straight line in the image is cast by a straight line in 3 -D. A curved stripe in
the image is originated from a curved Stripe in 3 -D, and the 3 -D curved stripe
is on a curved surface. In general, a curve in 3-D may not be on a curved sur-
face, it may be a planar curve. But for light stripes, they are intersections of

projected light sheets (3-D planes) and object surfaces. If a surface is planar,

(2)

(3)

(4)

96

the 3-D stripes on that surface must be straight. So a curved stripe in an image
indicates a curve in 3-D which in turn indicates the surface on which the
curved stripe lies is a curved surface. Note that a straight 3-D line does not
imply a planar surface; it may be a line of least curvature on a cylindrical sur-

face, for example.

A continuous 2-D stripe indicates a continuous 3 —D stripe . Two disconnected
3-D stripes will be seen connected in the image (look like one single stripe)
only when an accidental viewing position is taken to view the two 3-D stripes,

which violates our "general position" assumption.

Intersecting stripes (stripe network) in an image correspond to 3-D stripes on
the same object surface. By rule (2) each 2-D stripe corresponds to a 3-D stripe
on an object surface, a 2-D stripe network (continuous stripes) indicates a sin-
gle surface. But, because a camera viewing position not "accidental" to some
stripes in the network is not necessarily a general position to other stripes of the
same network, alignment of stripes belonging to different surfaces may occur.
Although infrequent, this type of accident may be observed in an image, espe-
cially when the 2-D stripes are thick due to blooming. Accidental stripe con-
nections in an image can be broken using some other information, such as

ﬁnding no network-consistent surface solution for a false stripe network.

Convexity and concavity of object surfaces can be determined by the 2-D stripe
networks. From the spatial relation between the camera, the projector and the
scene in our experiments, we know that an image stripe curving upward (down-
ward) toward its ends indicates a concave (convex) 3-D stripe on a surface.
"Up" and "down" were deﬁned in Section 3.1.2. Convexity and concavity of

the object surfaces are then determined by the 3-D stripes.

97

4.2.2 Stripe texture

If the camera were placed at exactly the same position as the projector, the light
stripes in the image would be a regular grid (perhaps with a little lens distortion), and we
would not be able to extract any useful information. In this case all the stripes on the
object surfaces are accidently aligned with the viewing direction — seen as straight lines
in the image. By increasing the displacement of the camera and the projector, the light
stripes seen in the image are distorted as a function of the object surfaces. It is the distor-
tion of the stripes that provide information we may use to discover the inverse function,
i.e. to recover the object surfaces. Stripe textures in an image, which are distorted grids
of tessellations, are useful for analyzing surface characteristics. Two kinds of texture

primitives are available : 4—sided tessellar cells and stripe curves.

 

 

 

Figure 4.6 Stripe texture primitives

 

98

Tessellar cells

A tessellar cell is a tile in a stripe network, composed of four vertices and four
edges of two adjacent X-stripes and two Y-stripes. In Figure 4.6, BCGF is a tessellar cell.

A set of features are deﬁned on the tessellar cell primitive.

(1) Standard deviation of side length, 0,. This feature measures the degree of distor-

tion of the cell ﬁom a rhombus shape.

(2) Differences of two opposite sides, denoted by All and A12. A cell on a planar
surface tends to have equal lengths of opposite sides, but one from a curved sur-

face does not.

(3) Standard deviation of angle, Ira. This feature also measures the degree of distor-
tion.

(4) Differences of two opposite angles, AOL] and Ant/2. The differences will be close
to zero if the cell is on a planar surface. They will be signiﬁcantly different from

zero if the tessellar cell is on a non-planar surface.

(5) Average k-curvature of sides of the cell, lie. The k-curvature of a side is deﬁned
as the mean of k-curvatures at each point on the side, where the k-curvature at a
point p is deﬁned as the angle change between line ap and pb, where a and b are
k pixels away from p in opposite directions (k = 5 in our experiments) [5]. Note
that the measured "curvature" here is not the same as that in section 4.1. Here the
"curvature" is an approximation of the curvature of a planar curve (in the image

plane), but in section 4.1 we were talking about curvatures of 3-D surfaces.

(6) Compactness, deﬁned as p2/A, where p is the perimeter of the cell and A is its

area.

99

Stripe curves

A stripe curve in an image is the image of the projection of a single grid line.
ABCD in Figure 4.6 is an example of a stripe curve. Stripe curves reveal planarity of
object surfaces, since 2D straight lines generally imply 3D straight lines, as in the infer-

ence rule (1) of section 4.2.1. Features of a stripe curve primitive include :

(1) Average value of curvature, um, and standard deviation of curvature, om. Cur-
vature is deﬁned as the angle between two consecutive stripe segments on the

curve.

(2) Average difference of lengths of consecutive stripe segments on the stripe curve,
u(Ad). If a stripe is a straight line, its vertices are nearly equally spaced, thus the
differences of lengths are close to zero. Here, we assume the viewing angle

effect on the spacing is small.

Stripe curves are detected based on the 2D grid networks and certain topological

constraints. Refer to [Stockman and Hu 85] for details.

4.2.3 Stripe texture classiﬁcation

Texture classiﬁcation is done in a hierarchical fashion as (1) planar vs. nonplanar,

(2) convex vs. concave vs. irregular, and (3) spherical vs. cylindrical.

Planar surfaces are distinguished from nonplanar ones based on features of tessellar
cells and stripe curves. A l-NN classiﬁer is designed to identify planar and nonplanar
cells based on the six features deﬁned earlier. Euclidean distance in the feature space is
used to determine near neighbors. Notice that this method only classiﬁes individual
tessellar cells, not the texture itself. But, a texture may indicate a planar (nonplanar) sur-
face if a majority of its cells are classiﬁed as planar (nonplanar). This can be used as sup-

plementary evidence to decisions based on properties of stripe curves.

100

Histograms of um and u(Ad) of stripe curves are constructed. Since stripes on a
plane are always straight, their curvatures are clustered at zero. And, since equally spaced
grid points produce nearly equally spaced stripes on planar surfaces, u(Ad)’s are also
concentrated at zero. But stripe curves on nonplanar surfaces have nonzero curvatures,
and are not equally spaced in general. Thus, a peak near zero in each of the feature histo-
grams indicates straight stripes. If the straight stripes constitute an absolute majority of

stripes of a texture, then that texture is assumed to indicate a planar surface.

Once a texture is determined to be nonplanar, we come to the next layer in the
classiﬁcation hierarchy — determining if it represents a convex or concave surface. This
is done via the average curvature (um) and standard deviation of curvature (GM) of the
stripe curves. For nonplanar surface patches, um and cc“, of X-stripes and Y-stripes are
computed separately. From the inference rule (4) of section 4.2.4, we have the following

criteria for determining convexity and concavity of a surface patch :

(X Item <0) and (Yuw 20) => convex
(X ”M >0) and (Yum 50) => concave

where each boolean subexpression is based on the majority of stripe curves in that direc-
tion. If the above two conditions are not satisﬁed then the surface patch is said to be of
irregular shape (neither convex nor concave). The value of ow is also used to adjust the
decision made above. If a surface has been classiﬁed as convex (concave) but the stripe
curves on the surface have large crew’s, the surface is reclassiﬁed as irregular although it
is basically convex (concave). Note that a convex or concave surface (cylindrical, for

example) has stripe curves with near-zero ow.

The texture gradient based on tessellar cell primitives is also useful in distinguish—
ing cylindrical and spherical surfaces. We compute gradients along four directions (X-
stripe direction, Y-stripe direction, and two diagonal directions) on four features (0,, 6a,

A002, and shape). A cylindrical surface would have a value relatively small in one

101

direction (along its principal axis) but spherical surfaces have relatively large values in

all four directions.

4.2.4 Experimental Results

Fifteen striped images were analyzed containing 27 surface patches, including 10
planar surfaces, 6 convex cylindrical surfaces, 5 convex spherical surfaces, 3 concave
spherical surfaces, and 3 irregular shaped surfaces. The objects were viewed in a variety

of orientations. There were 792 tessellar cells and 332 stripe curves in total.

For the l-NN classiﬁer, 106 randomly selected cells from the total population of
792 were in the training set. The remaining 686 cells formed the testing set. The results
are listed in Table 4.1. Using a majority decision rule on individual cells, 2 out of 27 sur-
face patches were misclassiﬁed (which had a small number of cells) and another two

patches remained undecided (planar-nonplanar tie).

Table 4.1. Surface patches, texture primitives, and classiﬁcation decisions.

102

 

 

 

 

 

 

 

 

# of decision decision
. # of .
tessellar usrng usrng
No. surface patch cells in tessellar stripe stripe
testing cell curve
set features curves features
1 jar 35 N 17 convex
2 ball 64 N 22 convex
3 book 48 P 15 planar
4 notebook 28 P 17 planar
5 roll of paper 35 N 22 convex
6 board eraser 3 P 3 convex *
7 metal block 8 P 8 planar
8 notebook 6 P 7 planar
9 board eraser 3 N ’r 4 planar
10 piece of wood 14 - t 11 planar
11 book 9 N T 9 planar
12 block 14 - t 11 planar
l3 bowl 28 N 14 concave
14 worktable 27 P 13 planar
15 cylinder 13 N 10 convex
16 cylinder 1 l N 9 convex
17 roll of paper 45 N 9 convex
18 grapefruit 13 N 1 l convex
19 bowl 26 N 15 concave
20 bowl 18 N 14 concave
21 apple 5 N 6 convex
22 apple 3 N 6 convex
23 tennis ball 5 N 6 convex
24 waved paper 21 N 18 irregular
25 waved paper 22 N 20 irregular
26 cloth 59 N 19 irregular
27 garbage can 123 N 30 planar *

 

P — planar; N — nonplanar.
T — misclassiﬁcation using the majority decision rule.

I — undecided using the majority decision rule (P-N tie).
* — misclassiﬁcation using stripe curve features.

 

103

70

 

 

50

 

 

Frequency 40

30

 

20

 

10

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I I I I lllIll Ii “(A00
-20 -15 -10 -5 0 5 10 15 20

 

Figure 4.7 Histogram on u(Ad) of stripe curves

Frequency

70

50

4o

30

20

10

104

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Illl!l|| lrlL

. IIH.I,JI,.II. ,6
I I ﬁ W
—20 -15 -10 -5 0 5 10 15 20

Figure 4.8 Histogram on It“, of stripe curves

105

Histograms of two stripe features, u(Ad) and um, were constructed based on data
from all 27 surface patches as shown in Figure 4.7 and Figure 4.8. We observe that there
is a peak near zero in both histograms, which indicates presence of planar textures. In the
histogram of curvature, two valleys were found around the peak. Using these valleys
(marked by arrows) as thresholds, the stripe curves were well classiﬁed as planar or non-
planar. For the nonplanar textures, concavity and convexity were determined using the
algorithm described in section 4.2.3. The result is shown in Table 4.1. Among the 27
surface patches only 2 (surfaces #6 and #27) were rrrisclassiﬁed. The rrrisclassiﬁcation
was due to (1) too few texture primitives (#6), and (2) similarity between a planar surface

and a cylindrical surface with large radius (#27).

To distinguish cylindrical and spherical surfaces, a gradient Operator was applied on
the textures. Since all tessellar cells were organized according to their spatial relation—
ships and a texture was represented as a two-dimensional array in the detection phase,
texture gradient is computed the same way as on a gray tone image. Gradients along four
directions (X-stripe direction, Y-stripe direction, and two diagonal directions) were com-
puted. This gradient operation was performed on surfaces that have been determined to
be either convex or concave. Results on 7 such surfaces are listed in Table 4.2. We can
see from Table 4.2 that cylindrical surfaces have average gradient magnitudes relatively
small in one of the four directions (indicated by *). That direction gives roughly the

orientation of the principal axis of a cylindrical surface.

Table 4.2.

Average gradient values on 4 features along 4 directions for 7 sur-
face patches. Cylindrical surfaces have values small in one of the
4 directions (indicated by *) that are assumed to be their principal

106

axes. Surface patch numbers refer to Table 4.1.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

features

833:6 51:21:: :1 compactness o, 0,, Aug
X 0.77 2.91 6.36 8.40
1. cylinder Y 0.83 3.19 5.88 9.67
diagonal 1 0.92 4.06 7.62 7.25
*diagonal 2 0.72 2.37 4.88 5.42
X 1.39 1.84 7.43 7.69
2. sphere Y 1.07 1.98 4.64 7.32
diagonal 1 1.92 1.63 11.55 3.94
diagonal 2 1.01 3.21 4.39 4.53
X 0.60 2.88 4.49 7.99
5. cylinder Y 0.71 2.54 3.93 7.08
*diagonal l 0.51 0.60 3.12 3.63
diagonal 2 0.80 6.81 6.95 10.05
X 0.93 2.38 6.36 7.48
13. sphere Y 0.89 2.35 6.35 7.17
diagonal 1 0.64 3.44 3.36 8.24
diagonal 2 1.02 3.18 8.37 6.66
"' X 0.15 0.52 1.25 1.37
. Y 1.83 2.87 15.17 1.97

16' ”1m“ diagonal 1 1.78 2.50 16.76 1.91
diagonal 2 1.81 3.21 16.27 1.54
X 0.95 3.30 11.09 2.95
. * Y 0.26 0.60 1.11 3.00
17‘ ”under diagonal 1 1.00 3.38 11.74 2.38
diagonal 2 0.92 3.01 10.96 2.02
X 0.64 2.16 10.40 5.87
18. sphere Y 0.68 2.49 10.24 6.00
dialgﬂal 1 1.05 3.86 6.39 7.75
diagonal 2 0.64 1.40 14.11 8.39

 

 

107

4.3 Virtual Edges

In addition to the surface shapes and locations of objects in the 3-space, the spatial
relationships among the objects are also important to scene understanding. One source
from which the occluding-shadow relations with respect to the projector may be derived
is the set of virtual edges [Marr 82]. The virtual edges in striped images are the "edges"
that form the boundaries of the stripe networks. The network boundaries are consn'ucted
by the "turning right" algorithm in Section 3.4.5 using the stripe end-points.

The geometric constraint (G4) of Section 3.3.2 says "If two stripe end-points lie on
the same projector ray in 3-D then their stripe numbers are the same and one lies on a
surface between the other and the projector". By this constraint we can ﬁnd the
occluding-shadow relations of surfaces with respect to the projector by looking at the
relations between stripe end-points of same stripe number to see if they lie on the same

projector ray. The algorithm is summarized as follows.

Algorithm (virtual edges). For each pair of stripe networks n1 and n2, ﬁnd a stripe
pair (s1,s2) such that they have the same stripe number (x8 or yg), and s1 6 n1, s2 6 n2.
From the 3-D coordinates (xw,y,,,,z,,,) of the end-points of the two stripes we can use the
projector calibration matrix Mp to ﬁnd their projections (xg,yg) on the slide plane. If
end-point e1 6 s1 and e2 6 s2 are very close in the slide plane, we know that they lie on
the same projector ray. The one that is closer to the projector (can be determined by their
3-D positions) occludes the other. The above procedure is applied on all stripe pairs of n1
and n2. After occluding-shadow relations of individual stripe end-points are found, the
virtual edge relations can be established since a virtual edge is simply a sequence of indi-

vidual stripe end-points, with well deﬁned order.

Refer to the jumble image we experienced in Chapter 3 whose stripe networks are

in Figure 3.10, the virtual edge algorithm resulted in Table 4.3.

108

 

Table 4.3 Shadow and Shadow-making virtual edges with respect to the projector

 

Occluding point Shadow point

 

id xg yg id xg yg

 

122 11.00 4.71 141 11.00 4.73

 

199 11.95 2.04 132 11.95 2.05

 

199 11.95 2.04 132 11.95 2.05

 

116 11.49 3.00 134 11.61 3.00

 

121 11.08 4.00 139 11.18 4.00

 

123 10.63 5.00 140 10.71 5.00

 

124 10.20 6.00 149 10.32 6.00

 

37 10.89 7.11 148 11.00 6.64

 

 

 

 

 

 

 

Notice that point 199 and 132 appear twice in the table because both are end-points
of an X-stripe and end-points of a Y-stripe. For point 37 and 148, their occluding-shadow
relation bears low conﬁdence, since their yg ’s are not very close. For virtual edges, the
order in which the end—points are extracted is 199, 116, 121, 122, 123, 124 for the upper
network, and 132, 134, 139, 141, 140, 149 for the lower network. It is seen that there is a
perfect match between the order of the end-points in the two networks and the

occluding-shadow relationship. Hence we have established the spatial relation between

109

the two surface patches in terms of their relative positions with respect to the projector.
In general there may not be a perfect match, but the virtual edge relations can still be

established if most end-points on the two virtual edges have occluding-shadow relations.

4.4 Discussion and Conclusions

In this chapter we have discussed inference of surface shapes from light striping
through curvature computation (3-D) and stripe textures (2-D). The results obtained are
rough shape descriptions, both qualitative as planar, concave, convex, etc., and quantita-
tive as 3-D locations of the boundary points of stripe networks, radii of spherical or
cylindrical surfaces, and so forth. Experiments showed that surface shapes of certain
smoothly curved objects could be deduced reasonably well.

Using the computed 3-D points to ﬁt a surface patch and ﬁnd local curvatures at
various points on the ﬁtted surface patch to infer shapes is a standard way to analyze a
surface [Ittner and A. K. Jain 85, Besl and R. Jain 85, Medioni and Nevatia 84, Laffey,
Haralick and Watson 82, Ponce and Brady 85, Yang and Kak 86, Besl and R. Jain 86].
But in some cases we need to analyze surface shapes using only 2-D information (3-D
data are not available, for example); and in some other cases 2-D information is itself
adequate. Stripe patterns in the image, or stripe textures, are a good source from which
the desired information can be obtained. In a 2-D image, the stripe textures provide infor-
mation about the distortion of the stripes, which is a function of the object surfaces. The
experiments on stripe texture analysis showed very convincingly that the 2-D stripes may
provide essential information about object shapes. On the other hand, 2-D stripes reveal
surface shapes only qualitatively, such as planar, convex, etc., not quantitatively. Hence

both 3-D and 2-D inferences are important and useful.

Related work [Shrikhande and Stockman 87] showed that surface normals at stripe
grid points can be directly derived with good accuracy without computing locations of

3-D surface points. This information is also useful and can be combined with the ﬁtted

110

surface in shape inference.

In shape inference using 3-D data and 2-D stripe textures, several issues remain to

which special attention must be paid :

(1) Gaussian curvature is computed only for points interior to stripe networks. This is
because surface ﬁtting itself, as any other kind of data ﬁtting, is not good for points
at the boundaries. To solve this problem, we need methods specially designed for
handling boundaries. Fusion of light striping with intensity is one of the methods,

which will be the topic in Chapter 5.

(2) Qualitative description of surface shape is made locally unless the surface is regu-
lar of some sort. An arbitrarily curved surface should be broken down into pieces,

each having its own "global" description.

(3) Surface patches must be "large" enough in order to be described through stripes.
As stated in Chapter 2, this is one of our basic assumptions about surfaces. If a sur-
face patch has too few stripes it will not be described properly, or will simply be

ignored by the processing procedures.

CHAPTER 5

Multi-channel Vision—

Fusion of Light Striping and Intensity

 

It is seen from the previous chapters that structured lighting technique can be effec-
tively used for 3-D sensing and shape inference. But it is also clear that using structured
lighting alone we may not be able to obtain all information necessary for developing a
proper object representation for recognition. An obvious example is an object surface
that is visible to the sensor (camera) but invisible to the projector. There are no light
stripes shining on such a surface and hence no information about it is available. A natural
way to overcome this difﬁculty is to look for information from other sources, such as
intensity images of the same scene. Multi-channel information is then fused together to
yield a description of the scene. A fusion scheme is presented in this chapter that com-
bines information ﬁ'om light striping and gray tone image to develop symbolic descrip-
tions for the surface patches and intensity edges according to some "inference rules". The
symbolic descriptions may be combined in the higher level analysis to segment the image

into objects.

111

112

5.1 Object Description

The goal of 3-D scene analysis is to recognize the objects in the scene. In order to
accomplish this task, object models which are the descriptions of the physical objects,
must be available somewhere in the system. The extracted features from the images
should also be described using the same language as used in the models; — the represen-
tation of the sensed structure can be matched against the representation of the models for
recognition. As stated in Chapter 2, we use the surface-boundary representation that con-

sists of the following basic elements (repeat of Ch 2).
1. surfaces
(a) qualitative — type (planar, convex, concave, cylindrical, etc.),
(b) quantitative — boundary, 3-D location, orientation;
2. edges
(a) qualitative — type (extremum, blade, fold, mark, shadow),
(b) quantitative — location, length (in chain-code form);
3. surface-surface relationships
(a) occlusion,
(b) adjacency,
(c) belonging to same or different objects;
4. surface-edge relationships
(a) surface boundaries,
(b) surface marks,
(c) shadow edges,
(d) intersection (common border) of two surfaces.

From the discussion in Chapter 3 and 4, we may ﬁnd that light striping can provide

elements 1 and 3(a) in the above representation. Now the question is, "Can other

113

elements of the representation be obtained from light striping ?"

5.2 Light striping is not enough
We have discussed the light striping technique in 3-D sensing and surface shape

analysis. This technique has been shown to be promising. Besides providing locations of

surface points in the 3-space, the sensed 3-D data have the following properties.

(1) They are sparse compared with dense data obtained using direct range sensors.
Because of the sparseness, the computational burden often inevitable in processing

dense data is somewhat relieved.

(2) Although sparse, they are good representatives of the dense 3—D population, because
they cover the major part of visible surface due to the way the feature points are
created. The spatial relationships between the 3-D points are well deﬁned in terms of
the stripes that connect these points. Hence, a good surface ﬁtting may be constructed

based on these representative 3-D points.

(3) They are ambiguous in the sense that a surface patch may be at one of several loca-
tions in 3-space. But as far as surface shape is concerned, each of the several surface

locations is equally good for surface shape inference.

(4) They are available only in the areas visible to both the projector (there are stripes)
and the camera (stripes appear in image). For areas where no stripe appears due to

occlusion or shadowing, no information can be obtained.

Besides the weakness stated in (4) above, some problems introduced in 2-D process-
ing may also arise. On a smooth surface the stripes would gradually fade out as the sur-
face turns away from the viewer. These stripes may not reach the real object boundaries
before they die out. A stripe extraction procedure (brightness thresholding, say) will fail
to pick up the fading tails of the stripes, but the tails, especially the end-points of the

tails, are indeed very important features for 3-D analysis. For the same reason that

114

brightness of stripe pixels varies as the surface orientation varies from point to point, glo-
bal brightness thresholding may result in broken stripes and fragmented stripe networks,
which introduces more difﬁculties in further processing steps. There seems to be no way
to get around these problems using only light striping. We ought to resort to other
sources for the missing information. Fortunately, an intensity image of the same scene
viewed at the same position and in the same direction is easily obtained by simply
switching off the projector. Exact registration of the two images (striped and intensity) is
guaranteed since neither the camera nor the objects in the scene have been touched. The

fusion of information from striped and intensity images will be the topic of this chapter.

5.3 Comparison of the two information channels

A striped image and its intensity counterpart each has its strengths and weaknesses.
To better understand why the combination of information from these two different
sources is advantageous, let’s look at some examples. In Figure 5.1, a gray-scale image
and its corresponding striped image are shown in (a) and (b), respectively. (c) is the
result of a gradient operation on the gray-scale image followed by a thresholding, and the
overlay of the edge map and striped regions is in (d). From the striped image alone, we
can compute 3-D positions of the grid points, determine the rough surface shapes (planar
in this case), detect blades and folds, and segment the image according to the stripe net-
works (two segments in the example). But it provides no information in the areas where
light stripes are occluded, and no clues about whether the two surfaces belong to the
same object. On the other hand, the gray-scale image may not catch discontinuities of
surface normal (the concave edge is missing, for instance), and may have difﬁculties in
segmenting surfaces. But it offers ﬁne boundaries and texture. In the example, the gray-
scale image suggests that the two striped surfaces belong to the same object because of
well-connected outer boundaries. Also the clear edge of the triangular-shaped surface is

useful in breaking down the larger striped surface into two sub-regions, which will be

115

conﬁrmed by the bending of the stripes. In section 5.5, we will ﬁnd that this particular

edge is a fold, and the two sub-regions are two surfaces meeting at the fold.

A second example of multiple objects is shown in Figure 5.2. Notice that the
striped regions do not exactly agree with the contours, due to the continuous change of
surface normal and stripe fading. But they provide a good preliminary segmentation,
which is not easy to get from the intensities alone. A more accurate segmentation can be
obtained by extending the striped regions with the gradient contours as boundary condi-

ﬁons

116

 

 

Figure 5.1 Fusion of intensity and light striping — example 1

(a) intensity image (b) striped image
(c) edge map ((1) overlay of edges and striped regions

 

117

 

 

 

 

(C) (d)

Figure 5.2 Fusion of intensity and light stripin g — example 2
(a) intensity image (b) striped image
(0) edge map (d) overlay of edges and striped regions

 

118

Table 5.1 gives a comparison of the two information channels. Combination of the
two channels will certainly facilitate the strengths of each, and is necessary for generat-
ing an object representation that will be suitable for the next step of processing up the

visual chain, e. g. model matching and recognition.

 

Table 5.1 Comparison of two information channels

 

 

 

 

 

 

 

Striped Image Intensity Image
Information gross surface shape, location, local surface albedo and
boundaries and segmentation; orientation (under controlled
detect folds (crease edges) in illumination); little shadow-
surfaces and some blades & ing; ﬁne boundaries and tex-
extrema tures; can see into some of
surfaces in shadow of pro-
jected light
Factored Out albedo; ﬁne texture no direct 3-D info (no
knowledge of projector)
Problems shadows; occasional segmentation into objects;
accidents; some ambiguity in difﬁcult to get gross shape;
3-D solution; will miss can miss discontinuities of
illumination or albedo surface normal (folds)
discontinuities

 

 

 

119

5.4 Stripe extension

Since light stripes in an image are much brighter than other part of the image, a glo-
bal thresholding would be a proper way to extract the stripes. But on the other hand,
because the intensity received in an image is a decreasing function of the angle between
the viewing direction and the surface normal, the intensity value on a light stripe that
runs across a smooth curved surface will fade out as the surface turns away from the
camera (or projector), and will vanish when the surface normal is at 90° to the imaging
direction (projection direction). Hence, a global thresholding may cut off some stripes in
the fading areas. To recover the thresholded part of stripes, a local thresholding may be
useful.

Local thresholding

Figure 5.3 (a) is a striped image of an orange, which is part of the image in Figure
2.2 (c). Figure 5.3 (b) is the result of a global thresholding on (a). It is seen that the tails
of some stripes were cut off, with detected stripe end-points at somewhere toward the
middle of the stripes. To get better end-points, we apply local thresholding on the
regions that are in the vicinity of the detected stripe end-points. To ﬁnd a proper tines-
hold value for thresholding a subimage, the common approach is to construct the histo—
gram of intensity in the subimage, and ﬁnd a valley in the histogram. But, identifying
valleys in a histogram is itself a non-trivial problem when there are no clearly deﬁned
peaks and valleys, which is the case in our situation where the intensities of both stripe
pixels and background pixels scatter over a wide range. Instead of working on histo-
grams, we adopted a heuristic approach. The heuristic is that the stripe tails, although
weak in intensity, are in general still brighter than the near-by background pixels, and the
majority pixels in the local region belong to the background. That is to say, the stripe pix-
els occupy a certain portion at the high end of the intensity range. We need to locate a cut

point at the right place. Two methods can be used to determine the cut point.

120

 

 

(a) (b)

Figure 5.3 (a) striped image of an orange (b) result of global thresholding

 

method A. Suppose the local region is an rtxn subimage. The stripe tail may run
across the region so it will be 11 pixels long; or it may terminate in the rrriddle of the
region and hence will be shoner. For a stripe tail 3 pixels width, it has at most 3n pixels;

that is 3'21 = % of the region. For n = 15, for example, no more than the top 20% high
’1

intensity pixels may be on the stripe. The results of thresholding using 15% as the cut
point is shown in Figure 5.4 (a).

method B. Because most pixels in the local area are background ones, the average
intensity would be closer to that of the background than to the stripe. Those pixels having
higher-than-average intensities that deviate‘very- much from the mean are classiﬁed as
stripe pixels. Using 90' as the threshold, where o is the standard deviation of intensity,

the result is shown in Figure 5.4 (b).

121

 

 

(a) (b)

Figure 5.4 Local thresholdin g results using threshold value at
(a) top 15% intensity
(b) 9 o (o is the standard deviation of intensity)

 

Notice that there may be some other stripes (besides the one we are working on)
running into the local region. We don’t want those bright stripes to interfere with the pro-
cessing. A simple way to avoid this is to consider not all pixels in the region but only
those whose intensities are lower than or equal to that of the detected end-point p, and
ignore the higher-intensity pixels because they have already been taken care of by the

global thresholding.

Although local thresholding can to some extent recover the faded stripe tails, which
are important in relating surface patches (indicated by stripes) and boundaries, it may n0t
be applicable in general, because the choice of the parameters (size of local region, threso
hold values etc.) is ad hoc, and, a lot of post-processing work needed to relate the locally

thresholded regions to the original stripes may be too complicated and not very reliable.

122

Stripe extension

Local thresholding may improve the accuracy in locating the real stripe end-points,
which are important for detenrrining surface boundaries. But often the boundaries of the
striped regions do not agree with the boundaries of object surfaces, as shown in the
potato image in Figure 5.2 (d). The striped regions give an initial segmentation; each of
the regions corresponds to a surface. On the other hand, the contours such as the one in
Figure 5.2 (c), represent surface boundaries but are not easy to segment, because the con-
tour image may still be quite noisy and incomplete. This suggests that a better segmenta-
tion may be achieved by "growing" the initial striped segments using the contours as ter-

nrination condition. One way to do this is to extend the stripes to the contours.

For each stripe we can ﬁt a curve using the grid points on the stripe. Points external
to each end of the stripe are calculated using the ﬁtted curve, or we simply linearly

extend the stripe, until one of the following occurs.
(1) a gradient contour is encountered.
(2) A pixel in some other segment is hit.

(3) Extension runs into a place that is likely to belong to the background (very low
intensity, say). This condition is necessary since gradient points usually do not
form closed contours; stripe extension may escape through the gaps between gra-

dient contours.

Figure 5.5 illustrates the effect of stripe extension. (a) is the initial striped segments
of the potato image along with the contour map obtained by gradient operation. Actually
in this example, the gradient operation was not applied directly to the original intensity
image; rather the intensity image was preprocessed by a logarithm operation that rescaled
the intensities according to an appropriate logarithm curve. In (b) the stripes are

extended, and (c) shows the resulting segments.

123

 

(a)

 

 

(C)

 

 

Figure 5.5 Segmentation using both stripes and intensity gradient
(a) initial segmentation with gradient contours
(b) extended stripes (0) resulting segmentation

 

124

5.5 Interpretation of contours

What does a contour map, such as the one in Figure 5.2 (c), tell us about object
shapes? Obtained by applying a gradient operation on an intensity image followed by
contour tracing and noise removal, image contours occur at places where intensity
changes abruptly, such as boundaries of object surfaces, shadows, and marks. There are
ﬁve types of image contours caused by the intrinsic structure of the 3-D view and occlu-
sion of one object by another. As suggested in [Chamiack and McDermott 85], and illus-

trated in Figure 5.6, these types are:

extremum self-occlusion by a curved object, surface gradually turning away from

the viewer, surface orientation perpendicular to the viewing direction at

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

the contour.
blade two surfaces meet, just one visible; surface orientation discontinuous.
fold two surfaces meet, both visible; surface orientation discontinuous.
shadow an abrupt change in illumination
mark an abrupt change in reﬂectance or texture due to surface characteristics.
light source blade
\ r 1d
0
I \ shadow > 12""th
I
extremum
mark
¥ 1
blade

Figure 5.6 Types of image contours

 

125

Our goal is to classify the image contours and to infer the surface shapes in the

vicinity of the contours.

5.5.1 Apparent curvature of visual contour and local surface geometry

Koenderink [Koenderink 84] proposed a theory that relates the apparent curvature
of the occluding contour of a visual shape to the intrinsic curvature of the surface and the
radial curvature. According to his theory, convexities, concavities, or inﬂection of con-
tours in the image allow us to draw inferences about local surface geometry with cer-
tainty, assuming objects are smoothly curved (e.g. no folds). Two curvatures were

deﬁned as follows.

radial curvature : K, = —

transverse curvature : K , = —

where r, is the radius of the curve in which the normal plane cuts the surface, and r, is
the radius of the curve along which the imaging rays cut the projection plane. The normal
plane is deﬁned by the normal to the surface at a rim point and the imaging direction, and
the projection plane is perpendicular to the normal plane at the rim point. See Figure 5.7.
The rim is the boundary between the visible and invisible parts of the object. Note that

the rim is not necessarily planar for a general smooth object.

A remarkable simple result is
K IX 1 = K

where K denotes the Gaussian curvature of the surface, although K , and K t are not neces-
sarily the two principal curvatures. If the projection is from a ﬁnite distance d, the

apparent curvature of the contour is

dK
Kr

 

Kapp = dK1=

126

The explanation of the scaling factor can be found in [Koenderink 84].

 

viewer

 

Figure 5.7 Radial curvature and transverse curvature

 

Notice that the radial curvature K, cannot change sign at an occluding contour
because the object occludes the sight of farther parts. Let K, be positive (an arbitrary
choice), then the apparent curvature K0” and the Gaussian curvature K have the same
sign. From Kap , which can be obtained from the visual contours in the image, we can
determine the sign of the Gaussian curvature, i.e. we can draw inferences about the local
surface shape. The general rule is : convexities of the contour correspond to convex sur-
face patches, concavities to saddle-shaped surface patches, inﬂection of contour to

inﬂection (K = 0) of the surface. There are no exceptions to this rule.

This theory allows us to interpret a smooth curved object at the rim through the
occluding contours available in the image. For example, the contour map of the image of
a sculpture and an orange is shown in Figure 5.8, where the labels convex (+), concave

(-), or inﬂection (0) can be assigned along the contours. This is a richer labeling than the

127

Huffman-Clowes scheme [Huffman 71, Clowes 71], in which occluding contours can

only be labeled "occluding" (—> or <—).

 

 

 

Figure 5.8 Intensity gradient image of a sculpture, an orange and a block

 

5.5.2 Combination of image contours and stripes

The apparent curvature theory outlined above is valid under the smoothness
assumption. That is, for rigid objects the tangent plane is deﬁned for every point of the
surface and moreover is a smooth function of position. In other words, the theory works
on only one type of image contour — the extremum. There are contours of other types,
such as blades, folds, shadows, and marks. For these contours, the apparent curvature
theory does not apply. For example, in Figure 5.8, if we know the circular contour at the
left of the image is the boundary of a smoorh object, as it really is, we can label it convex
according to the apparent curvature theory. But if the circular contour were a boundary of

a disk, the surface at this boundary would not be convex; instead, it would be a blade. To

128

distinguish extrema from blades, we may use the shape from shading idea [Horn 71]
which states that the intensity received is a decreasing function of the angle between the
surface normal and the incident direction of the light source. If surfaces are Lambertian,
normals can be computed at every point of the surface using intensity information. But
the shape from shading technique requires very carefully controlled illumination condi-
tions and uniform surface characteristics (Lambertian, say). It may not be easily
guaranteed in many cases. An easier alternative way to decide the type of an image con-

tour is via light stripes.
5.5.3 Terminology and Deﬁnitions

First we deﬁne some terminology in order to describe more clearly the idea of

fusion of intensity and light striping.

The volume viewed by the camera is called the C-cone and the volume viewed by
the projector is called the P-cone. We assume that the objects to be recognized are in
both the C-cone and the P-cone so that they are illuminated by the light stripes and visi-
ble to the camera. For an object in the C—cone, only the surfaces that are "towar " and
closest to the camera are visible. In other words, the surface normal at any point p on the
visible surfaces has a positive projection on the imaging vector at p and there is no other
surface point on the imaging ray between p and the camera focal point (no occlusion).
The visible surfaces divide the Come into two parts: the C-visible part corresponding to
the volume between the surfaces and the camera, and the 0sz part that is the
volume at the other side of the surfaces. The boundaries of the object surfaces that are
visible to the camera is called the C-rim; that is the loci of surface points where the imag-
ing rays are "tangent" (or touch) to the objects. The C-silhouette is the image of the C-
rim in the image plane. For the projector, the P-visible part, P-shadow part, P-rim, P-

silhouette are deﬁned similarly.

Under the assumption that the projector lighting is pervasive‘l’, a striped image con-

sists of two kinds of regions: striped regions (stripe networks) and non-striped regions.

129

The non-striped regions correspond to the P-shadow part in 3-D. We call the non-striped
regions in the striped images P-shadows. The boundaries of P-shadows, or P-edges,
according to [L. N. Hambrick, M. H. Loew and R. L. Corroll Jr., 1987], can be categor-

ized into four types as follows:

(1) shadow-making segment along which the projector light is blocked, labeled m

segment, which is part of the P—silhouette.
(2) shadow segment that projects from the m segment, labeled s.

(3) occluding segment along which part of the surface is blocked from the camera,

labeled 0 segment, which is part of the C-silhouette.

(4) shadow segment that projects from a hidden shadow-making segment, labeled h.

These segments of P-shadows are illustrated in Figure 5.9.

For a gray-scale image of the same scene illuminated by a light source at the posi-
tion of the camera, no C-shadows are perceived. The intensity edges (also called the C-

edges) can be one of the following as stated in Section 5.5 :
(1) occluding edge (extremum or blade, part of the C-silhouette),
(2) surface normal discontinuity edges (fold),

(3) surface marks.

 

1' Assume objects in the C-cone are place on a ground plane, so that if there were no objects on the
81'0qu plane the image would be ﬁlled with a regular grid pattern formed on the the plane.

130

 

light source

| \

 

 

Figure 5.9 Four types of P—edge:
m : shadow-making segment
s : shadow segment
h : shadow segment projects from hidden shadow-making segment
0 : occluding segment

 

131

5.5.4 Inference Rules

As stated in preceding chapters, stripe networks and stripe texture provide 3-D solu-
tions for grid points, surface normals at these grid points, and rough surface shapes as
well. Based on the computed rough shapes, we establish the following inference rules in
interpreting adjacent contours under the general position assumption. The contours

include both intensity edges (C-edges) and virtual edges of the stripe networks (P-edges).

It is clear that some contours in the image do not correspond to geometric charac-
teristics of the objects, such as surface marks of C-edge (due to surface reﬂectance), s and
h segments of P-edge (cast from other segments). These contours, especially the surface
marks, are somewhat irrelevant to the object shape. We ﬁrst establish an inference rule

for identifying surface marks.

Rule 1. If a C-edge is crossed by a stripe network, i.e. the stripes crossing the edge
are continuous at the edge, and the changes of surface normal at the edge are small, then

the C-edge is a surface mark (Figure 5.10 (1)).

For s segments (P-shadow segment), they always appear paired with an m segments
(P-shadow making segments), as we discussed about the virtual edges in Section 4.3. A
virtual edge v of a stripe network is an s segment if there is another virtual edge u of
some stripe network such that the stripe end-points on u lie on the projector rays of the
stripe end-points of v. That is, the projector rays "enter" the P-shadow at points on u and
"exit" the P—shadow at points on v. The two virtual edges u and v is an m-s segment pair
that can be determined by the projector geometryi'. The m-s segment identiﬁcation is not
considered here via inference rules because it is a direct result of 3-D computation rather
than 2-D inference, but should be identiﬁed before proceeding to apply other inference
rules so that the s segments are properly labeled "P-shadow" and will be excluded from
further consideration.

¥

'1' It is possible, however, that u is in the C-shadow and hence undetected via 3-D computation.

132

In what follows in this section (Inference Rules), we assume that the surface marks
and the projector-shadows (s segments) are already identiﬁed and will be excluded from
the rest of the inference procedure (but not to object recognition). The important edge
segments are those associated with occlusion, object boundaries, and surface normal
discontinuities. The edge segments we will consider include both intensity edges (con-
tours) and virtual edges. The virtual edges are boundaries of the stripe networks and
hence part of the P-silhouette, while the intensity edges correspond to the C-silhouette

(extremum and blade) and surface normal discontinuity (fold).

The following rules use 2-D stripe networks and rough shape information to make

inferences about the types (labels) of the edge segments.

Rule 2. If two striped regions corresponding to two surface patches meet (touch) at
a "common" virtual edge and the stripes are staggered from one region to the other, then
one surface patch occludes the other from the camera. The virtual edge represents a blade
if the occluding surface patch is (a) planar or (b) concave, or an extremum if it is (0) con-
vex. Usually the virtual edge is also coincident with an intensity edge because it is an
occluding edge that is part of the C-silhouette. The occlusion relationship between the
two surface patches may be determined by their computed locations in 3-space, or by

some global contour and stripe analysis (Figure 5.10 (2)).

Rule 3. If an edge is covered by a stripe network and the stripes at the edge have an
inﬂection (bending stripes), then the edge is a fold, because the inﬂection of the stripes
shows discontinuity of surface normals at the edge and both surface patches on the two
sides of the edge are visible to the camera. The fold may be an intensity edge and/or a
virtual edge. Whether it is a convex or concave fold can be determined by the directions

of the surface normals (Figure 5.10 (3)).

The above three inference rules are used for edges "internal" to striped regions (both
sides of the edge are striped regions). The next several rules deal with contours at the

border of a stripe network, i.e. only one side of the contour is striped.

133

First, we need to identify contours that are 0 (occlusion) segments. These contours
are part of the C-silhouette that occludes objects or part of objects from the camera.
There are two types of this kind of occluding contour: one at which the striped surface
patch occludes the non-striped one, and the other at which the striped surface patch is
occluded by the non-striped surface patch. A contour of the ﬁrst type is part of the
geometry of the striped surface, whereas a contour of the second type is irrelevant to the
geometry of the striped surface and needs to be identiﬁed early in order not to confuse

the surface-edge interpretation.

It is clear that without other information there is no way to tell which side of the
contour occludes the other side. Fortunately, the stripe network and the intensity edge

provide some clue.

Rule 4. For a one-side-striped edge segment (intensity edge) e, if there is another
edge segment r piercing into e to form a "T" junction, then the r~side of e (T-stem) is
occluded by the other side. r may be (a) an intensity edge and/or (b) a virtual edge seg-
ment (Figure 5.10 (4)).

Rule 5. For a one-side-striped intensity edge segment, if it is part of a relatively
closed intensity contour, then the surface enclosed by the contour occludes the other side

of the contour (Figure 5.10 (5)).

Rule 4 and 5 are based on some psychological experiences and are similar to some
rules used in line-drawing interpretation [Huffman 71, Clowes 71, Waltz 75, Binford and
Lowe 85]. In the light striping case, if the non-striped side occludes the striped side, the

non-striped side is also associated with an object some of whose surfaces are striped.

At an occlusion edge, if the non-striped side occludes the striped side, this edge is
irrelevant to the surface geometry of the striped surface. Hence for surface-edge interpre-
tation, we only consider the edge at which the striped side occludes the non-striped side
and edges that are not occlusion segment. In these cases, the striped region in the vicinity

of the edge segment is "closest" to the camera — there is no obstacle blocking it from the

134

camera.

There are two possibilities for such a striped region: the edge segment that bounds
the striped region is only a virtual line, or it is also an intensity contour. If it is an inten-
sity contour, it is either part of the C-silhouette or represents surface normal discon-
tinuity, since surface marks are already excluded. If the edge segment is only a virtual
line, it is part of the P-silhouette but unlikely part of the C-silhouette. The following
inference rules deal with edge segments that are one-side—striped and are "closest" to the
camera. The rules can be divided into two groups depending on whether the edge seg-
ment is only a virtual line or it is also an intensity contour. See Figure 5.11 for examples

where the rules are triggered.

Group 1. Rules in this group apply to cases where the edge segment that bounds a
striped region is an intensity contour. A contour is said to be convex if every chord drawn

on it lies within the striped region.

Rule 6. A contour c bordering a planar surface (indicated by straight stripes)
represents (a) a fold if there is evidence indicating an unstriped surface meeting at the
contour e, such as another intensity contour at the non-striped side connecting one end of
c forming an "L" or an "arrow" junction, or (b) a blade otherwise. This is because on a
planar surface, the normal to the surface is a constant. At the contour, the normal has to
be discontinuous from the visible striped plane to either the invisible surface of the object

(blade case) or to the visible unstriped surface of the object (fold case).

This rule can also be stated in terms of the position of the contour relative to the
striped region. If the contour bounds the "upper" boundary of the planar surface, it is a
blade (see Section 5.5.5 for mathematical proof) and hence an 0 segment; whereas if it
bounds the "lower" boundary of the surface, it is either a fold or a blade -——— an m segment
(P-shadow making segment). The direction indicators "upper" and "lower" were deﬁned

in Chapter 3 and are redescribed in Section 5.5.5.

135

Rule 7. A convex surface patch bounded by a straight contour c is convex cylindri-
cal at c. If c bounds the upper border of the surface, which is usually the case since an
edge segment at the lower border of a convex surface patch is often a virtual edge but not

an intensity contour, c is an 0 segment and labeled extremum.

Rule 8. If a convex surface patch bounded by a convex contour c, the contour is (a)
an extremum if the surface normal gradually turns away from the line of sight in the
direction towards the contour, or (b) a blade if the normal near the contour is not perpen-

dicular to the line of sight. In both cases, c is an occluding segment.

Rule 9. A contour c at the lower border of a convex surface patch is either (a)
labeled fold if an unstriped visible surface meets the contour, or (b) it represents an
extremum and the surface is saddle-shaped. In the former case, c is a P-shadow making

segment while in the later case c is an occluding segment.

Rule 10. A contour bounding a concave surface patch is a blade or a fold.

Group 2. Rules in this group apply to edges that are virtual boundary segments of
one-side-striped regions.

Rule 11. Same inference stated in Rule 6 holds for virtual edges that bound planar
surface patches. Furthermore, a virtual edge at the lower border of a planar surface is
more likely to be a fold than a blade since a blade is also an occluding segment and

occluding segments are often intensity edges.

Rule 12. The virtual edge v at the upper border of a convex surface is an occluding
segment and labeled extremum; the surface patch at v is either convex if v is convex, or

saddle-shaped if v is concave, similar to rule 8 and rule 9.

Rule 13. The virtual edge v at the lower border of a convex surface patch is a P-

shadow making segment.

136

 

 

 

 

 

 

 

 

 

 

exam le ﬁ e stripe contour surface type contour
p gur network segment at contour label
1 ' stripe
continuous, . any surface
11 In
normal a nds kind mark
I same
planar .
. all kinds planar blade
occluding
concave all kinds concave blade
occluding
2c
convex
. convex convex extremum
occluding
3 Stripe
W °°""""°"s' all kinds corner fold
normal
change
43 . intensity
11 ed e
a T? any occluding
kinds . . kind
junction
virtual
11
a a?" any occluded
kinds , , kind
junction
pan
3“ aged any occluded
kinds kind
contour

 

 

 

 

 

 

 

Figure 5.10 Inference rules 1 — 5

 

137

 

 

 

 

 

 

 

 

 

 

 

 

stripe contour surface type contour
example ﬁgure network segment at contour label
6a (Step)
planar all kinds planar fold
6b (knife)
% planar all kinds planar blade
7 (can)
% convex straight developable extremum
83 (ball) convex,
convex convex extremum
. change of
normal
8b (can) convex,
‘ small
convex convex blade
\ change of
\ ' normal
93 (lip stick) convex,
no curvature convex convex
towards fold
contour
9b (donut) convex,
curvature concave saddle extrem 111
towards u
contour
10 (bowl)
concave all kinds concave b13136
fold
P-shadow
all lower making
convex
kinds border segment

 

 

 

 

 

 

Figure 5.11. lnference rules 6 —- l3

 

138

5.5.5. Some mathematic arguments about the inference rules

In Chapter 3, we deﬁned direction indicators "up" and "down", or "upper" and
"below", in the image plane in terms of the spatial relationship between the camera and
the projector. The projecting rays in the 3-space emitting from the projector are deﬁned
with direction. Accordingly, the projection of these rays in the image plane also deﬁnes
the direction in the image. The direction in the image is from "up" to "down", or from
"upper" to "lower". If a striped region is bounded by a contour segment, then the contour
segment is said to be the "upper" boundary if at every contour point the direction is from
the non-striped side to the striped side of the contour; and said to be the "lower" boun-

dary if the direction at every contour point is from the striped side to the non-striped side.

Assumption A. In the following two lemmas, we only consider surface patches that
are one-side-striped, "closest" to the camera (not occluded by the non-striped region at
the other side of the edge), and their attached edges are neither P-shadow edges nor sur-

face marks.

Lemma 5.1 Under the Assumption A, the upper boundary of a planar surface patch

represents a blade as stated in rule 6.

proof. Since the non-striped side of the edge has no stripes, there are two possibili-
ties: (1) either the side is invisible to the camera and hence the stripes at that side cannot
be seen; in this case, the edge is obviously a blade; or (2) the side is invisible to the pro-
jector but visible to the camera. In this case, one of the following two situations must
occur : (a) it is occluded by an other object with respect to the projector (P-shadow) or to
the camera, but this kind of occlusion has already been identiﬁed in an earlier stage under
Assumption A and excluded here; or (b) it is occluded by the object itself. Here "occlu-

sion" is with respect to the projector. This is illustrated in Figure 5.12.

Let the "projector vector" (reverse direction of the projecting ray) at the edge point
A be'li, the normal of the non—striped surface (other side of the edge) at A beTr’, and the

imaging vector be 1'". Let P be the plane perpendicular to ? and N be the plane

139

 

 

 

 

Figure 5.12 Lemma 5.1

 

perpendicular to'ﬁ’. Since the non-striped side of the edge is occluded by the object itself,
Tiandi’must be at the different sides of the plane P and plane N, i.e. their angle is greater
then 90°. But, since the camera sees both sides of the edge,?must be on the same side
of P as'ﬁ, and at the same side of N asTr’. That is,'?is in the "quadruple" between plane P
and plane N. Hence, the projection of '1? in the image plane (from? toward?in Figure
5.12) would deﬁne the edge the "lower" boundary of the striped region, which contradicts
the condition that the edge is the "upper" boundary of the region. So, the non-striped side
of the edge cannot be occluded by the object itself with respect to the projector. This

excludes the possibility (2) and the lemma follows. I

Lemma 5.2 Under Assumption A, if a striped region bounded below by an edge

indicates a smooth curved surface patch, the non-striped side of the edge is visible to the

140

camera.

proof. Refer to Figure 5.13, let the projector vector and the imaging vector beFand
7:: respectively, and the surface normal at an edge point A be R Since the surface is
smooth and the stripes stop at the edgej’i’: 0 (they are perpendicular). Because point A
is visible to the camera-332 0. We now prove the equality does not hold, henceﬁ> 0.
Suppose't'ﬁf= 0. Since bothi’and?are perpendicular to the surface normal'ffat A, they
span the tangent plane of the surface at A. The object in the vicinity of A is at one side of
the tangent plane. Since the projection of Ti in the image plane, 3;, is the intersection of
the image plane and the Wplane, the striped region at the vicinity of A would also be at
one side of T)’, in the image. That is,?,- is "tangent" to the striped region at A,- which is the
image of the edge point A. Butfr’; should run across the edge at A,- because the edge is the
"lower" boundary of the striped region. So??? > 0 follows the contradiction. Hence, not
only the edge point A is visible to the camera, so is a vicinity of A, including the non-

striped side of the edge at A. I

 

Projector

 

Camera

Figure 3.13. Lemma 5.13

 

141

From Lemma 5.2, we know that if a virtual edge is the lower border of a smooth
curved surface patch (indicated by a striped region) the non-striped side of the edge is
also visible to the camera. This leads us to search for intensity edges down the area that

may provide "real" surface boundary information about the non-striped surface patch.

For other inference rules, some are heuristic, and some others can be proved

mathematically.

5.5.6 Implementation of the inference rules

The implementation of the inference rules may be done via a set of production rules
and a control scheme. These production rules may be used to form a decision tree that

controls the labeling process. The input and output are the following :
input :
1). contours in some form (chain—code, say),
2). surface patches, their types and boundaries;
output :
1). labels associated with each contour (label can change along a single contour),
2). surface types.

The labeling process is a contour-based procedure. It takes a contour at a time, seg-
ments the contour based on its adjacency relationships with the striped regions, and
labels each segment of the contour according to the inference rules. Since the production
rules are triggered in a speciﬁed order, surface marks and ms segments are determined
ﬁrst. Occlusion with respect to the camera is then determined and only these striped sur-
faces and corresponding edges that are "closest" to the camera are processed, since other-
wise the edges are not very useful in inferring the surface geometry. Then the rules in
group two are applied. As output, each contour segment may have one or more labels

associated with it.

142

Multiple labels can be disambiguated using the knowledge about the spatial rela-
tionship between the camera and the projector. For example, if a contour in the image
bounds the "upper" part of a planar surface, it indicates a blade, rather than ambiguous
labels "blade or fold", as claimed in Lemma 5.1. In addition a relaxation procedure
based on the relationships between adjacent image contours can also be used to sort out

some multiple labels.

143

5.5.7 Examples

We take two scenes as examples. The ﬁrst is the image of a solid block as shown in
Figure 5.1, in which all surface patches are planar. The second example scene consists of
a cobra sculpture, an orange and a block, in which surface patches are curved. The label-
ing procedure for the surface patches and associated edges in these two examples are
done manually —-— a human account of what coordinated use of individual processes and

rules developed in Section 5.5.4 might produce.
example 1 —- the solid block

The intensity gradient contours and the striped networks are reprinted in Figure

5.14.

 

 

Figure 5.14 Image of a solid block

(a) intensity gradient contours

(b) overlay of contours and stripe networks with named surface
patches and edges

 

There are two striped regions, but the lower one can be divided into two sub-regions by
the bending of the stripes and the intensity contour H at the bending place. Let the three

striped surface patches be P 1, P2 and P3. and the contour segments be A, B, C, R, as

144

indicated in Figure 5.l4(b). First, by 3-D relationships between the virtual edges, F is a
P-shadow cast by D. By the inference rule 3, H is labeled "fold". Furthermore, using the
3-D information (normals), H is classiﬁed as a "convex fold". According to the inference
rule 6 and Lemma 5.1, contours A, B, E are labeled blade, whereas C, D, G, I, J are
labeled "fold or blade". By rule 6, since D, A and a third contour K (not adjacent to a
striped region) form an "arrow", D must be a fold, either convex or concave. But it can-
not be concave because it generates the shadow F. So, D is a convex fold, similarly so
are C and J. For edges D and K, it is very likely that they form a plane P4, that can be
veriﬁed by the intensity in this area. Similarly, plane P 5 formed by straight lines C and L
is decided. Since the intensity contour N connects to L and is parallel to C, it is deter-
mined to be in the same plane P5. Because of the plane P5, the contours G, I are deter-
mined fold by rule 6. Since P 4 and P5 are planar surfaces and adjacent to P1 at fold edge
C and D, respectively, they are very likely to intersect at a common edge which starts at
the junction of C and D and is actually indicated by the weak intensity edge M. Because
both P4 and P5 are visible to the camera, M is a fold. Finally, plane P5 formed by J, Q
and R is claimed. Now, for the block, we obtained the following representation (includ-
ing processing described in Chapter 3):

6 planar surfaces:
P1 — blades A, B, convex folds C, D

z=80mm in the global coordinate system within the x-y domain deﬁned by

stripe end-points (speciﬁc numbers are omitted here)
P2 — blade E, P-shadow F, convex folds G, H

z=38mm in the global coordinate system within the x-y domain deﬁned by

stripe end-points (speciﬁc numbers are omitted here)
P3 — convex folds H, I, J, adjacent to P2 at the common edge H,

surface normal perpendicular to the planar surface that is determined by

ﬁtting a plane for the stripe grid points.

145

P4 — convex fold D, M, occluding contour K and P-shadow F; adjacent to P1 at

common edge D to P 5 at M

P5 — convex folds C, G, I, occluding contours L, N, and fold M; adjacent to P1 at

C, to P2 at G, to P3 at I, and to P4 at M
surface normal to P5 determined by the 3-D stripe end points on C, G and I
P5 — convex fold J, occluding contours Q and R, adjacent to P3 at J
example 2 — a cobra sculpture, an orange, and a block

The gradient contours and stripes plus contours are shown in Figure 5.15 (a) and (b)

respectively.

 

 

 

 

 

(a) (6)

Figure 5.15. Image of a cobra sculpture, an orange and a block
(a) intensity gradient contours

(b) overlay of contours and stripe networks with named surface
patches and labeled edges (e—extremum,b—blade)

 

146

There are 3 stripe networks in the image; G for the orange, B for the cobra body,
and H for the cobra head. Since there are no stripes on the surfaces of the block, it cannot
be interpreted using the fusion rules that require stripe information, so it will not be dis-
cussed here. The surface patch G is determined to be spherical by the B-spline ﬁtting of
3-D data, and the boundary of it is extremum according to the inference rule 8. Surface
patch H is convex by the stripe texture analysis; its upper bordering contour is extremum
The left border of H consists of a broken contour which is bridged using the virtual edge
of the stripe network; and is determined extremum also. For the surface B, both surface
ﬁtting and texture analysis suggested that it should be interpreted locally. Roughly, B can
be subdivided into three patches B 1, 82 and B3 as indicated in the Figure 5.15, where
B 1 and 82 are basically concave whereas B3 is convex. Part of the boundaries of B are
labeled as indicated in the ﬁgure. Notice that the lower borders (virtual edges) of B; and
B3 are "extremum" with respect to the projector and hence P-shadow making segments
according to rule 13. Also they have small z-coordinate values (5 - 15mm). The intensity
edges further down are found that indicate the real surface boundaries of the surfaces,
and very close to the ground plane (2:0), so 32 and 83 are labeled blade. Those inten-
sity contours interior to the striped regions are all marks by the inference rule 1. For the
virtual edges R (at the right border of H) and L (at the left border of 32), it is determined
that L is the P-shadow cast by R. Also, the surface patch H occludes Bl according the
inference rule 2.

The representation of this image can be described as follows.

There are ﬁve surface patches:

G — convex sphere with radius 36mm (by B-spline and Gaussian curvature com-
putation), at 3-D location determined by the stripe grid points (speciﬁc

numbers were not computed), bordered by extremum contour;

H — convex, occluding surface patch B 1 at left extremum contour,

147

right virtual edge casts shadow at the left of surface patch B 2;

Bl — basically concave, occluded by H, boundaries labeled as indicated in the
ﬁgure;

B 2 — basically concave, boundaries labeled as indicated in the ﬁgure;

B3 — convex , boundaries labeled as indicated in the ﬁgure,

B 1, 32 and B3 being on the same surface, with locations in the global coor-

dinate system determined by their grid points in the 3-space.
5.5.8 Experiments

Five more images of multiple object scenes were analyzed and the surface patches
and edges were labeled using the inference rules. The inference rules were coded in
PROLOG. The input to the PROLOG predicates was a set of hand-coded "facts" about
the edges and surface patches in terms of their geometric and topological properties
(adjacency, planar-ness of surface patches, etc.). The scenes were rather complex.
Although the representations were hand-coded, we assume that they may be derived from
the lower level processing described in Chapter 3 and 4. The PROLOG coding of the
inference rules and the output labels of the ﬁve images are listed in the Appendix. The
results are summarized in Table 5.2. The results show that 75% of the edges (101 out of
135) were labeled and only one mislabeled. That mislabeled edge is part of a "fold", but
due to lack of fold evidence it was labeled "blade". This mistake can be corrected by a
higher level process that uses some connection rules and segments these edges and sur-
face patches into objects. The remaining 25% of the edges were not labeled; for two of
them, no conclusion can be drawn from the inference rules because there is not enough
information about the occlusion relationship between the two sides at the edges. The rest
of the edges are simply not associated with striped surface patches. The occlusion rela-

tionships between the 26 surface patches were also correctly discovered.

148

 

 

 

 

 

 

 

 

 

Table 5.2 Summary of results of labeling surface patches and edges
of ﬁve scenes using the inference rules
input output
01119015 in number number number occludedt number“ number
the cue of of of surfaces surfaces of edges of edges
so
surfaces edges labeled discovered labeled mislabeled

book
orange 5 28 5 2 24 0
metronome
shoe
bowl 6 22 6 2 22 0
box
can
cup 5 32 5 2 l6 0
cup with cover
tomato
can 5 26 5 3 21 1
vacuum cleaner
tea pot
plate 5 27 5 3 18 0
roll of paper

 

 

 

 

 

 

 

1' All occlusions discovered are correct.
* Only edges that are associated with striped surfaces are labeled.

 

 

 

149

5.6. Conclusions

This chapter investigated a multi-sensor fusion approach that combines information
from both the striped image and the intensity image. The two images can easily be accu-
rately registered and hence can be treated as a single image. The gray-tone image is used
primarily for extracting gradient contours, while the stripe image provides 3-D surface
shapes. Object contours are very much essential to visual perception. Humans often can
recognize objects using only contours. For smooth object surfaces, some occluding con-
tours can be interpreted "convex", "concave" etc. according to Koenderink’s apparent
curvature theory. But, for objects that are not smooth, or contours that are not occluding
ones, we need information about the surface itself. Light striping is just the very means
that meets our needs. Hence, fusion of intensity contour and striped surface is a natural
and easy way that can be used to obtain a good scene representation and low level
interpretation.

Light striping offers shape information mostly interior to the visible surfaces; the
information about the boundary areas is often quite vague, because of either fading
stripes at extrema and hence less accurate computation toward the boundaries or no
boundary information available from surface ﬁtting. On the other hand, intensity con-
tours provide just the complement. To combine them in constructing a reasonable object
representation that is appropriate for model matching, we developed a set of inference
rules that label the surface patches and contours according to their detected features and
their spatial relationships. Although some are heuristic and empirical, most of these rules

are mathematically elegant and performed well in experiments.

Almost all of the contour labels assigned by the rules are useful for object model
indexing, and some are useful in matching to recognize and determine pose. In the future,
we hope to add more rules to help in further disambiguation and segmentation of the
scene into objects. It is important to note that our approach has been heuristic — we

chose not to rigidly deﬁne the class of objects and we have relied on general viewpoint

150

assumptions, which will not always hold true.

CHAPTER ‘6

Concluding Discussion and Future Work

 

This thesis generalized use of striped light toward obtaining a 2%-D representation

of a scene to be used ultimately for object recognition and pose detection. This was

demonstrated by using structured lighting in three stages in 3-D vision, namely 3-D sens-

ing, surface shape inference through stripe textures, and 2—;uD representation via fusion

of striped images and gray scale images. This chapter summarizes the work described in

this thesis, discusses its implications, and suggests directions for future work.

6.1 Summary

The ﬁrst chapter laid the groundwork for 3-D machine vision by brieﬂy discussing
the three basic parts of 3-D vision: feature extraction, object representation, and model
matching. 3-D data acquisition, which provides spatial information about the object sur-
faces, is one of the ﬁrst tasks in feature extraction. From 3-D data, geometrical relation-
ships between surface points can be established, which are used for describing surface
shapes. In object representation, surface shapes must be inferred and represented in an
appropriate format in order that the object be recognized by matching its representation
against some predeﬁned models. Many shape-from-x techniques have been developed in
the past two decades that infer surface shapes using information from different sources.

Structured light is one of these sources, and it is the one investigated in this thesis.
151

152

Model matching is an object recognition procedure that matches the representation of the
sensed data to a stored representation of the object. Constructing models automatically is
itself a very difﬁcult problem; and to develop a very large model base is even harder.
This thesis concentrates on the ﬁrst two tasks in 3-D vision, namely feature extraction
and construction of surface representation. In particular, a structured lighting approach is

developed.

The structured lighting approach in a general context was discussed in Chapter 2.
Structured lighting is a technique that imposes artiﬁcial features on the surfaces of the
objects by illuminating the scene with a structured light pattern. When using a grid as the
illumination pattern, the light stripes weave "networks" on the surfaces that give an
image where the grid distortion is a function of the geometric shapes of the surfaces, as
well as the spatial relationships between the sensor, the light source, and the object sur-
faces. The light source (a projector) and the sensor (a camera) form a stereo system in
which surface points in 3-space can be solved through triangulation. The major difﬁculty
is the stripe identiﬁcation problem which is the general "correspondence problem" in the
structured light setting. This problem is attacked by applying a set of general constraints
to the computed surface points that relate to each other in the stripe networks. In addition
to 3-D data, light striping also provides rich information about surface shapes through the
2-D stripe textures. Analysis of stripe texture may conﬁrm the surface shapes obtained
from the 3-D data, and hence increase the certainty of inference about surfaces. It is
apparent that the sensed 3-D points are sparse due to the way the grid of light is pro-
jected. We must assume the "smoothness" of the surfaces in order to make inferences
about the surfaces using only the sensed data, which may not be valid in general but is
often satisﬁed in practice such as in industrial environments. To make surface inferences
more reliable, and to obtain information in the non-striped regions, structured lighting is
fused with the gray tone image of the same scene. Surface-based representation may be

constructed based on the combination of the two information channels.

153

6.1.1 Computing 3-D surface solutions

Chapter 3 is dedicated to 3-D sensing using light striping. The striped image is ﬁrst
thresholded to generate a binary image, which is then processed to extract the grid points
(stripe intersection and end points). For those grid points extracted, a stereo computation
is carried out for their locations in the 3-D space. The stereo pair here is the image
sensed by the camera and the grid slide that is projected onto the scene. In order to use
stereo, the spatial relationships between the projector, the camera and the scene must be
ﬁxed and known in advance: this is accomplished through a process commonly referred
to as the calibration process. After the projector and the camera have been calibrated, a
point in the 3-D space can be computed via triangulation if it is identiﬁed in the stereo
pair. Identifying matching points, or solving the stripe labeling problem, is very difﬁcult
when the stripes are generated from a homogeneous pattern (a grid, for example). But
unlike the general "correspondence problem", not only does light striping substantially
reduce the search space but also it makes the relationships between the stereo matching
points more explicit and easy to resolve. Two general constraints — the uniqueness con-
straint and the continuity constraint are established and are further reﬁned into a set of
geometric rules and a set of topological rules that are specially derived for solving the
line labeling problem for the stripe networks. The relationships of the grid points within
and between the stripe networks are speciﬁed by these rules that are obeyed when the
striped images are formed under the general position assumption. A graph traversal pro-
cedure is carried out to propagate the constraints within a stripe network to sort out those
stripe labels that violate any of the constraints. A global constraint test is then applied to
relate several individual stripe networks in a multi-network image, that further reduces
the ambiguity in stripe labels. Five algorithms are described in detail which applied in
sequence will take the sensed striped image as the input and produce 3-D surface solu-
tions for the grid points in the stripe networks. The ﬁve algorithms are: 2-D processing

for extracting grid points of interest, 3-D computation for a single grid point via

154

triangulation, within-network constraint propagation, stripe network boundary extraction,
and between-network constraint propagation. Time complexity was analyzed. Experi-
ments were performed on various real images. The results of three of these images are
listed in Table 3.2. These images contains both polyhedral and curved objects of arbitrary
poses and occlusion relationships, some of which have complex shapes (e.g. cobra sculp-
ture). The experiments showed that the resulting 3-D surface solutions delivered by the
ﬁve algorithms are of reasonable accuracy, and the small degree of ambiguity which
remains may further be reduced using other information. Each of the 3-D surface solu-
tions can be used in inferring surface shapes. A mathematical analysis showed that the
degree of ambiguity in the stripe labeling is proportional to the camera standoff (distance
from the camera to the scene point) and the error tolerance in computation, and inversely
proportional to the stripe spacing.

The conclusion from this work on light striping in 3-D sensing is that it is effective,
reliable, and easy to apply, but yields a small degree of ambiguity in some surface loca-

tions.

6.1.2 Classiﬁcation of surface patterns

3-D surface data by themselves do not compose the surface description for scene
understanding; they are still a low-level representation of the objects but can be used in
deriving a higher level surface description. In addition to 3-D surface solutions, struc-
tured lighting also directly reveals surface shapes through the 'stripe networks imposed on
the surfaces. These stripe networks make the surfaces vividly visualized to a human due
to the way the stripes (surface contours) are formed. Chapter 4 discussed surface

classiﬁcation using 3-D data and stripe textures.

Based on the computed 3-D points, a surface can be ﬁtted using the B-spline tech-
nique, which was the ﬁrst topic in Chapter 4. Instead of generating an analytical form of

the ﬁtted surface, the two principal curvatures and the intrinsic Gaussian curvature were

155

calculated at various points interior to the stripe network. The curvatures were then
clustered and the surface patch (corresponding to a stripe network) was classiﬁed accord-
ing to the curvature clusters and classical differential geometry theory. Several surface
patches obtained from the real striped images were measured in terms of the curvature
properties. The experiment was described in Section 4.1.3. These surface patches (a
planar surface, a Coke can, and a potato) were correctly labeled planar, cylindrical, and
elliptic. Furthermore, the radius of the cylindrical surface (the Coke can) was also suc-

cessfully estimated using the principle curvature measures.

A second direction in surface shape classiﬁcation is through the 2-D stripe net-
works, or 2-D stripe textures. Since the stripe texture in the 2-D image is a distorted ver-
sion of the grid on the projector slide, and the way the grid is distorted is a function of the
surface shapes, we may recover the surface shapes, to some extent, by just looking at the
2-D stripe textures. Planar surfaces can be identiﬁed by the straight stripes in the image;
convex and concave surfaces can be determined by the way the stripes are curving in the
2-D image. Texture changes in certain directions (X-stripe direction, Y-stripe direction,
and diagonal directions) are also useful in determining the surface variance along these
directions, which may be used in the fusion of striping and intensity. In an experiment
stated in Section 4.2.4, 25 of 27 object surfaces were correctly classiﬁed as planar, con-
vex, concave, and irregular, using only 2-D information from the striped networks. Also
seven spherical or cylindrical surfaces were distinguished using texture gradient meas-
ures. It is convincing that the 2—D stripe patterns make available a rich piece of informa-

tion for surface interpretation that may work alone without support of 3D data.

6.1.3 Fusion of intensity and stripe image data

A fusion scheme that combines information obtained from light striping and inten-
sity was introduced in Chapter 5. As is often the case, the intensity image provides edge

information that is important for edge—based segmentation, which some psychologists

156

claim [Biederman 87] is the primary information source for human perception. As the
dual of the striped image, an intensity image of exactly the same scene can be easily
obtained by simply switching off the projector and turning on a light at the position of the
camera. This intensity image offers gradient contours that signiﬁcantly complement what
is in the striped image. But edge detection is itself very noise sensitive and it is usually
very hard to extract perfect edges. On the contrary, the striped image offers good surface
information and is much more noise insensitive. To overcome the weaknesses of each of
the two types of images and facilitate their strengths, fusion of the two was investigated
for achieving better segmentation and constructing surface-edge representations. A set of
rules were developed that make inferences about striped surface patches and related edge
segments and assign labels (extremum, blade, fold, shadow, mark) to the edges obtained
from the intensity gradient. In the experiments reported in Section 5.5.8, a set of PRO-
LOG predicates that implements the rules was applied to ﬁve images of multiple object
scenes. Out of 101 edges only one was mislabeled. The surface labels assigned by the
surface analysis in Chapter 4 and the edge labels assigned by the fusion rules in Chapter
5 are useful for object model indexing, and some are useful in matching to determine
object pose and recognition.

From the fusion of the light striped image and gray-tone image, we reach more cer-
tain and reliable object descriptions. There are some other information sources that may
be useful, such as a second striped image taken by projecting a grid from a different

angle (camera not moved), that may put stripes on the areas that were P-shadows before.

6.2. Conclusions

This thesis has investigated the usefulness of structured lighting in machine vision
for 3-D scene representation and ultimately for object recognition. In a controlled
environment such as an industrial robot workspace, a single grid-illuminated image can

be used to sense the 3-D surface points in a rather efﬁcient way as long as the objects are

157

relatively smooth and not too small to be covered by multiple stripes. Since our sensing
system is a "snapshot" system rather than a "scanning" system, it has potential speed and
cost advantages. The constraints employed in developing the surface solutions are from
the real world and are very powerful in solving the stripe identiﬁcation problem.
Although the resulting surface solutions may not be unique in general, the degree of
ambiguity is often small in practice and each of the possible locations in the 3-space
serves equally well in shape analysis. For object grasping, the ambiguity can be resolved

via tactile sensing.

Not only is the structured lighting a tool for 3-D sensing, but it also provides a
framework in which surface shape can be inferred and the scene representation can be
established. The projected grid is easy to generate. The image processing task is
simpliﬁed? because only one striped image is required and the grid is a uniform pattern.
The 2-D stripe networks clearly reveal the surface geometry through their texture varia-
tion. A gray scale image of the same scene is obtained by simply turning off the projec-
tor, and hence it can be exactly registered with the striped image. The two images pro-
vide information where both edge-based techniques and region-based techniques can be
applied.

The advantages of using structured light in 3-D scene analysis are balanced by some
restrictions in implementation. First, it is not trivial to process the striped image. To
extract the bright stripes, we must assume little interference from other factors, such as
no strong reﬂection from the object surface, otherwise the strong reﬂection surface spots
might be treated as part of the stripes. Second, accidental alignment of stripes can not
always be avoided; it occurs infrequently but not rarely, especially when stripe spacing is
small for obtaining denser data. Theoretically, this will not be a fatal problem since when

no surface solution can be found (this is often the case when two or more surface patches

 

T but is still not easy

158

are treated as one single surface patch due to accidental stripe connection) we can always
break the network into pieces and apply the algorithms to each piece. But this will con-
siderably complicate the process because deciding how to break the large network into
small ones is itself not trivial. The third restriction lies on the ambiguity in the location of
3—D surface solutions. However, this restriction is generally not a severe problem since
the degree of ambiguity is usually small and can be further reduced by additional
knowledge which is often available in practice. Also, as far as 3-D surface shape is con-

cerned, any of the multiple solutions works.

6.3. Future work

The research presented in this thesis can be carried forward in several ways: the
theoretical understanding of the structured lighting technique can be improved, the algo-
rithms themselves can be improved, the fusion scheme can be integrated into an expert
system for generating symbolic representation, and the representation should be used for
modeling and matching. Results of this thesis provide a method which could provide the

basis for development of a practical 3—D sensor system for bin-picking.

The theory of this research lies in the ﬁeld of geometry (analytical and differential),
and classical image processing as well. In 2-D processing, we need a general approach to
extract the stripes, particularly for those stripes that are very close to each other (when
the line of sight of the camera is almost parallel to the surface). The geometric and topo-
logical constraints discussed in Chapter 3 and the inference rules for fusion in Chapter 5
are for ideal mathematical cases. Violations to these rules may occur when they are
applied to real images, although they will occur infrequently. Hence, uncertainty might
be taken into consideration, or feedback from other processes needs to be integrated into

this step.

The algorithms were developed with no special attention to efﬁciency. For exam-

ple, ﬁles were created to hold intermediate results that were read as the input by the next

159

algorithm in the processing sequence. Sometimes routine work was done by hand rather
than by programs. This scheme may be improved using pipeline-like processing to save
disk U0. The algorithm 5 in Chapter 3, which projects striped regions back to the slide
plane and ﬁnds overlap, is combinatorially complex. This complexity may be reduced by
applying some rules such as "overlapping is checked only for adjacent regions" and the
like. The B-spline ﬁtting algorithm needs to be supplemented by some other surface

ﬁtting method that can handle smaller surface patches or surface borders.

Computer algorithms for the fusion of intensity and stripes have not yet been com-
pletely implemented. The algorithm for that may be part of an expert system, but the
detailed steps (for example, to segment contours, to relate striped regions to contour seg-

ments etc.) need to be implemented before the symbolic processing can be carried out.

Finally, for the ultimate goal of object recognition, a model matching step is
needed. How structured lighting techniques can be employed in generating models
automatically and how the matching would be performed based on the surface-edge
representation is the topic of current research. Research on this topic has been carried on
for a few years in the PRIP lab at Michigan State University [8. W. Chen and Stockman
87]. That work has to be investigated with what has been discussed in this thesis to com-

plete the basis for a 3-D machine vision system.

APPENDIX

APPENDIX

1. Inference rules coded in PROLOG

13.

4a.

4a.

4b.

4b.

2a.

2b.

6a.

6b.
8b.

borders(E,R)
borders(E,R)
planar_region(R)
mark(E)

pshadow(E)
shadow_making(E)
occludes_at(R] ,R2,E)
occludes_at(Rl,R2,E1)
occludes_at(Rl,R2,El)

occluded_at(R,E)
occluded_at(R,E)
occluded(R)
stage2(E)
stage3(E)

bladel (E)

bladel (E)

bladel (E)

blade1(E)
blade1(E)

lower_borders(E,R).

upper_borders(E,R).

planar(R);ground_plane(R).
interior(E,R),stripes_smoothly_cross(R,E).
shadows(E1,E).

shadows(E,E1).
ground_plane(R2),two_side_striped(E,R1,R2).
borders(El,R1),t _junction(E2,E1),borders(E2,R2).

t Junction(EZ,E1),borders(E2,R2),
two_side_striped(E1,R1,R2).

occludes_at(R 1 ,R,E).

one_side_striped(E,R),borders(E l ,R),tJ unction(E1 ,E).
occluded_at(R,E).

not (mark(E)),not (pshadow(E)).
region(R),stage2(E),not (occluded_at(R,E)).
two_side_striped(E,Rl,R2),staggers(R1,R2),
occludes_at(Rl,R2,E),(planar(R1);concave(Rl)).
two_side_striped(E,Rl,R2),
staggers(R1,R2),concave(R1),region(R1),region(R2).
planar(R),lower_borders(E,R),

not (arrow(E1,E,E2),not (borders(El,R),borders(E2,R))).
planar(R),upper_borders(E,R),one_side_striped(E,R).
borders(E,R),convex(R),convex(E),

normal_constant_to_edge(E,R).

160

10a. blade1(E)

10c.

6a.
6a.

9a.

9a.

10b.

2c

8a.

9b.

12.

9b.

bladel (E)

blade(E)
fold1(E)

fold 1 (E)

fold1(E)

fold 1 (E)

foldl (E)

fold l (E)

fold1(E)

fold(E)

extremuml (E)

extremuml (E)

extremuml (E)

extremuml (E)

extremuml (E)

extremum(E)

saddle(R)
cylindrical(R)

161

upper_borders(E,R),concave(R).
concave (R),lower_borders(E,R),
not (arrow(E1,E,E2),not (borders(El,R),borders(E2,R))).
blade1(E),stage3(E).

two_side_striped(E,Rl ,R2),stripe s_continuous(E),
normal_chan ge_2(R1 ,R2,E).

interior(E,R),ed ge(E) ,region (R),normal_chan ge_1 (R,E).
lower_borders(E,R),l_connects(E,E1),not borders(El ,R).
planar(R),lower_borders(E,R),arrow(E1,E,E2),

not (borders(El ,R),borders(E2,R)).
lower_borders(E,R),convex(R),l_connects(E,E1 ),
not (borders(EI,R)).
lower_borders(E,R),convex(R),arrow(E1,E,E2),
(not borders(E1,R);not borders(E2,R)).
concave(R),lower_borders(E,R),arrow(E1 ,E,E2),
not (borders(El ,R),borders(E2,R)).

fold 1 (E),stage3(E),not occluded_at(R,E).
two_side_striped(E,Rl ,R2),edge(E),region(R1),

re gion(R2) ,occludes_at(R 1 ,R2,E),sta g gers(R1 ,R2),
convex(Rl),not nonnal_constant_to_edge(E,R1).
convex(R),straight(E),intensity(E),
borders(E,R),region(R),edge(E).
convex(R),borders(E,R),convex(E),intensity(E),
normal_changes_to_edge(R,E),region(R),edge(E).
borders(E,R),convex(R),concave(E),
intensity(E)region(R),edge(E).
upper_borders(E,R),convex(R),virtual(E),
region(R),edge(E),not intensity(E).

extremuml (E),stage3(E).
borders(E,R),convex(R),concave(E),intensity(E).
convex(R),straight(E),intensity(E),borders(E,R).

2. Representation primitives of scene 1

edge(el).
edge(e2).
edge(e3).
edge(e4).
edge(e5).
edge(e6).
edge(e7).
edge(e8).
edge(e9).

edge(elO).
edge(el 1).
edge(e12).
edge(e13).
edge(e14).
edge(eIS).
edge(e16).
edge(el7).
edge(e18).
edge(e19).
edge(e20).
edge(e21).
edge(e22).
edge(e23).
edge(e24).
edge(e25).
edge(e26).
edge(e27).

normal_constant_to_edge(r2,e7).

normal_constant_to_edge(r2,e8).

162

convex(el).
convex(e2).
convex(e14).
convex(e16).
convex(el 8).
convex(e19).
convex(r2).
convex(r4).
concave(r1).

concave(r3).

straight(e6).
straight(e9).
straight(elO).

intensity(el).
intensity(e2).
intensity(e3).
intensity(e4).
intensity(e6).
intensity(e7).
intensity(e8).
intensity(elO).
intensity(e14).
intensity(elS).
intensity(e18).
intensity(e19).
intensity(e21).
intensity(e22).
intensity(e23).

intensity(e24).
intensity(e25).

virtual(eS).

virtual(e9).

virtual(el l).
virtual(e12).
virtual(e13).
virtual(e17).
virtual(e20).
virtual(e26).
virtual(e27).

shadows(e4,e5).
shadows(e7,e12).
shadows(e8,e13).
shadows(e9,el 1).
shadows(e15,e17).

l_connect s(e6,e7).
l_connects(e7,e6).
l_connects(e6,e8).

l_connects(e8,e6).

l_connects(e7,e10).
l_connects(e10,e7).
l_connects(e8,e10).

l_connects(e10,e8).

region(rl).
region(r2).
region(r3).
region(r4).

163

re gion(r5).

ground_plane(r5).

normal_changes_to_edge(r2,e6).

adjacent(rl,r2).
adjacent(r2,r1).
adjacent(rl ,r5).
adjacent(r5,r1).
adjacent(r2,r4).
adjacent(r4,r2).
adjacent(r2,r5).
adjacent(r5,r2).
adjacent(r3,r5).
adjacent(r5,r3).
adjacent(r4,r5).
adjacent(r5,r4).

staggers(rl,r2).
staggers(r2,r1).
staggers(rl,rS).
staggers(r5,r1).
staggers(r2,r4).
staggers(r4,r2).
staggers(r2,r5).
staggers(r5,r2).
staggers(r3,r5).
staggers(r5,r3).
staggers(r4,r5).
staggers(r5,r4).

one_side_st1iped(e4,r1).

one_side_st1iped(e20,r1).
one_side_st1iped(e9,r2).

one_side_suiped(e16,r3).
one_side_st1iped(el7,r4).
one_side_st1iped(e5,r5).

one_side_st1iped(el l,r5).
one_side_st1iped(e12,r5).
one_side_st1iped(el3,r5).
one_side_st1iped(e26,r5).
one_side_st1iped(e27,r5).

two_side_striped(e1,r1,r5).
two_side_stri ped(e2,r1 ,r5).
two_side_striped(e3,rl ,r5).
two_side_striped(e4,r1,r5).
two_side_striped(e6,r4,r2).
two_side_striped(e6,r5,r2).
two_side_striped(e7,r5,r2).
two_side_striped(e8,r5,r2).
two_side_striped(el4,r3,r5).
two_side_striped(el8,r4,r5).
two_side_striped(e19,r4,r5).

upper_borders(el ,rl ).
upper_borders(e2,rl ).
upper_borders(e6,r2).
upper_borders(e7,r2).
upper_borders(e8,r2).
upper_borders(e14,r3).
upper_borders(e17,r4).
upper_borders(el 8,r4).
upper_borders(el9,r4).

164

lower_borders(e3,rl).
lower_borders(e4,rl).
lower_borders(e6,r4).
lower_borders(e9,r2).
lower_borders (e 16,r3).

t Junction(el,e22).
t _junction(e2,e25).
t _junction(e4,e7).

t Juncﬁon(e3,e6).

t _junction(e18,e6).
t _junction(e18,e15).
t _junction(el9,e6).
t_junction(e21,e14).
t Junction(e24,e15).

smoothly_connects(e1,e4).

smoothly_connects(e2,e3).

165

3. scene 1 — a tea pot, a plate, and a roll of paper

 

Labeling results:

set of blades

set of extremum
set of P-shadows
set of shadowing
planar surfaces
convex surfaces
concave surfaces

occluded regions

[e1,e14,el6,e2,e3,e4]
[e18,e19,e6]

[e1 1,e12,e13,e17,e5]
[e15,e4,e7,e8,e9]

[r5]

[r2.r4]

[r1,r3]

[r1,r4,r5]

166

4. scene 2 — a book, an orange, and a metronome

 

. I " h 1’ '.. l/ ‘0 I C L
. ' . . — ."9 . I
W I v /'I‘ \~“. ’

///////////’// _ ‘

I). 7 €24

 
 

Labeling results:
set of blades : [e13,e19,e20,e6,e8]
set of folds : [e14,e18,e22,e7,e9]
set of extremum : [e1]
set of P-shadows : [e11,e12,e24,e25,e26,e4,e5]
set of shadowing : [e14,e20,e22,e3,e7,e9]
planar surfaces : [r2,r3,r4,r5]
convex surfaces : [r1]

occluded regions : [r2,r5]

167

5. scene 3 — a shoe, a bowl, and box

 

Labeling results:
set of blades : [e1 1,e12,e13,e14,e15,e16,e17,e2]
set of folds : [e18,e19,e6]
set of extremum : [e1,e3,e4]
set of P-shadows : [e10,e20,e21,e22,e9]
set of shadowing : [e13,e14,e17,e6,e8]
set of marks : [e5]
planar surfaces : [r2,r4]
convex surfaces : [rl__I,rl_3]
concave surfaces : [rl_2,r3]

occluded regions : [r2,r4]

168

6. scene 4 — a can, a cup, and cup with cover

Labeling results:

set of blades

set of folds

set of extremum
set of P-shadows
set of shadowing
set of marks
planar surfaces

convex surfaces

 

 

 

 

[e1,e10,e26]

[e1 l,e2,e25]

[e27,e9]
[e15,e16,e17,el8,e31,e8]
[e1 1,el3,e2,e29]

[e6,e7]

[r1,r4,r5]

[r2,r3]

 

169

7. scene 5 — a vacuum cleaner, a can, and a tomato

 

Labeling results:
set of blades : [e12,el3,el9,e6]
set of folds : [e14,e18,e23,e7]
set of extremum : [e1,e20,e21,e22,e5]
set of P-shadows : [e10,e16,e25,e26,e27,e4]
set of shadowing : [e14,e19,e2,e23,e6,e8]
planar surfaces : [r3,r4,r5]
convex surfaces : [rl,r2,r4_1]

occluded regions : [r1,r4,r5]

BIBLIOGRAPHY

BIBLIOGRAPHY

Agin, G. J. and Binford, T. O. "Computer Description of Curved Objects", IEEE Trans.
on Computer, Vol. C—25, April 1976, 439-449.

Agin, G. J. and Duda, R. O. "SRI Vision Research for Advanced Automation", in
Proceedings of the 2nd USA-Japan Computer Conference, Tokyo, August, 1975,
113-117.

Bajcsy, R. "Three-Dimensional Scene Analysis", Proc. 5th ICPR, Dec 1-4, 1980, Miami
Beach, 1064-1074.

Bajcsy, R. and Lieberman, L. "Texture Gradient as a Depth Cue", Computer Graphics
and Image Processing, 5, 1976, 52-67.
Ballard, D. H. and Brown, C. M. "Computer Vision", Prentice-Hall, Inc. 1982.

Baumgart, B. G. "Winged Edge Polyhedron Representation", Tech. Rep, AIM-179,
Comp. Sci. Dept., Stanford Univ., Stanford, Calif. 1972.

Besl, P. and Jain, R. "Three-Dimensional Object Recognition", Computer Surveys, Vol.

17, No. 1, March 1985, 75-145.

Besl, P. and Jain, R. "Range Image Understanding", Proceedings of IEEE Conference on
Computer Vision and Pattern Recognition, June 19-23, 1985 San Francisco, CA,

430-449.

Besl, P. and Jain, R. "Invariant Surface Characteristics for 3-D Object Recognition in
Range Images", Computer Vision, Graphics, and Image Processing, 33, 1 (January),
1986’ 33-80.

170

171

Bhanu, B. and Ho, C. C. "CAD-Based 3D Object Representation for Robot
Vision", IEEE Computer, Vol. 20, No. 8, August 1987, 19-35.

Biederman, I. "Aspects and Extensions of a Theory of Human Image Understand-
ing", Computational Processes in Human Vision: An Interdisciplinary Per-

spective, (Z. Pylyshyn Ed.), New York: Ablex, 1987.

Birk, J. R., Kelley, R. B., and Martins, H. "An Orienting Robot for Feeding
Workpieces Stored in Bins", IEEE Trans. on Syst. Man Cybern., 11, 2
(Feb.), 1981, 151-160.

Bolles, R. C. "Overview of Applications of Image Understanding to Industrial
Automation", Proc. of the SPIE Conference on Techniques and Applications
of Image Understanding, Washington. DC, April, 1981, Vol. 281. SPIE,
Bellingham, Wash., 134-140.

Bolles, R. C. and Fischler, M. A., "A RANSAC-Based Approach to Model Fitting
and Its Application to Finding Cylinders in Range Data", Proc. of the 7th
IJCAI, Vancouver, August, 1981, 637-643.

Brady, M. "Computational Approaches to Image Understanding", Computer Sur-

veys, Vol. 14, No. 1, March 1982, 3-71.

Brooks, R. A., and Binford, T. O., "Geometric Modeling in Vision for Manufac-
turing", Proc. of SPIE Conf. on Robot Vision, Washington. D. C. August,
1981, Vol. 281, SPIE, 141-159.

Chamiack, E. and McDermott, D. "Introduction to Artiﬁcial Intelligence",
Addison-Wesley, 1985.

Chen, R. T. and Dyer, C. R. "Model-Based Recognition in Robot Vision", ACM
Computing Surveys, Vol 18, No 1, March 1986, 66-108.

172

Chen, S. W., "Computing a Pose Hypothesis from a Samll Set of 3-D Object
Features", Computer Science Tech. Rep., 1986, Michigan State Univ.

Clowes, M. B. "On Seeing Things", Artif. Intell. 2, 1971, pp79-112.

Faig, W., "Calibration of Close-Range Photogrammetry Systems: Mathematical
Formulation", Photogrammetric Engineering and Remote Sensing, Vol. 41,

No. 12, 1975, 1479-1486.

Faugeras, O. D. and Toscani, G. "The Calibration Problem for Stereo", Proc. of

IEEE Conf. on C VPR, June 22-26, 1986, Miami Beach, pp15-20.

Fukunaga, K, "Introduction to Statistical Pattern Recognition", Academic Press,

New York, 1972.

Gibson, J. J. "The Perception of the Visual World", Boston, Houghton Miﬁ‘lin

Company, 1950.

Gleason, G. J ., and Agin, G. J ., "A Modular System for Sensor-Controlled Mani-
pulation and Inspection, Proc. of the 9th International Symposium on Indus-

trial Robots, Washington, D.C., March, 1979, 57-70.

Goad, C., "Special-Purpose Automatic Programming for 3-D Model-Based
Vision", Proc. of the Image Understanding Workshop, Arlington, VA. June,
1983, 94—104.

Grimson, W., and Lozano-Perez, T., "Model-based Recognition and Localization
from Tactile Data", Proceedings of International Conference on Robotics,

Atlanta, GA, March 13-4, 1984.

Hall, E. L., Tio, J. B. K., McPherson, C. A. and Sadjadi, F. A. "Curved Surface
Measurement and Recognition for Robot Vision", IEEE Workshop on Indus-
trial Application of Machine Vision, May 1-3, 1982, 187-199.

173

Hambrick, L. N ., Loew, M. H. and Corroll, R. L. "The Entry-Exit Method of Sha-
dow Boundary Segmentation", IEEE Trans. on PAMI, Vol. PAMI-9, No. 5,
Sept. 1987, 597-607.

Henderson, T. C., "Efﬁcient Segmentation Method for Range Data", Proc. of the
SPIE Conf. on Robot Vision, Arlington, VA., May 1982, 46-47.

Henderson, T. C., and Bhanu, B., "Three-Point Seed Method for the Extraction of
Planar Faces from Range Data", Conf. Record of the Workshop on Industrial
Applications of Machine Vision, Research Triangle Park, NC, May, 1982,
181-186.

Haralick, R. M., "Statistical and Structural Approaches to Texture", Proc IEEE
67 (1979), 786-804.

Hoffman, R., and Jain, A. K., "Segmentation and Classiﬁcation of Range
Images", IEEE Trans. on PAMI, 9, 5 (Sept.), 1987, 608-620.

Horn, B. K. "Understanding Image Intensities" Artificial Intelligence, 8 (1977),

201-231.

Horn, B. K., "Obtaining Shape from Shading Information", in The Psychology of
Computer Vision (P. H. Winston ed.), McGraw-Hill, 1985.

Hu, G., Jain, AK. and Stockman, G. "Shape from Light Stripe Texture", Proc. of
IEEE Conf. on CVPR, Miami Beach, June 22-26, 1986, 412—414.

Hu, G. and Stockman, G. "3-D Surface Solution Using Structured Light and Con-
straint Propagation", to appear on IEEE PAMI, 1988.

Huffman, D. A. "Impossible Objects as Nonsense Sentences", in Machine Intelli-
gence, vol. 6, Meltzer and Michie eds., Edinburgh Univ. Press, Edinburgh,
UK, 1971.

174

Ito, M. and Ishii, A. "Range and Shape Measurement Using Three-View Stereo
Analysis", Proc. of IEEE Conf. on CVPR, June 22-26, 1986, Miami Beach,
pp14-l9.

Itmer, D. J. and Jain, A. K. "3-D Surface Discrimination from Local Curvature
Measures", Proc. of IEEE Conf. on Computer Vision and Pattern Recogni-

tion, June 19-23, 1985, San Francisco, CA. 119-123.

Jarvis, R. A. "A perspective on Range Finding Techniques for Computer Vision",

IEEE PAMIi, Vol. 5, No. 2, March 1983. Pp122-139.

Kanade, T. "A Theory of Origami World", Artiﬁcial Intelligence 13(1), 1980,
279-311.

Kanade, T. "Recovery of the Three-Dimensional Shape of an Object from a Sin-
gle View", Artiﬁcial Intelligence, 17 (1981), 409-460.

Kender, J. "Shape from Texture", TR CMU-C5-8I-102, Dept. of Computer Sci-
ence, Carnegie-Mellon Univ., Pittsburgh, PA., 1980.

Koenderink, J. J. "What Does The Occluding Contour Tell Us About Solid Shape
7", Perception, Vol. 13, 1984, 321-330.

Koenderink, J. J.,a nd vanDoorn, A. J., "The Internal Representation of Solid
Shape with Respect to Vision", Biol. Cybern., 32, 1979, 211-216.

Lieberman, L., "Model-Driven Vision for Industrial Automation", Advances in

Digital Image Processing, P. Stucki, Ed., Plenum, New York, 235-246.

Laffey, T. J., Haralick, R. M. and Watson, L. T. "Topographic Classiﬁcation of
Digital Image Intensity Surfaces", Proceedings of IEEE Workshop on Com-
puter Vision : Representation and Control, August 1982, 171-177.

Le Moigne, J. and Waxman, A. M. "Multi-Resolution Grid Patterns for Building
Range Maps", U of Maryland, TR.

175

Levine, M. D., O’Handley, D. A. and Yagi, G. M. "Computer Determination of
Depth Maps", CGIP Vol. 2, 1973, pp134-150.

Lipschutz, M. "Differential Geometry", McGraw-Hill, 1969.

Lowe, D. G. and Binford, T. O. "Thr Recovery of Three-Dimensional Structure
for 3-D Machine Vision”, IEEE Trans on PAM], Vol 7, No. 3 May 1985,
pp320-325.

Malik, J., "Interpreting Line Drawings of Curved Objects", International Journal

of Computer Vision, Vol. 1, No. l, 1987, 73-103.
Marr, D. "Vision", Freeman & Company, 1982.

Marr, D. and Nishahara, H. K. "Representation and Recognition of the Spatial
Organization of Three Dimensional Structure", Proc. R. Soc. London B,

200, 1978. PP269-294.

Marr, D. and Poggio, T. "Cooperative Computation of Stereo Disparity", Science,

194, 1976, PP283-287.

Medioni, G. and Nevatia, R. "Description of 3-D Surfaces Using Curvature Pro-
perties", Proceedings of DARPA Image Understanding Workshop, October
3-4 1984, New Orleans, LA. 291-299.

Milgram, D. L., and Bjorklund, C. M., "Range Image Processing: Planar Surface
Extraction", Proc. of the 5th Intl. Conf. on Pattern Recog., Miami Beach,

FL. Dec. 1980, 912-919.

Millman, R. s. and Parker, G. D. "Elements of Differential Geometry", Prentice-
Hall, Inc. 1977.

Mishkin, M., Ungerleider, L. G. and Macko, K. A. "Object Vision and Spatial
Vision : Two Cortical Pathways", Trends in NeuroSciences, 1983, 6, 414-
417.

176

Nevatia, R. and Binford, T. 0. "Description and Recognition of Curved Objects",
Artiﬁcial Intelligence 8 (1977), 77-98.

Oshirna, M. and Shirai, Y., "Object Recognition Using Three-Dimensional Infor-
mation", IEEE Trans. on PAMI, 5, 4 (July), 1983, 353-361.

Perkins, W. A., "A Model-Based Vision System for Industrial Parts", IEEE Trans.
on Computers, 27, 2 (Feb.), 1978, 126-143.

Persoon, E., and Fu, K. 8., "Shape Discrimination Using Fourier Descriptors",
IEEE Trans. on Syst. Man and Cybern., 7, 3 (March), 1977, 170-179.

Pietikainen, M., and Harwo D. "Depth from Three Camera Stereo", Proc. of IEEE
Conf. on CVPR, June 22-26, 1986, Miami Beach, pp2-8.

Ponce, J. and Brady, M. "Toward a Surface Primal Sketch", Proceedings of IEEE
International Conference on Robotics and Automation, St. Louis, Mo.,
March 22-28 1985, 420-425.

Popplestone, R. J ., Brown, C. M., Ambler, A. P., and Crawford, G. F., "Forming
Models of Planar-and-Cylinder Faceted Bodies from Light Stripes", Proc. of
the 4th IJCAI, Tbilisi, USSR, Sept. 1975, 664-668.

Posdamer, J. L. and Altschuler, M. D. "Surface Measurement by Space-encoded
Projected Beam Systems", CGIP, 18, 1982, pp1-17.

Rocker, F. and Kiessling, A. "Methods for Analyzing Three Dimensional
Scenes", Proc. of 4th IJCAI, 1975, 669-672.

Rosenfeld, A. and Kak, A. C. "Digital Picture Processing" 2nd Ed., Academic
Press, 1982.

Sekuler, R. and Blake, R. "Perception", Alfred A. Knopf, 1nc.,

Shirai, Y. "Recognition of Polyhedrons with a Range Finder", Artiﬁcial Intelli-
gence, Vol 4, No 3, Oct. 1972, pp243-250. 1985.

177

Shirai, Y., "Recognition of Real-World Objects Using Edge Cues", Computer
Vision Systems, A. R. Hanson and E. M. Riseman, Eds, Academic Press,

New York, 1978, 353-362.

Shneier, M., "Models and Strategies for Matching in Industrial Vision", Tech.
Rep. TR-1073, Comp. Sci. Dept., Univ. of Maryland, College Park, July,
1981.

Shrikhande, N. and Stockman, G. "Surface Orientation from Striped Lighting",
Proc. of 5th Scandinavian Conf. on Image Analysis, June 1987, Stockholm,
Sweden.

Stevens, K. A. "The Visual Interpretation of Surface Contours", Artiﬁcial Intelli-

gence 17 (1981) 47-73.

Stockman, G. C., Kopstein, K., and Benett, 8., "Matching Images to Models for
Registration and Object Detection via Clustering", IEEE Trans. on PAM], 4,

3 (May). 1982, 229-241.

Stockman, G. C., and Chen,. S. W., "Detecting the Pose of Rigid Objects: A
Comparison of Paradigms", Proc. SPIE Conf. on Electro-Optic Imaging Sys-
tems and Devices, Los Angeles, CA. 11-16 Jan., 1987.

Stockman, G. C., Chen, S. W., Hu, G. and Shrikhande, N. "Recognition of Rigid
Objects Using Structured Light" Proc of IEEE Conf. on Systems, Men and
Cybernetics, Oct. 1987, Washington D. C.

Stockman, G. C., "Object Recognition and Localization via Pose Clustering",

Computer Vision, Graphics, and Image Processing, 40, 1987, 361-387.

Strand, T.C. "Optical Three-Dimensional Sensing for Machine Vision", Opt.
Eng, 24(1), 33, 1985.

178

Tsai, R. Y. "Multiframe Image Point Matching and 3-D Surface Recognition",
IEEE Trans on PAM], Vol 5, No. 2, March 1983, pp159-173.

Tsai, R. Y. "An Efﬁcient and Accurate Camera Calibration Technique for 3-D
Machine Vision", Proc. of IEEE Conf. on CVPR, June 22-26, 1986, Miami
Beach, pp364-374.

Wallach, H. and Floor, L., "The Use of Size Matching to Demonstrate the Effec-
tiveness of Accommodation and Convergence as Cues for Distance", Per-

ception and Psychophysics, 10, 1971, 423-428.

Waltz, D. I. "Generating Semantic Description from Drawings of Polyhedral
Scenes with Shadows", in The Psychology of Computer Vision (Winston, P.,
Ed.), McGraw-Hill, NY. 1975.

Wang, Y.F., Mitiche, A. and Aggarwal, J.K. "Computation of Surface Orientation
and Structure of Objects Using Grid Coding", IEEE Trans. on PAM], Vol.
PAMl-9, No. 1 1987, pp129-137.

Will, P. M. and Pennington, K. S. "Grid Coding : A Preprocessing Technique for
Robot and Machine Vision", 2nd IJCAI, Imperial College, London, 1971,
pp66-70.

Witkin, A. P., "Recovering Surface Shape and Orientation from Texture", Artif.
Intell., 17, 1(Aug.), 1981, 17-47.

Woodham, R. J., "Photometric Stereo: A Reﬂectance Map Technique for Deter-
mining Surface Orientation from Image Intensity", Proc. of the SPIE Conf.
on Image Understanding Systems and Industrial Applications, San Diego,
Calif., August, 1978, Vol. 155, SPIE, 136-143.

Yakimovsky, Y. and Cunningham, R., "A System for Extracting Three-
Dimensional Measurements from a Stereo Pair of TV Cameras, Computer

Graphics and Image Processing, 7, 1978, 195-210.

179
Yang, H. S., Boyer, K. L. and Kak, A. C. "Range Data Extraction and Interpreta-
tion by Structured Light", TR 1984, pp199-205.

Yang, H. S. and Kak, A. C. "Determinating of the Identity, Position and Orienta-
tion of the Topmost Object in a Pile", CVGIP, 36, 1986, 229-255.

 

nrcuran STATE UNIV. LIBRARIES
illll"Ill?”11111111”11111111111111111
31293108120803