DEVELOPMENT OF LARGE SCALE
STRUCTURED LIGHT BASED
MEASUREMENT SYSTEMS
By
Chi Zhang

A DISSERTATION
Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Electrical Engineering
2012

ABSTRACT

DEVELOPMENT OF LARGE SCALE
STRUCTURED LIGHT BASED
MEASUREMENT SYSTEMS
By
Chi Zhang
The development of large scale structured light based dimensional measurement systems
is introduced in this study. Based on diﬀerent applications, Two diﬀerent structured light
systems are developed: 3D inspection system which emphasized on accuracy and fast environmental reconstruction which emphasized on measurement speed.
Quality inspection is a process to evaluate the quality of a manufactured work piece’s 3D
shape. Compared with traditional Coordinate Measurement Machine (CMM), structured
light based optic measurement has the merits of fast speed, high inspection sample rate,
and overall simplicity of operation. The measurement result is further compared with the
Computer Aided Design (CAD) model to to evaluate its quality. There are basically three
main issues, when the system scale is enlarged.
Calibration of such a large system with long standoﬀ distance, large Field Of View
(FOV), deep Depth Of View (DOV), and multi-sensor is a challenge task. In order to
maintain high accuracy, an innovative 3D sensor model with less calibration parameters was
developed. Instead of employing the incident light in projector frame, 3D point recovery
was achieved within the camera frame via plane induced parallax information, in which
projector’s intrinsic and orientation are avoided in the 3D model. Second, structured light
systems is usually weak against surface optical properties. Material exhibiting diﬀerent color
textures, reﬂection ratios, and especially a mixture between specula reﬂection and diﬀuse
reﬂection, fails the optic sensor to correctly acquire incident light information. Traditional
structure light method is only valid for parts with diﬀuse reﬂection property. A robust surface

decoding method for industry application was developed. Monochromatic light was utilized
against diﬀerent object color textures. The illumination of the projector was automatically
adjusted based on the test material. Furthermore, an extrapolation model to solve the
internal reﬂection problem and a sub-pixel interpolation model to increase measurement
accuracy were proposed too. At last, registration among sensor frame to the common was
achieved based on a modiﬁed Iterative Closest Point (ICP) method. The ﬁnal point cloud
joined by each individual point cloud could represent the 3D shape of a large work piece.
Object-oriented on-line calibration was developed based on ICP and Geometry Dimension
and Tolerance (GDT) information to register the ﬁnal point cloud and the CAD model.
Environmental reconstruction for navigation is a process to quickly acquire surrounding
information around the vision sensor and presents a 3D fusion display for the operator.
Traditional navigation system only employed a camera to get 2D information. Structured
light system based on infrared light could quickly rebuild objects depthes and fuse it into
the displayed images without inﬂuence the operator’s normal vision. The sensed points on
the objects are highlighted by the color-codes: from red to blue, which are used to indicate
the object distances. In order to measure the moving objects, an one-shot surface coding
algorithm was developed. Only one projection image is needed to acquire the 3D information.
The codeword is determined in a single pattern because the code of each primitive depends
on the values of the primitive and its neighbors.
In order to reconstruct the environment without blind areas, an omni-direction panoramic
structured light sensing system was developed to increase FOV. Hyperbolic mirrors are put
in front of a projector and a camera. 3D reconstruction model was build up associated with
the hyperbolic mirror. At last, a 360 image fused with depth information is achieved by the
designed system.
In summary, the study developed large scale structured light systems based on two applications: accurate inspection for quality control and fast environmental reconstruction for
robot navigation.

Copyright by
Chi Zhang
2012

TABLE OF CONTENTS
LIST OF TABLES

viii

LIST OF FIGURES

ix

1 Introduction
1.1 Applications and Motivations . . . . . . . . . . . . . . . . . . . . .
1.2 Background and Literature Review . . . . . . . . . . . . . . . . . .
1.2.1 Implementation of a Measurement System . . . . . . . . . .
1.2.2 Review of 3D Inspection Sensors Based on Optical Methods
1.2.3 Sensor Calibration . . . . . . . . . . . . . . . . . . . . . . .
1.2.4 Surface Coding via Structured Light . . . . . . . . . . . . .
1.2.5 3D Data Registration . . . . . . . . . . . . . . . . . . . . . .
1.2.6 Omni-directional 3D Reconstruction Vision Systems . . . . .
1.3 Diﬃculties and Challenges . . . . . . . . . . . . . . . . . . . . . . .
1.4 Objectives of the Work . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Organization of the Study . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

1
1
3
3
6
8
13
17
18
23
25
26

.
.
.
.
.
.
.
.
.
.
.

28
29
29
31
32
35
36
36
38
40
41
45

2 Development of a 3D Sensor Based on Structured Light
2.1 Introduction of Imaging Devices . . . . . . . . . . . . . . . . .
2.1.1 A Pinhole Model of an Image Device . . . . . . . . . .
2.1.2 Homopraphy Matrix from Single View Geometry . . .
2.1.3 Epipolar Geometry and Fundamental Matrix . . . . . .
2.1.4 Plane Induced Parallax in an Imaging System . . . . .
2.2 Optimize the 3D Sensor Model via Parallax Information . . .
2.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . .
2.2.2 Recording the Reference Information with Homography
2.2.3 Correspondence between Measurements . . . . . . . . .
2.2.4 Working Principle of the New Sensor Model . . . . . .
2.3 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . .
3 Sensor calibration
3.1 Precision Analysis of the Given Sensing System .
3.1.1 Uncertainty Analysis of an Imaging Device
3.1.2 Simulation of the System Error Model . .
3.1.3 System Precision Test Results . . . . . . .
3.2 Sensor Calibration . . . . . . . . . . . . . . . . .
3.2.1 Calibrate Homography Matrix between the
the Reference Plane . . . . . . . . . . . . .

v

. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Camera
. . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Matrix .
. . . . .
. . . . .
. . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Image Plane and
. . . . . . . . . .

46
46
48
50
52
53
53

3.2.2 New Calibration Tasks of the Camera . . . . . . . . . . . . . . . . . .
3.2.3 The Choice of Non-singular Matrix H in New Calibration Tasks . . .
3.2.4 Calibration of Camera Translation Vector Oc and Baseline Distance d
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54
56
59
62

4 Surface Coding for a Structured Light System
4.1 Introduction of the Surface Coding Methods . . . . . . . . . . . . . . . . . .
4.2 Single-shot Coding Method . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Pattern Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.2 Primitive Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.3 Primitive Detection and Codeword Decode . . . . . . . . . . . . . . .
4.3 Multi-shot Projection Coding Method . . . . . . . . . . . . . . . . . . . . . .
4.3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Encoding and Decoding Strategy . . . . . . . . . . . . . . . . . . . . . . . .
4.4.1 Derivation of the Camera Photo Response Function for a Given Scene
4.4.2 Initial guess of the crosstalk function . . . . . . . . . . . . . . . . . .
4.4.3 Advanced Projection Pattern Generation . . . . . . . . . . . . . . . .
4.4.4 Interpolation Model for Diﬀused Reﬂection Region and Extrapolation
Model for Internal Reﬂection Region . . . . . . . . . . . . . . . . . .
4.5 Veriﬁcation of the Projection Mask Performance . . . . . . . . . . . . . . . .
4.5.1 the Veriﬁcation of the Illumination Adjustment via Projection Mask .
4.5.2 the Veriﬁcation of the Interpolation and Extrapolation models . . . .
4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64
64
66
66
67
69
72
72
73
77
79
80

5 Registrations between Diﬀerent Coordinates after 3D Reconstruction
5.1 Registration between diﬀerent 3D coordinates . . . . . . . . . . . . . . . . .
5.2 Oﬀ-line Registration Calibration from Sensor Frame to Common Frame . . .
5.2.1 Calculation of Rotation Matrix via Quaternion Algorithm . . . . . .
5.2.2 Iterative Closest Point (ICP) based Registration . . . . . . . . . . . .
5.2.3 System Calibration Procedure . . . . . . . . . . . . . . . . . . . . . .
5.3 Registration between the Final Point Cloud and CAD Model . . . . . . . . .
5.3.1 Octree Space labeling method and Multi-resolution Method based ICP
5.3.2 Feature Weighted Registration . . . . . . . . . . . . . . . . . . . . . .
5.3.3 The Object-oriented online Registration Model . . . . . . . . . . . . .
5.3.4 The Methodology in Current Task . . . . . . . . . . . . . . . . . . . .
5.3.5 Registration Process . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3.6 Error Map Generation . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Sensor Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99
99
100
101
103
107
111
112
116
118
122
123
124
125
126

3.3

vi

82
88
88
88
97

6 The
6.1
6.2
6.3
6.4

6.5

development of an Omni-direction 3D camera for mobile robot
The Working Principle of the Omni-directional 3D Sensor . . . . . . . . . . .
Calibration of the Omni-directional 3D Camera . . . . . . . . . . . . . . . .
Error Analysis for the Designed 3D camera . . . . . . . . . . . . . . . . . . .
One-shot Surface Coding for the Sensor . . . . . . . . . . . . . . . . . . . . .
6.4.1 Image warping in an omnidirectional camera . . . . . . . . . . . . . .
6.4.2 One-shot projection pattern design for the omnidirectional 3D camera
6.4.3 Epipolar constraints and the decoding method . . . . . . . . . . . . .
Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 Current Experimental Results and Data Analysis
7.1 The Rapid Inspection System (RIS) . . . . . . . . . . . . . . . . . .
7.1.1 The Conﬁguration of the System . . . . . . . . . . . . . . .
7.1.2 The Inspection Result via Single-shot Coding Algorithm . .
7.1.3 The Inspection Result via Multi-shot Coding Algorithm . . .
7.1.4 The Registration Result from Four Pairs of Sensor Frames .
7.1.5 Inspection for Diﬀerent Work Pieces . . . . . . . . . . . . .
7.2 The Navigation System for Single Direction . . . . . . . . . . . . .
7.2.1 The Prototype of the Navigation System for Single Direction
7.2.2 The Fusion Result for the System . . . . . . . . . . . . . . .
7.2.3 Integration the Single Direction Sensor with a Mobile Robot
7.3 The Omni-direction Navigation System . . . . . . . . . . . . . . . .
7.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

128
128
130
132
133
134
135
136
137
146
146
146
148
150
152
154
155
156
156
158
159
164

8 Conclusions
170
8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
BIBLIOGRAPHY

173

vii

LIST OF TABLES
3.1

The standard deviation results for all sensors . . . . . . . . . . . . . . . . . .

52

4.1

Pixels binarization without projection mask . . . . . . . . . . . . . . . . . .

94

4.2

Pixels binarization without projection mask . . . . . . . . . . . . . . . . . .

97

7.1

The number of the point clouds in diﬀerent experiments

viii

. . . . . . . . . . . 151

LIST OF FIGURES
2.1

The pinhole model of a camera. (For interpretation of the references to color
in this and all other ﬁgures, the reader is referred to the electronic version of
this dissertation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

2.2

(a) The homography matrix between the image plane and the world plane. .

32

2.3

The epipolar geometry between two image planes. . . . . . . . . . . . . . . .

33

2.4

The pencil of epipolar lines in each image plane on the epipole. . . . . . . . .

35

2.5

A plane W induced parallax in camera image plane. . . . . . . . . . . . . . .

36

2.6

Traditional 3D reconstruction model. . . . . . . . . . . . . . . . . . . . . . .

37

2.7

Recoding reference planar information. . . . . . . . . . . . . . . . . . . . . .

39

2.8

Correspondence between reference recording and work piece inspection. . . .

40

2.9

The optimized 3D model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

2.10 3D sensor model in the parallel conﬁguration. . . . . . . . . . . . . . . . . .

45

3.1

The diﬀerence between precision and accuracy. . . . . . . . . . . . . . . . . .

47

3.2

(a) High accuracy with low precision; (b) low accuracy with high precision. .

48

3.3

Camera error model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

3.4

Projector error model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

3.5

Error analysis results: height errors when input error equals 0.1, 0.2, 0.3, 0.5
pixel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

3.6

(a) The illustration of homography matrix calculation; (b) The used chessboard. 54

3.7

The light array passing two pinhole models. . . . . . . . . . . . . . . . . . .

56

3.8

Multi-plane calibration strategy. . . . . . . . . . . . . . . . . . . . . . . . . .

59

3.9

The distribution of psr (h1 , u, v) for each layer. . . . . . . . . . . . . . . . . .
c

60

3.10 The second order polynomial to approximate the residual function for a pixel

62

ix

4.1

The pattern generation procedure. . . . . . . . . . . . . . . . . . . . . . . . .

67

4.2

An example of occlusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

4.3

The comparison of diﬀerent primitives. . . . . . . . . . . . . . . . . . . . . .

68

4.4

The primitive value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

4.5

Image processing: (a) contour extraction; (b) location detection; and (c) principal axis identiﬁcation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

4.6

Determination of the codeword for each primitive. . . . . . . . . . . . . . . .

71

4.7

Traditional encoding and decoding. . . . . . . . . . . . . . . . . . . . . . . .

74

4.8

The ﬂow chart of the encoding and decoding strategy. . . . . . . . . . . . . .

76

4.9

The captured picture when projecting an image with intensity 100. . . . . .

80

4.10 The photo response curve for 5 points in the last ﬁgure. . . . . . . . . . . . .

81

4.11 The proper projection values in the camera frame. . . . . . . . . . . . . . . .

82

4.12 (a) The homography mapping among the projector image plane, the camera
image plane and the inspection table, (b) the correspondence pixel PP changes
according to the PW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

4.13 a) The initial projection mask; (b) the dilated projection mask; (c) one example of a projection pattern. . . . . . . . . . . . . . . . . . . . . . . . . . .

84

4.14 (a) A received image of projected pattern; (b) a received image for the inverse
pattern; (c) interpolation of the edge location; (d) estimation of the edge
location in a strong illumination condition. . . . . . . . . . . . . . . . . . . .

85

4.15 The ﬁnal projection mask. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

4.16 The received image without the projection mask. . . . . . . . . . . . . . . .

89

4.17 The received image with the initial mask. . . . . . . . . . . . . . . . . . . . .

90

4.18 The received image with the ﬁnal mask. . . . . . . . . . . . . . . . . . . . .

91

4.19 (a) A selected detail view of the black part when projecting pattern without
the mask; (b) the same view when projecting the inverse pattern; (c) the
intensity curve around point B / B ′ . . . . . . . . . . . . . . . . . . . . . . .

92

x

4.20 () A selected detail view of the shiny part when projecting pattern without the
mask; (b) the same view when projecting the inverse pattern; (c) the intensity
curve around point C / C ′ . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

4.21 (a) A selected detail view of the black part when projecting pattern with the
mask; (b) the same view when projecting the inverse pattern; (c) the intensity
curve around point B / B ′ . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95

4.22 (a) A selected detail view of the shiny part when projecting pattern with the
mask; (b) the same view when projecting the inverse pattern; (c) the intensity
curve around point C / C ′ . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

96

5.1

ICP algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5.2

System registration gauge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.3

Sensor 1 measurement result. . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.4

The center of each marker. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.5

The rough alignment between sensor 1 and CMM data. . . . . . . . . . . . . 110

5.6

The ﬁne alignment between sensor 1 and CMM data. . . . . . . . . . . . . . 111

5.7

System calibration results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5.8

Space partition by the octree space method. . . . . . . . . . . . . . . . . . . 113

5.9

The node tree of the octree method. . . . . . . . . . . . . . . . . . . . . . . . 113

5.10 Octree for CAD model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.11 Octree for the ﬁnal point cloud. . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.12 The diﬀerence between two algorithms. . . . . . . . . . . . . . . . . . . . . . 116
5.13 Check ﬁxture and the stopper bar touching datum points of the work piece. . 118
5.14 (a)CAD model with datum points.(b)the registration result based on datum
points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.15 The color coded error distribution map for a piece of windshield. . . . . . . . 125
5.16 The fusion result for navigation application. . . . . . . . . . . . . . . . . . . 126

xi

6.1

The working principle of the designed omni-direction system. . . . . . . . . . 138

6.2

Calibration of the omni-directional camera. . . . . . . . . . . . . . . . . . . . 139

6.3

Calibration of the omni-directional projector. . . . . . . . . . . . . . . . . . . 140

6.4

Received image from the world frame: (a) the received image; (b) Illustration
of the optical path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

6.5

The object depth error from the resolution limitation. . . . . . . . . . . . . . 142

6.6

The omnidirectional camera resolution: (a) the resolution curve in image
frame; (b) the resolution curve in camera frame. . . . . . . . . . . . . . . . . 143

6.7

Image warp for a panoramic view. . . . . . . . . . . . . . . . . . . . . . . . . 144

6.8

The designed one-shot projection pattern. . . . . . . . . . . . . . . . . . . . 144

6.9

The epipolar geometry of the omnidirectional system. . . . . . . . . . . . . . 145

7.1

RIS system conﬁguration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.2

The inspected pillar with surface pre-treatment. . . . . . . . . . . . . . . . . 149

7.3

The inspected point cloud of the pillar. . . . . . . . . . . . . . . . . . . . . . 149

7.4

The 3D point clouds for the work pieces. . . . . . . . . . . . . . . . . . . . . 150

7.5

The comparison between CMM data and the RIS data. . . . . . . . . . . . . 152

7.6

The CMM error map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

7.7

The ﬁrst tested work piece with bright color and more complex shape. . . . . 155

7.8

The corresponding CAD model for the ﬁrst work piece. . . . . . . . . . . . . 156

7.9

The point cloud for the ﬁrst work piece. . . . . . . . . . . . . . . . . . . . . 157

7.10 The error map for the ﬁrst work piece. . . . . . . . . . . . . . . . . . . . . . 158
7.11 The conﬁguration of the navigation system for single direction. . . . . . . . . 159
7.12 The tested objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
7.13 The captured image when projecting near infrared light pattern. . . . . . . . 161

xii

7.14 The fusion result for the sensor. . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.15 The integrated mobile robot with the designed sensor. . . . . . . . . . . . . . 163
7.16 The description of the integrated system. . . . . . . . . . . . . . . . . . . . . 164
7.17 The result of designed system. . . . . . . . . . . . . . . . . . . . . . . . . . . 166
7.18 The developed omnidirectional 3D sensor. . . . . . . . . . . . . . . . . . . . 167
7.19 Examples of the projected pattern on the left corn of the room. . . . . . . . 167
7.20 The received image by the omnidirectional camera. . . . . . . . . . . . . . . 168
7.21 The fused result with the object depth. . . . . . . . . . . . . . . . . . . . . . 169
7.22 The unwarped fusion result. . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

xiii

CHAPTER 1
Introduction
1.1

Applications and Motivations

The development of rapid and accurate measurement systems is an active research area.
Among current 3D measurement techniques, structured light based 3D vision has made
signiﬁcant strides and has been utilized in many applications, such as manufacturing quality
control, 3D proﬁle reconstruction, reverse engineering and environmental reconstruction.
According to the system performances, the applications can be roughly separated into two
branches: accurate inspection and fast reconstruction.
Inspections for industry applications usually emphasize the result accuracy. A structured
light based inspection system acquires the reﬂection lights from the test work pieces and
rebuilds the 3D results by a set of points, named point cloud. The point cloud is further
compared with Computer Aided Design (CAD) model to ﬁnd manufacturing distribution
error in the application of quality control. The color-code error distribution map is evaluated
by the acceptable tolerance, work pieces exceeding the tolerance are considered as unqualiﬁed.
Fast reconstructions, such as robot navigation, usually consider the acquisition speed as
the most important parameters. Its accuracy requirement is relative lower than inspection’s.
However, the navigation application is very sensitive with the measurement speed. The

1

system should be able to quickly rebuild a moving object. The developed system employs
the infrared light to sense the objects and uses one-shot coding method to quickly rebuild the
scene 3D information. The reconstruction result is further fused with the camera captured
image for the operator to get 3D information.
Currently there are still some limitations preventing the further applications of the technology.
First, a large scale of the sensing system for manufacturing industry is needed to inspect
big work pieces. Traditional inspection machines, such as CMM, are very sensitive with the
inspection scale. The price could increase dramatically once the scale increases. Such an
imaging system with large Field Of View(FOV), long standoﬀ distance, random image noise
from environment, and registration between diﬀerent sensors coordinates greatly arises the
diﬃcult of the system implementation and calibration.
Second, vision based measurement system always suﬀers from the surface optical properties. Work pieces with various properties, such as bright, white, dark, black parts, and
especially the surfaces exhibiting a mixture between specular and defused reﬂection, greatly
inﬂuence the vision system. Manufacturing industries need to pre-treat the surface into a
diﬀused reﬂective one, and then utilize the optic system to measure the work pieces, which
increases the cost and the processing time. Therefore, a system capable against optic challenging objects is needed for industry application.
Third, there lacks of a quickly on-line navigation system for mobile robot. Since the
mobile base is moving all the time, the system should reconstruct the environment based
on only one projection. Traditional structured light systems project multi-image patterns
could fail to this application. Furthermore, the results should also simultaneously satisfy the
eﬃcacy, reliability, high density, and accuracy.
At last, an omni-direction 3D navigation sensor is needed for the operator to get 360 degree environmental information. Without the blind area of the vision system, the operator
could control the mobile robot and view the surroundings more eﬃciently. The structured

2

light system could simultaneously compensate the mirror distortion, sense 360 degree, reconstruct objects in the environment and fuse the 3D information into the 2D image.
Aiming to these demands, this study is trying to develop proper methodology and reliable
machine vision based systems for those applications.

1.2
1.2.1

Background and Literature Review
Implementation of a Measurement System

The development of a structured light based measurement system can be described as the
following parts: 3D recovery model for a traditional active vision system, calibration of the
system sensor, encoding and decoding for the structured light, registration between diﬀerent
work frames and omni-direction system.
The imaging sensing device, a camera for example, is usually simulated as a pin-hole
model. Objects in the 3D world frames maps into the image frame by a 3 × 4 perspective
matrix. Only 2D information is preserved through the mapping, and the limitation of the
last degree of freedom is missing. The recovery from an imaging pixel forms a linear array
instead of a point. In order to accomplish the 3D reconstruction task, additional constraint is
needed to ﬁx last degree of freedom. The 3D recovery model, as the starting point to design
a structured light, build up the work principle of the system sensors. Aided from another
devices, such as a laser, a projector, or another camera, the third constraint equation is able
to be build up. The 3D recovery model will control the following implementation steps of
the system, such as conﬁguration, calibration, and so on. An simple, precise, and easy to be
implemented 3D recovery model plays an important role in the system design.
After the 3D recovery model is ﬁxed, the next step is to ﬁnd proper way to calibrate the
system. A structured light system usually consists of a projector and a camera. Calibration
tasks usually involve both devices’ the intrinsic and extrinsic parameters. The intrinsic
parameters controls the mapping from its image frame to its own sensor frame, and the
3

extrinsic parameters transfer the sensor frame into the world frame. Together there are
totally 22 parameters for both devices. If the 3D model is properly designed, less parameters
could be calibrated.
Structured light system comes from stereo vision system by replacing one camera with
a projector. Stereo vision system suﬀers from correspondence problem: only corresponded
pixels in both imaging devices could build up the 3D points. Without feature aided, stereo
vision system could not establish the correspondence. Therefore, only a few points having
clear features could be recovered. Structured light system replaces one camera with a projector. The projected pattern is properly design to encode certain a codeword into the incident
light. The camera records the reﬂection light and extracts the codeword. Since the incident
light and the reﬂection light preserves the codeword, pixels sharing the same codeword in
both imaging devices are corresponded. A proper designed projection pattern could generate the correspondence without the inﬂuence from the textures and geometries of the test
objects. Therefore, a robust encoding and decoding strategy is critical for a structured light
system. According to the number of projections, the encoding and decoding methods can be
separated into two categories: one-projection based method and multi-projection method.
One-shot projection method’s acquisition time is much shorter than multi-projection method,
and therefore can be used to measure moving objects. However, its result point density and
its accuracy is less than multi-projection method. Thus, one-shot projection method is usually used for 3D reconstruction. Multi-projection method is mainly utilized to inspect stable
work pieces. With multiple projections, the method could inspect optically-challenging objects without pre-treatment, such as objets with diﬀerent color, diﬀerent reﬂection ratios, or
material exhibiting a mix between diﬀuse reﬂection and optic reﬂection. Two encoding and
decoding methods are introduced separated in the study.
After 3D point cloud is recovered in each individual frame, usually there are some post
prossing tasks to obtain the ﬁnal results. If the system contains more than one pairs of
sensors, registration between each sensor frame and a common frame is needed to merge all

4

small point clouds into the ﬁnal one. For quality control application, the ﬁnal point cloud
needs to compare itself with CAD model, so addition registration between the common frame
and the CAD frame is also needed. For navigation application, the ﬁnal point cloud should
be fused into the 2D image display. The third dimension, object depth: distance from objects
to the camera, is fused into the image via color codes. Distance from far to near is assigned
by diﬀerent colors, from blue to red. Therefore, registration from 3D world frame to the
image frame is needed to ﬁnish the fusion.
Omni-direction systems usually has diﬀerent work principles with the traditional structured light systems, because of the additional convex shape mirror in front of the cameras
and the projectors. Its literature view is introduced in a separated section in this chapter.
From the view of applications, structured light systems can be roughly separated into two
branches: accurate inspection and the fast environment reconstruction. From the of development, the diﬀerence between two applications lies in the encoding and decoding strategy.
Accurate inspection usually employees multi-shot projection to acquire test objects information to obtain a high dense and accurate point cloud. Fast environment reconstruction
on a mobile robot usually adopts single-shot projection to record surround information to
increase sample speed.
Once the applications’ scale increases, especially for the FOV of the system, both branches
also utilize diﬀerent methods to enlarge detection range. In order to increase of FOV of the
system, accuracy inspection puts more multiple sensors to cover a big work piece. Traditional
methods, such as enlarging the standoﬀ distance, could also increase the FOV of the system.
However, the resolution area of an imaging device, the area one camera / projector pixel
covers in the world frame, also increases. The deviation within the resolution area could not
be detected by traditional methods, and thus the results accuracy is reduced. Therefore,
multi-sensor strategy could enlarge the FOV and maintain high accuracy at mean time.
For the application of navigation, there is limited space on the mobile robot. So it is
very diﬃcult to put multiple sensors on the robot.In stead of multiple sensors, convex shape

5

mirrors are put in front of the projector and the camera to enlarge the FOV. The method
could also detect 360 degree of view without blind area in the limited space. Since the
accuracy of navigation is much small than the quality inspection, the error within resolution
area can be tolerated.
In the following parts of this chapter, the literature is mainly follow the development
steps of a structured light system: 3D recovery model, system calibration, encoding and
decoding for the structured light, registration between diﬀerent work frames and at last
the omni-direction system. Special attentions are given to two major applications: quality
inspection and environment reconstruction.

1.2.2

Review of 3D Inspection Sensors Based on Optical Methods

Quality inspection is a process of ﬁnding the deviation between the work pieces and CAD
model by a set of speciﬁcations [1]. The whole process can be divided into two parts: the
measurement of the work pieces and the generation of the error distribution map.
Basically, existing measurement techniques can be separated into to categories: contact
based and non-contact based. A CMM is a traditional system commonly used for dimension
inspection based on contact method. A touching probe is employed to contact the work piece
surface and obtain the coordinates of the points. Through the accuracy of the a CMM is
very high, the nature of point-wise contact based method limits both CMM’s measurement
speed and the capability of inspecting complicated work pieces. Non-contact measurement
via machine vision [2], with the merits of fast speed, high inspection sample rate, and overall
simplicity of operation, becomes a popular alternative inspection method for CMM. To many
applications in the manufacturing industry, inspection of a large part requires multiple 3D
vision sensors to cove the whole piece, in which additional registration between each sensor
frame to the common frame is needed to complete the ﬁnal point cloud. At last the ﬁnal
measurement result, the point cloud representing the test work piece, is compared with the
Computer Aided Design (CAD) model to ﬁnd the error distribution map. The color-coded

6

error distribution map will be further evaluated with the acceptable tolerance by requirement.
Only the work pieces within the manufacture tolerance can be delivered to the customers.
Furthermore, a machine vision system is also widely utilized in diﬀerent industry ﬁelds [3] [4].
The problem of retrieving 3D information form an object by vision method starts from
the basic imaging device: a camera. A camera, which is usually simulated by a pinhole
model, presents a perspective mapping from 3D world frame into the image frame. Each
pixel in the image frame recovers a light array passing the optic center instead of a point in
the world frame. In order to achieve the 3D recovery, additional constraints are needed to
limit the last degree of freedom along the light array. Stereo vision [5] was developed in 3D
recovery based on two cameras viewing the same object from two diﬀerent view points. The
correspondence pixels from two cameras build two light arrays by pinhole approximation.
The intersection point between two corresponding light array recovers the 3D coordinates
in the world frame. Sometimes, the middle point of the common perpendicular of the two
corresponding light arrays is utilized as the 3D reconstruction point due to the calibration
error.
Stereo vision is also called as passive vision, since non of the sensors emit any kind of
radiation but instead rely on detecting the radiation reﬂection by the objects. Despite it
was widely used in computer vision, it presents several drawbacks in 3D reconstruction. The
ﬁrst one is the correspondence problem. In other words, determining relationships from
pixels of diﬀerent views is not a trivial step. The second one is resolution of 3D points. The
technique can only reconstruct points with unique features. Therefore, dense reconstructions
is diﬃcult to obtain. In order to overcome both drawbacks, two solutions are developed when
dens reconstructions are required. The ﬁrst one is adding an active imaging device, such
as a laser or a projector, projecting a group of pre-deﬁned image patterns, which is also
called structured light. The codeword carried by each pixel from the projector is designed
to be unique. Therefore the correspondence in both cameras is able to established. The
limitations of the this method lies in: a) the Field Of View (FOV) is the overlap of three

7

imaging devices, b) the Depth Of View (DOV) is minimum of three image devices, and c)
additional device increases the cost of the system.
Facing the limitation of the ﬁrst solution, the second one is proposed by simply replacing
one camera by a projector in the stereo vision. The projector provides both correspondence
and the incident light array equation. Treating the projector as a pseudo camera, 3D reconstruction can also achieved by the intersection between corresponding light arrays. Since
projector emits lights, the system is also called structured light based active vision system.
In this study, 4 pairs of structured light based active vision sensors are employed to build a
large scale measurement system as a prototype for an industry quality control system.

1.2.3

Sensor Calibration

Calibration of the 3D sensor is crucial for a measurement system to get accurate results.
Since a 3D sensor consists of a camera and a projector, its calibration can roughly separated
into two categories: camera calibration and projector calibration. The pinhole model approximation of each image device adopts intrinsic and extrinsic parameters to construct the
light array equation from a given pixel. Intrinsic parameters transfer the image pixel into
a linear light array in the camera frame. Extrinsic parameters transfer the camera frame
into the world frame on the inspection table. Conventional calibration methods calculate
the intrinsic and extrinsic parameters of a camera or a projector.
Camera calibration has been studied for decades, With the aid from the correspondence
between a calibration gauge and pixel locations in the image, parameters of a camera can be
regressed. In the calibration process, the calibration gauge plays an important characteristic
function. A large scale 3-dimensional gauge with high accuracy usually costs a lot. Based on
the dimension of the gauge, calibration techniques can be roughly classiﬁed into the follow
categories:
1) 3D reference object based calibration. Camera calibration is performed by observing
the gauge whose geometry in 3-D space is known with accurate precision [6]. This kind

8

of approaches usually require an expensive calibration apparatus, because it costs a lot to
manufacture an gauge with accurate 3D coordinates, especially for a large scale one.
2) 2D plane based calibration which is also the most common used method. In order to
reduce the cost of the gauge, calibration algorithm is optimized to obtain the camera intrinsic
and extrinsic parameters on planar objects. Tsai [7] adopted a planar gauge to obtain camera
parameters via two steps. In the initial step, constraints between parameters, such as a
rotation matrix is orthonormal, are not enforced, but a quantity that simpliﬁes the analysis
and leads the tasks to linear equations via Radial Alignment Constraint (RAC). Solution
of the linear equations is calculated by least square method and is utilized as the starting
values for the ﬁnal optimization in step two. In the second step, the rest of parameters are
obtained via non linear optimization Levenberg-Marquardt Algorithm (LMA) [8] that ﬁnds
the best ﬁt between the observed image points and the predicted points by the target model.
Tsai’s method is known as two limitations: a) the solution procedure, however, fails if
the camera plane is nearly parallel with the calibration board. Tsai et al. [9] also devised an
algorithm that handled the exactly parallel case. Years later, Zhuang and Wu [10] extended
Tsai’s method to calibrate a camera in a near parallel conﬁguration when camera intrinsic
parameters were known. Luo et al. [11] proposed another algorithm to calibrate a camera
when intrinsic parameters were unknown. b) The second limitation is that the translation
between the calibration planes is needed to regress camera parameters, which results an
elaborate setup. Diﬀerent from Tsai’s method, Zhang [12] proposed a new planar based
calibration method, in which the knowledge of the plane motion is not necessary. The
orthogonal constraints from the rotation matrix were utilized as the calibration conditions.
The homography matrix between image plane and calibration plane was factorized into
two parts: intrinsic parameter and a matrix consisting of two columns from rotation and
translation vector. An analytical solution can be calculated as initial condition and numerical
solution can be found iteratively through LMA.
3) 1D calibration. In order to further reduce the requirement of the calibration gauge,

9

camera calibration based one 1D gauge is proposed. The calibration results can reach fair
level of accuracy. Zhang [13] ﬁrst proposed to calibrate a camera with a 1D object. The
solution can be found if one point is ﬁxed. A closed-formed solution is developed if six
or more observations of such 1D object are made. Qi. et al. [14] extended Zhang’s 1D
calibration method, and calibrate a camera with aid from a gravity driven 1D object.
4) Another groups calibration are called semi-calibration or self-calibration. Such kind
of calibration usually assumed some parts of the parameters are known, and can calibration
the rest parameters without any requirement of any reference gauge. Therefore, sometimes
this kind of calibration methods are also called 0D calibration. Semi-calibration usually
adopts epipolar geometry constraints of the same object from two views. Essential matrix
was ﬁrst mentioned by H. C. Longuet-Higgins [15]. If two camera’s intrinsic parameters are
known and normalized, the extrinsic parameters can be solved up to 4 analytical solutions
by SVD method. The essential matrix can be determined from eight-point algorithm [?].
If the transformation between two positions is known, a camera’s intrinsic parameter can
estimated by fundamental matrix from epipolar constraints [16] [17].
The history of projector calibration is much shorter than camera calibration. After
structured light based 3D reconstruction model was developed, people began to investigate
diﬀerent methods on projection calibration. Compared with a camera, the calibration of
a projector is always more diﬃcult and the process is more complicated, since it is not
a information receiving device. The dimensions of the gauge can not be directly applied
to the calibration process and the projected pattern needs to employee another imaging
receiving device, such as a camera, to acquire. Therefore, the projector calibration error is
usually bigger than the camera’s, and becomes the major source of errors in a structured
light system. Drarei et al. [18] treated a projector as a pseudo camera with the aid from a
calibrated camera and then utilized camera 2D calibration method to calibrate the projector.
The homography matrix between a projector pixel location and 3D points regressed from the
camera is used to regress the projector parameters. However, this kind of methods brings

10

the camera calibration error into the projector calibration, which double the ﬁnal error.
Zhang et al. [19] ﬁrst adopted structured light to set up a link table between a projector and
a camera, and then applied traditional 2D camera calibration methods to regress projector
parameters. The 3D coordinates of the points on the gauge is not regressed from the camera,
so the camera calibration error will not participate into the projector calibration. However,
the calibration steps are very complicated. The planar gauge was placed around 20 diﬀerent
positions and 3D points on each position were recorded by horizontal and vertical Gray Code
and Phase Shifting (GCPS) structured light. Since GCPS is not robust against image noise,
the miss-linked location by GCPS also costs calibration error. Eipoplar constraints describes
the internal geometric relationship between two views and be welled used in self-calibration.
Homography matrix which is used to calibration a camera by 2D gauge also receives a big
success in a camera calibration. Combining these two, epipolar constraints and homogrpahy
matrix, Zhang et al. [20] calibrated the relative rotation and translation between a camera
and a projector. But the algorithm requests the intrinsic parameters need to be known
ﬁrst. In order to reduce the error from the projector calibration, Shi [21] did not regress
the pinhole model parameters of the projector but created a link table between camera light
array and projector light array with respect to each level of height within the whole work
space. The results are accurate, but the method is limited by the size of link table, which
means it can not be used for large scale system.
Convectional calibration methods of a structured light system calculate both equipments
intrinsic and extrinsic parameters, leaves a challenge task to calibrate a projector. Although
projector calibration can be done via a complex way, the solution either contains too much
error or time consuming. This thesis brings the camera planar calibration method and epipolar information into 3D reconstruction model to reduce the parameters in 3D model. The
novel strategy is that the camera calibration information based on planar method can be
saved as the reference information. The acquisition information of the test object is used
to compare with the pre-saved reference information. The corresponding pixel deviation

11

between reference and a test object in camera image is applied to reconstruct the 3D point
cloud, in which projector’s intrinsic and orientation parameters is avoided in calculation.
Therefore, the projector calibration is not necessary for 3D reconstruction anymore. The
correspondence function provided by the structured light is not only used to set up correspondence between a camera and a projector, but also establishes corresponding pair of pixels in
a camera between two measurements: the camera pixel recording reference information and
the corresponding camera pixel recording tested surface information.
Error analysis is another important topic in the calibration of a given 3D surface proﬁle
measurement system. As early as 1980s, when the active vision system was invented, the
error analysis problems have been pointed out. People did various kinds of error analysis
to evaluate the system from diﬀerent aspects and error sources.The error model developed
by Trobina et al. [22] for an active range sensor was mainly concentrated on the errors
from the camera perspective geometry.Yang et al. [23] proposed a more detailed mathematic
model for each calibrated parameter in the 3D reconstruction.A relative error model [24] was
presented for calibration of an active triangulation sensing model. Uncertainties analysis
and their eﬀects in 2D camera calibration [25] have also been discussed. In summary, people
mainly concentrated on the error analysis related with device parameters, sensor model, and
calibration methods in the past. As the scale of a 3D proﬁle sensing system increases, the
consistence of the system cost by the random image noise also needs to be evaluated. In
practical applications, especially for industry applications, consistence decides whether the
measurement results are reliably or not. When the FOV and standoﬀ distance of the system
increase to a scale that consistence error is no longer small enough to ignore, consistence
analysis and test need to be done before the calibration starts and a poor consistence test
result requires the redesign of the system. In this study, the consistence performance of the
design system is analyzed, simulated, and tested. Tradeoﬀ values are chosen for the standoﬀ
distance and inspection area by considering consistence error.

12

1.2.4

Surface Coding via Structured Light

The encoding and decoding procedure of the structured light is one of the most critical factors
for an optic measurement system. The biggest challenge is the capability of simultaneously
satisfying the eﬃciency, reliability, and accuracy requirements. The projected patterns are
deformed by the free-formed test surface. Therefore, the codeword is designed to be against
this distortion and at mean time the encoded pattern needs to carry as many codewords as
possible. The corresponding decoding on-line image calibration algorithm should dynamically calibrate the received image against the distortion and image noise to extract the
codeword. Since these two procedures are always be coupling with each other, the literature
review of both functions are explained simultaneously.
Several pattern strategies have been proposed to make a fast 3D inspection using a single
or a few patterns, which can be categorized into two types in terms of test objects: one-shot
pattern for moving objects and multi-pattern for static objects.
The moving objects can only be inspected by one-shot pattern algorithm, otherwise, the
system may acquire incorrect pattern code due to the movements of the parts. The major
challenge for a one-shot pattern is the capability of compressing enough codewords. People
employed diﬀerent strategies for it, such as color and spatial-neighborhood patterns. In the
color method, the codeword is determined in a single pattern because the color of diﬀerent
primitives, such as dots, lines, grids, are diﬀerent. The decoding calibration can separate
each codeword via color ﬁlter. Spatial-neighborhood can be encoded because the code of each
primitive (symbol) depends on the value of the primitive and its neighbors. Usually both
strategies are combined together to create a one-shot pattern. Pattern with round disks,
small color dots on the black background, was developed [26] [27]. Decoding function ﬁrst
separated each image primitive, and employed color information to assign unique codeword
for primitive. The codeword density can be increased via strips design [28] [29] or grids
design [30] [31]. The drawback of color strategy is that the color contrast ratio can be
aﬀected by the color reﬂectivity, inherent texture of the work piece, and ambient lights,
13

which results in less reliability than monochromatic light, such as black-white light.
In this study, a new structured light pattern based on the spatial-neighborhood strategy
was proposed [32]. The primitive of the pattern is based on the corner of the black-white
checkerboard pattern, which is also called the X-point. The X-point provides more robust
and accurate position compared with a conventional pattern in which the primitive is based
on symmetrical symbols such as disk, strip, etc. The orientation of the X-point is used to
encode the primitive of the pattern. Moreover, the point cloud density in diﬀerent regions
of the inspected part can be adjusted.This strategy is very useful for applications, in which
high point density is only required for some critical regions such as connection interfaces.
Multi-shot strategy is only available for stable objects but it is more robust against image noise than one-shot method. After detecting the illumination problems, such as shadow
problem, under- / over- exposure problem and internal reﬂection problem, the projection
pattern could be readjusted by a proper feedback. The recent research is mainly focused
on how to improve the inspection accuracy and point cloud density. Among many strategies, Gray Code [33] combined with Phase Shifting, such as sinusoid wave [34], received
a good result and was utilized in may areas. After successfully inspecting diﬀused based
material via multi-shot method, people started to apply the multi-shot method to detect
optical-challenging objects without pretreatment. As a pioneer, Nayar [35] [36] built upon a
shape-from-intensity method and iteratively approximate the 3D shape of the test objects.
But the method is only to model relatively simple geometries instead of inspections. Later,
people began to investigate more robust methods against optically-challenging objects. Literature shows that there are approximately four categories that people applied to increase
the measurement performances: directly coding method, color invariance based method,
polarization based method, and adaptive projection method.
In directly coding methods, people tried to develop encoding patterns which were robust
against image noises. Instead of phase shifting method, digitized Gray Code Line Shifting
(GCLS) [37] [22] attempted binary square wave as the projection patterns. Binary wave

14

achieved better performance than sinusoid waves. Diﬀerent from phase shifting method,
decoding process was changed from analog sample into digitized sample. Codeword 1 is
encoded as intensity 255 and 0 is encoded as intensity 0 in the projector frame. High image
intensity is decoded back as ’1’, and low image intensity is considered as ’0’ in the camera
frame.
However digitized directly coding method is still weak against shiny material. The unilluminated area can also result in high image intensity in the camera frame due to the
internal reﬂection. Low reﬂection coeﬃcient could also cause low image intensity in camera
frame even high illumination pattern is projected. Combined binary pattern with color
information [38], some shiny objects were successfully measured. The authors hypothesized
that the eﬀects of internal reﬂection and ambient light in the scanning process could be
eliminated via color invariants. The inter-reﬂection is not preserved under bas-relief (GBR)
transformation. However, the method could not inspect the black material which could
absorbs all incident colors. Furthermore, under- / over- exposure problem was not mentioned
in the study. Therefore, the capability of the method is limited.
Based on the fact that specular reﬂection changed the polarization direction of the incident light, polarizers were added in front of the cameras / projectors to ﬁlter out the internal
reﬂection light [39] [40]. Therefore, shiny objects could be measured in their systems. However, without the pre-knowledge of test material optic properties, surface normal, and exact
number of internal reﬂections, authors had to manually and carefully adjust the polarizers to
a proper angle to ﬁlter out the internal reﬂection light. Once the position of the part changes,
or a new part was put into the system, polarizers had to be re-adjusted [41]. Therefore, the
system is not very ﬂexible.
Adaptive coding methods could dynamically modify projection illumination based on the
scene, which can avoid the over-under exposure problem and be further used to eliminate the
internal reﬂection issues. The method is more robust than the polarization based method,
since no modiﬁcation is needed if a new part is inspected. However, the biggest challeng-

15

ing is the correspondence problem between the camera and the projector. The projector
illumination needs to be adjusted but the receiver is the camera not the projector. The
correspondence, which is usually called crosstalk function, is needed to create the feedback
from the camera to the projector. After feedback information is obtained, the illumination of
the structured light could be dynamically adjusted to measure optically-challenging objects.
However, the crosstalk function is a dilemma for a structured light system: structure light is
used to ﬁnd out correspondence, but the correspondence is needed to create a scene based
structured light. Koninckx et al. [42] proposed a scene adapted structured light system to
avoid under- / over- exposure problems. Not only the reﬂection coeﬃcient by scene is considered, but also the nonlinear responses for both imaging devices are simulated. However, the
approximate knowledge of the geometry has to be pre-known to solve the dilemma. Without
the foreknowledge, the method is relatively lack of ﬂexibility for diﬀerent work pieces. P.
Sen [43] estimated crosstalk function by turning on each projector pixel each time, and also
applied an adaptive subdivision to speed up the algorithm. However, the authors only use the
method to dynamically control the illumination condition but not to reconstruct the given
scene. Furthermore the number of the projecting and recoding loops is too many. Therefore,
the eﬃciency is very low. An adaptive corresponding algorithm was developed [44] [45] by
Xu et al. to inspect objects with strong internal reﬂection. The authors classify the captured
areas as certain areas and uncertain areas. They iteratively peel out the certain areas and
enhance the illumination condition on the uncertain regions. Objects with strong internal
reﬂection can be measured in their system without any pre-knowledge of the test geometry.
While the limitation is that its iteration numbers is not ﬁxed. It may cost hours to inspect
a scene [44] [45] and thus the algorithm eﬃciency is relatively low.
In summary, a robust, ﬂexible, and eﬃcient inspection system for the optically-challenging
objects is needed for quality inspection in industrial ﬁeld. In this study, a robust encoding and
decoding algorithm to measure optically-challenging objects is proposed. Monochromatic
WB stripes are applied to avoid the inﬂuence from test objects’ colors. A scene based

16

projection mask is utilized to adjust the projection illumination so that the under- / overexposure problem and shadow problem can be eliminated. Combined with the projection
mask, an extrapolation model is developed to estimate the edges between high and low
illuminated regions. Therefore, shiny objects could also be inspected in a structured light
system.

1.2.5

3D Data Registration

Registration of point clouds is to ﬁnd Euclidean motions among a set of point clouds or,
between a point cloud and the model set of a given part, respect to a common reference
frame. For a large scale inspection system, two categories registrations are needed:
1) Registration of multiple point clouds into one common frame.
2) Registration between the ﬁnal point cloud and the CAD model.
Registration problem can be roughly considered to be two parts: coarse alignment and
ﬁne registration. Coarse alignment determines an initial condition and ﬁne registration
estimates a best solution to be close to the truth as much as possible. Without coarse
alignment, ﬁne registration is time consuming and often fails to ﬁnd a solution. Methods for
coarse alignment include principle component analysis (PCA) [46], point signature [47], and
image transformation [48], etc.
One of the most famous ﬁne registration methods is the Iterative Closest Point (ICP)
algorithm [49] [50]. ICP is based on the search results of pairs of nearest points in the
two data sets to estimate a coordinate transformation. This iterative process executes until
result reaches the pre-deﬁned threshold. Many improvements and modiﬁcations of the basic
ICP algorithm have been developed: Johnson et al. [51] adopted color information combined
with Euclidean distance to set up correspondence in order to improve the registration results.
Instead of searching the distance between corresponding points, Pottmann used the distance
between the surfaces as a registration evaluation standard [52]. Gregory investigated the
Euclidean invariant features in generalization ICP in order to decrease the probability of

17

being trapped in a local minimum [53]. Moreover, ICP method needs a lot of processing
time to ﬁnd the corresponding closest points. In order to deal with this problem, people
tried to increase the speed of the algorithm. Timothee used multi-resolution way to speed
up the ICP searching process [54].
ICP family is a good algorithm in ﬁne registration, and the merit of this algorithm
is that the total linear least square error will be minimized after the ﬁnal alignment. In
this study, ICP based method is utilized to solve the registration of multiple point clouds
into one common frame. However, this merit will change into a drawback in error map
registration between ﬁnal point cloud and CAD. The error between work piece and CAD
model should not be minimized by registration process. The reduced error map can not reﬂect
the correct manufacture error, therefore, an object-oriented registration method based on
ICP is proposed to solve the problem.

1.2.6

Omni-directional 3D Reconstruction Vision Systems

The capability of vision based control is highly related with information provided to the
operator / computer. An omnidirectional 3D camera is a system that provides rich 3D information with a wide rang viewing and thus becomes a valuable choice for such applications.
An omnidirectional camera oﬀers a 360 Field Of View (FOV) and a complete information about the environment surrounding the vision sensor. The corresponding realtime 3D
perception algorithm could further extend the capability of the omnidirectional camera to
measure the depth information. Applications, such as robot navigation, remote manipulation, auto-driving system, map reconstruction, and pedestrian detection, demand this kind
of vision sensor to quickly and accurately collect necessary information. With the aid of an
omnidirectional 3D camera, the navigated robot could be easily avoid collisions; the remote
controlled robot arm could freely pick and place parts all around; map reconstruction for an
unknown scene could be achieved within one pass, and the pedestrians approaching from the
rear could also be detected. Aiming on this demand, we develop a low cost structured light

18

based omnidirectional 3D camera, consisting of a projector, a camera, and two hyperbolic
mirrors. A panoramic view is provided around the sensor without any blind areas, and the
depths from the sensor to the objects are measured within one-shot projection.
Basically there were two ways to obtain omnidirectional images [55]. The ﬁrst approach
was to generate the omnidirectional images through the image mosaic method. The method
either employed multiple cameras viewing diﬀerent directions or utilized one rotating camera
[56] to obtain multiple images of the environment. The ﬁnal omni-directional image was the
merged result using the mosaicing method. 3D reconstruction by such methods was usually
achieved by rotating a stereo vision system consisting of two cameras [57] [58]. In each
rotation angle, the system was considered as a normal stereo vision system and acquired the
3D coordinates. An omni-directional 3D rebuild was achieved by scanning the environment
a full 360 degrees. For such a system, it could not build all the environment at same time
and the mechanic rotation could also inﬂuence the accuracy of the result.
The second approach is to use an omni-direction camera [59] [60] [61] to obtain panoramic
vision. An omni-directional camera usually consists of a convex shape mirror, such as a
hyperbolic, parabolic, conic, or sphere mirror, and a normal camera. By viewing the back
image of the mirror, a panoramic vision can be obtained. The math models [62] [63] and the
conﬁgurations [64] have been discussed a lot in the past.
An omnistereo vision system could be formed to obtain 3D coordinates by employing
two omni-direction cameras. Via vertically-aligning two omnidirectional cameras [65], an
omnistereo system was successfully designed and was implemented. A 3D sample point was
rebuild by the intersection of two corresponding light arrays from two calibrated omnidirectional cameras. Monocular vision could also rebuilt 3D information but less commonly
compared with stereo vision method. In order to reconstruct an environment map, an omnidirectional camera was put onto a mobile robot [66] [67]. Via the estimation of the robot
motion, the feature points in the scene could be reconstructed. However, both systems usually suﬀered from the correspondence problem and both measurement accuracy and result

19

density were limited by this correspondence issue.
Facing on the correspondence problem, a structured light based omnidirectional vision
sensor was developed [68] [69]. The authors put a laser facing to a conic mirror and then
formed their structured light emitter. An omni-direction camera and such a light source were
put vertically to form their omni-direction structured light based 3D sensor. The laser was
split into a light ring by the conic mirror and the light ring was emitted to the environment.
The omnidirectional camera captured the light ring and the radius of the light ring was
utilized to calculate the depth information. Since there was only one light ring in the image,
correspondence problem did not exist in their system. The scene points illuminated by the
light ring cloud be detected in realtime. However, the system could only scan a thin ring from
the scene and limited 3D information was provided. In order to sample more points from the
scene, Ukida et al. [70] developed a structured light aided omnistereo vision sensor. Three
projectors shooting in 3 diﬀerent directions were added to an omnistereo vision system. With
the aid from the structured light, a more accurate and more dense 3D reconstruction result
was achieved. However, 3 projectors did not cover the whole FOV of the omnidirectional
camera, several blind areas were failed to obtain 3D information.
In short, there is a demand to design an omnidirectional 3D sensing system capable to
fully measure a 360 degree surrounding environment with a high dense sample result. In
this paper, a structured light based omnidirectional 3D camera, consisting of a projector, a
camera, and two hyperbolic mirrors, was developed. The hyperbolic mirror and the projector
formed an omnidirectional projector. The omnidirectional projector and the omnidirectional
camera were aligned vertically and cloud sample dense 3D points around all 360 degree
surroundings. The system can measure the environment without any blind areas.
Besides the high density 3D sample results, the measurement speed is also very important. A lot of applications, such as navigation, remote manipulation, pedestrian detections
and et al., did not only require measuring the environment but also needed a realtime sampling method. Based on the number of projection images, structured light systems could

20

be roughly separated into to two categories: one-shot projection system and multi-shot projection system. Once the imaging sensor was moving or a moving object was approaching,
multi-shot method failed to handle such cases.
The work [68] [69] that Gluckman et al. developed could measure the scene in realtime,
but its point density was very limited. The system [70] that Ukida et al. developed applied
several multi-projection methods to set up correspondence. The measurement could not be
ﬁnished in real time. In order to sample dense 3D points in realtime, a robust one-shot
surface coding method was needed. In the past, people only developed diﬀerent kinds of
one-shot coding methods for a normal structured light system. Kinect system developed
by Microsoft could detect 3D in realtime and obtained dense samples, but such a system’s
view angle was so limited compared with omnidirectional system. Much less information was
presented to the operator compared with an omnidirectional 3D camera. Color based and
spatial-neighborhood based coding methods were two most common one-shot methods for a
conventional structured light system. In the color method, the codeword was determined in
a single pattern because the color of diﬀerent primitives were diﬀerent. The drawback of the
color strategy was that the color contrast ratio could be aﬀected by the background colors,
which resulted in less reliability than with monochromatic light, such as black-white light.
Spatial-neighborhood [71] using monochromatic light was more robust than the color
based method. The codeword of each primitive (symbol) depended on the value of the primitive and its neighbors. Each codeword was unique from the others so that correspondence
could be established. However, this method was not suitable for an omnidirectional sensor.
The geometric relationship could be destroyed by the convex shaped mirrors. The projection
pattern was ﬁrst warped by the hyperbolic mirror in front of the projector and then was further warped by the hyperbolic mirror in front of the camera. Without knowing two warping
functions, the spatial-neighborhood method could not directly applied to the omnidirectional
3D sensor. Therefore, existing solutions for one-shot surface coding were not suitable for an
omnidirectional 3D system.

21

Besides the structured light systems, ladar system [72] could also get realtime 3D results
from the scene. However, such a system could only sample limited number of 3D samples.
Furthermore, the cost of the system is also much more expensive than our system.
In order to meet this demand, a one-shot surface coding for a structured light based omnidirectional 3D system was developed to measure the environment in realtime. Light rings
embedded with diﬀerent angular frequencies and intensities were emitted from the omnidirectional projector to the environment through the hyperbolic mirror. Taking advantage of the
epipolar constraints, the invariance of the phase angle and angular frequency were utilized
to establish the correspondence and then were trangulized to obtain the depth information.
Compared with traditional one-shot method, the contribution was the one-shot design for
an omnidirectional projector and the challenge was the pattern design against both mirror
distortion and scene distortion. The codeword should be designed based on the invariance
from scene distortion and the two warps that were aﬀected by the hyperbolic mirrors were
also needed to develop the pattern.
In order to obtain an accurate 3D proﬁle reconstruction, it was also of great importance to
calibrate the geometrical relation between the optical components of omnidirectional stereo
vision. Generally, a uniform theoretical model was used to present the geometric relationship
between the 3D point in the scene and the corresponding 2D point in the image plane of the
camera or projector. Then, the model parameters were estimated through several features
with known locations in the scene [73] [74]. The calibration accuracy, however, was limited by
the imprecise alignment of optical components and residual errors due to the manufacturing
error of the oﬀ-the-shelf optical components. To improve the accuracy, the vector-based
calibration strategy was proposed to uniformly calibrate both camera and projector, where
an incident vector for a pixel of the projector and a reﬂected vector for that of the camera
were assigned respectively.
In summary, The development of the omnidirectional 3D camera contributed in the following aspects:

22

1) Developed and conﬁgured a structured light based omnidirectional 3D camera. It was
capable to measure 360 degree environment with dense 3D samples.
2) Developed and implemented a one-shot algorithm for such a 3D camera. 3D samples
could be measured in real time. It was essential when the sensor was located on a moving
base.
3) Developed a vector based calibration for such a system. The proposed method was
able to calibrate an omnidirectional projector, which people have not discussed yet based on
my literature review.

1.3

Diﬃculties and Challenges

The objective of this study is to develop large scale structured light based measurement
systems, which include topics not only in the development of theoretical methodology but
also application implementation. Although several 3D sensors have been developed in past,
this study aims at the the development of large scale optic systems for accurate inspection
and fast environment reconstruction.
The main challenges involved in this study include:
1) Calibration tasks for a large scale structured light system. It is very diﬃcult to
maintain high inspection accuracy when the system scale increases. One pixel error, 3D
reconstruction error cost by one image pixel, in a large scale system is much bigger than the
error in a small one. A complete calibration methodology for a large scale structured light
sensor, including 3D model optimization and precision analysis, needs to be developed. Both
diﬃculty during the calibration implementation and the calibration requirement arise when
the scale is enlarged. Traditional calibration tasks involves too much parameters in their
3D recovery model, 22 parameters including both imaging devices’ intrinsic and extrinsic
parameters. Especially the projector parameters is the major source of calibration error.
2) Consistence analysis because of the large scale of the system. Conventional structured

23

light inspection system usually much less FOV. The inﬂuence costed by the random image
noises can tolerated in small systems. However, for a large scale system, the random image
noises begin to aﬀect the consistence of the system, which is a novel problem for this ﬁeld.
3) Robust encoding and decoding algorithm is needed inspect optically-challenging objects. Without pretreatment, the reﬂection ratios for a large work piece can vary a lot.
The incident illumination could be dynamically change correspondingly. Otherwise, it is a
dilemma to adjust camera aperture. The dark part requires a big aperture and the shiny
part needs a small aperture. Optical challenging objects, ranging from shiny, dark, and even
blank, should be inspected simultaneously. Shiny objects could cost the mis-coding problem
due to the internal reﬂection problem – the reﬂection light from a point in the scene is not
only related with its own incident light but also with the incident lights in its neighborhood.
The un-illuminated area can also result in high intensity in the camera image because of
internal reﬂection.
4) Registration between each sensor frame to the common frame. A large inspection
system employees more than one pair of sensors to cover a work piece. The point cloud saved
in each individual sensor frame should be merged into a common frame. Sensor calibration
error could enlarge the registration error, and make the ﬁnal result unacceptable. Since
the test work piece may not necessarily have any features to establish the correspondence.
And furthermore, very Each sensor frame shares very small overlaps. The registration is a
challenge task for multi-sensor inspection system.
5) On-line registration calibration between the ﬁnal point cloud and CAD model within
limited time. The color-coded error distribution map is the ﬁnal purpose of a quality inspection system, which is based on the registration between the ﬁnal point cloud and CAD.
A valid registration tells the distributions of manufacturing errors. Failure to gain a correct
error map costs expensive to the manufacturing.
6) Robust one-shot encoding and decoding algorithm is needed for environment reconstruction. Since the system needs to measure moving objects, and thus is sensitive to the

24

acquisition time. All necessary information for 3D reconstruction must be obtain within one
projection and acquisition. The encoding and decoding algorithm should simultaneously
satisfy the requirement of accuracy, robustness, and eﬃciency.
7) 360 degree navigation system with realtime 3D perception ability should be build up
in a limited space on a mobile robot.

1.4

Objectives of the Work

1) The development of the new 3D sensor model with less calibration parameters. Compared
with camera calibration, projector calibration is more complicated and has more calibration
errors. Therefore, a novel sensor model which can avoid the projector intrinsic parameters and orientation should be developed. 3D point is not calculated through traditional
triangularization method but through the plane induced parallax information.
2) Analysis, simulation, and veriﬁcation of the precision of the system was done for the
large scale structured light system. The eﬀect of random image noise was analyzed and
numerical evaluated. The precision of the designed and implemented system reached the
system requirement.
3) A robust surface coding method was developed to inspect the optically-challenging
objects. Monochromatic White and Black (WB) stripes instead of color patterns are utilized
to eliminate the eﬀect from material’s texture and color. A projection mask, dynamically
adjusting the illumination of a projector, is applied to avoid under- / over-exposure issues:
incident lights shooting onto the shiny or high reﬂection area are reduced and lights onto the
shadow or low reﬂection area are enhanced. An extrapolation model combined the projection
mask successfully solves the internal reﬂection problem. A pattern of WB stripes and its
inverse are sequentially projected onto the test material. The edges between white and black
stripes are estimated through an extrapolation model even internal reﬂection occurs.
4) A registration method was developed to merge multi-sensor frame into a common

25

frame, and thus a ﬁnal point that cloud could represent the entire work piece was derived.
Based on a pre-measured painted windshield, four individual sensor frame were transferred
into a common frame.
5) A color-code error distribution map was created to evaluate a given work piece to ﬁnish
the quality inspection task. People could easily ﬁnd out whether a work piece is qualiﬁed or
not.
6) A new structured light pattern based on the spatial-neighborhood strategy was proposed. The primitive of the pattern was based on the corner of the black-white checkerboard
pattern, called the X-point. The X-point provides more robust and accurate position compared with a conventional pattern in which the primitive is based on symmetrical symbols.
The orientation of the X-point is used to encode the primitive of the pattern.
7) Adding hyperbolic mirrors in front of the camera and the projector, an omni-directional
3D optical sensor for mobile robot navigation was developed. Its corresponding sensor model,
coding method, and calibration were developed.

1.5

Organization of the Study

The development of a structured light based 3D measurement system roughly consists of 3
steps: the working principle of the 3D sensor, calibration of the sensor, and surface coding
method for the system. For an accurate inspection system, multiple sensors are utilized to
enlarge the system. Therefore, additional registration is needed after each sensor rebuild
each individual point cloud. For a fast navigation system, a convex shaped mirror based
omni-direction imager is used to enlarge the system ﬁeld of view. Therefore, omni-direction
3D reconstruction navigation system is explained after the introduction of a traditional single
direction navigation system.
Chapter 2 explains the working principle of the 3D sensor. The corresponding sensor
calibration is given in chapter 3. Surface coding emphasizing single-shot or multi-shot al-

26

gorithms are introduced in chapter 4. Chapter 5 provides the registration algorithm to
transfer data in diﬀerent coordinates. Chapter 6 gives the development of omni-direction 3D
rebuild system. Current experimental results are given in chapter 7. Chapter 8 presents the
conclusions, discussions and further works.

27

CHAPTER 2
Development of a 3D Sensor Based on
Structured Light
The 3D shape of a work piece is conventionally measured by a Coordinate Measurement
Machine (CMM) via adopting a touching probe onto the inspected surface and acquiring coordinates. The nature of point-wise contact based method limits both CMM’s measurement
speed and the capability of inspecting complicated work pieces. Non-contact optic measurement method via machine vision, with the merits of fast speed, high inspection sample rate,
and overall simplicity of operation, becomes a popular alternative inspection method for
CMM.
The implementation of a structured light based inspection system is highly related with
its scale. A successfully designed and implemented system is capable to measure big work
pieces and remain high accuracy. A large scaled structured light system with long standoﬀ
distance, large Field Of View (FOV), deep Depth Of View (DOV), and multi-sensor raises the
inspection error, and thus lift the diﬃculty of its calibration. The ﬁnal inspection result for
such an inspection system carries the sensor calibration error, random image noise, and multisensor registration error. Traditional 3D reconstruction model requires intrinsic and extrinsic
parameters of both the camera and the projector. In order to reduce sensor calibration

28

error, an optimized 3D reconstruction sensor model with fewer parameters is developed. By
deﬁning a reference plane in the sensor model, the intrinsic parameters and the orientation
of the projector can be avoided in the 3D recovery calculation.
This chapter describes an innovative 3D reconstruction sensor model with less parameters.
Such a sensor model successfully solve a practical problem in industry application again large
scale.

2.1

Introduction of Imaging Devices

A Structured light based 3D scanner usually consists of a projector and a camera. The
camera obtains a perspective mapping from the world frame to the image from. The projector
provides additional constraints to limit the last degree of freedom in the perspective geometry,
and then achieve the 3D reconstruction. In order to optimize the 3D recovery, background
information of the imaging devices are given in this section: pinhole model approximation,
homography matrix between two planes, epipolar geometry between two pinhole model,
parallax imaging, lens distortion, Field of View (FOV) and Depth Of View (DOV).

2.1.1

A Pinhole Model of an Image Device

An imaging device, such as a projector or a camera, can be approximated as a pinhole model.
Since there is no diﬀerence between a camera pinhole model and a projector pinhole model,
this section introduces a camera pinhole model as an example.
The geometric model of pinhole camera consists of an optic point Oc and an image
frame OI uv on the focal plane I. The fundamental property of the perspective geometry is
that an image point p = [up , vp , 1]′ is collinear with the Oc and its corresponding point P =
[xP , yP , zP , 1]′ . The point Oc is called optical center, and the camera frame is Oc Xc Yc Zc with
the axis Oc Zc perpendicular to the image frame I. OI is the intersection point between axis
Oc Zc and plane I, which is called the principal point. Rotation matrix Rc and translation

29

Figure 2.1: The pinhole model of a camera. (For interpretation of the references to color in
this and all other ﬁgures, the reader is referred to the electronic version of this dissertation)
vector tc provide the geometric relationship between the world frame OW XW YW ZW and
the camera frame Oc Xc Yc Zc . The light array passing any points in world frame into the
image plane can be denoted as a 3 × 4 perspective transformation matrix Cc .

(2.1)

λc p = C c P

, in which λc is a scale factor. The Matrix Cc can also be factored into three parts: intrinsic
parameter Kc , rotation matrix Rc , and the translation vector tc in world frame, and Eq. 2.1
can be transferred into:

xP
up
 
 ∼
 yP 
 vp  = Kc Rc tc  
 zP 
1
1


fu α u0


Kc =  0 fv v0 
0 0 1






(2.2)

(2.3)

Symbol ∼ means the equation equals up to a scale factor. fu and fv in Kc are the focus
=
lengths along the u direction and v direction in image frame respectively. α is the skew

30

factor of a camera. For a square pixel camera, fu = fv = f and α is 0.
The original point of the camera frame Oc can also be described by rotation matrix Rc
and tc .

(2.4)

tc = −Rc Oc

Substituting Eq. 2.4 into Eq. 2.2, the perspective transformation of the pinhole model
can be expressed as:




up


=
 vp  ∼ Kc Rc
1

I, −Oc







xP
yP
zP
1







(2.5)

Back projection from Eq. 2.5, each light array from a pinhole model can be denoted as
Eq. 2.6. L denotes as the light array passing point P and λc is the scale factor.

L = Oc + λc (Kc Rc )−1 pp

(2.6)

In order to reconstruct a light array from a given pixel pp of a pinhole model. The parameters in the linear equation: intrinsic parameters, rotation matrix and translation vector are
needed.

2.1.2

Homopraphy Matrix from Single View Geometry

Planar objects, as one of the most common entities, is widely utilized in machine vision.
The link between the planar object and their corresponding image points under perspective
projection is described in this section.
Homography matrix describes the projective geometry of a camera image plane and a
world plane. Image of points on the image plane are related to corresponding points on
world plane by homography matrix as shown in ﬁgure 2.2. The planar surface normal is
deﬁned as the z axis of the world frame, and the z values of all points on the plane equal to
0. Light array passing the pixel p intersects the world plane at point P . The homography
31

matrix between two plane can be derived from the camera perspective mapping by Eq. 2.7.
The ith column of the rotation matrix Rc is denoted by ri .

Figure 2.2: (a) The homography matrix between the image plane and the world plane.

 
up
 
λ  vp  = Kc r1 r2 r3 tc
1






XP
XP
Y 
 P



 = Kc r 1 r 2 t c  Y P 
 0 
1
1

(2.7)

From Eq. 2.7, the deﬁnition of the image plane and the world plane can be expressed as
a 3 × 3 nonsingular matrix:

H = Kc r1 r2 tc

(2.8)

From the deﬁnition in Eq. 2.8, homography matrix H can be factorized into three parts:
the intrinsic parameters Kc , the ﬁrst two columns r1 and r2 of the rotation matrix Rc and the
translation vector tc . Utilizing the factorization information, homography matrix is usually
used to calibrate a camera or a projector by regressing into the Kc , R and t ??.

2.1.3

Epipolar Geometry and Fundamental Matrix

A typical structured light sensor consists of a camera and a projector. The epipolar geometry
is the intrinsic projective property between two the views. The most important property
32

of the epipolar constraint is that it is independent of scene structure, and only depends on
the imaging devices’ internal parameters and relative poses. The fundamental matrix F
encapsulates such an intrinsic geometry. Fundamental matrix F is 3 × 3 matrix of rank 2.

Figure 2.3: The epipolar geometry between two image planes.
Consider the case shown in ﬁgure 2.3, P is a point on the tested surface; pc and pp is the
projections of the point P on the image planes Ic and Ip respectively; Oc and Op are the
optical centers of the camera and the projector. Point P , Oc and Op form a plane, which is
deﬁned as the epipolar plane. The intersection of the epipolar plane with Ic is the epipolar
line lc , and the intersection between the epipolar plane with Ip is lp . The position of point
P varies for diﬀerent tested surface, but the correspondence point pp in Ip is always on the
epipolar line lp , which is the most important properties of the epipolar geometry. Each focal
point projecting on the other image plane forms point ec and ep , named epipolar or epipolar
point.
The point correspondence between pc and pp is:
pc F pp = 0
, in which F presents the 3 × 3 fundamental matrix between two pinhole model.

33

(2.9)

The correspondence between the pixel and its epipolar line is:
lp = F p c

(2.10)

lc = F T p p

(2.11)

The epipolar lines in both image are related by:

lp = F [ec ]× lc

(2.12)

lc = F T ep × lp

(2.13)

, in which [e]× is skew symmetric matrix of an epipolar e.
The epipolar plane is formed by an arbitrary point from scene and two optical points of
two pinhole models. Illustrated by ﬁgure 2.4, points P1 , P2 , and P3 forms three eqipolar
1 2
planes with the optical points Oc and Op . The epipolar planes intersect Ic at line lc , lc and
3
1 2
3
lc and intersect Ip at line lp , lp and lp . Since all the eqipolar planes share with the baseline

Oc Op , and ec and ep are the intersection points of baseline with Ic and Ip respectively.
Therefore all the epipolar lines in Ic intersections at point ec and all the epipolar lines in Ip
intersections at point ep . ec and ep is the null space vector of fundamental matrix F and
F T respectively.
The eipolars in both images are:

F ec = 0

(2.14)

F T ep = 0

(2.15)

The fundamental matrix F , presenting the epipolar geometry in two pinhole models, is
widely used solving correspondence problem in stereo vision. It is also one of the easiest
34

Figure 2.4: The pencil of epipolar lines in each image plane on the epipole.
parameters to be calibrated, since the calculation of fundamental matrix does not require
any foreknowledge of the scene. No calibration gauge is needed.

2.1.4

Plane Induced Parallax in an Imaging System

In the ﬁgure 2.5, Point P in the scene projects itself as pc and pp in image plane Ic and Ip .
Oc and Op are the two optical points of two pinhole models. The epipolar in each image
frame are ec and ep respectively. lc and lp are the corresponding epipolar lines in both image
W
planes. The ray P Op intersects the plane W at point Pp and ray P Oc intersects the plane
W
W
at point Pc . The images of point P and Pp are coincident points at pp in the image plane
W
W
c
W
Ip . pW = (Hc )−1 Pp = Hp pp is the projection of point Pp on the image plane Ic . The
c

vector between points pc and pW in the camera image plane is the parallax related with the
c
homography induced by the plane W . If the point P is in the other side of the plane W ,
and then the point pW will be on the other side of point pc along the epipolar line lc .
c
Point pc can be expressed by:

c
pc = pW + ρec = Hp pp + ρec
c

(2.16)

Since pc , pc and ec are collinear on lc , the scalar ρ and ec form the parallax vector relative

35

Figure 2.5: A plane W induced parallax in camera image plane.
c
to the homography Hp induced by the plane W . ρ is viewed as depth information related

with the plane W . If ρ = 0 then point P is on the plane W . The sign of ρ indicates the side
between point P and plane W . if ρ is smaller than 0, and then point P is behind plane W .
Otherwise, point P is before plane W .

2.2

Optimize the 3D Sensor Model via Parallax Information

2.2.1

Problem Statement

Traditional 3D reconstruction model based on structured light utilized the intersection between incident light and reﬂection light in the epipolar plane. As shown in ﬁgure 2.6, this
constraint is interpreted geometrically in terms of the rays in space corresponding to the two
image points. pp and pc are two projection points from Pw in the projector plane Ip and
camera plane Ic respectively. Op and Oc are optical points of the projector and the camera
respectively. The incident light from pp is named as Lp and the Lc denotes the reﬂection
light. From the epipolar constraints, pp , Op , Pw , Oc and pc are in the same plane. Pw is the

36

intersection point of Lp and Lc . Since pp , pc and Pw constructs a triangle in this plane, this
method is called triangulation method.

Figure 2.6: Traditional 3D reconstruction model.
The triangulation 3D reconstruction model requires to generate the back projection equations for the correspondence pixels:
Lc = Oc + λc (Kc Rc )−1 pc

Lp = Op + λp Kp Rp

−1

pp

(2.17)

(2.18)

The solution of Eq. 2.17 and Eq. 2.18 is the coordinate of point Pw . Due to the
calibration error or image noise, Lc and Lp may not be in the same plane resulting that there
is no intersection point between this two rays. The middle point of common perpendicular
is viewed as the solution in such a case.
This traditional structured light 3D model is known at least two limitations:

37

1) The ﬁrst one is the large amount of parameters to be calibrated. It includes the
intrinsic and extrinsic parameters of both imaging devices, together 22 parameters. The
more parameters to be calibrated, the more errors induced and the less accuracy is received
in the result point cloud.
2) The second one it the projector calibration. Due to the property of the projector, not
serving as an information receiving device, its calibration is more complicated than a camera
and contains more errors than a camera.
This thesis combines the camera planar calibration method into 3D reconstruction to
optimize the number of unknowns in the equation. The novel strategy is that the camera
calibration information based on planar method can be saved as the reference information.
The acquisition information of the test object is used to compare with the pre-saved reference information. The corresponding deviation between reference and test object is applied
to reconstruct the 3D point cloud, in which only few parameters need to be calibrated.
Therefore, the projector calibration is not necessary for 3D reconstruction anymore. The
correspondence function provided by the pattern design is not only used to set up correspondence between a camera and a projector, but also establishes corresponding pair of pixels in
camera between two measurements: the camera pixel recording reference information and
the corresponding camera pixel recording tested surface information.

2.2.2

Recording the Reference Information with Homography Matrix

If the camera intrinsic Kc and extrinsic parameters, Rc and tc , are calibrated corresponding
to plane W through Eq. 2.8, the light array vector from projector can be calculated without
projector intrinsic parameters and its rotation matrix. In ﬁgure 2.7 Lp coming from the
projector intersects reference plane W at point Pr and is reﬂected into the camera at pr ==
c
r
[ur , vc , 1]T by the light array Lr . tpc is the baseline distance between the projector and the
c
c

camera. By vector relationship, each light ray from the projector can be represented by
38

Figure 2.7: Recoding reference planar information.
reference point Pr and light array Lr :
c
−−
−→ −−
− → −−→
−−
O P Pr = O C Pr − O C O P

(2.19)

Since the camera was already calibrated with the plane W via the homography matrix
W
W
W
Hc . Point Pr can calculated through the deﬁnition of Hc : Pr = Hc pr . Substitute Pr ,
c

pr and camera calibration information into Eq. 2.6, the depth information λPr of point Pr
c
can be calculated by:

λP r =

|Pr − Oc |
(Kc Rc )−1 pr
c

(2.20)

−−
−→
Once the camera’s parameters are well calibrated with the reference board, Oc Pr can be
considered as known. Vector tpc is the only value to calculate the linear equation of vector
39

−−
−→
Op Pr from the projector, which also results that the intrinsic and orientation parameters of
the projector are avoided in reconstructing light vector coming out from a projector. The
plane information was saved as reference information and replaced projector parameters in
3D reconstruction.

2.2.3

Correspondence between Measurements

Figure 2.8: Correspondence between reference recording and work piece inspection.
Image patterns designed in projector plane carry unique information from pixel to pixel.
In the reference recording process, codeword carried in Lp establishes the correspondence
with light Lr . In the test surface measurement, codeword carried in Lp establishes the
c
correspondence with light Ls . Combing with two measurements, the correspondence between
c
Lr and Ls is set up in two measurements as shown in ﬁgure 2.8.
c
c
40

In ﬁgure 2.8, a tested surface S in blue color is located on the reference board. Lp from
the projector is shot onto the surface at point Ps instead of Pr on the reference board and
s
is reﬂected into the camera image plane Ic at point ps = [us , vc , 1]T by the light array
c
c

Ls . From the codeword provided in the light Lp , the correspondence betweenLr and Ls
c
c
c
is established. The Correspondence pixels pr and ps , which are both in camera frame, are
c
c
used to reconstruct point Ps . In contrast, traditional method constructs point Ps needs
correspondence pixels in diﬀerent image frames: projector frame Ip and Ic , which results
calibration in both image devices is needed instead of one.

2.2.4

Working Principle of the New Sensor Model

Figure 2.9: The optimized 3D model.

41

Shown in ﬁgure 2.9, the reference reﬂection light Lr can be written as:
c
Lr = Oc + (Kc Rc )−1 λr pr
c
c

(2.21)

The reﬂection reference point Pr on the reference W is:
Pr = Oc + (Kc Rc )−1 λPr pr
c

(2.22)

Kc is the camera intrinsic parameter. Rc is the rotation matrix, and Oc is the coordinate of
the camera optic point. λr is the depth along the array Lr and λPr is the depth of point Pr
c
along Lr .
c
The surface reﬂection light Ls denotes:
c
Ls = Oc + (Kc Rc )−1 λs ps
c
c

(2.23)

The incident light Lp is:

Lp = Op + Kp Rp

−1

λ p pp

(2.24)
p

The projection of projector optical point Op in camera image plane Ic at epipolar ec =
[ue , ve , 1]T . The coordinate of Op can be expressed in camera frame by:

p

Op = Oc + (Kc Rc )−1 λOp ec

(2.25)

, in which λOp the depth along array Op Oc .
Since point Op and Pr can be expressed in the camera frame, substitute Eq. 2.25 and
Eq. 2.22 into the following:
−−
−→
Lp = Op + λ Op Pr

(2.26)

The incident light, which previously be expressed in the projector frame by Eq. 2.24, can
be written in camera frame by:
42

p

p

Lp = Oc + (Kc Rc )−1 λOp ec + λ λPr pr − λOp ec
c

(2.27)

Through Eq. 2.23 and Eq. 2.27, an equation of depth λs can be written by:

p

p

λs ps = λOp ec + λ λPr pr − λOp ec
c
c

(2.28)

Extend Eq. 2.28, we can get:

λs ups = λOp ue + λ λPr upr − λOp ue

(2.29)

λs vps = λOp ve + λ λPr vpr − λOp ve

(2.30)

When the depth λs equals the following, the surface point Ps can be derived.
λOp λPr ue vpr − ve upr
λPr ups vpr − vps upr − λOp ups ve − vps ue
Transfer the Eq. 2.31 into the following:

(2.31)

λs = λPs =

λOp λPr
λs =

u e u pr
ve vpr

(2.32)

u ps u pr
u ps u e
λP r
− λOp
vps ve
vps vpr

The Eq. 2.32 can be simpliﬁed into a determinant form:
ue
ve
1
λOp

u pr
vpr
1
λP r

u ps
vps
1 =0
λPs

(2.33)

Substitute λs into the Eq. 2.23, The point of Ps can be calculated:

Ps = Oc + (Kc Rc )−1

λOp λPr ue vpr − ve upr
λPr ups vpr − vps upr − λOp ups ve − vps ue

43

ps
c

(2.34)

From Eq. 2.34, the 3D reconstruction is all done in the camera frame via the pixel location
deviation between two measurements: reference recording and surface measurement. The
p

parameters that need to be calibrate are: Oc Kc , Rc , ec , λOp . The parameters λPr , upr and
vpr are calculated though the reference recording process.
Parallel Conﬁguration
In last section, 3D reconstruction has been achieved only the camera frame by adopting the
pixel deviation between two measurement as the system input. Projector calibration can be
avoided, resulting that less parameters is utilized in the 3D recovery model. If the baseline
Op Oc is conﬁgured parallel with the reference plane W , the 3D recovery can be even more
easier via similar triangle theory. Shown in ﬁgure 2.10, The base line Op Oc is parallel with
the reference recording plane W , so translation vector tpc can be consider as one scale value
as d.
Both imaging devices, the projector and the camera, share with the same standoﬀ distance
s
s
S. Pr is the intersection point between plane W and extension of line Oc P s, and Pr can be

consider the projection of Ps on plane W when viewing from the camera. hP s denotes the
s
height of point Ps . Since line Op Oc is parallel with Pr Pr , triangle Ps Op Oc is similar with
s
the triangle Ps Pr Pr . The ratio of similar triangles is constant:
s
hP s
Pr Pr
=
S − hP s
d

(2.35)

s
s
Pr Pr denotes the length between point Pr and Pr in the reference plane and it can be

transferred into the camera image frame by:

s
s
w
w
Pr Pr = |Pr − Pr | = |Hc (ps − pr )| = |Hc psr |
c
c
c

(2.36)

The point Ps can calculated through the following equations:

hPs =

s
w
(Pr Pr ) S
|Hc psr | S
c
=
sP
w
d + Pr r
d + |Hc psr |
c

44

(2.37)

Figure 2.10: 3D sensor model in the parallel conﬁguration.

  



u ps
f 0 u0
xPs
  



λ  vps  =  0 f v0  Rc I −Oc  yPs 
0 0 1
hPs
1

2.3

(2.38)

Chapter summary

For a large scale structured light sensor with multi-sensor, large ﬁeld of view, and long
standoﬀ distance, the work principle of a 3D measurement sensor is critical for the system. It
could greatly aﬀect the system performances. A novel 3D reconstruction model is developed.
Compared with the traditional structured light sensors, the intrinsic and rotation of the
projector are avoided in the 3D reconstruction model. Since the projector calibration error
is the major error source, the developed 3D sensor cloud further reduce the calibration error
and improve the system accuracy.

45

CHAPTER 3
Sensor calibration
Sensor calibration is to regress of the parameters in the 3D reconstruction model. It usually
consists of two parts: the consistence evaluation of the system and the calibration method
to regress of parameters. The two problems left to be solved from Eq. 2.37 and Eq. 2.38 in
sensor calibration are summarized as:
1) Error analysis and consistence evaluation of the input psr based on the system conﬁgc
uration;
2) Camera calibration and regression of translation vector S and d.

3.1

Precision Analysis of the Given Sensing System

Precision is also called the consistence of the system. People always confuse it with the
concept of accuracy. In fact these two are totally diﬀerent concepts. Precision is a value to
evaluate the system’s repeatability. The accuracy is a value describing the diﬀerence between
the ground truth and the mean of the multiple measurements. The diﬀerence between
these two concepts are shown in ﬁgure 3.1. y axis denotes the probability density and x
axis denotes the measurement value. Precision can be viewed as the standard deviation
of multiple measurements, which represents the repeatability. Accuracy is the deviation
between reference value (ground truth) and the mean of multiple measurement. There is a
46

signiﬁcant diﬀerence between these two values: the accuracy of the system can be improved
by calibration, but the precision can not. Because precision is mainly aﬀect by the random
image noise from the environment.

Figure 3.1: The diﬀerence between precision and accuracy.
Two examples are given in ﬁgure 3.2. In case (a), system’s accuracy is high but with low
precision. The diﬀerence between ground truth (the center of the rings) and the mean of
all the dots is small. The system can be considered as a well calibrated system with high
accuracy. However, Industry can not use such a system, since neither measurement is able
to provide correct information. The diﬀerence between the ground truth is almost random,
which means the system can not improved by calibratoin. In case (b), System’s accuracy is
low but with high precision. The diﬀerence between ground truth and the mean of all the
dots is big. The system can be considered as a poor calibrated system with high precision.
Industry is able to utilize such a system by simply adding an oﬀset into the measurement
result.
calibration can not improve the precision. In contrast, poor precision may signiﬁcantly
lower the calibration accuracy. The calibration process utilizes the measurement of the gauge
to regress of the model parameters. If the measurements have too big standard deviation,
the calibrated parameters will contain big errors. In the calibration process, mean of multimeasurement of a given calibration gauge is used to regress of parameters. In the industry
47

(a)

(b)

Figure 3.2: (a) High accuracy with low precision; (b) low accuracy with high precision.
application, a proper precision result is critical for the system. Calibration accuracy is
bonded by the system precision.

3.1.1

Uncertainty Analysis of an Imaging Device

Consistence performance of a 3D inspection system is mainly from the random image noise
from the surrounding environment. The random image noise increases corresponding with
the standoﬀ distance S. Larger inspection range results a longer standoﬀ distance. Traditional inspection ﬁeld ranging from 10,000 to 50,000 mm2 has relatively lower image noise,
which can be ignored in measurement process. When a sensor’s Field Of View (FOV) reaches
about 2,400,000 mm2 , uncertainty analysis of random noise needs to be considered. Proportional with the FOV, the standoﬀ distance is almost 10 times larger than the traditional
application, resulting big random image intensity noise from environment. Larger FOV also
means that each pixel covers bigger area. Same level of sub-pixel variation in the camera
image gives bigger 3D reconstruction error.
In ﬁgure 3.3, The pattern is shot from a projector to a ﬂat object at point Xi . Pixel
′
Ii is the projection from the point Xi in the world frame. Due to the random noise, Ii is

applied instead of Ii to construct the light array which intersect the plane object at point
′ ′
′
Xi . The angle between Ii Xi and Ii Xi is θ. The focus length is f , and S denotes the standoﬀ
′
distance. The error between Xi and Xi is deﬁned as e:

48

Figure 3.3: Camera error model.

T

Figure 3.4: Projector error model.

49

′
e = Xi − Xi ≈ S ∗ θ

(3.1)

Same principles with a camera model, the error model of a projector is also approximately
proportional with the standoﬀ distance S, which is shown in ﬁgure 3.4. Longer standoﬀ
distance has worse system precision. An inspection system with a given design parameter
standoﬀ distance S needs to measure the precision to decide whether it needs further adjust
or not.

3.1.2

Simulation of the System Error Model

From Eq. 2.37, once the input psr has a certain number of uncertainties, the height error
c
will be:

∆hPs =

S
d
1 + w sr
H c pc

−

S
d
1 + w sr
|Hc (pc + ∆psr )|
c

(3.2)

W
Homography matrix Hc is the mapping from the image plane to the reference in Eq.
S
2.36. Pr Pr is the distance in the reference plane, and psr is the pixel distance in the
c

camera image plane. In the current system, the camera is conﬁgured almost parallel with
the reference plane to reach big FOV. So we can assume that the resolution of the camera
is a constant value k which transfers the pixel length to the physical length on the reference
plane. So the simulation formula can be approximately expressed as:

∆hPs =

S
1+

d
k |psr |
c

S

−
1+

d
sr | + k |∆psr |
k |pc
c

(3.3)

, in which k represents the resolution scalar factor to convert the unit from pixel into millimeter. Assuming the system has 0.1, 0.2, 0.3, 0.5 pixels’ error, the error in height are shown
in the ﬁgure 3.5 . S is designed as 1600 millimeters. d is designed as 500 millimeters, and k

50

equals 0.7 mm/pixel. The y axis in ﬁgure 3.5 denotes the height error amount and the unit is
millimeters. The x axis is input value of the sensor model: psr , the deviation in the camera
c
image frame between point ps and point pr . As the error increases from 0.1 to 0.5 pixels,
c
c
the error of height increases. In each sub-ﬁgure in ﬁgure 3.5, the error curve drops as the
input psr increases. psr in the sensor model is costed by the height deviation between point
c
c
Ps and reference plane. Large psr means that point Ps has a big height from the reference in
c
ﬁgure 2.10, which results the standoﬀ distance between the sensor to the point Ps becomes
shorter. From the Eq. 3.1, a shorter standoﬀ distance results smaller error.

delta m: 0.1pixel

delta m: 0.2pixel
0.5
Error in Height

Error in Height

0.25
0.2
0.15
0.1
0.05
0

100
200
Input m pixels
delta m: 0.3pixel

0.2
100
200
Input m pixels
delta m: 0.5pixel

300

100
200
Input m pixels

300

1.5
Error in Height

Error in Height

0.3

0.1
0

300

1
0.8
0.6
0.4
0.2
0

0.4

100
200
Input m pixels

300

1
0.5
0
0

Figure 3.5: Error analysis results: height errors when input error equals 0.1, 0.2, 0.3, 0.5
pixel.

51

Table 3.1: The standard deviation results for all sensors

Sensor one

Sensor three

Sensor four

0.17 mm

3.1.3

Sensor two
0.08 mm

0.08 mm

0.13 mm

System Precision Test Results

An inspection gauge is measured 10 times. The standard deviation is used to describe the
precision of the system. The mean of the height map in each pixel is calculated by:
1
¯
h(u, v) =
n

n

hk (u, v)

(3.4)

k=1

, in which u and v stand for the column and row location of a pixel;n is the total measurement
times, and k is the measurement order. The standard deviation of each pixel is:

1
n

Std (u, v) =

n

2
¯
hk (u, v) − h (u, v)

(3.5)

k=1

The standard deviation for each couple of projector and camera can be calculated by:

Stds =

1
nu · nv

Std (u, v) , u = 1, ..., nu ; v = 1, ..., nv
u

(3.6)

v

where nu and nv are maximum row and column number of the camera resolution. In our
application, they are 2044 and 2002, respectively. Current four precision results for four
sensor are given in table ??

52

3.2
3.2.1

Sensor Calibration
Calibrate Homography Matrix between the Camera Image
Plane and the Reference Plane

Apply the deﬁnition of homography matrix in Eq. 2.7:

 
ui
 
W
 vi  = Hc
1




Xi
 
T
T
 Yi  , Pi = [Xi , Yi , 1] ∈ W, pi = [ui , vi , 1] ∈ Ic , i = 1, ..., n
1

(3.7)

, in which pi and Pi are a pair of corresponding pixel in image plane and point in reference
plane. n is the number of corresponding pairs used to calculate the homography matrix.
Convert the H matrix in vector form as h = (h11 , h12 , h13 , h21 , h22 , h23 , h31 , h32 , h33 )T , and
substitute it into the Eq. 3.7. The homogeneous equations for n points become Ah = 0,
with A the 2n × 9 matrix:


u1 v1
0 0


 u2 v2

0 0

A=






un vn
0 0


−v1 X1 −X1
−v1 Y
−Y1 


−v2 X2 −X2 

−v2 Y2 −Y2 








1 0 0 0 −un Xn −vn Xn −Xn 
0 un vn 1 −un Yn −vn Yn −Yn
1 0 0
0 u1 v1
1 0 0
0 u2 v2

0 −u1 X1
1 −u1 Y1
0 −u2 X2
1 −u2 Y2

(3.8)

The solution of Ah = 0, can follow the standard procedure which can minimize the
algebraic residuals |Ah|. Via singular value decomposition (SVD), the null space vector h is
the solution which can minimize the residuals.
Fig. 3.7 (a) shows the calibration method to calculate the homography matrix between
the chessboard and the camera image plane. Fig. 3.7 (b) gives a top view of the chessboard.
The camera frame is Oc xc yc zc . Oc is the optic center of the camera and zc axis is
perpendicular to the image plane IC . The frame Ow xw yw zw is the world frame. The surface
53

normal of the chessboard is deﬁned as the z direction of the world frame. Both frames follow
the right-hand rule. The homography matrix can be viewed as a part of the perspective
projection matrix Cc without the third column.

Figure 3.6: (a) The illustration of homography matrix calculation; (b) The used chessboard.

3.2.2

New Calibration Tasks of the Camera

The limitation of the traditional homography based calibration method lies in the regression
process from homography matrix to the intrinsic and extrinsic parameters of the image
devices. The regression solution goes into ill-conditioned when imaging devices are parallel
with the inspection plane. The intrinsic and extrinsic parameters are used to construct a
54

light array from a given pixel such as the array in Eq. 2.17. If the light arrays can be
expressed in diﬀerent ways, the regression process can be avoided.
Deﬁne a 3 × 3 non-singular matrix H, and multiply the pixel pc with (H −1 H), Eq.2.17
is changed into:

Lc = Oc + λc (Kc Rc )−1 H −1H pc

(3.9)

The equation of the light Lc can be transferred into

Lc = Oc + λc (HKc Rc )−1 (Hpc )

(3.10)

The new calibration tasks of the Eq.3.10, which is mathematically not diﬀerence with
Eq.2.17, are changed from ﬁnding Kc , Rc and Oc into:
1) Choose a proper 3X3 non-singular matrix H and ﬁnd the matrix multiplication result
of (HKc Rc ).
2) Find the translation vector Oc ;
The forward transformation of Eq.3.10 can be expressed as:




uc


H  vc  ∼ HKc Rc
=
1

I, −Oc







x
y
z
1







(3.11)

′
N N
Deﬁning p′ = (u′ , vc , 1)T = Hpc and Kc Rc = HKc Rc . Eq.3.11 is changed into the
c
c

following:




u′
c
 ′ ∼ N N
 vc  = Kc Rc
1

I, −Oc







x
y
z
1







(3.12)

N
N
, in which Kc is intrinsic parameter of a new virtual pinhole model, Rc is the new rotation

matrix of the new pinhole model, and p′ is the new pixel location on the new image plane.
c

55

Figure 3.7: The light array passing two pinhole models.
In Fig. 3.7, Optic point of the camera denotes point O, image plane Ic has the distance
′
of f to the optic point. The virtual image plane denotes Ic , with distance f ′ to the optic
′
point. Homography matrix between image plane Ic to the Ic is the 3 × 3 matrix H. The
N
intrinsic parameter of both models are Kc and Kc respectively. The light array L can be

both constructed from pixel (u, v) or by the pixel (u,′ v ′ ). If the homography matrix between
two pinhole models is known, the light array equation can be reconstructed in both ways
depending which one is easier to calculate.

3.2.3

The Choice of Non-singular Matrix H in New Calibration
Tasks

By factorization the homograhy matrix into the traditional format of Eq. 2.8, camera calibration [75] can be processed by iterations to ﬁnd K, r1 , r2 , r3 = r1 × r2 , tc , and tc = −Rc Oc to
ﬁt parameters existed in the light array equation. Traditional calibration method stops facc
torizing homograohy matrix here. However, the homography matrix Hw can be continually

factorized:

56

c
H w = Kc

Rc i Rc j Rc −Oc

(3.13)

, in which i = [1, 0, 0]′ and j = [0, 1, 0]′ are two unit vectors along x and y directions in
the world frame respectively. Oc = [xc , yc , zc ]′ is the coordinate of the optic point Oc in the
world frame. Eq. 3.13 transfers into the following form:





1 0 −xc


c
HW = Kc Rc 0 1 −yc 
0 0 −zc

(3.14)


1 0 −xc


matrix 0 1 −yc  seems a normal 3 × 3 matrix. In fact it is an inverse matrix of new
0 0 −zc
pinhole model’s intrinsic parameter matrix:



−1
1 0 −xc
−zc 0 xc




0 1 −yc  = (−z)  0 −zc yc 
0 0 −zc
0
0
1

(3.15)

Deﬁne a new pinhole model’s intrinsic parameter matrix as:



−zc 0 xc


K N =  0 −zc yc 
0
0
1

(3.16)

The ﬁnal factorization of the homography matrix can go to:

−1
−zc 0 xc


c
HW = Kc Rc (−z)  0 −zc yc 
0
0
1

(3.17)

The calibration board and the optic point form a new virtual pinhole model, in which
new intrinsic parameters are expressed by the coordinate of the optic point Oc in world
frame. The absolute value of zc can be considered as new focus length. The image center
is [xc , yc ]T . The new pinhole model can be viewed as a ’square pixel camera’, since focus
lengths along x direction and y direction are the same and skew factor equals to 0. Since
the new pinhole model is deﬁned in the world frame, the rotation matrix between the world
57

frame and the new camera frame is an 3 × 3 identity matrix. Substituting Eq. ?? into
Eq. ??, the relationship between the image pixel and the point on the calibration board is
expressed by the following equation:



 
u
x
∼
 
NI 
K
 v  = Kc Rc  y 
1
1

(3.18)

w
c
Choose Hc = (Hw )−1 as the 3X3 non-singular matrix H in calibration tasks in section

III and substitute it into Eq. 3.11:




u′
c
′
vc







−zc 0 xc
∼
 −1 −1
 =  0 −zc yc  R K KR
1
0
0
1

I, −tc

The reconstruction of the light array changes into the ﬁnal form:







x
y
z
1









−1


xc
−zc 0 xc
uc





w
Lc =  yc  + λc  0 −zc yc  Hc  vc 
zc
0
0
1
1


(3.19)

(3.20)

The forward transformation matrix becomes:

 
 
    
 x
x
′
uc
uc
−zc 0 xc 0  
 
  
 y 
′ y 
w 
Hc  vc  = v ′ c  =  0 −zc yc 0    = Cc  
z 
z 
1
0
0
1 −zc
1
1
1

(3.21)

′
, in which −zc equals to the standoﬀ distance S. The new perspective matrix Cc only
w
contains the translation vector Oc . Hc is the transformation from a pixel pc in the old

image plane to a pixel p′ in the new image plane.
c
   
u′ c
uc
 
w 
Hc  vc  = v ′ c 
1
1

(3.22)

w
After the homography matrix Hc was chosen as the 3 × 3 nonsingular matrix in the
W
new calibration task, the multiplication of (Hc Kc Rc ) can be easily expressed by three

58

elements of Oc . And now the only unknown is Oc in the camera pinhole model equation.
The traditional linear or nonlinear regression to ﬁnd intrinsic and extrinsic parameters to
rebuild the light array equation is avoided. The new image plane is mapped into a plane
perfect parallel with the reference plane.

3.2.4

Calibration of Camera Translation Vector Oc and Baseline
Distance d

Figure 3.8: Multi-plane calibration strategy.
A multi-plane pixel to pixel calibration strategy has been developed for current system.
The chessboard shown in ﬁgure 3.6 is lifted up multiple times and its correspondingly position
is recorded for calibration. In order to eliminate the random image noise in recording, all
the recordings in the following calibration process are measured 5 times and the mean of
measurement results is used in data regression.
1
¯
I (u, v) =
m

m

Ik (u, v), m = 5

(3.23)

k=1

Ik (u, v) in Eq. 3.23 represents the intensity value of a pixel [u, v]T of the k th recording.
The calibration procedure can be summarized as follow:
1) Place the chessboard as reference board underneath of a sensor, and the homography
w
r
matrix Hc and lens distortion are calculated via the corners coordinates [xr , yw , 0, 1]T in
w

59

r
the world frame and their corresponding pixel locations ur , vc , 1
c

T

.

2) The Grey Code Line Shifting (GCLS) is projected onto the chessboard and is captured
by the camera as the reference cording.
3) Lift the chessboard at height h1 called layer 1.

Substitute the point Pw =

w
i
i
[xi , yw , hi , 1]T , the corresponding camera pixel [ui , vc , 1]T , and Hc into Eq. 3.21.
c
w

4) Project GCLS onto the broad and ﬁnd the correspondence by the codeword explained
in ﬁgure 2.8. The pixel distance psr (h1 , u, v) is calculated based on the correspondence in
c
each pixel u = 1, ..., nu , v = 1, ..., nv . Since every point on the board has the same height,
psr (h1 ) is:
c
1
nu nv u v
, where nu and nv are the column number of the
psr (h1 ) =
c

ph1 (h1 , u, v)
c

(3.24)

camera 2044 and row number 2008, re-

spectively. Each layers’s psr (h1, u, v) distribution is approximated as a Gauss curve shown
c
in ﬁgure 3.9, which is due to the random image noises and some implementation errors.

Figure 3.9: The distribution of psr (h1 , u, v) for each layer.
c
w
Substitute Hc and psr (h1 )
c

5) Repeat the step 3) and 4) 9 times and each time at diﬀerent height.
6) Solve Eq.3.21 by the collected data: Transfer the Eq.3.21 into the following:
60



′







i
− Pw

0
i
Pw
′

T

i
−ui Pw
c

T

′

i
u i Pw
c

′

i
v i Pw
c



′T
 c1
c 
  ′T
′
i T
−v i Pw  c2 
 c 
c
 3 ′T
cc
0

0
T

T

T





=0

(3.25)

′
, where ci is the i-th row of camera perspective matrix Cc . The form of Eq. 3.25 can be
c

transformed into Ac = 0. For a set of n pairs of correspondence points, a 3n × 12 matrix A
by stacking up Eq. 3.25 for each correspondence. c is the right null-space vector, and can
be solved by SVD method.
w
7) Substitute Hc and psr (h1 ) into Eq. 2.37, and change the format into:
c



h1


 ...

hn


−h1 psr ′ (h1 )
c
 d





=
...
...
 S

sr ′ (h )
sr ′ (h )
pc
−hn pc
n
n
psr ′ (h1 )
c





(3.26)

w
, where psr ′ = Hc psr . S and d are solved by least square method.
c
c

8) Residual function compensation for each pixel: The sensor model requires that the
baseline is parallel with reference plane. The implementation error makes the real case is a
little oﬀ from the ideal case. In order to further decrease the errors, the residual compensation
function for each pixel of the camera is approximated by a second order polynomial. The
Residual function for each pixel is the diﬀerence between:

e (u, v) = hn (u, v) −

S
d
1 + w sr
|Hc pc (hn , u, v)|

(3.27)

Since the diﬀerence between the ground truth and the system measurement value, the
residual function can be approximately by Taylor expansion. A second order polynomial is
used to approximate each pixel’s residual function in Eq. 3.28.

w
w
e (u, v) = A (u, v) |Hc psr (hn , u, v)|2 + B (u, v) |Hc psr (hn , u, v)| + C (u, v)
c
c

61

(3.28)

Figure 3.10 illustrates a result of residual function approximation of a given pixel u =
80, v = 2. The x axis is the value of pixel distance |ps r(u, v)| when measuring diﬀerent
c
heights, and y axis is the diﬀerence between the ground truth and measurement results (unit
in millimeter). The blue curve is the residual between ground truth and measurement result,
and the red one is the second order polynomial approximation. From the ﬁgure, we can see
that the error reaches 1 millimeter before the error compensation and becomes less than 0.2
millimeter after compensation.

Figure 3.10: The second order polynomial to approximate the residual function for a pixel

3.3

Chapter Summary

A complete calibration method is developed for the structured light sensor. All the parameters in the sensor model can be successfully regressed. In order to further increase the
62

accuracy, the residual function for each camera pixel is approximated by a second order
polynomial. The consistence of the developed system is analyzed, simulated and tested. The
illed-condition problem in camera calibration problem is avoided by applying homography
mapping.

63

CHAPTER 4
Surface Coding for a Structured Light
System
The encoded patterns, which are called structured light, are projected from the projector
onto the test work pieces and are recorded by the camera. The codewords carried in reﬂection lights are extracted in camera frame within the decoding process. With the aid from
structured light, the camera and projector pixels sharing same codeword are corresponded
to obtain 3D coordinates of the test work piece. The measurement result, the point cloud
representing for the work piece, can be further used for diﬀerent applications, such as quality
inspection for industry, mobile robot navigation, and so on.

4.1

Introduction of the Surface Coding Methods

The tasks for the surface coding process is to set up the correspondence between image
frames: the camera frame and the projector frame. The projected codewords should be preserved from the eﬀect of the test objects distortion, such as distortion from their geometries,
textures, and optical properties. The corresponding decoding process should correctly extract codeword. The eﬃciency, robustness, density, and its accuracy are the most important
characters to evaluate the part.
64

The eﬃciency of the coding methods is usually revaluated by the system acquisition time.
The codewords could be encoded and decoded with less projection patterns. Therefore, the
system could ﬁnish the measurement process with less time. The robustness of the coding
method is the ability to avoid the albedo aﬀect from the tested objects. The reﬂection light
from the tested objects to the camera is highly related with the surface optical properties.
The textures, colors, and reﬂection coeﬃcients could disturb the decoding process. The
encoding and decoding process should be designed against the optical properties of the objects. The density of the encode codewords controls how dense the measurement point cloud
is. More codewords could sample more 3D points from the objects and thus obtain more
geometry information of the part. A dense sample result is critical for industry inspection
application, as errors could be reﬂected easily by a high density point cloud. The codewords
are encoded / decoded in the projector frame / camera frame respectively. It is very important to accurately extract the codewords from both image frames. The image coordinates of
the codewords are used to reconstruct the 3D measurement result. Therefore, the accuracy
in extracting codewords determine the accuracy of the measurement result.
According to the acquisition time, the coding methods could be roughly separated into
two categories: multi-projection and single-projection. Usually a multi-projection method
takes more time than a single-projection method, as more projecting-recording loops are
contained in a multi-projection method. A single-projection method encodes the codewords
within one projection pattern, while a multi-projection method creates the codewords via
several patterns. The eﬃciency of a coding method is highly related with the capability
that the system could measure. A moving object can only be measured by single-projection
method. However, the disadvantage is that its robustness, density, and accuracy are usually
lower than the multi-projection method. Therefore, any applications that emphasize on the
measurement time not the accuracy prefers to utilize the single-projection method.
Multi-shot strategy is only available for stable objects but it is more robust against image
noise than one-shot method. After detecting the illumination problems, such as shadow

65

problem, under- / over- exposure problem and internal reﬂection problem, the projection
pattern could be readjusted by a proper feedback. Since more projections are applied to the
objects, more codewords can be encoded into the patterns. Therefore, the advantages of the
multi-projection method are the robustness, density, and accuracy. The multi-projection is
usually used in inspection application for quality control in industrial application.
Emphasizing on diﬀerent performance factors, the coding methods are explained in two
categories: single-shot projection method and multi-shot projection method.

4.2
4.2.1

Single-shot Coding Method
Pattern Generator

The patterns based on spatial neighbors can be generated shown in ﬁgure 4.1. The generation
procedure starts with the top left 3 × 3 random matrix. Then, the window moves to the
next position. The Hamming distances between the new primitive codeword and all of those
previously generated are calculated, where the Hamming distance is deﬁned as:
9

h(uv,u′ v′ ) =

δk , δk =
k=1

0 if puv = pu′ v′
1 otherwise

(4.1)

It is shown in ﬁgure ﬁgure 4.1 that we can obtain better results when the Hamming
distance is greater than 3. If all the distances are no smaller than a design threshold, the
primitive is accepted and the next iteration was performed until all the gaps are ﬁlled. If
not, a new primitive is assigned. The window is successively shifted from the left to the right
with a step size of ∆x pixels, from the top to the bottom with a step size of ∆y pixels. The
shifting steps can be adjusted to satisfy the requirement of construction resolution as long as
the primitive can be identiﬁed clearly in the camera image plane.For example,some critical
region of the inspected part need higher resolution than other regions on the part.

66

Ix
IY

Figure 4.1: The pattern generation procedure.

4.2.2

Primitive Design

Designing a good pattern is critically important for achieving accurate correspondence with
the optical triangulation technique, especially for one-shot method. The primitive design
should satisfy the following constraints:
(a) monochromatic light and
(b) robust and accurate detection.
Taking the monochromatic light into account, the symbols should not contain color coding
information. Hence, the symbols with geometrical feature will be adopted in this study,
instead of the traditional color based coding patterns. Therefore, the image features should
be accurately extracted. The center symmetric primitive, such as circle, disk etc., is widely
used for fringe patterns and the intensity centroid of the symbol is regarded as the symbol

67

location. However,it is inaccurate when shadows and occlusions occur.
As shown in ﬁgure 4.2, an inspected part with a step in the scene hides part of the
projected pattern from the camera. Hence, a partial occlusion aﬀects the centroid position.
In this design, the strategy determines the symbol location with the corner of high contrast
checkerboard. Because it only uses the local information of this hidden pattern, it does not
inﬂuence the accuracy performance.

Figure 4.2: An example of occlusion.

Figure 4.3: The comparison of diﬀerent primitives.

68

Figure 4.4: The primitive value.
Thus, portions of the X-point region hidden by smudges or occlusions are disregarded
with no signiﬁcant impact on accuracy as shown in ﬁgure 4.2.
Additionally, the disk in an image does not contain any geometrical information other
than their center location. The X-points has both a location and an orientation. This
additional characteristic is used to discriminate the diﬀerent primitives in pattern. As shown
in ﬁgure 4.4, the red arrow denotes the orientation of the primitive and the corresponding
value. The line connecting the neighbor primitives in the horizontal direction is considered
as the base axis. The angles between the principle axis of the primitives and the base axis
are 0◦ , 45◦ , 90◦ , and 135◦ and the corresponding values are 0, 1, 2, and 3, respectively.

4.2.3

Primitive Detection and Codeword Decode

Once the pattern is projected on the scene, a single frame is captured by the camera. The
image processing is simple because the primitives are designed on the black background and
separated enough. There is a strong gradient in the image intensity around the primitive
boundary. Hence, the contour of the primitive is easily detected and can be implemented in
real-time. Additionally, it is less sensitive to reﬂectivity variation and ambient illumination
than threshold based segmentation.
Consequently, the moment and principal axis method is adopted to recognize the primitive. The moment of the geometrical primitive is represented as:
∞

Mij =

xi y j f (x, y)dxdy

−∞

69

(4.2)

The coordinate of the mass center is denoted as:

Xm = M10 /M00 , Ym = M01 /M00

(4.3)

A central moment is basically the same as the moment just described except that the
values of x and y used in the formula are displaced by the mean values:
∞

CMij =

−∞

(x − Xm )i (y − Ym )j f (x, y)dxdy

(4.4)

The angle between the principal axis and x axis is:
1
α = a tan
2

2CM11
CM20 − CM02

(4.5)

Hence, the mass center of the contour is detected by Eq. 4.3 and is regarded as the initial
rough location of the X-point. Then, the ﬁnal location of the X-point is determined by the
Harris algorithm within a local region for corner detection. Consequently, the principal axis
is extracted by Eq. 4.5. In fact, two perpendicular axes can be extracted: the long and short
axes. The long axis is regarded as the principal axis. Then, orientations of the principal
axis and the base axis are compared to determine the value of the primitive. If diﬀerence
between the principal and base axis is lower than an experimentally predeﬁned threshold,
then this primitive is determined. The image processing for the pattern with 5 × 5 primitives
is shown in ﬁgure 4.5.
Once each primitive value is determined, the next step is to ﬁnd out the eight corresponding neighbor primitives to determine its codeword. The closest primitives in the eight
directions are easily found with the help of the direction of the long axis. Next, the codeword of this primitive depends on its value and its neighbor values. For example, a primitive
codeword is determined as ﬁgure. 4.6.
When each value of the primitive on the camera image plane is obtained, the corresponding pixel matching will be performed. The left upper primitive on the camera image plane

70

Figure 4.5: Image processing: (a) contour extraction; (b) location detection; and (c) principal axis identiﬁcation.

Figure 4.6: Determination of the codeword for each primitive.
is selected as the matching window to ﬁnd the corresponding primitive on the projector image plane. Then, the matching windows both on the camera and projector image plane are
shifted to the next primitive from left to right, and from top to bottom. The procedure will
be performed by are cursive search algorithm until all corresponding primitives are found.
71

4.3
4.3.1

Multi-shot Projection Coding Method
Problem Statement

Although some commercial 3D area sensors have been developed and been utilized in quality inspection, the technology still need to conquer several challenges to extend its further
applications. One of the biggest challenges is its weakness against the optical properties of
the test objects. Optically-challenging objects, such as painted parts with diﬀerent color,
materials with diﬀerent reﬂection coeﬃcients, and shiny punched metal work pieces, have to
be pre-treated into diﬀuse reﬂection surface before inspection. The textures and colors of
objects could inﬂuence a color based structured light. The diﬀerence of reﬂection coeﬃcients
can result in under- / over- exposure problem, which brings out a dilemma to adjust the
camera aperture: a bright part requires a small aperture, while a dark area needs a big
aperture. The shiny work pieces induce the internal reﬂection problem: a reﬂection light
from a scene point is not only related with its incident light but also the incident lights in
the neighborhood. A low illuminated area could be lighted up by a bright area in its neighborhood and thus results a wrong codeword. In summary, there is a demand of a structured
light system that is fully capable to measure optically-challenging objects and to cope with
the following problems:
1) Color and texture problem: An Object has diﬀerent colors and textures. A color
based codeword could be inﬂuenced from the test object and therefore generates a wrong
codeword.
2) Shadow problem: A directly illuminated area can be viewed darker than an indirectly
illuminated part. As a consequence, a high illuminated area could be considered as small
intensity region and low illuminated area could be interpreted as high intensity region.
3) Under-/over- exposure problem: The intensities of the reﬂection lights from same illumination can vary a lot because of the diﬀerent reﬂection coeﬃcients. If reﬂection coeﬃcients
of the work pieces have signiﬁcant diﬀerences, it is a dilemma to adjust the camera aperture.

72

4) Internal reﬂection: The reﬂection of light from a point in the scene is not only related
with its own incident light but also with the incident lights in its neighborhood. The unilluminated area can also result in high intensity in the camera image because of internal
reﬂection.
Aiming to solve problems mentioned above, the paper developed a novel encoding and
decoding method of a structured light based inspection system capable to sample dense 3D
point clouds from optically-challenging objects. In order to successfully inspect the test
objects, the developed structured light should simultaneously have the following properties:
1) A color invariance encoding and decoding algorithm.
2) Dynamical control the projector illumination to avoid under- / over- exposure problem
and shadow problem.
3) Robust decoding against the internal reﬂection problem.

The symbols used in this part are explained here: I means a pixel intensity. The subscript
P and C respectively represent a projector and a camera. (u, v) is usually used as a pixel
location in the projector frame, and (u′ , v ′ ) is used as a pixel location in the camera frame.
For instance, IP (u, v) means the intensity value at pixel location (u, v) in the projector
frame.

4.4

Encoding and Decoding Strategy

3D reconstruction through structured light system requires the establishment of the correspondence table between two image devices: a projector and a camera, in which encoding
and decoding play important roles. Digitized binary monochromatic patterns which are robust against image noise were utilized as encoded projection images. Traditional approaches
assume that the inspected objects are perfect diﬀuse reﬂection. The strategy of the encoding
and decoding is shown in ﬁgure 4.7. The ﬁrst step is the structured light encoding in projec-

73

tor frame: codeword 0 is encoded as intensity 0 and codeword 1 is transferred into intensity
255. In inspection process, the patterns are projected into the scene, and are reﬂected into
the camera. In the decoding process, a low intensity camera pixel is decoded as 0 and a high
intensity pixel is translated into 1. Pixels in both frames shared with the same codeword are
corresponded .

Figure 4.7: Traditional encoding and decoding.
Once the inspection object is changed into an optically-challenging one, the encoding and
decoding become more complicated. The intensity of a projection pattern needs to rapidly
compensate the albedo eﬀects from the optically-challenging objects. The so-called density
adjusted projection patterns are the combination between GCLS and an intensity mask
matrix MP . The intensity of a projected pixel IP (u, v) can be expressed by the following
formula:

GCLS (u, v) M (u, v)
IP (u, v) = IP
P

(4.6)

GCLS
IP
(u, v) denotes the intensity of a traditional GCLS pattern at location (u, v) in

a projector frame.

MP (u, v) can be considered as a weighted coeﬃcient for the code

GCLS (u, v): if M (u, v) = 0, the pixel is turned oﬀ; if 0 < M (u, v) < 1, the intenIP
P
P

sity of the projection is reduced; if MP (u, v) = 1, the intensity of the pixel remains the
same; if MP (u, v) > 1, the intensity of the projection is enhanced. Internal reﬂection phenomena can be eliminated if its incident light is reduced. The camera could view a dark /
shadow region clearly if the incident lights are enhanced. In summary, the projection mask

74

works as a feedback function from the camera receiver to the projector.
In order to get the scene adapted mask matrix MP , the camera frame photo response
function fph and the crosstalk fcr from the camera frame to the projector frame are needed.
The camera frame photo response function fph in Eq. 4.7 represents the relationship between
the projection IP (u, v) intensity and captured intensity IC (u′ , v ′ ), which is mainly costed by
the albedo eﬀect of the scene.
A proper value IP of the projection intensity is ﬁrst derived from the inverse camera
photo response function to ensure the camera pixel (u′ , v ′ ) can receive a suitable intensity.
The crosstalk function fcr in Eq. 4.8 allocates this proper value IP into the correspondence
projector pixel (u, v) to ensure the radiance from the projector pixel (u, v) does go into the
camera pixel (u′ , v ′ ). After this two tasks are accomplished, the projection mask can be
derived.

IC u′ , v ′ = fph (IP (u, v))

(4.7)

u′ , v ′ = fcr (u, v)

(4.8)

Considering the camera as the feedback receiver, the projected intensity IP (u, v) can be
expressed by:

−1
IP fcr u′ , v ′

−1
= fph IC u′ , v ′

(4.9)

In this study, we normalized the projection and capture chain by the following assumption: a projection of intensity 255 will result in intensity 255 in the camera frame. Therefore,
the intensity mask M(u, v) equals:

M(u, v) = M

−1
fcr

u′ , v ′

=

−1
fph (255)

255

(4.10)

Since the intensity mask M(u, v) should be dynamically adjusted by the captured value
IC (u′ , v ′ ), fcr is ﬁrst utilized to transfer the camera pixel location (u′ , v ′ ) into projector pixel
75

−1
location (u, v). fph (255) speciﬁes a desired projection value, which can make the received

value as 255. The ratio between the desired projection value and 255 is the projection
intensity mask value.
The ﬂow chart of the inspection process is displayed in ﬁgure 4.8. It ﬁrst derives the
intensity mask and then applies it to the structured light and at last ﬁnishes the 3D reconstruction.

Figure 4.8: The ﬂow chart of the encoding and decoding strategy.
There are generally three steps for the inspection process:
(1) Initial projection to derive the camera photo response function, the initial guess of
the crosstalk function and the density mask.
76

(2) Advanced projection of the intensity adjusted pattern to modify the crosstalk function
and update the density mask.
(3) Final projection to obtain the correspondence and ﬁnish 3D reconstruction.
The details of steps are explained in the following sections.

4.4.1

Derivation of the Camera Photo Response Function for a
Given Scene

The high illumination intensity reﬂection light can cost the camera saturation problem –
some highly saturated pixels spill over and aﬀect neighbor pixels’ intensity value, such as a
shiny spot in a picture. On the other hand, low illumination intensity reﬂection light may
cause the camera fail to capture the incident light. In order to simultaneously capture the
reﬂection lights from both high and low reﬂective surfaces, a proper feedback algorithm is
needed. A photo response function reﬂects the radiometric chain from the projector image
frame to the camera image frame. The inverse function of the photo response function is
used as the feedback function to eliminate scene albedo eﬀects. A photo response function
basically contains three parts: radiance of a given projector pixel fP , reﬂection radiance
from incident radiance fR , and intensity of a camera pixel from in the reﬂection radiance
fC . In this study, all of three functions are approximated as linear functions.
The radiance RP (u, v) from a projector pixel IP (u, v):

 R (u, v) = f (I (u, v)) = k (u, v)I (u, v) + k (u, v)
 P
1
2
P P
P

0

IP (u, v)

(4.11)

255

The reﬂection radiance RC (x, y, z) at a scene point (x, y, z) from the incident radiance
RP (u, v) can be written as:
RC (x, y, z) = fR (RP (u, v)) = k3 (x, y, z)RP (u, v) + k4 (x, y, z)
The camera Intensity IC (u′ , v ′ ) from the reﬂection radiance RC (x, y, z)
77

(4.12)

IC (u′ , v ′ ) = fC (RC (x, y, z)) = k5 u′ , v ′ IP (u, v) + k6 u′ , v ′
0 ≤ IC (u′ , v ′ ) ≤ 255

(4.13)

where k1 (u, v), k2 (u, v), k3 (x, y, z), k4 (x, y, z), k5 (u′ , v ′ ), and k6 (u′ , v ′ ) are unknown coeﬃcients in the linear functions. The radiometric chain through diﬀerent projector pixel,
diﬀerent scene point, and diﬀerent camera pixel varies via the subscripts of the coeﬃcients.
Eq. 4.11 and Eq. 4.12 are substituted into Eq. 4.13 and then the equation is treated as
a linear function since all three equations are linear functions:

 IC u ′ , v ′ = k 7

0≤


0≤

u′ , v ′ IP (u, v) + k8 u′ , v ′
IC (u′ , v ′ ) ≤ 255
IP (u, v) ≤ 255

(4.14)

k7 (u′ , v ′ ) and k8 (u′ , v ′ ) are the combination of the previously coeﬃcients. The radiometric
chain problem is treated as a black box problem. Only k7 (u′ , v ′ ) and k8 (u′ , v ′ ) are needed
to adjust the projection intensity through received intensity. Coeﬃcients from k1 to k6 are
not necessary for the photo response function.
In order to solve the photo response function in Eq. 4.14, two tasks need to be accomplished:
(a) The coeﬃcients k7 (u′ , v ′ ) and k8 (u′ , v ′ ). These two are used to ﬁnd out what the suitable
projection intensity is for a camera pixel (u′ , v ′ ) to avoid under- / over- exposure.
(b) The crosstalk function between projector pixel (u, v) and camera pixel (u′ , v ′ ). This
function is utilized to ﬁnd out where the suitable projection intensity should be located in
the projector frame.
Basically, the solution of the ﬁrst task answers what the projection value is, and the
second one answers where this value should be located. Since it is very diﬃcult to solve both
tasks simultaneously, task (a) is ﬁrst peeled by projecting single intensity images: all the
projector pixels equal to same intensity value IP . Since all the pixel values in the projector
frame are same, the crosstalk can be ignored. No matter which pixel projector is crosstalked
with the camera pixel (u′ , v ′ ), the intensity is always the same. Equation 4.14 can be changed

78

into:

 IC u ′ , v ′ = k 7 u ′ , v ′ IP + k 8 u ′ , v ′

0 ≤ IC (u′ , v ′ ) ≤ 255


0 ≤ IP (u, v) ≤ 255

(4.15)

Single intensity images with the value equaling 0, 50, 100, 150, 200, 250 are projected to
the test objects. Figure 4.9 gives an example when an 100 intensity image was projected. 5
arbitrary points from A to E in the received image are selected and marked as diﬀerent colors
to check their intensity variations. Their received intensities are given in ﬁgure 4.10. x axis
is the projection intensity in projector frame, and y axis represents the received intensity in
the camera frame. The color lines represent the photo response curves for diﬀerent points in
the scene. A is a point on a black object, so its received intensity value is always very low.
Even projecting intensity 255 onto the point, its received value is still a little less than 50. D
and E are two traditional shiny points, their correspondence camera pixels can be saturated
very easily. C can be viewed as a ’normal’ diﬀuse point, neither too shiny nor too dark. B
is a special case for points on a black object. When the incident angle and reﬂection angle
reach certain condition, black object can also result in ’mirror-like’ eﬀect, very sensitive to
the incident light. The parameters k7 and k8 can be regressed after the projected intensities
and received intensities are substituted into the Eq.4.15.
The result of the proper projection values in the camera frame is given in ﬁgure 4.11. Similar as a ﬁlm of a camera, the brighter the camera receives, the darker the proper projection
value is.

4.4.2

Initial guess of the crosstalk function

The result given in ﬁgure 4.11 only solves how big the projection value should be, it does
not answer where projection value should be located. IP is calculated by Eq. 4.14 but the
index number u and v in projector frame are missing.
The initial guess of the crosstalk function assumes that the test scene is a ﬂat surface

79

Figure 4.9: The captured picture when projecting an image with intensity 100.
on the inspection table. The crosstalk between two image frames becomes a homography
mapping in Eq. 4.16.

P
P W
P
PP = H W PW = H W H C PC = H C PC

(4.16)

As shown in ﬁgure 4.12 (a), IP and OP form a pinhole model of a projector; IC and OC form
a pinhole model of a camera. PP = (uP , vP , 1), PW = (xW , yW , 1), and PC = (uC , vC , 1)
are the points on projector image plane, the inspection table, and the camera image plane,
P
W
P
respectively. HW , HC and HC are three 3 × 3 homography matrixes between two of the

planes.

4.4.3

Advanced Projection Pattern Generation

Combining the initial guess of the cross talk function and the camera photo response function,
the initial projection mask can be derived via Eq. 4.10. The result is shown in ﬁgure 4.13 (a).
However, in reality the real scene is diﬀerent from the initial assumption, and the location
80

Figure 4.10: The photo response curve for 5 points in the last ﬁgure.
of the corresponding projector pixel will not be exactly at PP = (uP , vP ) but at somewhere
near to this location. Shown in ﬁgure 4.12 (b), when the real scene is higher than the
assumption, the real location will be PP 1 . If the real scene is lower than the assumption, the
real location could be at (PP 2 ). Assuming the possibility of the real location is a Gaussian
distribution around pixel ((uP , vP )), the projection mask is dilated by a certain radius and
is displayed in ﬁgure 4.13 (b). A combination between GCLS pattern with the projection
mask is shown in ﬁgure 4.13 (c).

81

Figure 4.11: The proper projection values in the camera frame.

4.4.4

Interpolation Model for Diﬀused Reﬂection Region and Extrapolation Model for Internal Reﬂection Region

Both crosstalk function and the correspondence for 3D recovery need the binarization of the
received images into codeword 1 and 0. The diﬀerence between these binarizations lies in
the accuracy requirements. Binarization in pixel level accuracy is enough for a crosstalk
function to control the illumination, but the correspondence for 3D recovery needs to reach
sub-pixel level accuracy. However, in the other aspect the illumination condition in ﬁnding
out correspondence is better than the condition in ﬁnding out the crosstalk function, because
the illumination condition is further improved after the crosstalk function is calculated. In
all, a robust binarization method is critical in the measurement of optical-challenging objects.
An extrapolation model for internal reﬂection region and an interpolation model for diﬀuse

82

(a)

(b)

Figure 4.12: (a) The homography mapping among the projector image plane, the camera
image plane and the inspection table, (b) the correspondence pixel PP changes according to
the PW .
reﬂection region are proposed.
Binarization methods of the received image can be roughly classiﬁed into three branches.
1) Naive binarization: based on ﬁxed intensity threshold. 2) Adaptive binarization: based
on diﬀerent thresholds for diﬀerent pixels, in which a pixel’s threshold is determined by
the overall weight of its neighborhood. 3) Dual projection binarization method: a pixel’s
binarization is based on whether the projection or its inverse is brighter [34] [45]. Method
three is more robust against image noise and has been widely used in 3D measurement.
In this study, a modiﬁed dual projections robust binarization method is used to establish
the crosstalk function. In stead of unitizing the stripes for line shifting, the edge between
every two stripes is employed. Sub-pixel interpolation for the edge location is added to
increase measurement accuracy. An estimation algorithm for the edge location is also added
when the camera saturation or the internal reﬂection occurs. As shown in ﬁgure 4.14 (a) and
(b), images of a stripe pattern and its inverse are used to do binarization analysis. Black and

83

(a)

(b)

(c)
Figure 4.13: a) The initial projection mask; (b) the dilated projection mask; (c) one example
of a projection pattern.
84

Figure 4.14: (a) A received image of projected pattern; (b) a received image for the inverse
pattern; (c) interpolation of the edge location; (d) estimation of the edge location in a strong
illumination condition.
white stripes are the projection pattern, and the highlighted blue lines indicate the edges.
The edge falls into a camera image pixel. Therefore, an interpolation of sub-pixel model is
needed to improve the measurement performance.
Figure 4.14 (c) illustrates the binarization method and edge interpolation for the diﬀuse
reﬂection surface. y axis indicates the received intensity in camera frame, and x axis is the
image coordinates in the vertical direction. The red line indicates the projected pattern and
the blue line means the inverse pattern. A, B, C and D are the four intensity values to
do binarization and edge interpolation. Since intensity A is bigger than its inverse B by a
certain threshold, pixel n is assigned as 1; for the same reason, pixel n + 1 is decoded as 0.
Assuming that the intensity change within a pixel is a linear function, the location of the
edge X can interpolated by intensities A, B, C and D. Although the linear assumption may
not be perfect for all the cases, the error can still be bounded within one pixel. The model in

85

ﬁgure 4.14 (c) is utilized in both calculating the crosstalk and the 3D correspondence, since
it can go to sub-pixel accuracy.

X =n+

A−B
(C − B) − (D − A)

(4.17)

In the advanced projection, the over exposure problem and the internal reﬂection problem
can not be totally avoided. The calculation model of the edge location is changed into
extrapolation from interpolation, which is shown in ﬁgure 4.14 (d). In the over exposure
condition, the incident light intensities of the white stripes are bigger than 255. So the blue
line raises from point B and reaches the maximum intensity 255 before reaching point C.
The red line also drops from dashed line into the solid line. As a consequence, point A
changes into A′ and C moves into C ′ . BC ′ and DA′ will not intersect each other until their
extensions merge.
If the internal reﬂection happens, things could become more complicated. Point C ′ could
illuminate its neighborhood C ′′ by internal reﬂection and causes line C ′ B move backwards
to C ′′ B ′ . Point A′ illuminates the point D and make the line A′ D move forward to A′′ D ′ .
Therefore, the step distance S between point B ′ and D ′ changes into a number bigger than 1
pixel. If we view from a received image, it means that the white stripe erodes the dark stripes.
The white stripes become wider and the dark stripes become narrower. Extrapolation model
assumes that the intensity change within several pixels is a still linear function, so the stripe
edge X can be derived via Eq. 4.18.

X =n+

S A′′ − B ′
(C ′′ − B ′ ) − (D ′ − A′′ )

(4.18)

The assumption that extrapolation used is stronger than the interpolation model assumed. The linear assumption is not only within one pixel but also within several pixels.
In reality, linear function approximation within one pixel only contains little error, but the
linear function approximation within several pixels can produce a certain amount of errors.
When the width S increases, the estimation location of the X contains more errors. Since
86

the error for extrapolation model can not be bounded within one pixel, this model is only
used in approximating the crosstalk function but not in correspondence calculation.
Figure 4.15 shows the ﬁnal projection mask after crosstalk function is derived. The
intensity mask could perfectly control the projection intensity to ensure the received intensity
between under-exposure and over-exposure. It could also minimize the internal reﬂection by
reducing the incident light intensity in the shiny area. From the ﬁgure, we could also see
that the projection mask in the blind areas contains a lot of noise, for instance the zoomed
parts (a) and (b). If we compare same parts with the original image ?? (a), we could see that
these areas are the blind areas, either the projector can not illuminate these locations or the
camera can not view the projected patterns there. Since there is always one image device
failing to detect the blind area, correspondence pixels can not be generated to rebuild 3D
points. Therefore, the projection mask error in these areas will not aﬀect the 3D calculation
results.

Figure 4.15: The ﬁnal projection mask.

87

4.5

Veriﬁcation of the Projection Mask Performance

As shown in ﬁgure ?? (a), three work pieces: one black part, one dark part, and one shiny
part, are used to test the designed system. In order to verify the idea of this study, two
aspects of received results are given and discussed in this part: a) the veriﬁcation of the
projection mask and b) the veriﬁcation of the interpolation and extrapolation models.

4.5.1

the Veriﬁcation of the Illumination Adjustment via Projection
Mask

In this study, a dynamic scene based projection mask for adjusting the incident light intensity
is proposed against optic challenging parts. In order to verify this idea and validate the mask
performance, the received images before and after the mask are provided.
Figure 4.16, 4.17 and 4.18 illustrate the eﬀect of projection mask in capturing the images
of the scene. Figure 4.16 is the received image without the projection mask. There are
several areas still having over-exposure and internal reﬂection problem. Since the aperture
of the camera is adjusted to the maximum, almost all the work pieces are looked as shiny
parts. Figure 4.17 is the captured image after the initial projection mask is added. Because
the assumption that scene is ﬂat is too strong, the projection mask could not adjust the
incident light perfectly and some areas are still shiny. Figure 4.18 is the result after the ﬁnal
projection mask is applied. From the image we could see that all the incident lights are
dynamically adjusted and the shiny phenomena are disappeared.
p

4.5.2

the Veriﬁcation of the Interpolation and Extrapolation models

Figure 4.16, 4.17 and 4.18 only provide the general performance of the projection mask.
More details intensity information are given to illustrate the followings:
88

Figure 4.16: The received image without the projection mask.
1) Veriﬁcation of the linear interpolation model which is utilized to get sub-pixel accuracy.
The interpolation model employs the intersection point to ﬁnd the intensity edges. The model
needs to be proved by the real received data.
2) Demonstration of the extrapolation model which is proposed to explain the internal
reﬂection and over-exposure issues. The real intensity distribution in the internal reﬂection
region should prove the extrapolation model.
3) Veriﬁcation of the result of the extrapolation model. The extrapolation model is not
only applied to prove the internal reﬂection but also utilized to ﬁnd the stripe edges in the
shiny region. The corresponding experimental result needs to be provided.
4) The study proposed that the adaptive projection mask could reduce / eliminate internal reﬂection. The received intensity distributions before / after projection mask need to be
compared to demonstrate the idea.

89

Figure 4.17: The received image with the initial mask.
Figure 4.19(a) and 4.19(b) are two detailed views of a black part when stripe pattern
and its inverse without projection mask are projected respectively. The internal reﬂection
and the reﬂection coeﬃcient of the black work piece are both low. Therefore, these detail
views are selected to verify the sub-pixel linear interpolation model. The widthes of W/B
stripes AB are the same with the widthes of B/W stripes A′ B ′ equaling to 39 pixels. The
high illuminated stripe does not erode the low illuminated stripe, which indirectly proves
the internal reﬂection is small and thus the surface could be viewed as diﬀuse reﬂection.
The pixels along the vertical line shown in ﬁgure 4.19(a) and 4.19(b) are scanned. The
binarization results of the selected pixels are given in table 4.1. The pixel’s intensity curves
around point B / B ′ along the vertical direction are shown in ﬁgure 4.19(c) to verify the
interpolation model. The red curve is the intensity distribution along red line in ﬁgure

90

Figure 4.18: The received image with the ﬁnal mask.
4.19(a), and the blue curve is the intensity distribution along blue line in ﬁgure 4.19(b). It
provides more clear information about the intensities variations. If the red curve is above the
blue curve by a certain threshold, the pixels are decoded as "high" equaling as 1, otherwise
the pixels are classiﬁed as "low" equaling as 0. The intersection point means edges between
high and low illuminated regions. The distribution of the real received intensity is similar
with the internal reﬂection developed in ﬁgure 4.14 (c). The pixel intensities at pixel location
899 are 61 and 34 respectively; and the pixel intensities at location 900 are 29 and 64
respectively. The location of the intersection point (x axis coordinate) is 899.4, after applying
the model in Eq. 4.17. Comparing intensity distributions in ﬁgure 4.19(c) and the curves in
ﬁgure 4.14 (c), the interpolation model can be veriﬁed.
In order to check the over-exposure and internal reﬂection phenomena, detail views of

91

(a)

(b)

(c)
Figure 4.19: (a) A selected detail view of the black part when projecting pattern without
the mask; (b) the same view when projecting the inverse pattern; (c) the intensity curve
around point B / B ′ .
92

(a)

(b)

(c)
Figure 4.20: () A selected detail view of the shiny part when projecting pattern without the
mask; (b) the same view when projecting the inverse pattern; (c) the intensity curve around
point C / C ′ .

93

Table 4.1: Pixels binarization without projection mask
Low
Pixel

High

Low

High

821-860.1

860.1-899.4

899.4-938.7

938.7-975

the shiny part is displayed in ﬁgure 4.20(a) and 4.20(b). As explained in the extrapolation
model, if the internal reﬂection and over-exposure issues happen, the W stripe will erode
the B stripe, resulting that the W stripe’s width is bigger than the B stripe’s. The pixel
distance CD is 33, and the pixel distance C ′ D ′ is 38. The width diﬀerence is the major
harm from the internal reﬂection and the over-exposure issues. More clear information cloud
be found in ﬁgure 4.20(c). Instead of an intersection point, the red curve and the blue
curve share a common short line segment. The pixels within the common short line are
always highly illuminated and can not been classiﬁed by traditional methods, which results
the failure to sample 3D coordinates for shiny regions. The extrapolation sub-pixel model
in ﬁgure. 4.14 (d) ﬁts such a situation. The intensity values of the pixel 1292 and 1293
on the blue curve are 226 and 255 respectively; and the intensity values of the pixel 1294
and 1295 on the red curve are 255 and 155 respectively. Applying the Eq. 4.18, Strip edge
could be estimated at location 1293.7 by extrapolation. The distributions in ﬁgure 4.20(c)
could verify the extrapolation model in ﬁgure 4.14 (d). The extrapolation model result can
successfully separate the high / low illuminated regions. The high / low illuminated region
is decoded into 1 / 0.
As a comparison, the same views but with projection mask are shown in ﬁgure 4.21 and
4.22. Figure 4.21 is applied to verify that the illumination variation costed by the projection
mask will not aﬀect the pattern edge detection in the diﬀuse reﬂection region. Figure 4.22
is employed to show that illumination variation costed by the projection mask reduce /
eliminate the internal reﬂection problem in the shiny region. Figure 4.21(a) and 4.21(b) are
the two detail views of the black part when the adaptive mask is utilized. The stripe width

94

(a)

(b)

(c)
Figure 4.21: (a) A selected detail view of the black part when projecting pattern with the
mask; (b) the same view when projecting the inverse pattern; (c) the intensity curve around
point B / B ′ .

95

(a)

(b)

(c)
Figure 4.22: (a) A selected detail view of the shiny part when projecting pattern with the
mask; (b) the same view when projecting the inverse pattern; (c) the intensity curve around
point C / C ′ .

96

AB still equals to A′ B ′ as 39 after projection mask is applied. The binarization results
are shown in table 4.2. The result is almost the same with table 4.1. Figure 4.21(c) is the
intensity curves around point B / B ′ . The pixel intensities at location 899 are 80 and 28
respectively; and the pixel intensities at location 900 are 25 and 62 respectively. Applying
the model in Eq. 4.17 again, the intersection pixel location is 899.6. Compared to results
from ﬁgure 4.19, we could conclude that the illumination variation costed by the projection
mask will not aﬀect the decoding in the diﬀuse reﬂection region.
Figure 4.22(a) and 4.22(b) are two detail views of the shiny part when projecting stripe
pattern and its inverse with mask. The stripe widthes of ’W’ / ’B’ are same: CD = C ′ D ′ =
35. Compared to result from ﬁgure 4.20, we could conclude that the internal reﬂection is
eliminated. Same conclusion can be drawn from ﬁgure 4.22(c). After the illumination is
adjusted by the mask, there is an intersection between two curves. Instead of applying the
extrapolation model, the interpolation model can be utilized to ﬁnd the edge. The intensities
at pixel 1293 are 133 and 80 respectively; and the intensities at pixel 1294 are 83 and 141
respectively. The intersection point can be found, 1293.5, by Eq. 4.17.
Table 4.2: Pixels binarization without projection mask
Low
Pixel

High

Low

High

821-860.2

860.2-899.6

899.6-938.7

938.7-975

In summary, the illumination variation costed by the projection mask will eliminate
internal reﬂection in shiny region without any bad eﬀect to the diﬀuse region.

4.6

Chapter Summary

In this chapter, two surface coding methods are introduced: single-shot algorithm and multishot algorithm. single-shot algorithm could greatly reduce the acquisition time and thus
obtain the ability to measure a moving object. The diﬃcult of this method is to encode all
97

the codeword within one image. A spatial-neighborhood method was developed for singleshot method, the mass center and orientation of each primitive was utilized as its value
each. Each primitive’s value associated with its eight neighbors’ value forms a codeword.
Compared with traditional color based codeword, this method is more robust, as it can get
ride of the color eﬀect from the scene.
The disadvantages of the single-shot method were the poor density and the weakness
against optical properties of the test objects. A multi-shot method was developed to inspect optically-challenging objects without surface pre-treated. Compared with single-shot
method, it is more robust and dense but less eﬃcient. A projection mask that dynamically
compensate the albedo eﬀects from the scene was developed. A proper feedback was used
to control the illumination of the projector. Therefore, both shiny and black materials were
able to be measured simultaneously. An extrapolation model combined projection mask was
applied to avoid the internal reﬂection issues. Compared with the previous methods, the
proposed method was much more robust, eﬃcient, and ﬂexible.

98

CHAPTER 5
Registrations between Diﬀerent
Coordinates after 3D Reconstruction
5.1

Registration between diﬀerent 3D coordinates

The previous chapters describes the sensor working principle and its calibration. The measured data are usually given as "point cloud", which can be compared to the designed surface
shape, such as the Computer Aided Design (CAD) model, for error analysis. The deviation
between the point cloud and CAD can be displayed by a color-coded error map. The error
distribution map is used to evaluate the part quality by comparing the results with acceptable
Geometry Dimension and Tolerance (GDT).
In order to ﬁnish the tasks of quality inspection, there are still two questions remained:
1) Registration between sensor frame and common frame: Each point cloud is obtained
in individual sensor frame. They need to be stitched together for representing the whole
work piece It means all the point clouds need to be represented in one common frame. The
transformation matrix between each sensor frame to the common frame is ﬁrst calculated
and then is applied to update each point cloud. The updated point clouds merges into one
entire point cloud which can represent the whole work piece. The ﬁnal point cloud can be

99

further compared with CAD model to ﬁnd out the color coded manufacture distribution
error map. Since each sensor frame is steady with each other, the transformation between
sensor frame to the common wont not change after registration calibration. Therefore, this
registration is also called as oﬀ-line registration calibrated.
2) Error map generation. The mathematical deﬁnition of the deviation between data
sets: the triangulated CAD model and the point could. The deviation along the surface
normal of each triangulated surface of CAD and the closest point from point cloud is applied
to describe the error distribution. The color code from blue to red is used to visualize the
distribution, in which blue denotes the CAD is lower than the point cloud and red means
the CAD is higher than the point cloud. Error map generation requires:
a) the on-line registration between the ﬁnal point cloud and the CAD model;
b) the deﬁnition of the deviation between two data sets.
Since the position of the work piece is not ﬁxed, the registration matrix between the ﬁnal
point cloud and the CAD model is changing from time to time. The registration matrix
needs to be calculated after inspection within a limited time. Therefore it is also called online registration calibration. The deviation between ﬁnal point cloud and the CAD model
is deﬁned by the distance between closest point in point cloud and the triangulated surface
in CAD. If the triangulated CAD model is above the closest point in ﬁnal point cloud, the
deviation is deﬁned as negative, indicated as blue color; in the other side, the deviation is
considered as positive, indicated as red color.

5.2

Oﬀ-line Registration Calibration from Sensor Frame
to Common Frame

The tasks in this oﬀ-line registration calibration are to ﬁnd the rigid transformation matrix
to update all point clouds in their individual frames into a common frame.
S
The result point cloud P j in each sensor fame can be written as:

100

S
P j=

Sj

pi (xi , yi , zi ) ∈ R3 , j = 1, 2, 3, 4

(5.1)

The registration to transfer four point clouds into a common frame is deﬁned as:

C Sj
P C = ∪4
j=1 TS P
j

(5.2)

C
TS is the transformation matrix from sensor frame j to the common frame. The task is to
j

C
ﬁnd the transformation matrix TS for each sensor frame. In order to solve TSj in Eq. 7.3,
j
three tasks need to be accomplished.
S
1) Two point clouds P j and P C need to be derived ﬁrst. Point set P C can be determined
S
by pre-measured gauge, and P j is the sensor measurement result.

2) The correspondence pairs from two data sets are needed. The well-known ICP algorithm is used to get correspondence after the rough alignment. The closest points are
assumed as a corresponding pairs between two data sets. The ﬁnal alignment can be found
iteratively between two sets.
C
3) Derive the rigid transformation matrix TS after the correspondence is established.
j
The rigid transformation matrix consists of two parts: the rotation transformation matrix

R and translation vector t. Matrix R is a unit orthogonal matrix, with three degree of
freedoms. The derivation of R is a little complicated to follow the unit, orthogonal, and
three degree of freedoms constraints.

5.2.1

Calculation of Rotation Matrix via Quaternion Algorithm

A quaternion is an extension of a complex number, which is deﬁned in:

q = q0 + q1 + jq2 + kq3

(5.3)

, where i, j, and k are three orthogonal 3D vectors as the bases of 3D Euclidean space.
i2 = j 2 = k 2 = ijk = −1A unit quaternion q has norm of one and can be expressed by:

101

q = cos (θ/2) + sin (θ/2) V

(5.4)

, in which θ is the rotation angle and V = [xV , yV , zV ]T is a unit vector representing the
rotation axis. Then, a unit quaternion q represents an arbitrary rotation with parameters V
and θ in 3D space. A quaternion based rotation matrix, therefore, becomes:


2
2
2
2
q0 + q1 − q2 − q3 2 (q1 q2 − q0 q3 )
2 (q1 q3 + q0 q2 )


2
2
2
2
Rq =  2 (q1 q2 + q0 q3 ) q0 − q1 + q2 − q3 2 (q2 q3 − q0 q1 ) 
2
2
2
2
2 (q1 q3 − q0 q2 )
2 (q2 q3 + q0 q1 ) q0 − q1 − q2 + q3

(5.5)

A point P = [x, y, z, ]T can be considered as a pure quaternion (q0 = 0). Its rotation can
be expressed as:

Rq P = q ◦ P ◦ q ∗

(5.6)

, where ◦ denotes a quaternion multiplication and q ∗ is the conjugate of the quaternion q.
The rigid transformation between diﬀerent frames can be solved in two steps via quaternion
algebra:
 
 
 
xc
xs
xs
 
 
  ∗
 yc  = Rq  ys  + t = q ◦  ys  ◦ q + t
zc
zs
zs

(5.7)

, in which Ps = [xs , ys , zs ]T is a point in sensor frame, and Pc = [xc , yc , zc ]T is a point in
common frame. We can obtain:

(Pc + Ps )× η − (t)× η − t = Ps − Pc

(5.8)

, where ()× represents a skew symmetric matrix and η = tan(θ/2)V .
i
i+1
i
i+1
Let Mi ≡ Ps − Ps and Ni ≡ Pc − Pc , Eq. 5.8 changes into:

(Mi + Ni )× η = Mi − Ni , i = 1, 2, ..., n − 1

102

(5.9)

, in which n is number of points in both data sets. After η is solved, the rotation axis V and
the rotation angle θ can be obtained by:

V =

θ = 2a tan

η
η
ηmax
Vmax

(5.10)

(5.11)

Therefore, the rotation matrix Rq and the translation vector t are found by:



xV λθ + cθ
xV yV λθ − zV sθ xV zV λθ + yV sθ


Rq = xV yV λθ + zV sθ
yV λθ + cθ
yV zV λθ − xV sθ
xV zV λθ − yV sθ yV zV λθ + xV sθ
zV λθ + cθ
t = Pc − Rq Ps

(5.12)

(5.13)

, where λθ = 1 − cos(θ), cθ = cos(θ), and sθ = sin(θ). In order to calibrate a rigid
c
transformation Ts , the 3D coordinates of correspondence points in both sensor frame s and

common frame c need to be determined. Coordinates of points in the common frame c can
be predetermined by making a calibration gauge, and the points in the sensor frame are the
measurement results of the gauge after the sensor is calibrated. The correspondence between
two data sets: data in common frame and data in sensor frame is established by Iterative
Closet Point (ICP) method.

5.2.2

Iterative Closest Point (ICP) based Registration

Based on the accuracy of the merge result, registration can be roughly considered to two
parts: coarse alignment and ﬁne adjustment. Coarse alignment derive the transformation
matrix based on a few feature points in two data sets. Therefore, the result accuracy is
not very high. It is usually used for the applications with low accuracy requirement, such
as navigation or initial alignment for ﬁne adjustment. The ICP ﬁne alignment algorithm
[?] [?] is based on searching of the pairs of nearest points in both data sets and estimates
103

a coordinate transformation to alignment them together. The whole process is iteratively
executed until the registration error reaches the threshold, in which the overall registration
error is minimized.
ICP algorithm is shown in ﬁgure 5.1 as a recursive process. The algorithm is to ﬁnd the
p

rigid transformation matrix Tq between set P with Np points and set Q with Nq points. Xp
and Xq are two points from set P and Q, respectively. The algorithm includes:
Step 0) Generate the coarse alignment by a few correspondence features between to data
sets. These features can be deﬁned from artiﬁcial markers, colors, and curvatures. Update
set T0 Q− > Q0 .
Each recursive loop of the ICP algorithm can be described as follows:
Step 1) Set up the correspondence pairs of points by minimizing the cost function, the
Euclidean distance between every two points in two data sets. The Euclidean distance
between an arbitrary point XQ ∈ Q and all the points in the set P is calculated. The
point from P which has smallest cost function, Euclidean distance, is considered as the
correspondence point of point XQ . The algorithm goes from the ﬁrst point in Q to the last
one and create a list Lk with Nq pairs of correspondence points from two sets. k indicates
the current loop number.
Step 2) Derive the rigid transformation matrix Tk which consists of rotation matrix Rk
and tk , via quaternion method explained in previous section.
Step 3) Update the date set Qk into Qk+1 via matrix multiplication: Tk Qk = Qk+1 .
Step 4) Calculate the average error ek via:
1
ek =
Nq

Nq
2
2
Tk Xq (i) − Xp (i)

(5.14)

1

Step 5) If the decrement between ek−1 and ek is smaller than T hreshold1, or ek is smaller
than T hreshold2, the algorithm output the results: P , Qn , Tf = Tn Tn−1 , ..., T0 , n means
the total number of loops, and Tf means the ﬁnal transformation matrix. Otherwise the
algorithm goes to Step 1) and starts another loops. The algorithm was proved to converge
104

Figure 5.1: ICP algorithm
monotonously to the local minimum [49] [50].
ICP algorithm is widely used in point cloud registration. However, it can not be directly

105

applied for current application in this thesis due to several limitations and practical issues
a) Correspondence selection due to point density. ICP algorithm assumes that two data
set represent the same shape, and therefore the rigid transformation is able to be regressed
via minimizing the cost function e. Point density of a data set represents the digitized sample
rate of a given 3D shape. If the point densities of two data sets have signiﬁcant diﬀerence,
such as a measured point cloud and a triangulated CAD model, data set with less point
density is preferred as set Q in the ICP algorithm. Nq pairs of correspondence points will be
involved in the calculation. Otherwise, Np pairs of correspondence points are used, which
greatly reduce the algorithm eﬃciency. Selecting Q as the data set with high density also
results that one point in P is the corresponding point for multiple points in Q and then may
increase the error.
b) Interpolation points for the correspondence. Although two data sets sample the shape,
the assumption that the closest point is the corresponding point is still too strong in some
case. The corresponding point of a point in set Q can be the interpolation point from set P
instead simply selecting the closest point from P . In order to further increase the accuracy
of the algorithm, interpolation points may be needed.
c) The consistence of the registration. The coarse alignment is considered by the initial
condition of ICP. Since ICP algorithm can only insure to reach the local minimum instead
of global minimum, the ﬁnal position varies when initial condition changes. ICP registration
method based error map generation requires that the work piece’s initial position should be
almost same. The error maps of the same work piece and same CAD should be same from
time to time. If the standard deviation of the error maps exceed the threshold, the result
error maps become meaningless.
d) The deﬁnition of the cost function. ICP algorithm utilizes the Euclidean distance between two points as the cost function, resulting that the closest points are the corresponding
points to calculate transformation T . The cost function is not unique, the distance from
point to plane for instance, can also be used as the converge criteria depending on the appli-

106

cation requirement. The minimization of the cost function results that the distance between
points in set P and planes in set Q reaches the local minimum.
e) Calculation time of the algorithm. On-line calibration requires the registration between
the ﬁnal point cloud and CAD model can be ﬁnished within limited time. The searching
for corresponding points in each loop requires Np × Nq times calculation. If the size of the
data is big, for instance around 100, 000 points for each, the calculation time can reach more
30 minutes to complete the algorithm. Octree space labeling method and multi-resolution
method are used to speed up the algorithm.
f) Objected-oriented registration. The registration between the ﬁnal point cloud and the
CAD model is diﬀerent from the traditional registration application. Conventional registration assumes that two data sets are from the same shape, and then an algorithm that
minimizes Euclidean distance can be applied. The assumption is broken in current application. The ﬁnal point cloud does has manufacturing errors compared with CAD model, and
furthermore ﬁnding these errors is the purpose of the whole system. The registration is deﬁned by the factory based on how the work piece is assembled to the others. The connection
parts, such as holes of screws, which is usually call datum points, should have more weight
in the registration process. The new registration that provides additional constraints and
weight is developed.

5.2.3

System Calibration Procedure

In ﬁgure 5.2, the painted windshield with white markers is used as the system registration
calibration gauge. Each white dot marker’s center coordinate was measured by a CMM
and was considered as the ground truth data for registration calibration. The rest part
was painted into black to assure only white dots coordinates are inspected by RIS system.
System registration calibration can be divided into four steps:
1) Obtain RIS measurement data: measure the system registration gauge, ﬁlter out the
noise and save four measurement results in four sensor frames. Figure 5.3 is the measurement

107

Figure 5.2: System registration gauge.
result of sensor one, the coordinates of the left top part of the system registration gauge.
RIS system measures a cluster of points for each marker on the gauge.

Figure 5.3: Sensor 1 measurement result.
2) Interpolate the coordinate of the marker center: ICP algorithm can not be directly
applied in current case due to the limitation b) explained in last section. Closest points in
both data sets may not represent the corresponding points, therefore the corresponding point
108

can the interpolation among the 3D point cloud in the second data set. In order to further
increase the calibration accuracy, the interpolation is ﬁrst applied after the data acquisition
of RIS sensor. The radius of all the circular markers is 5 millimeters. The coordinate of
the center of each marker was interpolated after obtaining the raw RIS measurement data.
Figure 5.4 is the top view of the sensor 1 measurement result. It also displays the interpolated
centers of each marker. The green dots denote the measurement points, and the red stars
are the interpolated marker center. As we can easily see from ﬁgure5.4, the RIS sensor may
not measure the center of the markers. Registration based on closest point can still generate
a certain level of error. The interpolation point of each marker is able to represented the
center and results better accuracy. The center of each cluster of green points was estimated
by the conventional Hough transform.

Figure 5.4: The center of each marker.
3) Registration process: registration process is divided into two parts: rough alignment
(initial alignment) and ﬁne alignment. Rough alignment is to match the RIS data to the
CMM data in a low accurate level. Fine alignment applies the result from the rough alignment as initial position. ICP [49] [50] method is applied to obtain a high accurate result.
109

Figure 5.5 shows the rough alignment result for sensor one. Figure 5.6 shows the ﬁne alignment result for it. Red dots denote the RIS sensor one result, and blue ones denote the
CMM result. Red dots and blue dots are overlayed with each other in ﬁne alignment.

Figure 5.5: The rough alignment between sensor 1 and CMM data.
4) Repeat step 1) to 3) for the other 3 sensors.
After system registration calibration is done, four individual point clouds successfully
join into one in a common frame shown in ﬁgure 5.7.

110

Figure 5.6: The ﬁne alignment between sensor 1 and CMM data.

Figure 5.7: System calibration results.

5.3

Registration between the Final Point Cloud and
CAD Model

The last step of a quality inspection is to ﬁnd the deviation between the ﬁnal point cloud
111
and the CAD model. The color code information is used to evaluate the manufacturing

error of the work piece. If the manufacturing error exceeds the tolerance, the work piece
is considered as unqualiﬁed. In the registration step between point cloud and CAD, the
ﬁne alignment matrix is needed to match the measured data with CAD model. Current
registration method can be directly applied for this type of application because:
1. Photogrammetry method is not feasible for each work piece on an assembly line.
2. Feature points do not exist on part surface.
3. The mechanic support can’t guarantee the same location and orientation of each work
piece.
4. The online point cloud registration needs to be ﬁnished within limited time.

5.3.1

Octree Space labeling method and Multi-resolution Method
based ICP

Registration between the ﬁnal point cloud and the CAD model needs to be ﬁnished within
a limited time. Therefore, several algorithms to speed up the registration are applied to the
registration algorithm in this thesis. Traditional ICP method searches the corresponding
point through whole data set P for a point Pq in set Q, which is not necessary. After the
rough alignment, the corresponding point in P stays within the neighborhood around point
Pq . Searching a neighborhood around point Pq instead of the entire point cloud can greatly
save the algorithm execution time.
In order to limit the search region in set P , octree space labeling method is utilized. An
octree is a tree data structure in which each internal node has exactly eight children. Octrees
are most often used to partition a three dimensional space by recursively subdividing it into
eight octants as shown in ﬁgure 5.8. The center of the space is considered as the ﬁst node,
and the space is divided into 8 parts according to the geometrical relationship with the node.
Each subspace will be divided recursively into another 8 parts. The nodes of the octree is
shown in ﬁgure 5.9. The ﬁrst node in level one leads 8 children in level two, and each node
in level two leads another 8 in level three.
112

o1

o1

o1

o1

o1

o1

o1

Figure 5.8: Space partition by the octree space method.

Figure 5.9: The node tree of the octree method.
In current application, both data sets, the point cloud and the CAD, are divided into 64
subsets by three levels of octree respectively. The subsets with same coding name in both data
sets, such as the subset 000 in the CAD and the subset 000 in the ﬁnal point, are considered
as the corresponding subsets. The searching of corresponding points are conducted only

113

with in each corresponding subsets. Therefore, the execution time of algorithm can be cut
approximately 63/64. The loop of the cctree algorithm for the CAD model is described
ﬁgure 5.10.

o1

o1

o1

o1

o1

o1

o1

Figure 5.10: Octree for CAD model.
O1 = [xo1 , yo1 , zo1 ]T , the geometry center of the data set, is ﬁrst calculated by Eq. 5.15:

O1 =

1
N

i
PCAD , i = 1, ...N

(5.15)

N

The coordinates of a point P , x, y, and z, are compared with O1 step by step. 1 is
returned if P ’ coordinate is not smaller than O1 ’ coordinate, otherwise 0 returns. By this
way, every point goes into 8 diﬀerent subsets: 000, 001, 010, 011, 100, 101, 110, and 111.
The same loop goes into each subset again and result 8 × 8 = 64 small subsets representing
the CAD.
The ﬁnal point cloud is also treated with similar algorithm and is ﬁrst divided into 8
parts in loop 1 shown in ﬁgure 5.11. The only diﬀerence is that an oﬀset value T is added
to the separation criteria, the geometric center of the ﬁnal point cloud. If the coordinate of
114

P is smaller than O1 + T , the algorithm returns 0; if the coordinate is bigger than O1 − T ,
the algorithm returns 1. The points near the edge between two subsets can go to both sets.
The diﬀerence is illustrated in ﬁgure 5.12. Each subset in the point cloud is bigger than the
corresponding subset in the CAD. The neighbor subsets have a certain region of overlap. A
point near the edge of a subset in CAD may result its corresponding points in the neighbor
subset in point cloud. The corresponding subset in the ﬁnal point should be enlarge to cover
this case. Shown in ﬁgure 5.12, subsets in CAD does not have overlap, but the subsets in
the point cloud have.

P

p
P

P

o1

p

o1

o1
P

o1

P

o1

P

o1

p

o1

o1

p

o1

p

o1

p

o1

o1

P

o1

p

o1

Figure 5.11: Octree for the ﬁnal point cloud.
Traditional ICP method also searching the corresponding point one by one in ﬁnal point
cloud, which means the searching resolution is a constant value 1. At the begging of the
recursive algorithm, it is not necessary to search every point. Searching every 4 or 8 points
is enough to generally align two data sets. Resolution can be function of iteration number n.
As n increases, the searching resolution comes back to 1. Usually the iteration stop between

115

Figure 5.12: The diﬀerence between two algorithms.
7 loops to 12 loops. Therefore, the resolution function used in this thesis is given in Eq.
5.16. The algorithm is also forced to run at least 4 loops to ﬁnd the correct corresponding
points.

Res = max 1, 2(4−n)

5.3.2

(5.16)

Feature Weighted Registration

Usually, the measured work piece needs to be assembled with the other parts. Therefore,
several user speciﬁed locations are used to connected as links to the other parts. Those points
or parts, once deﬁned, will signiﬁcantly determine the transformation of a point cloud in its
CAD frame. Points in the data sets weights diﬀerent in the registration process. Traditional
ICP methods weight all points same is improper in the error map registration.
Curvature
Curvature Information: Curvature information is widely used in feature-based matching.
Usually people utilize mean curvature or Gaussian curvature information to ﬁnd feature
points in a 3-D data set. Those feature points are helpful to set up the correspondence since
they can be easily identiﬁed. The points carrying relatively large curvature such as peak
points or valley points are chosen as feature points. The method in this thesis estimates
the invariant points from both data sets by curvature information. In quality inspection
process, all work pieces share similar but not same geometry. Each work piece has some
116

deformation from the designed shape. From the manufacture process, it can be concluded
that the simpler the part shape is, the lower the risk of manufacture error the part has. For
instance, ﬂat shape with zero curvature carries smallest manufacture error, because almost
no manufacture is evolved. So ﬂat parts can be viewed as the most likely invariant parts in
the work piece and should be weighted more in the registration process. In contrast, complex
geometric parts will be less trustable and weight less. If the industry does not provides the
datum points / features, we consider that the parts carrying the least curvature weight most
in the registration process.
GDT and Measured Condition
GDT is short for geometry dimension and tolerance. It is a standard method to specify the
geometric form and tolerance information of work piece in industry which is used in quality
inspection process. Only the error map is within the acceptable tolerance, the work piece
can be considered as qualiﬁed. The error map generation is related with the registration
process. Without correct registration, the error map can not reﬂect the part’s quality. The
registration based on the data features is critical in the error map generation.
Check ﬁxture is a properly designed support of the work piece, which guarantees the
work piece’s position and orientation.
Figure 5.13 shows a picture of a check ﬁxture for windshields. In order to avoid the
position error, three or a few more widely spaced datum points stoppers are measured and
ﬁxed on the check ﬁxture. The windshield will sit on the surface of check ﬁxture with all
stopper bars touching the windshield edge. However, the ﬁnal position and orientation between a windshield and the check ﬁxture are judged by eyes, which means both accuracy and
precision can not be granted. The same work piece can provide diﬀerent error distribution
maps. The second disadvantage of the check ﬁxtures is that each ﬁxture only ﬁts one kind
of work piece. Diﬀerent types of work pieces request a new check ﬁxture, which costs a lot.
Straightness and ﬂatness are perhaps the two most popular types of geometries to be

117

Figure 5.13: Check ﬁxture and the stopper bar touching datum points of the work piece.
used as datum, because least manufacture involved in those parts. The installation points
are chosen as datum points, since those points carry less curvature and less tolerance. In
this Thesis, the edges touching the datums on the check ﬁxtures will be weighted more in
the ICP registration. The position and orientation of windshield corresponding the check
ﬁxture is used as an initial condition of the modiﬁed ICP algorithm.

5.3.3

The Object-oriented online Registration Model

Registration is the process to ﬁnd out rigid transformation T between two data sets A and B
in two coordinates. The following parts are only concentrated on how to solve the equation
{ai } = T {bi } to get transformation T . The total square error e2 is deﬁned as

e2

NA

=

(ai − T bi )2

(5.17)

i=1

ai is a point from the CAD model A, and bi is the correspondent point in the measured data
set B.

118

In order to analyze and solve the online registration problem, the theoretical model is
built up from ideal case to the real case.
case 1
Ideal case, which is the easiest.

Given two data sets: A{ai (xi , yi , zi ), ai ∈ R3 }, and

B{bi (xi , yi , zi ), bi ∈ R3 } in two coordinate systems. Shape A is equal to shape B, Data
set A is CAD, B is the measured data. The registration matrix T between two data sets
needs to ﬁnd out.
Solution: Since shape A equal shape B, the only error source for e is the registration
error, which is named as eRegistration .
The registration error for each pair of corresponding points in two data sets can be written
in the Eq.5.18
ei = eRegistration,i = ai − T bi

(5.18)

Linear least square method is directly applied. By minimizing e2 , T can be solved by
iteration. (ICP method)
case 2
Random noise eRandom is added in to data set B. The unknown registration matrix T needs
to ﬁnd out between two coordinate systems.
The error between each pair of corresponding points in two data sets has two parts: the
registration error and the random error, so the equation can be written as Eq.5.19

ei = eRegistration,i + eRandom,i = ai − T bi

(5.19)

Linear least square method can still be directly applied in the equation. Since linear
least square method is good at solving the random noise, T can be found by iteration. (ICP
method)

119

case 3
The deformation error eDef orm is added into data set B. which can be viewed as the real
case.
In this case, the error equation is:
ei = eRegistration,i + eRandom,i + eDef orm,i
= ai − T bi

(5.20)

If linear least square method is still directly applied to equation by minimizing linear
square of e to ﬁnd out T . The total square e2 including the deformation error eDef orm will
be minimized in the registration procedure, Registration process changes the deformation
information, which makes the error map un-trustable.
The solution: Since shape B is a deformation form from A, there exist some parts B1
that have same shape with the correspondent parts of A denoted as A1 ; there exist some
parts B2 that have small deformation with the correspondent parts of A denoted as A2 ;
and there exist some parts B3 that have medium or big deformation with the correspondent
parts of A denoted as A3 . Now data sets A and B will be separated into three subclasses:
A = {A1 , A2 , A3 }, B = {B1 , B2 , B3 }.
3.1 First assume region A1 in A (CAD) is known from the GDT information In CAD,
and B1 is unknown in measured data B, and assume B1 always exists. The error e1 between
A1 and B1 only have two parts: eRegistration + eRandom . Linear least square method is able
to solve T between A1 and B1 . Based on the characteristics information carried between A1
and A, a ﬁlter can be build up to estimate the possible data set B1 P ossible of B1 from B,
which has the following relation B ⊇ B1 P ossible ⊇ B1 . The ﬁlter is built by the curvature
and moment information A1 carried in set A.
3.2 In this case, the existence of B1 can not be guaranteed. The assumption that A1
always has correspondent B1 may not always be true. CAD model provides the information
that A1 ’s corresponding part B1 should has low tolerance, however the work piece could be
unqualiﬁed due to the manufacture process. By adding this constraints, case 3.1’s solution
120

matrix needs to verify before registration. The solution matrix T was applied to calculate
e1 . If e1 is bigger than the acceptable tolerance carried by CAD, which means B1 in fact
does not satisfy the assumption, then this work piece will be considered as unqualiﬁed and
thrown away. The post process’s veriﬁcation result tells people any work piece challenges the
assumption will be punished as an unqualiﬁed part to be thrown away, so that the solution
strategy becomes un-challengeable.
3.3 A1 in data set A changed unknown too. None of any pre-knowledge about A and B
is known.
Solution: curvature information in A is used to estimate the A1 . And then A1 is used to
estimate B1 to ﬁnd T .
Assumptions: 1) During the manufacture process, the more complex the part is made,
the more manufacture error. The less complex the part is made, the less manufacture error,
the more reliable. 2) Geometric complexity is related with the curvature information: The
less mean curvature point ai has, the less geometric complex the point has, the more likely ai
belongs to A1 , ai ∈ A1 . Based on this assumption, AP ossible ⊇ A1 is estimated by curvature
1
information.
Judged based on the curvature information, AP ossible , AP ossible , AP ossible are estimated.
1
2
3
P
Similar with case 3.1, replace A1 by AP ossible , estimate B1 ossible by ﬁlter. After the equation
1

eRegistration + eRandom = {ai } − T {bi } are solved, Tk=1 is found. Similar with case 3.2,
postprocess is carried out to calculate every point’s ei in AP ossible. If any point’s error is
1
bigger than threshold, the point in AP ossible will be thrown away. Iterative step k goes to
1
P
k + 1, New B1 ossible was re-calculated based on new AP ossible . This iterative steps will stop
1

after AP ossible converges to A1 by comparing the post-processing veriﬁcation results with
1
the accepted tolerance.

121

5.3.4

The Methodology in Current Task

Notation: Before the error map generation, the transformation T which aligns measured
point cloud from sensor frame into CAD model in CAD frame is needed. The CAD model
in the CAD frame can be denoted as:

CAD = {pCADi (xi , yi , zi ) ∈ R3 }

(5.21)

The datum points in the CAD model:

A1 = {pDatumi (xi , yi , zi ) ∈ R3 }

(5.22)

For diﬀerent work piece j, its notation in part frame is:

p

p

Wj = {pW ,i (xi , yi , zi ) ∈ R3 }

(5.23)

j

For measurement time k, the location of the work piece is denoted as Tk , this transformation matrix denote from part coordinate frame to world frame. So the description of the
work piece j in the location k in world frame is:

p

w
Wj,k = Tk Wj

(5.24)

The measured point cloud of the work piece ﬁrst stores in the common sensor coordinate
frame can be denote as:

p

s
s
w
s
4
4
Wj,k = Ui=1 (Tw )Wj,k = Ui=1 (Tw )Tk Wj

(5.25)

4
s
The term Ui=1 (Tw ) is to transfer four individual sensor frame point clouds into a ﬁnal

one. The inspection system used here has four 3-D sensors, each one measures a quarter of
the work piece. The ﬁnal point cloud needs to be transferred from four individual sensor
frames.
The ﬁnal registration process is to align the measured point cloud with the CAD model:
122

p

CAD
s
4
s
Wj,k = T Wj,k = T Ui=1 (Tw )Tk Wj

(5.26)

p

In Eq.(5.26), there are two variables in the left side: Wj presents diﬀerent work pieces,
Tk presents diﬀerent location of the work piece. The transformation matrix T needs to cover
all these two variables. The algorithm is started by ﬁnding the possible datum points sets
P
B1 ossible in the measured data by applying a ﬁlter function to search though measured data

set.

5.3.5

Registration Process

The ﬁlter is designed by the collection of datum points’ characteristics in CAD, including the
curvature information and the moment information. After those information are captured,
threshold tolerance is add on the ﬁlter. The threshold tolerance will control how many points
P
can pass the ﬁlter into the set B1 ossible. The ﬁlter will go over the whole measured data set
P
to get B1 ossible.

F ilterf unctionf (pi ) = 1, Ki ∈ Kc ± τK Mi ∈ Mc ± τM
0, otherwise

(5.27)

In the Eq.(5.27), pi denotes the point i in measured data, Ki denotes the mean curvature
of point i, Mi denotes moment of point i. τ is the threshold tolerance. If the points satisfy the
curvature and geometric condition, the ﬁlter function equals to 1, otherwise ﬁlter function
equals to 0.
ICP is applied in the algorithm: Repeat the iteration process i = from 1 to iF inal , until
the total least square error reaches the threshold.
(a)

Corresponding

points:

The

CLOSEST

points

min[(xa1 − xb1 )2 + (ya1 − yb1 )2 + (za1 − zb1 )2 ] are considered as corresponding points
P
between A1 and B1 ossible .

(b) Minimizing the total square error, in order to ﬁnd the rotation matrix R and translation vector t.
123

(1)

(c) Updating the set A1

(2)

to A1

by T .

(d) Continue step (a) to (c) until the average total square error under threshold value.

5.3.6

Error Map Generation

After the point cloud is transferred into the CAD model frame, the distance D between point
cloud and CAD surface is calculated along the normal direction.
The deviation △ can be described by Eq. 5.28, Where A represents CAD model and B
represents a measured point cloud.

△ : (A, B) −→ E

(5.28)

The error map E includes a set of 3D points and their distances D in a metric space to
the closest triangle in CAD model M, as seen in Eq. 5.29
D(bi , A) = min(d(pi , Tj ), i = 1, 2, ...m, j = 1, 2, ...n)
(5.29)
E
= {< bi , D(bi , A) > |i = 1, 2, ..m}
The methodology explained earlier is implemented step by step. The ﬁrs step is to
calculate the datum points’ moment information and the characteristic information in CAD
model. Then the estimated possible datum points in the measurement point cloud are
found out. At last, ICP algorithm is applied between four datum points in CAD and the
possible datum points sets in the measured point cloud. The registration matrix based on
the invariance part of the windshield is ﬁnally derived. Figure 5.14 (a) is the CAD model
surface with four datum points on it. Based on the four datum points and the methodology
outlined before, the ﬁnal alignment between CAD and point cloud is generated, shown in
Figure 5.14 (b).
The ﬁnal error map for a windshield is displayed in ﬁgure 5.15. Red color means that the
CAD is higher than the point cloud. Green color means that both sets have same heights,
and blue means that CAD is lower than the point cloud. From the color bar in the left side
of the ﬁgure 5.15, the error is from from −5 millimeters to 6 millimeters.
124

Figure 5.14: (a)CAD model with datum points.(b)the registration result based on datum
points

Figure 5.15: The color coded error distribution map for a piece of windshield.

5.4

Sensor Fusion

The 3D point cloud is not the only way to display the measurement result. In the navigation
application, the 3D point cloud is fused into the 2D image. The distance between the object
and the camera is ﬁrst translated into a color code and then the color code is fused into the
display. An example is given in ﬁgure 5.16. Since the 3D point cloud is calculated through

125

the corresponding image pixel, the transformation between the 3D point cloud and the 2D
image can be directly obtained.

Figure 5.16: The fusion result for navigation application.

5.5

Chapter Summary

After 3D point cloud is derived through the structured light sensor, there are usually several
post processes. Registration or sensor fusion are two common tasks. Registration is to ﬁnd
out a rigid transformation matrix between two 3D frames and to put diﬀerent point cloud into
one frame. ICP based registration method is developed in this chapter. Through the system
calibration, the ﬁnal point cloud merges all the four individual point clouds and represents
the entire work piece. Octree and multi-resolution is used to speed up the iteration speed.
An object-oriented method is developed to register the ﬁnal point cloud with the CAD model
126

and thus an error distribution map could be calculated and displayed.

127

CHAPTER 6
The development of an Omni-direction
3D camera for mobile robot
6.1

The Working Principle of the Omni-directional 3D
Sensor

The developed omnidirectional 3D camera consists of a projector, a camera, and two hyperbolic mirrors. Similar with traditional omnistereo systems, a projector replaced a camera
and resulted in a structured light based system. The principle of the 3D camera is illustrated
in ﬁgure 6.1. Under ideal condition, optical centers of the camera and the projector coincide
2
2
with the focal points Fm1 , Fm2 of the two hyperbolic mirrors, respectively. The projector is

regarded as an inverse camera, that is, the projector maps a 2D pixel in the projector to a
3D array in the scene.
The optical path of the 3D camera can be described as follows: a light ray from the
2
projector goes through the focal point Fm2 and then intersects with the hyperbolic mirrors

at point Pm2 . The hyperbolic mirror has a useful property that any light ray going towards
one of the focal points will be reﬂected through the other focal point. Hence, the incident
1
light ray will be reﬂected away from the other focal point Fm2 . Then, it intersects with the

128

observed target with diﬀuse reﬂection property so that part of the light ray will be reﬂected
1
to the focal point Fm1 of hyperbolic mirror 1; thus, the light ray will be further reﬂected
2
at point Pm1 of the hyperbolic mirror 1 to the camera through the other focal point Fm1

due to the property of the hyperbolic mirror. Therefore, under ideal condition, the relation
between any 3D point in the scene and the corresponding 2D point in the image sensor can
be represented by a uniform model with a single viewpoint [62].
However, the above ideal mathematical model requires the lens’ optical center to be
coincide with focal point of the hyperbolic mirror. This requirement, however, is diﬃcult to
be completely satisﬁed, especially for the projector since it cannot view the scene directly.
So the uniform model for every pixel of the sensor will cause the residual error, resulting in
incorrect 3D reconstruction.
To solve such a problem, a vector-based 3D reconstruction strategy which determines
the ray vector for each pixel of the camera and projector in the scene was proposed and the
corresponding look-up-tables (LUT) was established to calibrate the system. A light array
Lp (up , vp ) from a projector pixel (up , vp ) intersects its corresponding light array from camera
Lc (uc , vc ). The intersection point P is the reconstruction point. Instead of reconstructing
the light array from the image pixel, light arrays are directly rebuilt in the task space:

1
Lp = Pw + UW p λp

(6.1)

Lc = Q1 + VW c λc
w

(6.2)

1
Pw is an arbitrary point on Lp and UW is a unit vector representing its direction. Q1
w

and Vw in Eq. 6.2 have the similar meanings.

129

6.2

Calibration of the Omni-directional 3D Camera

First, The calibration of the omnidirectional camera was discussed using the vector-based
calibration strategy. As shown in ﬁgure 6.2, Uw is the incident vector in the world frame of
a pixel Ic in the catadioptric camera, which is our calibration parameter for each pixel. By
moving a predeﬁned reference board (white ﬂat board) with serval precise-located markers,
1
2
the view vector Uw can be speciﬁed by the two points Pw and Pw . Although the calibration

concept is straightforward, it is a challenge to calibrate all the pixels simultaneously.
To this end, an extra dioptric projector, as well as a dioptric camera, is introduced.
The dioptric projector shoots two-dimensional encoded patterns onto the reference board;
meanwhile, the dioptric camera records the resulting image of the board. In such a way,
the phase value can be determined for each pixel in the dioptric camera. The origin of the
reference board frame is set to coincide with one marker and the xy axes are parallel to
reference board. Obviously, the z axis is perpendicular to the reference board. In this case,
′
the coordinate Pr in the reference board frame can be transformed to the corresponding Ic

in the image plan frame of the dioptric camera by the homography matrix H:
′
P r = H × Ic

(6.3)

where H can be speciﬁed by using the markers on the reference board since the coordinates of
these markers in the image plane frame and reference board frame are both known. It should
be pointed out that the location Pr can be calculated within the sub-pixel by appropriate
image processing.
The desired location Pw of the point is measured in the world frame. Therefore, the
location Pr in the reference board frame should be further transformed to Pw in the world
frame:
Pw = RPr + t

(6.4)

where the 3 × 3 rotation matrix R and 3 × 1 translation vector t can be obtained through
the known markers. Markers are measured using a high accuracy instrument (e.g. the
130

measuring arm or laser tracker), which is kept static during the calibration procedure so
that the instrument frame is considered as the world frame. Whereas, the dioptric projector
and the dioptric camera are moved to optimal position in sequence when the reference board
is placed at diﬀerent sides.
Furthermore, the illuminated patterns are simultaneously captured by the catadioptric
1
camera so that the corresponding coordinate Pw in the world frame can be determined for
2
every pixel in catadioptric camera. Similarly, the coordinate Pw can be obtained when the
1
reference board is moved to another location. Then, the reﬂected vector Uw and a point Pw

for each pixel in the world frame can establish a LUT, whose size is equal to the resolution
of the catadioptric camera. Without loss of generality, any point viewed in the catadioptric
camera can be presented by:
1
Pw = Pw + αUw

(6.5)

2
1
Pw − Pw
. It should be noted that the point viewed within
2
1
Pw − Pw
the pixel in the catadioptric camera can be realized by using bilinear interpolation with four

where α is a scale factor; Uw =

neighboring pixels.
On the other hand, the calibration of the catadioptric projector can be achieved in a
similar way except with the absence of the extra dioptric projector. For this technique,
encoded patterns are directly shot from the catadioptric projector and the dioptric camera
records the phase distribution and calculates the corresponding locations Q1 and Q2 in the
r
r
reference board frame. Similarly, markers on the reference board are measured by the high
accuracy instrument so that we can obtain the locations Q1 and Q2 in the world frame.
w
w
Hence, any point on the incident light ray of the projector can be expressed by:

Qw = Q1 + βVw
w

(6.6)

Q2 − Q1
w
w
. Eventually, the incident vector Vw and a point
2 − Q1
Qw
w
Q1 for each pixel of the catadioptric projector can also establish a LUT, whose size is equal
w
where β is a scale factor; Vw =

to the resolution of the catadioptric projector.
131

Once projector and camera are calibrated, the point on the observed target can be computed by the intersection of the two vectors of the projector and camera [76].

6.3

Error Analysis for the Designed 3D camera

The accuracy of 3D reconstruction is bounded by the resolution of the imagers. The resolution of a camera is the distance that one pixel covers at a certain amount of distance. Any
variation within the distance can not be directly detected and thus bounds the accuracy
of the designed system. Diﬀerence from a traditional camera, the resolution of an omnidirectional camera is not a constant value. As shown in ﬁgure 6.4(a), the resolution around
the edge is much lower than the center of the image. The resolution decreases along the
image radius. The reason is because of the curvature of the omnidirectional mirror. Since
the curvature of the mirror is not a constraint value, the back image from image also shrinks
at diﬀerent ratios. Figure 6.4(b) illustrates the optical path of omnidirectional camera. The
wall is approximated 2.5 meter from the sensor, and point P moves along the wall. The
corresponding pixel p moves along the radius to the center. Since the resolution is diﬀerent
from pixel to pixel the detection accuracy of point P is highly related with its position. If
point P is around the same height with the mirror, its resolution is low and its detection
accuracy is high. Once the point P moves lower, the resolution arises and the accuracy also
drops.
 

 x
 
2
2
2
2
xI
fx α u0 2eu0  
  ∼ b −ez − a x + y + z 
 y 
 0 fy v0 2ev0   
 yI  =
2 z 2 − a2 x2 + y 2
z 
b
1
0 0 1
2e
I
1

(6.7)

In error simulation, P moves vertically from 300mmto−1500mm, and its corresponding resolutions in image frame and in world frame are shown in ﬁgure 6.6(a) and ﬁgure 6.6(b) respectively. Ideal omnidirectional camera model is applied in calculation through Eq 6.7. The parameters are used design values: a = 18.75mm, b = 18.75mm, e = 40.0mm, fx = fy = 250,
u0 = 320, v0 = 240, X = 2500mm, Y = 0, Z = −1500 300mmand the camera pixels are
132

640 by 480. From the resolution curve in ﬁgure 6.6(a), the resolution is round 7mm/pixel
when the radius is 250 pixels, and the resolution arises to 25mm/pixel when radius is 100
pixels. From the resolution curve in ﬁgure 6.6(b), the resolution reaches 7mm/pixel when z
is 300mm high. As the z decreases, the resolution increase.
The omnidirectional projector resolution curve follows the same working principle with
the omnidirectional camera. The only diﬀerence is that it was put upside down. The
reconstruction error ∆d can be calculated by the resolution error. Shown in ﬁgure 6.5, AP1
is the distance that one camera pixel covers, and BP1 is the distance that one projector
pixel covers. Since the triangleABP2 is similar with the triangle OC OP P2 , ∆d can be easily
calculated: 160mm.
When the object depth is around 4000mm,∆d equals 256mm. If a 2K by 2K camera is
used instead of the current one, ∆d equals 39mm when object distance is 2500mm. ∆d =
62mm when the object distance is 4000mm.

6.4

One-shot Surface Coding for the Sensor

A robust projection pattern is critically important for achieving accurate correspondence for
a structured light based measurement system, especially for a one-shot method. The design
should satisfy the following constraints:
(a) monochromatic light and
(b) pre-warping to cancel mirror distortion and
(c) invariance from scene variation.
Taking the monochromatic light into account, the designed primitives should not contain
color coding information. The background colors will not inﬂuence a monochromatic light
based system. Furthermore, the color based design pattern could not be applied to an
infrared light source, but the monochromatic based design could. The ﬁnal design of the
omnidirectional 3D camera would employe near infrared light source instead of normal one.

133

Traditional one-shot pattern only works ﬁne in a normal structured light system, it could not
be used in omnidirectional structured light system without pre-warping. The convex shape
mirrors in the omnidirectional sensor could distort both the geometries and the orientations
of the primitives. Compared with traditional monochromatic light based patterns, light
arrays are warped by the hyperbolic mirror twice in an omnidirectional system. Hence the
geometrical features based pattern is even more diﬃcult to be calculated compared with the
traditional one. The wanted projection image should be ﬁrst un-warped in order to cancel
the mirror distortion. Since a projector can not receive information, its pre-warp function
is much more diﬃcult than a camera’s. Second, even the un-warping function is calculated,
the projected geometrical light features are still further arbitrarily distorted by the unknown
environment. The designs of the primitives should also be an invariance from this distortion
and then a correct correspondence can be linked between two image frames.

6.4.1

Image warping in an omnidirectional camera

Both of the projector warp and the camera warp are the transformations between conic image
to panoramic views [77]. Camera image warp has been discussed a lot in the past [77]. In
essence, the warping function scans the conic image around the center and then horizontally
allocates each part into a rectangle, shown in ﬁgure 6.7. The radius and the center location
are needed to ﬁnish the warping process. The radius and the center can be easily obtained
for an omnidirectional camera by image analysis, but the warp for the projector is much
more diﬃcult and has been discussed very little in literature. The center of the conic image
can not be directly estimated, since a projector is not an information receiving device.
Through the LUT created in the calibration process, the projector can ’view’ the scene
by a calibrated camera, which is illustrated in ﬁgure 6.3. By treating the omni-projector
as a pseudo omni-camera, its warping function can also be obtained by LUT. Basically, the
designed projection marks are ﬁrst calculated on the reference board in the world frame and
then are transferred back to the projector image frame via LUT. Hence, the conic image

134

center and its radius can be interpolated.

6.4.2

One-shot projection pattern design for the omnidirectional
3D camera

The task to design a one-shot projection is to assign each projected bright pixel / image
primitive / marker one unique codeword to distinguish each other. The unique codeword
will bring correspondence between two image frames.
A projection pattern with concentric circles is designed for the one-shot projection shown
in ﬁgure 6.8. After the center of conic projector image is derived, concentric circles with
diﬀerent radii are utilized as the projection pattern. Each circle is assigned by a diﬀerent
angular frequency and a projection intensity in order to distinguish each other. The intensity
functions of the concentric circles are described by Eq. 6.8 using the inverse Fast Fourier
N
Transform (FFT). The intensity function Ip (r, θ) is designed in the polar coordinate and
2π
r stands for the radius and θ is the angle. N is the number points on the circle. X(r, k) is
the assigned angular frequency for each circle and and Ir is the intensity coeﬃcient for each
circle. Within each circle all of the bright pixels share the same angular frequency: X(r, k).
If the image is scanned from the center to outside by a radius, the circles are labeled by
diﬀerent angular frequencies and diﬀerent projection intensities. If the image is scanned
along the tangent direction of each circle, each bright pixel can be labeled by a diﬀerent
phase angle θ. The angular frequency, projection intensities and the phase angle θ assign
each bright pixel one unique codeword.

Ip (r,






N
θ) =

2π



1
N

N

Ir

X

N
− 2π θ−1 (k−1)
(r, k) ωN
,

k=1

if r = R1 , ...R10

(6.8)

0, else

ωN = e(−2πi)/N

135

(6.9)

Ir = IR1 , ...IR10 , when r = R1 , ...R10

6.4.3

(6.10)

Epipolar constraints and the decoding method

The task for a one-shot pattern decoding algorithm is to extract the codeword so as to establish the correspondence between a projector pixel and a camera pixel. Most importantly, the
extracted codeword should be an invariance against the unknown environment distortion.
The projected pattern is distorted by the unknown test scene and the distortion is used to
reconstruct the 3D coordinates. Diﬀerent object depths could bring about diﬀerent distortions. Utilization of such distortions is the fundamental working principle of a triangular
based 3D reconstruction. However, the drawback of of the distortion is that it could also destroy geometric shapes and the locations of the pixels / primitives / markers. The designed
codeword should avoid such a case. A robust encoding and decoding algorithm should utilize
the invariance information against the distortion. In this paper, epipolar constraints [55] are
utilized to extract the codeword and to ﬁnd out the invariance.
As the objects depths vary, the corresponding camera pixel changes its position. However,
there are still certain rules that it must follow. As shown in ﬁgure 6.9, a projector pixel
Ip (rp , θ) emits a light array, intersects to the scene at point P , and reﬂects into the camera
image plane at point Ic (rc , θ). No mater where P is, Ip and Ic always share the same phase
angle θ and θ is the invariance component from the scene variance. In this paper, each
projection ring is labeled by diﬀerent projection intensity and angular frequency and each
pixel is assigned by an unique codeword: the phase angle. Therefore, the proposed one-shot
pattern is capable of linking correspondence with high accuracy. The intensity of each arc /
circle can been easily detected by the camera and its angular frequency can also be derived
through FFT:

136

N

X (r, k) =

(j−1)(k−1)

Ic (r, j) ωN

(6.11)

j=1

Combined its phase angle, a codeword of a bright pixel can be uniquely determined and be
utilized to set up the correspondence.

6.5

Chapter Summary

This chapter presents a structured light-based omni-directional 3D environment reconstruction system using the projector, camera, as well as hyperbolic mirrors. The corresponding
vector-based calibration strategy is proposed to improve the system accuracy.

137

Zm1

Ym1

1

F m1

Xm1

Pm

 z  e
a2

2

x2  y 2

b2

1

2e
Zc
Yc
Lc (uc, vc)
f

2
Oc, F m1

Xc

(uc, vc ) Pc

(PX, PY, PZ )

(up, vp )Pp
Yp

Zp
2

Oc, F m2

Zm2 P
m

Xp

Lp (up, vp)

Ym2

2

F m2

Xm2

Figure 6.1: The working principle of the designed omni-direction system.

138

Figure 6.2: Calibration of the omni-directional camera.

139

Figure 6.3: Calibration of the omni-directional projector.

140

(a)

(b)
Figure 6.4: Received image from the world frame: (a) the received image; (b) Illustration
of the optical path.

141

'

Figure 6.5: The object depth error from the resolution limitation.

142

(a)

(b)
Figure 6.6: The omnidirectional camera resolution: (a) the resolution curve in image frame;
(b) the resolution curve in camera frame.

143

Figure 6.7: Image warp for a panoramic view.

Figure 6.8: The designed one-shot projection pattern.

144

Figure 6.9: The epipolar geometry of the omnidirectional system.

145

CHAPTER 7
Current Experimental Results and Data
Analysis
This chapter gives the experimental results and data analysis is given based on the results.
According to the implementations of the systems, the measurement results are roughly separated into three categories:
1) The experimental results for the developed Rapid Inspection System (RIS) for free form
surface.
2) The experimental results for navigation system in single viewing direction.
3) The experimental results for the omni-directional navigation system.

7.1
7.1.1

The Rapid Inspection System (RIS)
The Conﬁguration of the System

Rapid Inspection System (RIS) for freeform surface, which is designed and implemented
in Michigan State University Robotics and Automation Lab, is a 3D scanning system for
manufacturing quality control. 3D point cloud and corresponding error distribution map
with the Computer Aided Design (CAD) model will be generated after inspection process

146

has been conducted. Conﬁguration of the system is illustrated in ﬁgure 7.1.

Projector
Camera

Figure 7.1: RIS system conﬁguration.
RIS is a multi-sensor system, having four pairs of cameras and projectors on the top. 3
inches × 3 inches ﬁrm Aluminum T slots are selected to build a frame structure to eliminate
the vibration from the ground. Sensors are mounted on the upper beams, a workpiece,
such as a windshield, is located on the lower holder. Each pair of sensors forms an active
vision sensing system. 3D point cloud in each sensor frame is constructed by calculating
the deformation of Gray Code Line Shifting (GCLS) pattern by the test surface. The LCD

147

projectors on the beam shoot image patterns on the tested object and the 2k × 2k high
resolution cameras receive the reﬂection lights. The ﬁeld of view for each sensor is 2,100
millimeters (L) × 1,600 millimeters (W) × 600 millimeters (H). In the left top red box of
ﬁgure 7.1, it highlights and zooms the cross view of sensor 1. A LCD projector projects the
light patterns and a CMOS camera receives the reﬂection lights.
The pixels on the projector frame and the camera frame denote as mp : (up , vp , 1)′ and
mc : (uc , vc , 1)′ , respectively.
S
The result point cloud P j in each sensor fame can be written as:

S
P j=

Sj

pi (xi , yi , zi ) ∈ R3 , j = 1, 2, 3, 4

S
j
j
P j = f j mp , mc , j = 1, 2, 3, 4

(7.1)

(7.2)

Function f j () denotes the 3D reconstruction model of sensor (Sj ) to recover the point cloud
j

j

from corresponding pairs of pixels mp and mc in both image frames.
The registration to transfer four point clouds into a common frame is deﬁned as:

C Sj
P C = ∪4
j=1 TSj P

(7.3)

C
TS is the transformation matrix from sensor frame j to the common frame.
j

7.1.2

The Inspection Result via Single-shot Coding Algorithm

As explained in the previous chapters, single-shot coding algorithm is capable to measure
a moving object. Assuming the work pieces are moving along the assembling line, singleshot coding method is applied to inspection the parts. Since the single-shot is weak against
the image noises, a pre-treated work piece is utilized: a spayed pillar part from automobile
industry shown in ﬁgure 7.2. Its corresponding inspection result is given in ﬁgure 7.3.
Although the method is weak against image noise and it could only obtain relatively low

148

point density, the method could still obtain the point cloud for a pre-spayed work piece.
The point cloud is much lower than a multi-shot coding method, but is still higher than
traditional CMM based inspection. Compared with multi-projection method and CMM,
single-shot method could measure a moving work piece and its acquisition time is less than
1 second.

Figure 7.2: The inspected pillar with surface pre-treatment.

Figure 7.3: The inspected point cloud of the pillar.

149

7.1.3

The Inspection Result via Multi-shot Coding Algorithm

Multi-projection coding method is more robust against image noise, so it could inspect work
pieces without surface pre-treatment. Figure 7.4 gives the 3D point cloud of three work
pieces by the inspection system. From the result, we could conclude that the system could
inspect the work pieces and provide dense 3D point clouds no matter what optic property
is: black, dark, and even shiny.

Figure 7.4: The 3D point clouds for the work pieces.
The major advantages of the proposed method is to increase the completeness of the
inspection results. The traditional inspection method can not fully inspect the opticallychallenging objects without surface pre-treatment. In order to get a quantity evaluation
of the completeness, two additional experiments were conducted. In the ﬁrst experiment,
the work pieces were powder sprayed. The camera and projector settings, such as camera
aperture, projection brightness, and projection contrast, were properly readjusted to avoid
under- / over- exposure issues. The number of the result 3D point cloud was considered
as the ground truth. If any inspection result reached the same number, the inspection
150

completeness was considered as 100%. The comparison between our results and the ﬁrst
experimental results was to ﬁnd out what is the percentage of the completeness we reached.
The second one is to inspect the same work pieces using naive structured light coding /
decoding methods. The patterns without adaptive mask are projected. In this experiment,
the camera aperture can not be adjusted perfectly. The black piece requires a big aperture,
but the shiny piece needs a small aperture. The aperture is set around the middle: perfect to
the dark piece, but too big to the shiny part and too small to the black part. The comparison
between our results and the second experimental results is to ﬁnd out how much the method
improved.
The results are listed in table 7.1. The experimental results using our methods are listed
as the third one.
Table 7.1: The number of the point clouds in diﬀerent experiments
1st

2nd

3rd

Black one

30434

14793 (48.6%)

29526 (97.0%)

Dark one

77433

71024 (91.3%)

76533 (98.8%)

Shiny one

147755

126486 (90.7%)

143497 (97.1%)

The numbers in the second column in the table 7.1 are the results of the ﬁrst experiment,
and the number of points are considered as the ground truth. Its completeness is deﬁned
as 100%. The numbers in the third column are the results by the naive inspection method.
From the results, we could see that more than half of the black work piece are failed to
be reconstructed, and around one tenth of the other two pieces are failed to be rebuild
either. The reason that black piece received poor performance is that the aperture size is
adjusted relatively small. Therefore the inspection of the shiny part is relatively good but
the black piece is too poor. The numbers from the third experiment shows the completeness
of our method: all the parts can be inspected over 97%. Around 50% of the completeness is
151

improved for the black piece and around 7% of the completeness is improved for the other
two. Because the projection mask could dynamically adjust the illumination, the system
could simultaneously inspect both shiny part and black part.

7.1.4

The Registration Result from Four Pairs of Sensor Frames

A piece of painted windshield was inspected by the RIS system. Each pair of the 3D sensor
covers around a quarter of the windshield. The ﬁnal point cloud representing the windshield
is the merged result from four individual point clouds in four sensor frames. The windshield
is also measured by a CMM machine.
Since CMM has been utilized for years for its accuracy performance, the CMM data is
utilized as ground truth. The diﬀerence between the CMM measurement data and the RIS
measurement data is applied to evaluate the calibration result.

Figure 7.5: The comparison between CMM data and the RIS data.
Figure 7.5 shows both the measurements. The upper three point clouds with triangulated
152

surface are the RIS measurement result after aligned with CAD model with diﬀerent view
angles, and the lower three point clouds with low density are CMM measurement results with
the same three view angles. The evaluation between these two sets employee the following
formulas:

M=

1
NCM M

di =
NCM M

1
NCM M

i
i
PRIS − PCM M

(7.4)

NCM M

i
M is the mean distance between the corresponding points of both sets. di , PRIS , and
i
PCM M are deﬁned as follows:

i
i
di = PRIS − PCM M =

i
i
XRIS − XCM M

2

i
i
+ YRIS − YCM M

2

i
i
+ ZRIS − ZCM M

2

(7.5)

i
i
i
i
PRIS = XRIS , YRIS , ZRIS

T

i
i
i
i
PCM M = XCM M , YCM M , ZCM M

∈ MRIS

T

∈ MCM M

(7.6)

(7.7)

, in which MRIS and MCM M stands for the measurement results from RIS and CMM,
respectively. The corresponding points are found via ICP method explained in the last
chapter. The mean distance between CMM data and RIS data is about 0.15 millimeter.
Considering CMM data as the ground truth, the calibration RIS system has around 0.15
millimeters error. Backprojection of the 0.15 millimeters error into the camera frames, is
equivalent to 0.14 pixel calibration error. By comparison, conventional relatively calibration
errors for a structured light system are around 0.02 pixel in camera and 0.29 pixel in projector
respectively. Since in our sensor model projector’s intrinsic and extrinsic parameters do not
participate in the calculation, 3D points in world frame are directly correspondence with
projector pixels. Backprojected error in projector can be viewed as 0 pixel. The total errors
in RIS can be viewed as 0.14 pixels.
153

The error map of the same piece of windshield from CMM is also applied to compare the
calibration performance of the RIS system shown in ﬁgure 7.6.

Figure 7.6: The CMM error map.
Compared with the error map in last chapter 5.15, we can see that both error maps have
similar distribution color coding. In the CMM error map, green color means that CMM
measurement data has similar height with the CAD model. Blue color indicates that CMM
data is lower than the CAD model. If CMM data is higher than the CAD, the color is red.
The histogram in ﬁgure 7.6 shows that the error roughly ranges from −5 millimeters to 6
millimeters. Therefore, the magnitude of the error map is also similar with each other.

7.1.5

Inspection for Diﬀerent Work Pieces

In order to test the capability of the inspection system, work pieces with diﬀerent dimensions,
geometric shapes, and optic surface properties are measured by the system. Figure 7.7 shows
another tested work piece and its corresponding CAD model is given in ﬁgure 7.8. Figure
7.9 and 7.10 are the point cloud and error distribution map respectively. From the result in
154

this section and previous sections, it is demonstrated that the system is able to inspect work
pieces without pre-treatment via multi-projection surface coding method.

Figure 7.7: The ﬁrst tested work piece with bright color and more complex shape.

7.2

The Navigation System for Single Direction

The structured light system could also be utilized into the navigation application via the
single-shot coding method. This section will provide some experimental result for such a
system.

155

Figure 7.8: The corresponding CAD model for the ﬁrst work piece.

7.2.1

The Prototype of the Navigation System for Single Direction

Figure 7.11 shows the prototype of the navigation system for single direction. The system
consists of a camera, a projector and an infrared light ﬁlter. The system ﬁeld of view is about
400 mm by 400 mm at the distance around 1000 mm. The projector shots the single-shot
pattern into the tested scene and the infrared light ﬁlter eliminates the visible light. The
near infrared light pattern is projected onto the scene and is reﬂected into the camera.

7.2.2

The Fusion Result for the System

Figure 7.12 shows the tested objects in front of the sensor. The white object is a little further
than the wooden one, and the wooden one forms a certain slop. The captured image from

156

Figure 7.9: The point cloud for the ﬁrst work piece.
the camera is shown in ﬁgure 7.13. Although people could not see the projected pattern,
the camera could view both the near infrared light pattern and the scene at the same time.
The experimental result is given in ﬁgure 7.14. The depth information is transferred into
a color code and is fused into the image display. Conventional the operator could only get
2D information and the distance between the object and the camera has to be estimated.
After the color-coded depth information is fused into the image, operator could control the
mobile base more accurately. Furthermore, once the projection pattern is shotted onto a
living object, such as a person or an animal, the infrared light pattern will not disturb them.

157

Figure 7.10: The error map for the ﬁrst work piece.
e

7.2.3

Integration the Single Direction Sensor with a Mobile Robot

After tested the developed sensor, the second step is to integrate it into a mobile robot.
Figure 7.15 shows that the single direction sensor is put in front to the mobile robot and
there is a robot arm above the sensor. Once the sensor returns the fusion results, diﬀerent
operations can be achieved based on 3D and 2D fusion information. The whole system can
be described as ﬁgure 7.16. The system mainly contains Segway mobile platform, SCHUNK
7DOF LWA manipulator, near-infrared 3D sensor, phantom force reﬂect joystick, as well as
tele-operation control computer. The mobile robot system work on two mode: the active and
passive mode. The robot can mobile platform moves under the control of joystick by teleoperation, called passive mode. Similarly, the 7DOF manipulator can autonomously grasp

158

Figure 7.11: The conﬁguration of the navigation system for single direction.
the target object in the workspace with the help of the 3D sensor perception at the active
mode. The communication between the remote operator and robot system takes advantage
of the internet.
Figure 7.17 illustrate the result of the integrated system. The robot arm is initially at
a diﬀerent position. After the fusion sensor returns the 3D coordinates of the desired part,
the robot picks the part and lifts it up. The whole process demonstrates the feasibility of
fusion sensor based control.

7.3

The Omni-direction Navigation System

In order to avoid the blind area, an omni-directional navigation system is also developed.
Hyperbolic mirrors are added in front of the camera and the projector. Current experimental
results are given in this section. The prototype of the omnidirectional sensor is implemented
as illustrated in ﬁgure 7.18. The detail view of the developed omnidirectional 3D camera
is in the right side of the ﬁgure. A projector associated with a hyperbolic mirror is used

159

Figure 7.12: The tested objects.
as an omnidirectional structured light emitter. A CCD camera (model M056-ZA180-002,
resolution 640 × 480) with a hyperbolic mirror is used as an omnidirectional camera to
view the panoramic view. Both imaging devices are put vertical and form the designed 3D
omni-directional camera. The whole omnidirectional perception system was mounted on the
back of the mobile base and there is a robot arm for operator to do the manipulation tasks.
The ﬁeld of view (FOV) of the hyperbolic mirror is expected around 360-degree in horizontal plane with the upper elevation angle approximate 30-degree with respect to the
horizontal plane. The mirror parameters a and b along the semimajor and semiminor axes

160

Figure 7.13: The captured image when projecting near infrared light pattern.
are a2 = 1248.2060, b2 = 351.7940, respectively.
In order to verify the performance of the designed 3D camera, the sensor is put into
a room. The distance from the walls to the sensor is used to evaluate the developed 3D
camera. The omnidirectional 3D camera was roughly located at the center of the room
and obviously the room corners are further than the walls. Due to the limitation of the
projector illumination the environment lights are turned oﬀ, when the structured light is
projecting. The projected image is shown in ﬁgure 7.19. As explained in previous chapter,
several concentric circles are projected from the projector and the radii of these light rings
are used to perceive the object depths. From the ﬁgures 7.19, all the projected light rings are
almost parallel in the task space. The positions of the rings change according to the object
distance: once the distance increase, the rings lifts up; if the distance remains almost same,
the rings also remain horizontal and are paralleled with each other. The radii distortions of

161

Figure 7.14: The fusion result for the sensor.
the projected rings are used to rebuild objects depths. From the ﬁgures, we can easily see that
the rings lift up when they reaches the corners. The received image by the omnidirectional
camera is also indicated by ﬁgure 7.20. Four walls are labeled by four directions: from
east to north. The frequencies and the density variations for the light rings can be easily
obtain to separated each light rings. The phase angle of corresponding pixels are preserved
by epipolar constraints to separated each pixels along the rings. The radii of the rings are
changing according to the distances. We can see how the ring radii change at the four corners
of the room. Due to the low resolution of the camera, the radii changes around the south
corners are not very big, but there are a clear radii changes around the north corners. It is
because that the sensor is quite far way from the north wall but near to the south wall.
The second image is taken when the light is turned on and the fusion result is shown
in ﬁgure 7.21. Diﬀerent colors are assigned to the light rings to indicate the object depths.
Blue color means the distance is far: around 4 meters and red means near: around 2 meters.

162

Figure 7.15: The integrated mobile robot with the designed sensor.
A panoramic view is also provided by ﬁgure 7.22 for a better display. From the image, we
can see that the north wall is farther than the rest, since its color is assigned as blue. The
ring patterns clearly raise at the corners indicating that the corners are further than the
walls. The assigned colors also show that corner are further than walls. Because the ﬁgure
7.21 does not fully cover the conic image, there are several black regions in the un-warped
panoramic view. One is above the person’s head in the ﬁgure 7.22 and two are in the top
right and top left respectively.

163

Figure 7.16: The description of the integrated system.

7.4

Chapter Summary

The chapter provides all experimental results of the developed measurement systems. The
developed RIS system could utilize the single-shot surface coding method to inspect a surface
pre-treated work piece and obtains a fairly dense point cloud. With the aid of multi-shot
coding method, the system could inspect optically-challenging objects without surface pretreatment and obtains a high dense point cloud. Multi-sensor strategy is applied to enlarge
the system ﬁeld of view. A painted windshield is employed to test this mutli-sensor inspection
system. The ﬁnal point cloud is merged from 4 individual one. A comparison between CMM
result is used to verify the result accuracy. Using single-shot coding method and infrared
light, a structured light based navigation sensor is developed. A fusion image with color
coded depth information is provided for the operator. Therefore, the operator can view the
image and obtain the depth information at the same time. In order to avoid blind area, an
omni-direction structured light based measurement sensor is developed. The experimental
result for the prototype of the senor is provided. The unwrapped fusion result is given.

164

However, the accuracy of the measurement result still needs to be improved.

165

Figure 7.17: The result of designed system.
166

Figure 7.18: The developed omnidirectional 3D sensor.

Figure 7.19: Examples of the projected pattern on the left corn of the room.
167

Figure 7.20: The received image by the omnidirectional camera.

168

Figure 7.21: The fused result with the object depth.

Figure 7.22: The unwarped fusion result.

169

CHAPTER 8
Conclusions
8.1

Conclusions

Based on the results and discussion presented in previous chapters, the following conclusions
can be made:
1.

A 3D sensor model with less calibration parameters is developed.

Traditional

triangular-based 3D reconstruction model employees the incident light from a projector and
the reﬂection light to a camera to calculate the intersection point. The recovery of both lights
demands parameters for these two devices, which in total is 22 parameters, to reconstruct
a 3D point. Due to nature of the projector, not an information receiving device, projector
calibration error becomes the major error sources of a structured light system. The novel
3D model optimizes the traditional model by reconstructing 3D point only in camera frame,
resulting that the rotation and intrinsic parameters of a projector are avoided. The pixel
deviation in camera image frame between two inspections: measurement of a plane and measurement a test, is employed to reconstruct the 3D coordinate. Two important concepts in
multi-view perspective geometries, homography and epipolar, are utilized in the 3D model
optimization.
2. A large scale structured light based inspection system for industry application is

170

developed and is successfully implemented. The system is capable to inspect a large moving
diﬀuse reﬂective object with one-shot algorithm and to inspect a large static object exhibiting
a mixture of both diﬀuse reﬂection and specular reﬂection. The inspection volume of the
system is much bigger than traditional 3D optic sensors’.
3. A multi-plane calibration strategy is developed to calibrate the new sensor model. In
order to further increase the accuracy of the system, the residual function for each pixel is
approximated by a second order polynomial. The calibrated sensor can reconstruct 3D point
cloud and represent the 3D shape of the test objects.
3. New surface coding methods are developed: one-shot algorithm for moving object and
multi-shot algorithm for free-formed object.
4. Registration between each sensor frame and common frame is developed based on a
modiﬁed ICP algorithm. Four individual point clouds ﬁnally merge into a entire one representing the whole piece of the object. An innovative object-oriented registration between the
ﬁnal point cloud and CAD is developed to generate the error distribution map. Octree space
labeling and multi-resolution method are utilized to speed up the ICP based ﬁne registration
process.
5. Inspection results are compared the CMM data, which represents the ground truth, to
valid the new sensor model and its corresponding calibration. The mean Euclidean diﬀerence
between 3D point clouds, the CMM data and the RIS data, is about 0.1 millimeter. CMM
data and RIS data exhibit the similar error distribution map for a same work piece.
6. The navigation system for the mobile robot control is developed via one-shot surface
coding method. The system can simultaneously provide the 3D image fused with the depth
information. Therefore, the operator could control the mobile robot more accurate.
7. An omni-direction system was developed. Its sensor model, and its corresponding
calibration, error analysis, and coding method are developed. Primitive results veriﬁed the
feasibility of the developed sensor.

171

BIBLIOGRAPHY

172

BIBLIOGRAPHY

[1] S. T. Newman and A. K. Jain, “A survey of automated visual inspection,” Computer
Vision and Image Understanding, vol. 61, pp. 231 – 262, 1995.
[2] S. Kurada and C. Bradley, “A review of machine vision sensors for tool condition monitoring,” Comput. Ind., vol. 34, no. 1, pp. 55 – 72, 1997.
[3] S. H. Dupont, J. Kastelik, and M. Pommeray, “Structured light fring projection setup
using optimized acousto-optic deﬂectors,” IEEE/ ASME Transactions on Mechatronics,
vol. 4, no. 4, pp. 557 – 560, August 2010.
[4] M. Rea, D. McRobbie, H. Elhawary, Z. T. H. Tse, M. Lamperth, and I. Young, “System
for 3-d real-time tracking of mri-compatible devices by image processing,” IEEE/ ASME
Transactions on Mechatronics, vol. 13, no. 5, pp. 590 – 597, October 2009.
[5] D. Marr and T. Poggio, “A computational theory of human stereo vision,” Proceedings
of the Royal Society of London. Series B. Biological Sciences, vol. 204, no. 1156, pp.
301 – 328, May 1979.
[6] O. Faugeras, Three-Dimensional Computer Vision: a Geometric Viewpoint. MIT Press
Cambrige, 1993.
[7] T. tsai, “A versatile camera calibration technique for high-accuracy 3d machine vision
metrology using oﬀ-the-shelf tv cameras and lenses,” IEEE Journal of Robotics and
Automation, vol. RA-3, no. 4, pp. 323 – 344, August 1987.
[8] K. Levenberg, “A method for the solution of certain non-linear problems in least
squares,” The Quarterly of Applied Mathematics, vol. 2, pp. 164 – 168.
[9] R. Y. Tsai and R. K. Lenz, “Review of the two-stage camera calibration technique plus
some new implementation tips and new techniques for center and scale calibration,” in
Optical Society of America, Topical Meeting on Machine Vision, 1980, pp. 38–41.
[10] H. Q. Zhuang and W. C. Wu, “Camera calibration with a near-parallel (illconditioned)
calibration board conﬁguration,” IEEETrans. Robotics and Automation, vol. 12, pp. 918
– 921, December 1996.

173

[11] H. Luo, L. Zhu, and H. Ding, “Camera calibration with coplanar calibration board near
parallel to the image plane,” Sensors and Actuators, vol. 132, pp. 480 – 486, November
2006.
[12] Z. Zhang, “A ﬂexible new technique for camera calibration,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330 – 1334, 2000.
[13] ——, “Camera calibration with one-dimensional objects,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 26, no. 7, pp. 892 – 899, 2002.
[14] F. Qi, Q. Li, Y. Luo, and D. Hu, “Camera calibration with one-dimensional objects
moving under gravity,” Pattern Recognition, vol. 40, pp. 343 – 345, 2007.
[15] H. C. Longuet-Higgins, “A computer algorithm for reconstructing a scene from two
projections,” Nature, vol. 293, pp. 133 – 135, 1981.
[16] Q. Luong and O. D. Faugeras, “Self-calibration of a moving camera from point correspondences and fundamental matrics,” The International Journal of Computer Vision,
vol. 22, no. 3.
[17] S. J. Maybank and O. D. Faugeras, “A theory of self-calibration of a moving camera,”
The international of Computer Vision, vol. 8, no. 2.
[18] J. Drarei and S. Roy, “Design of a higher jumping rescue robot with the optimized
pneumatic drive,” in Proceedings of the Interenational Conference on Computer Vision
Theory and Applications, vol. 2, Lisbon, Portugal, February 2009, pp. 377 – 382.
[19] S. Zhang and P. S. Huang, “Novel method for structured light system calibration,” J.
Opt. Soc., vol. 45, p. 083601, August 2006.
[20] B. Zhang and Y. Li, “Dynamic calibration of the relative pose and error analysis in a
structured light system,” J. Opt. Soc., vol. 25, no. 3, pp. 612 – 622, March 2008.
[21] Q. Shi, N. Xi, and H. Chen, “Calibration of robotic area sensing system for dimensional
measurement of automotive part surfaces,” in International Conference on Intelligent
Robots and Systems, Alberta, Canada, August 2005, pp. 1526 – 1531.
[22] M. Trobina, Error Model of a Coded-Light Range Sensor. BIWI-TR-164, 1995.
[23] Z. Yang and Y. Wang, “Error analysis of 3 d shape construction from structured lighting,” Pattern Recognition, vol. 29, no. 2.
[24] G. Sansoni, M. Carocci, and T. Rodella, “Calibration and performance evaluation of a
3-d imaging sensor based on the projection of structured light,” IEEE Transactions on
Instrumentation and Measurement, vol. 49, no. 3.

174

[25] P. Chen and D. Suter, “Homography estimation and heteroscedastic noise-a ﬁrst order
perturbation analysis,” Monash University, Tech. Rep., 2005.
[26] A. Adan, F. Molina, and L. Morena, “Disordered patterns projections for 3d motion
recovering,” in 3D Data Processing, Visualizatio and Transmission, September 2004,
pp. 262 – 269.
[27] P. Payeur and D. Desjardins, “Structured light stereoscopic imaging with dynamic
pseudo-random patterns,” in International Conference on Robotics and Automation,
May 2009, pp. 687 – 696.
[28] M. A. Tehrani, A. Saghaeian, and O. R. Mohajerani, “A new approach to 3d modeling
using structured light pattern,” in In: Third international conference on information
and communication technologies: from theory to applications, May 2008, pp. A1 – 5.
[29] C. Je, S. Lee, and R. Park, “High-contrast color stripe pattern for rapid structuredlight range imaging.” in In: Proceedings of the eighth European conference on computer
vision, Prague, Czech Republic, 2004, pp. 95 – 107.
[30] J. Savi, J. Batlle, and E. Mouaddib, “A robust-coded pattern projection for dynamic 3d
scene measurement,” Pattern Recognition Letters, pp. 1055 – 1065, 1998.
[31] E. M. Petriu, Z. Sakr, H. Spoelder, and A. Moica, “Object recognition using pseudorandom color encoded structured light,” in In Proceedings of the 17th IEEE instrumentation and measurement technology conference, vol. 3, Baltimore, MD, USA, 2000, pp.
1237 – 1241.
[32] J. Xu, N. Xi, C. Zhang, Q. Shi, and J. Gregory, “Real-time 3d shape inspection system
of automotive parts based on structured light pattern,” Optics and Laser Technology,
vol. 43.
[33] A. Inokuchi, K. Sato, and F. Matsuda, “Range imaging system for 3-d object recognition,” in Processing Of International Conference of Pattern Recognition, 1984, pp. 806
– 808.
[34] D. Scharstein and R. Szeliski, “High-accuracy stereo depth maps using structured light,”
in Processing Of International Conference on Computer Vision and Pattern Recognition,
2003, pp. 195 – 202.
[35] S. Nayar, “Shape recovery using physical models of reﬂection and inter-reﬂection,” Ph.D.
dissertation, 1991.
[36] S. Nayar, K. Ikeuchi, and T. Kanade, “Shape from inter-reﬂections,” in Proceeding of
International Conference on Computer Vision, 1990, pp. 2 –11.

175

[37] Q. Shi, N. Xi, and Y. Chen, “Recursive measurement process for improving accuracy
of dimensional inspection of automotive body parts,” in In: 2007 IEEE international
conference on robotics and automation, Roma, Italy, 2007, pp. 4764 –4769.
[38] R. Benveniste and C. Ünsalan, “Binary and ternary coded structured light 3d scanner
for shiny objects,” in Proceedings of ISCIS 10, London, UK, September 2010, pp. 241 –
244.
[39] Q. Hu, K. G. Harding, X. Du, and D. Hamilton, “Shiny parts measurement using color
separation,” Proceedings of SPIE, the International Society for Optical Engineering, vol.
6000.
[40] J. Clark, E. Trucco, and L. Wolﬀ, “Using light polarization in laser scanning,” Image
and Vision Computing, vol. 15.
[41] T. Chen, H. Lensch, C. Fuchs, and H. P. Seidel, “Polarization and phase-shifting for
3d scanning of translucent objects,” Proceeding of IEEE Computer Vision and Pattern
Recognition.
[42] T. P. Koninckx, P. Peers, P. D. é, and L. V. Gool, “Scene-adapted structured light,”
IEEE Computer Society Conference on CVPR 2005, vol. 2, 2005.
[43] G. G. S. M. M. H. M. L. P. Sen, B. Chen and H. Lensch, “Dual photography,” in Proc.
ACM SIGGRAPH, Los Angeles, U.S.A., 2005, pp. 745 – 755.
[44] Y. Xu and D. G. Aliaga, “Robust pixel classiﬁcation for 3d modeling with structured
light,” in Proceeding GI ’07 Proceedings of Graphics Interface 2007, New York, NY,
U.S.A., September 2010, pp. 233 – 240.
[45] ——, “An adaptive correspondence algorithm for modeling scenes with strong interreﬂections,” IEEE Transactions on Visualization and Computer Graphics, vol. 15, pp.
465 – 480, 2009.
[46] Y. D. S. Lee and D. Chung, “Registration of multi-range views using the reversecalibration technique,” Pattern Recognition, vol. 31, no. 4, 1998.
[47] C. J. R. Chua, “Point signatures: a new representation for 3d object recognition,”
International Journal of Computer Vision, vol. 25, no. 1, 1997.
[48] Ph.D. dissertation.
[49] P. J. Besl and N. D. Mckay, “A method for registration of 3-d shapes,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, 1992.
[50] Z. Y. Zhang, “Iterative point matching for registration of free-form curves and surfaces,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 2, 1994.
176

[51] A. E. Johnson and S. B. Kang, “Registration and integration of texture 3-d data,”
Cambrige Research Lab, Tech. Rep.
[52] H. Pottmann, S. Leopoldseder, and M. Hofer, “Registration without icp,” Computer
Vision and Image Understanding, vol. 95, 2004.
[53] G. C. Sharp, S. W. Lee, and D. K. Wehe, “Icp registration using invariant features,”
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, 2002.
[54] T. Jost and H. Hugli, “A multi-resolution scheme icp algorithm for fast shape registration,” in Conference of Proceedings of First International Symposium of 3D Data
Processing Visualization and Transmission, Breguet, Switzerland, 2002, pp. 540 – 543.
[55] Z. Zhu, “Omnidirectional stereo vision,” in Proc. of ICAR 2001, 2001, pp. 22–25.
[56] S. Peleg and M. Ben-Ezra, “Stereo panorama with a single camera,” in IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, 1999, pp. 395 – 401.
[57] S. Peleg, M. Ben-Ezra, and Y. Pritch, “Omnistereo: panoramic stereo imaging,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 23, March 2001.
[58] Y. Li, H.-Y. Shum, C.-K. Tang, and R. Szeliski, “Stereo reconstruction from multiperspective panoramas,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 26, January 2004.
[59] S. Baker and S. K. Nayar, “A theory of catadioptric image formation,” in Proceedings
of the 6th International Conference on Computer Vision, January 1998, pp. 35 – 42.
[60] Y. Yagi and S. Kawato, “Panoramic scene analysis with conic projection,” in In IEEE
international conference on robotics and automation, Ibaraki, Japan, 1990, pp. 181 –
187.
[61] K. Yamazawa, Y. Yagi, and M. Yachida, “Omnidirectional imaging with hyperboloidal
projections,” in In IEEE international conference on robotics and automation, Yokohama , Japan, 1993, pp. 1029 – 1034.
[62] S.-S. Lin and R. Bajcsy, “Single-viewpoint, catadioptric cone mirror omnidirectional
imaging theory and analysis,” J. Opt. Soc. Am. A, vol. 23, December 2006.
[63] ——, “Single-viewpoint omnidirectional catadioptric cone mirror imager,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, May 2006.
[64] Z. Zivkovic and O. Booij, “How did we built our hyperbolic mirror omni-directional
camera - practical issues and basic geometry,” Intelligent System laboratory Amsterdam,
Tech. Rep., 2005.

177

[65] J. Gluckman, S. K. Nayar, and K. J. Thoresz, “Real-time omnidirectional and panoramic
stereo,” in In Proceedings of the 1998 DARPA Image Understanding Workshop. Morgan
Kaufmann, 1998, pp. 299–303.
[66] R. Kawanishi, A. Yamashita, and T. Kaneko, “Construction of 3d environment model
from an omni-directional image sequence,” in Asia International Symposium on Mechatronics, Hokkaido University, Japan, pp. 27 – 31.
[67] ——, “Three-dimensional environment model construction from an omnidirectional image sequence,” Journal of Robotics and Mechatronics, vol. 21.
[68] D. S. Ly, “Simultaneous calibration of a structured light-based catadioptric stereo sensor,” Master’s thesis, Universite de Picardie Jules Verne, Amiens, France, 2008.
[69] R. Orghidan, “Catadioptric stereo based on structured light projection,” Ph.D. dissertation, Spain, 2005.
[70] H. Ukida, N. Yamato, Y. Tanimoto, T. Sano, and H. Yamamoto, “Omni-directional 3d
measurement by hyperbolic mirror cameras and pattern projection,” in In IEEE International Instrumentation and Measurement Technology Conference, Victoria, Vancouver
Island, Canada, pp. 365 – 370.
[71] J. Xu, N. Xi, C. Zhang, Q. Shi, and J. Gregory, “Real-time 3d shape inspection system
of automotive parts based on structured light pattern,” Optics and Laser Technology,
vol. 43, pp. 1 – 8, May 2010.
[72] S. Wu, S. Decker, P. Chang, T. Camus, and J. Eledath, “Collision sensing by stereo
vision and radar sensor fusion,” IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTAT ION SYSTEMS, vol. 10, December 2009.
[73] M. Ruﬂi, D. Scaramuzza, and R. Siegwart, “Automatic detection of checkerboards on
blurred and distorted images,” in IEEE International Conference on Intelligent Robots
and Systems, Nice, France, pp. 3121 – 3126.
[74] D. Scaramuzza, A. Martinelli, and R. Siegwart, “A ﬂexible technique for accurate omnidirectional camera calibration and structure from motion,” in Proceedings of the Fourth
IEEE International Conference on Computer Vision Systems, Washington, DC, USA.
[75] Z. Zhang, “A ﬂexible new technique for camera calibration,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330 – 1334, 2000.
[76] J. Xu, N. Xi, C. Zhang, and Q. Shi, “Windshield shape inspection using structured light
patterns from two diﬀuse planar light sources,” in Proc. IEEE/RSJ Int. Conf. Intell.
Robots Syst, Oct 2000, pp. 721 – 726.

178

[77] J. Zheng and S. Tsuji, “Panoramic representation for route recognition by a mobile
robot,” International Journal of Computer Vision, vol. 9.

179