DEVELOPMENT OF LARGE SCALE STRUCTURED LIGHT BASED MEASUREMENT SYSTEMS By Chi Zhang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Electrical Engineering 2012 ABSTRACT DEVELOPMENT OF LARGE SCALE STRUCTURED LIGHT BASED MEASUREMENT SYSTEMS By Chi Zhang The development of large scale structured light based dimensional measurement systems is introduced in this study. Based on different applications, Two different structured light systems are developed: 3D inspection system which emphasized on accuracy and fast environmental reconstruction which emphasized on measurement speed. Quality inspection is a process to evaluate the quality of a manufactured work piece’s 3D shape. Compared with traditional Coordinate Measurement Machine (CMM), structured light based optic measurement has the merits of fast speed, high inspection sample rate, and overall simplicity of operation. The measurement result is further compared with the Computer Aided Design (CAD) model to to evaluate its quality. There are basically three main issues, when the system scale is enlarged. Calibration of such a large system with long standoff distance, large Field Of View (FOV), deep Depth Of View (DOV), and multi-sensor is a challenge task. In order to maintain high accuracy, an innovative 3D sensor model with less calibration parameters was developed. Instead of employing the incident light in projector frame, 3D point recovery was achieved within the camera frame via plane induced parallax information, in which projector’s intrinsic and orientation are avoided in the 3D model. Second, structured light systems is usually weak against surface optical properties. Material exhibiting different color textures, reflection ratios, and especially a mixture between specula reflection and diffuse reflection, fails the optic sensor to correctly acquire incident light information. Traditional structure light method is only valid for parts with diffuse reflection property. A robust surface decoding method for industry application was developed. Monochromatic light was utilized against different object color textures. The illumination of the projector was automatically adjusted based on the test material. Furthermore, an extrapolation model to solve the internal reflection problem and a sub-pixel interpolation model to increase measurement accuracy were proposed too. At last, registration among sensor frame to the common was achieved based on a modified Iterative Closest Point (ICP) method. The final point cloud joined by each individual point cloud could represent the 3D shape of a large work piece. Object-oriented on-line calibration was developed based on ICP and Geometry Dimension and Tolerance (GDT) information to register the final point cloud and the CAD model. Environmental reconstruction for navigation is a process to quickly acquire surrounding information around the vision sensor and presents a 3D fusion display for the operator. Traditional navigation system only employed a camera to get 2D information. Structured light system based on infrared light could quickly rebuild objects depthes and fuse it into the displayed images without influence the operator’s normal vision. The sensed points on the objects are highlighted by the color-codes: from red to blue, which are used to indicate the object distances. In order to measure the moving objects, an one-shot surface coding algorithm was developed. Only one projection image is needed to acquire the 3D information. The codeword is determined in a single pattern because the code of each primitive depends on the values of the primitive and its neighbors. In order to reconstruct the environment without blind areas, an omni-direction panoramic structured light sensing system was developed to increase FOV. Hyperbolic mirrors are put in front of a projector and a camera. 3D reconstruction model was build up associated with the hyperbolic mirror. At last, a 360 image fused with depth information is achieved by the designed system. In summary, the study developed large scale structured light systems based on two applications: accurate inspection for quality control and fast environmental reconstruction for robot navigation. Copyright by Chi Zhang 2012 TABLE OF CONTENTS LIST OF TABLES viii LIST OF FIGURES ix 1 Introduction 1.1 Applications and Motivations . . . . . . . . . . . . . . . . . . . . . 1.2 Background and Literature Review . . . . . . . . . . . . . . . . . . 1.2.1 Implementation of a Measurement System . . . . . . . . . . 1.2.2 Review of 3D Inspection Sensors Based on Optical Methods 1.2.3 Sensor Calibration . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Surface Coding via Structured Light . . . . . . . . . . . . . 1.2.5 3D Data Registration . . . . . . . . . . . . . . . . . . . . . . 1.2.6 Omni-directional 3D Reconstruction Vision Systems . . . . . 1.3 Difficulties and Challenges . . . . . . . . . . . . . . . . . . . . . . . 1.4 Objectives of the Work . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Organization of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 3 6 8 13 17 18 23 25 26 . . . . . . . . . . . 28 29 29 31 32 35 36 36 38 40 41 45 2 Development of a 3D Sensor Based on Structured Light 2.1 Introduction of Imaging Devices . . . . . . . . . . . . . . . . . 2.1.1 A Pinhole Model of an Image Device . . . . . . . . . . 2.1.2 Homopraphy Matrix from Single View Geometry . . . 2.1.3 Epipolar Geometry and Fundamental Matrix . . . . . . 2.1.4 Plane Induced Parallax in an Imaging System . . . . . 2.2 Optimize the 3D Sensor Model via Parallax Information . . . 2.2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . 2.2.2 Recording the Reference Information with Homography 2.2.3 Correspondence between Measurements . . . . . . . . . 2.2.4 Working Principle of the New Sensor Model . . . . . . 2.3 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . 3 Sensor calibration 3.1 Precision Analysis of the Given Sensing System . 3.1.1 Uncertainty Analysis of an Imaging Device 3.1.2 Simulation of the System Error Model . . 3.1.3 System Precision Test Results . . . . . . . 3.2 Sensor Calibration . . . . . . . . . . . . . . . . . 3.2.1 Calibrate Homography Matrix between the the Reference Plane . . . . . . . . . . . . . v . . . . . . . . . . . . . . . . . . . . . . . . . Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Plane and . . . . . . . . . . 46 46 48 50 52 53 53 3.2.2 New Calibration Tasks of the Camera . . . . . . . . . . . . . . . . . . 3.2.3 The Choice of Non-singular Matrix H in New Calibration Tasks . . . 3.2.4 Calibration of Camera Translation Vector Oc and Baseline Distance d Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 56 59 62 4 Surface Coding for a Structured Light System 4.1 Introduction of the Surface Coding Methods . . . . . . . . . . . . . . . . . . 4.2 Single-shot Coding Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Pattern Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Primitive Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Primitive Detection and Codeword Decode . . . . . . . . . . . . . . . 4.3 Multi-shot Projection Coding Method . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Encoding and Decoding Strategy . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Derivation of the Camera Photo Response Function for a Given Scene 4.4.2 Initial guess of the crosstalk function . . . . . . . . . . . . . . . . . . 4.4.3 Advanced Projection Pattern Generation . . . . . . . . . . . . . . . . 4.4.4 Interpolation Model for Diffused Reflection Region and Extrapolation Model for Internal Reflection Region . . . . . . . . . . . . . . . . . . 4.5 Verification of the Projection Mask Performance . . . . . . . . . . . . . . . . 4.5.1 the Verification of the Illumination Adjustment via Projection Mask . 4.5.2 the Verification of the Interpolation and Extrapolation models . . . . 4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 64 66 66 67 69 72 72 73 77 79 80 5 Registrations between Different Coordinates after 3D Reconstruction 5.1 Registration between different 3D coordinates . . . . . . . . . . . . . . . . . 5.2 Off-line Registration Calibration from Sensor Frame to Common Frame . . . 5.2.1 Calculation of Rotation Matrix via Quaternion Algorithm . . . . . . 5.2.2 Iterative Closest Point (ICP) based Registration . . . . . . . . . . . . 5.2.3 System Calibration Procedure . . . . . . . . . . . . . . . . . . . . . . 5.3 Registration between the Final Point Cloud and CAD Model . . . . . . . . . 5.3.1 Octree Space labeling method and Multi-resolution Method based ICP 5.3.2 Feature Weighted Registration . . . . . . . . . . . . . . . . . . . . . . 5.3.3 The Object-oriented online Registration Model . . . . . . . . . . . . . 5.3.4 The Methodology in Current Task . . . . . . . . . . . . . . . . . . . . 5.3.5 Registration Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.6 Error Map Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Sensor Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 99 100 101 103 107 111 112 116 118 122 123 124 125 126 3.3 vi 82 88 88 88 97 6 The 6.1 6.2 6.3 6.4 6.5 development of an Omni-direction 3D camera for mobile robot The Working Principle of the Omni-directional 3D Sensor . . . . . . . . . . . Calibration of the Omni-directional 3D Camera . . . . . . . . . . . . . . . . Error Analysis for the Designed 3D camera . . . . . . . . . . . . . . . . . . . One-shot Surface Coding for the Sensor . . . . . . . . . . . . . . . . . . . . . 6.4.1 Image warping in an omnidirectional camera . . . . . . . . . . . . . . 6.4.2 One-shot projection pattern design for the omnidirectional 3D camera 6.4.3 Epipolar constraints and the decoding method . . . . . . . . . . . . . Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Current Experimental Results and Data Analysis 7.1 The Rapid Inspection System (RIS) . . . . . . . . . . . . . . . . . . 7.1.1 The Configuration of the System . . . . . . . . . . . . . . . 7.1.2 The Inspection Result via Single-shot Coding Algorithm . . 7.1.3 The Inspection Result via Multi-shot Coding Algorithm . . . 7.1.4 The Registration Result from Four Pairs of Sensor Frames . 7.1.5 Inspection for Different Work Pieces . . . . . . . . . . . . . 7.2 The Navigation System for Single Direction . . . . . . . . . . . . . 7.2.1 The Prototype of the Navigation System for Single Direction 7.2.2 The Fusion Result for the System . . . . . . . . . . . . . . . 7.2.3 Integration the Single Direction Sensor with a Mobile Robot 7.3 The Omni-direction Navigation System . . . . . . . . . . . . . . . . 7.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 128 130 132 133 134 135 136 137 146 146 146 148 150 152 154 155 156 156 158 159 164 8 Conclusions 170 8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 BIBLIOGRAPHY 173 vii LIST OF TABLES 3.1 The standard deviation results for all sensors . . . . . . . . . . . . . . . . . . 52 4.1 Pixels binarization without projection mask . . . . . . . . . . . . . . . . . . 94 4.2 Pixels binarization without projection mask . . . . . . . . . . . . . . . . . . 97 7.1 The number of the point clouds in different experiments viii . . . . . . . . . . . 151 LIST OF FIGURES 2.1 The pinhole model of a camera. (For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2 (a) The homography matrix between the image plane and the world plane. . 32 2.3 The epipolar geometry between two image planes. . . . . . . . . . . . . . . . 33 2.4 The pencil of epipolar lines in each image plane on the epipole. . . . . . . . . 35 2.5 A plane W induced parallax in camera image plane. . . . . . . . . . . . . . . 36 2.6 Traditional 3D reconstruction model. . . . . . . . . . . . . . . . . . . . . . . 37 2.7 Recoding reference planar information. . . . . . . . . . . . . . . . . . . . . . 39 2.8 Correspondence between reference recording and work piece inspection. . . . 40 2.9 The optimized 3D model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.10 3D sensor model in the parallel configuration. . . . . . . . . . . . . . . . . . 45 3.1 The difference between precision and accuracy. . . . . . . . . . . . . . . . . . 47 3.2 (a) High accuracy with low precision; (b) low accuracy with high precision. . 48 3.3 Camera error model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4 Projector error model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.5 Error analysis results: height errors when input error equals 0.1, 0.2, 0.3, 0.5 pixel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6 (a) The illustration of homography matrix calculation; (b) The used chessboard. 54 3.7 The light array passing two pinhole models. . . . . . . . . . . . . . . . . . . 56 3.8 Multi-plane calibration strategy. . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.9 The distribution of psr (h1 , u, v) for each layer. . . . . . . . . . . . . . . . . . c 60 3.10 The second order polynomial to approximate the residual function for a pixel 62 ix 4.1 The pattern generation procedure. . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2 An example of occlusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.3 The comparison of different primitives. . . . . . . . . . . . . . . . . . . . . . 68 4.4 The primitive value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.5 Image processing: (a) contour extraction; (b) location detection; and (c) principal axis identification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.6 Determination of the codeword for each primitive. . . . . . . . . . . . . . . . 71 4.7 Traditional encoding and decoding. . . . . . . . . . . . . . . . . . . . . . . . 74 4.8 The flow chart of the encoding and decoding strategy. . . . . . . . . . . . . . 76 4.9 The captured picture when projecting an image with intensity 100. . . . . . 80 4.10 The photo response curve for 5 points in the last figure. . . . . . . . . . . . . 81 4.11 The proper projection values in the camera frame. . . . . . . . . . . . . . . . 82 4.12 (a) The homography mapping among the projector image plane, the camera image plane and the inspection table, (b) the correspondence pixel PP changes according to the PW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.13 a) The initial projection mask; (b) the dilated projection mask; (c) one example of a projection pattern. . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.14 (a) A received image of projected pattern; (b) a received image for the inverse pattern; (c) interpolation of the edge location; (d) estimation of the edge location in a strong illumination condition. . . . . . . . . . . . . . . . . . . . 85 4.15 The final projection mask. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.16 The received image without the projection mask. . . . . . . . . . . . . . . . 89 4.17 The received image with the initial mask. . . . . . . . . . . . . . . . . . . . . 90 4.18 The received image with the final mask. . . . . . . . . . . . . . . . . . . . . 91 4.19 (a) A selected detail view of the black part when projecting pattern without the mask; (b) the same view when projecting the inverse pattern; (c) the intensity curve around point B / B ′ . . . . . . . . . . . . . . . . . . . . . . . 92 x 4.20 () A selected detail view of the shiny part when projecting pattern without the mask; (b) the same view when projecting the inverse pattern; (c) the intensity curve around point C / C ′ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.21 (a) A selected detail view of the black part when projecting pattern with the mask; (b) the same view when projecting the inverse pattern; (c) the intensity curve around point B / B ′ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.22 (a) A selected detail view of the shiny part when projecting pattern with the mask; (b) the same view when projecting the inverse pattern; (c) the intensity curve around point C / C ′ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.1 ICP algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.2 System registration gauge. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.3 Sensor 1 measurement result. . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.4 The center of each marker. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.5 The rough alignment between sensor 1 and CMM data. . . . . . . . . . . . . 110 5.6 The fine alignment between sensor 1 and CMM data. . . . . . . . . . . . . . 111 5.7 System calibration results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.8 Space partition by the octree space method. . . . . . . . . . . . . . . . . . . 113 5.9 The node tree of the octree method. . . . . . . . . . . . . . . . . . . . . . . . 113 5.10 Octree for CAD model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.11 Octree for the final point cloud. . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.12 The difference between two algorithms. . . . . . . . . . . . . . . . . . . . . . 116 5.13 Check fixture and the stopper bar touching datum points of the work piece. . 118 5.14 (a)CAD model with datum points.(b)the registration result based on datum points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.15 The color coded error distribution map for a piece of windshield. . . . . . . . 125 5.16 The fusion result for navigation application. . . . . . . . . . . . . . . . . . . 126 xi 6.1 The working principle of the designed omni-direction system. . . . . . . . . . 138 6.2 Calibration of the omni-directional camera. . . . . . . . . . . . . . . . . . . . 139 6.3 Calibration of the omni-directional projector. . . . . . . . . . . . . . . . . . . 140 6.4 Received image from the world frame: (a) the received image; (b) Illustration of the optical path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.5 The object depth error from the resolution limitation. . . . . . . . . . . . . . 142 6.6 The omnidirectional camera resolution: (a) the resolution curve in image frame; (b) the resolution curve in camera frame. . . . . . . . . . . . . . . . . 143 6.7 Image warp for a panoramic view. . . . . . . . . . . . . . . . . . . . . . . . . 144 6.8 The designed one-shot projection pattern. . . . . . . . . . . . . . . . . . . . 144 6.9 The epipolar geometry of the omnidirectional system. . . . . . . . . . . . . . 145 7.1 RIS system configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.2 The inspected pillar with surface pre-treatment. . . . . . . . . . . . . . . . . 149 7.3 The inspected point cloud of the pillar. . . . . . . . . . . . . . . . . . . . . . 149 7.4 The 3D point clouds for the work pieces. . . . . . . . . . . . . . . . . . . . . 150 7.5 The comparison between CMM data and the RIS data. . . . . . . . . . . . . 152 7.6 The CMM error map. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 7.7 The first tested work piece with bright color and more complex shape. . . . . 155 7.8 The corresponding CAD model for the first work piece. . . . . . . . . . . . . 156 7.9 The point cloud for the first work piece. . . . . . . . . . . . . . . . . . . . . 157 7.10 The error map for the first work piece. . . . . . . . . . . . . . . . . . . . . . 158 7.11 The configuration of the navigation system for single direction. . . . . . . . . 159 7.12 The tested objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 7.13 The captured image when projecting near infrared light pattern. . . . . . . . 161 xii 7.14 The fusion result for the sensor. . . . . . . . . . . . . . . . . . . . . . . . . . 162 7.15 The integrated mobile robot with the designed sensor. . . . . . . . . . . . . . 163 7.16 The description of the integrated system. . . . . . . . . . . . . . . . . . . . . 164 7.17 The result of designed system. . . . . . . . . . . . . . . . . . . . . . . . . . . 166 7.18 The developed omnidirectional 3D sensor. . . . . . . . . . . . . . . . . . . . 167 7.19 Examples of the projected pattern on the left corn of the room. . . . . . . . 167 7.20 The received image by the omnidirectional camera. . . . . . . . . . . . . . . 168 7.21 The fused result with the object depth. . . . . . . . . . . . . . . . . . . . . . 169 7.22 The unwarped fusion result. . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 xiii CHAPTER 1 Introduction 1.1 Applications and Motivations The development of rapid and accurate measurement systems is an active research area. Among current 3D measurement techniques, structured light based 3D vision has made significant strides and has been utilized in many applications, such as manufacturing quality control, 3D profile reconstruction, reverse engineering and environmental reconstruction. According to the system performances, the applications can be roughly separated into two branches: accurate inspection and fast reconstruction. Inspections for industry applications usually emphasize the result accuracy. A structured light based inspection system acquires the reflection lights from the test work pieces and rebuilds the 3D results by a set of points, named point cloud. The point cloud is further compared with Computer Aided Design (CAD) model to find manufacturing distribution error in the application of quality control. The color-code error distribution map is evaluated by the acceptable tolerance, work pieces exceeding the tolerance are considered as unqualified. Fast reconstructions, such as robot navigation, usually consider the acquisition speed as the most important parameters. Its accuracy requirement is relative lower than inspection’s. However, the navigation application is very sensitive with the measurement speed. The 1 system should be able to quickly rebuild a moving object. The developed system employs the infrared light to sense the objects and uses one-shot coding method to quickly rebuild the scene 3D information. The reconstruction result is further fused with the camera captured image for the operator to get 3D information. Currently there are still some limitations preventing the further applications of the technology. First, a large scale of the sensing system for manufacturing industry is needed to inspect big work pieces. Traditional inspection machines, such as CMM, are very sensitive with the inspection scale. The price could increase dramatically once the scale increases. Such an imaging system with large Field Of View(FOV), long standoff distance, random image noise from environment, and registration between different sensors coordinates greatly arises the difficult of the system implementation and calibration. Second, vision based measurement system always suffers from the surface optical properties. Work pieces with various properties, such as bright, white, dark, black parts, and especially the surfaces exhibiting a mixture between specular and defused reflection, greatly influence the vision system. Manufacturing industries need to pre-treat the surface into a diffused reflective one, and then utilize the optic system to measure the work pieces, which increases the cost and the processing time. Therefore, a system capable against optic challenging objects is needed for industry application. Third, there lacks of a quickly on-line navigation system for mobile robot. Since the mobile base is moving all the time, the system should reconstruct the environment based on only one projection. Traditional structured light systems project multi-image patterns could fail to this application. Furthermore, the results should also simultaneously satisfy the efficacy, reliability, high density, and accuracy. At last, an omni-direction 3D navigation sensor is needed for the operator to get 360 degree environmental information. Without the blind area of the vision system, the operator could control the mobile robot and view the surroundings more efficiently. The structured 2 light system could simultaneously compensate the mirror distortion, sense 360 degree, reconstruct objects in the environment and fuse the 3D information into the 2D image. Aiming to these demands, this study is trying to develop proper methodology and reliable machine vision based systems for those applications. 1.2 1.2.1 Background and Literature Review Implementation of a Measurement System The development of a structured light based measurement system can be described as the following parts: 3D recovery model for a traditional active vision system, calibration of the system sensor, encoding and decoding for the structured light, registration between different work frames and omni-direction system. The imaging sensing device, a camera for example, is usually simulated as a pin-hole model. Objects in the 3D world frames maps into the image frame by a 3 × 4 perspective matrix. Only 2D information is preserved through the mapping, and the limitation of the last degree of freedom is missing. The recovery from an imaging pixel forms a linear array instead of a point. In order to accomplish the 3D reconstruction task, additional constraint is needed to fix last degree of freedom. The 3D recovery model, as the starting point to design a structured light, build up the work principle of the system sensors. Aided from another devices, such as a laser, a projector, or another camera, the third constraint equation is able to be build up. The 3D recovery model will control the following implementation steps of the system, such as configuration, calibration, and so on. An simple, precise, and easy to be implemented 3D recovery model plays an important role in the system design. After the 3D recovery model is fixed, the next step is to find proper way to calibrate the system. A structured light system usually consists of a projector and a camera. Calibration tasks usually involve both devices’ the intrinsic and extrinsic parameters. The intrinsic parameters controls the mapping from its image frame to its own sensor frame, and the 3 extrinsic parameters transfer the sensor frame into the world frame. Together there are totally 22 parameters for both devices. If the 3D model is properly designed, less parameters could be calibrated. Structured light system comes from stereo vision system by replacing one camera with a projector. Stereo vision system suffers from correspondence problem: only corresponded pixels in both imaging devices could build up the 3D points. Without feature aided, stereo vision system could not establish the correspondence. Therefore, only a few points having clear features could be recovered. Structured light system replaces one camera with a projector. The projected pattern is properly design to encode certain a codeword into the incident light. The camera records the reflection light and extracts the codeword. Since the incident light and the reflection light preserves the codeword, pixels sharing the same codeword in both imaging devices are corresponded. A proper designed projection pattern could generate the correspondence without the influence from the textures and geometries of the test objects. Therefore, a robust encoding and decoding strategy is critical for a structured light system. According to the number of projections, the encoding and decoding methods can be separated into two categories: one-projection based method and multi-projection method. One-shot projection method’s acquisition time is much shorter than multi-projection method, and therefore can be used to measure moving objects. However, its result point density and its accuracy is less than multi-projection method. Thus, one-shot projection method is usually used for 3D reconstruction. Multi-projection method is mainly utilized to inspect stable work pieces. With multiple projections, the method could inspect optically-challenging objects without pre-treatment, such as objets with different color, different reflection ratios, or material exhibiting a mix between diffuse reflection and optic reflection. Two encoding and decoding methods are introduced separated in the study. After 3D point cloud is recovered in each individual frame, usually there are some post prossing tasks to obtain the final results. If the system contains more than one pairs of sensors, registration between each sensor frame and a common frame is needed to merge all 4 small point clouds into the final one. For quality control application, the final point cloud needs to compare itself with CAD model, so addition registration between the common frame and the CAD frame is also needed. For navigation application, the final point cloud should be fused into the 2D image display. The third dimension, object depth: distance from objects to the camera, is fused into the image via color codes. Distance from far to near is assigned by different colors, from blue to red. Therefore, registration from 3D world frame to the image frame is needed to finish the fusion. Omni-direction systems usually has different work principles with the traditional structured light systems, because of the additional convex shape mirror in front of the cameras and the projectors. Its literature view is introduced in a separated section in this chapter. From the view of applications, structured light systems can be roughly separated into two branches: accurate inspection and the fast environment reconstruction. From the of development, the difference between two applications lies in the encoding and decoding strategy. Accurate inspection usually employees multi-shot projection to acquire test objects information to obtain a high dense and accurate point cloud. Fast environment reconstruction on a mobile robot usually adopts single-shot projection to record surround information to increase sample speed. Once the applications’ scale increases, especially for the FOV of the system, both branches also utilize different methods to enlarge detection range. In order to increase of FOV of the system, accuracy inspection puts more multiple sensors to cover a big work piece. Traditional methods, such as enlarging the standoff distance, could also increase the FOV of the system. However, the resolution area of an imaging device, the area one camera / projector pixel covers in the world frame, also increases. The deviation within the resolution area could not be detected by traditional methods, and thus the results accuracy is reduced. Therefore, multi-sensor strategy could enlarge the FOV and maintain high accuracy at mean time. For the application of navigation, there is limited space on the mobile robot. So it is very difficult to put multiple sensors on the robot.In stead of multiple sensors, convex shape 5 mirrors are put in front of the projector and the camera to enlarge the FOV. The method could also detect 360 degree of view without blind area in the limited space. Since the accuracy of navigation is much small than the quality inspection, the error within resolution area can be tolerated. In the following parts of this chapter, the literature is mainly follow the development steps of a structured light system: 3D recovery model, system calibration, encoding and decoding for the structured light, registration between different work frames and at last the omni-direction system. Special attentions are given to two major applications: quality inspection and environment reconstruction. 1.2.2 Review of 3D Inspection Sensors Based on Optical Methods Quality inspection is a process of finding the deviation between the work pieces and CAD model by a set of specifications [1]. The whole process can be divided into two parts: the measurement of the work pieces and the generation of the error distribution map. Basically, existing measurement techniques can be separated into to categories: contact based and non-contact based. A CMM is a traditional system commonly used for dimension inspection based on contact method. A touching probe is employed to contact the work piece surface and obtain the coordinates of the points. Through the accuracy of the a CMM is very high, the nature of point-wise contact based method limits both CMM’s measurement speed and the capability of inspecting complicated work pieces. Non-contact measurement via machine vision [2], with the merits of fast speed, high inspection sample rate, and overall simplicity of operation, becomes a popular alternative inspection method for CMM. To many applications in the manufacturing industry, inspection of a large part requires multiple 3D vision sensors to cove the whole piece, in which additional registration between each sensor frame to the common frame is needed to complete the final point cloud. At last the final measurement result, the point cloud representing the test work piece, is compared with the Computer Aided Design (CAD) model to find the error distribution map. The color-coded 6 error distribution map will be further evaluated with the acceptable tolerance by requirement. Only the work pieces within the manufacture tolerance can be delivered to the customers. Furthermore, a machine vision system is also widely utilized in different industry fields [3] [4]. The problem of retrieving 3D information form an object by vision method starts from the basic imaging device: a camera. A camera, which is usually simulated by a pinhole model, presents a perspective mapping from 3D world frame into the image frame. Each pixel in the image frame recovers a light array passing the optic center instead of a point in the world frame. In order to achieve the 3D recovery, additional constraints are needed to limit the last degree of freedom along the light array. Stereo vision [5] was developed in 3D recovery based on two cameras viewing the same object from two different view points. The correspondence pixels from two cameras build two light arrays by pinhole approximation. The intersection point between two corresponding light array recovers the 3D coordinates in the world frame. Sometimes, the middle point of the common perpendicular of the two corresponding light arrays is utilized as the 3D reconstruction point due to the calibration error. Stereo vision is also called as passive vision, since non of the sensors emit any kind of radiation but instead rely on detecting the radiation reflection by the objects. Despite it was widely used in computer vision, it presents several drawbacks in 3D reconstruction. The first one is the correspondence problem. In other words, determining relationships from pixels of different views is not a trivial step. The second one is resolution of 3D points. The technique can only reconstruct points with unique features. Therefore, dense reconstructions is difficult to obtain. In order to overcome both drawbacks, two solutions are developed when dens reconstructions are required. The first one is adding an active imaging device, such as a laser or a projector, projecting a group of pre-defined image patterns, which is also called structured light. The codeword carried by each pixel from the projector is designed to be unique. Therefore the correspondence in both cameras is able to established. The limitations of the this method lies in: a) the Field Of View (FOV) is the overlap of three 7 imaging devices, b) the Depth Of View (DOV) is minimum of three image devices, and c) additional device increases the cost of the system. Facing the limitation of the first solution, the second one is proposed by simply replacing one camera by a projector in the stereo vision. The projector provides both correspondence and the incident light array equation. Treating the projector as a pseudo camera, 3D reconstruction can also achieved by the intersection between corresponding light arrays. Since projector emits lights, the system is also called structured light based active vision system. In this study, 4 pairs of structured light based active vision sensors are employed to build a large scale measurement system as a prototype for an industry quality control system. 1.2.3 Sensor Calibration Calibration of the 3D sensor is crucial for a measurement system to get accurate results. Since a 3D sensor consists of a camera and a projector, its calibration can roughly separated into two categories: camera calibration and projector calibration. The pinhole model approximation of each image device adopts intrinsic and extrinsic parameters to construct the light array equation from a given pixel. Intrinsic parameters transfer the image pixel into a linear light array in the camera frame. Extrinsic parameters transfer the camera frame into the world frame on the inspection table. Conventional calibration methods calculate the intrinsic and extrinsic parameters of a camera or a projector. Camera calibration has been studied for decades, With the aid from the correspondence between a calibration gauge and pixel locations in the image, parameters of a camera can be regressed. In the calibration process, the calibration gauge plays an important characteristic function. A large scale 3-dimensional gauge with high accuracy usually costs a lot. Based on the dimension of the gauge, calibration techniques can be roughly classified into the follow categories: 1) 3D reference object based calibration. Camera calibration is performed by observing the gauge whose geometry in 3-D space is known with accurate precision [6]. This kind 8 of approaches usually require an expensive calibration apparatus, because it costs a lot to manufacture an gauge with accurate 3D coordinates, especially for a large scale one. 2) 2D plane based calibration which is also the most common used method. In order to reduce the cost of the gauge, calibration algorithm is optimized to obtain the camera intrinsic and extrinsic parameters on planar objects. Tsai [7] adopted a planar gauge to obtain camera parameters via two steps. In the initial step, constraints between parameters, such as a rotation matrix is orthonormal, are not enforced, but a quantity that simplifies the analysis and leads the tasks to linear equations via Radial Alignment Constraint (RAC). Solution of the linear equations is calculated by least square method and is utilized as the starting values for the final optimization in step two. In the second step, the rest of parameters are obtained via non linear optimization Levenberg-Marquardt Algorithm (LMA) [8] that finds the best fit between the observed image points and the predicted points by the target model. Tsai’s method is known as two limitations: a) the solution procedure, however, fails if the camera plane is nearly parallel with the calibration board. Tsai et al. [9] also devised an algorithm that handled the exactly parallel case. Years later, Zhuang and Wu [10] extended Tsai’s method to calibrate a camera in a near parallel configuration when camera intrinsic parameters were known. Luo et al. [11] proposed another algorithm to calibrate a camera when intrinsic parameters were unknown. b) The second limitation is that the translation between the calibration planes is needed to regress camera parameters, which results an elaborate setup. Different from Tsai’s method, Zhang [12] proposed a new planar based calibration method, in which the knowledge of the plane motion is not necessary. The orthogonal constraints from the rotation matrix were utilized as the calibration conditions. The homography matrix between image plane and calibration plane was factorized into two parts: intrinsic parameter and a matrix consisting of two columns from rotation and translation vector. An analytical solution can be calculated as initial condition and numerical solution can be found iteratively through LMA. 3) 1D calibration. In order to further reduce the requirement of the calibration gauge, 9 camera calibration based one 1D gauge is proposed. The calibration results can reach fair level of accuracy. Zhang [13] first proposed to calibrate a camera with a 1D object. The solution can be found if one point is fixed. A closed-formed solution is developed if six or more observations of such 1D object are made. Qi. et al. [14] extended Zhang’s 1D calibration method, and calibrate a camera with aid from a gravity driven 1D object. 4) Another groups calibration are called semi-calibration or self-calibration. Such kind of calibration usually assumed some parts of the parameters are known, and can calibration the rest parameters without any requirement of any reference gauge. Therefore, sometimes this kind of calibration methods are also called 0D calibration. Semi-calibration usually adopts epipolar geometry constraints of the same object from two views. Essential matrix was first mentioned by H. C. Longuet-Higgins [15]. If two camera’s intrinsic parameters are known and normalized, the extrinsic parameters can be solved up to 4 analytical solutions by SVD method. The essential matrix can be determined from eight-point algorithm [?]. If the transformation between two positions is known, a camera’s intrinsic parameter can estimated by fundamental matrix from epipolar constraints [16] [17]. The history of projector calibration is much shorter than camera calibration. After structured light based 3D reconstruction model was developed, people began to investigate different methods on projection calibration. Compared with a camera, the calibration of a projector is always more difficult and the process is more complicated, since it is not a information receiving device. The dimensions of the gauge can not be directly applied to the calibration process and the projected pattern needs to employee another imaging receiving device, such as a camera, to acquire. Therefore, the projector calibration error is usually bigger than the camera’s, and becomes the major source of errors in a structured light system. Drarei et al. [18] treated a projector as a pseudo camera with the aid from a calibrated camera and then utilized camera 2D calibration method to calibrate the projector. The homography matrix between a projector pixel location and 3D points regressed from the camera is used to regress the projector parameters. However, this kind of methods brings 10 the camera calibration error into the projector calibration, which double the final error. Zhang et al. [19] first adopted structured light to set up a link table between a projector and a camera, and then applied traditional 2D camera calibration methods to regress projector parameters. The 3D coordinates of the points on the gauge is not regressed from the camera, so the camera calibration error will not participate into the projector calibration. However, the calibration steps are very complicated. The planar gauge was placed around 20 different positions and 3D points on each position were recorded by horizontal and vertical Gray Code and Phase Shifting (GCPS) structured light. Since GCPS is not robust against image noise, the miss-linked location by GCPS also costs calibration error. Eipoplar constraints describes the internal geometric relationship between two views and be welled used in self-calibration. Homography matrix which is used to calibration a camera by 2D gauge also receives a big success in a camera calibration. Combining these two, epipolar constraints and homogrpahy matrix, Zhang et al. [20] calibrated the relative rotation and translation between a camera and a projector. But the algorithm requests the intrinsic parameters need to be known first. In order to reduce the error from the projector calibration, Shi [21] did not regress the pinhole model parameters of the projector but created a link table between camera light array and projector light array with respect to each level of height within the whole work space. The results are accurate, but the method is limited by the size of link table, which means it can not be used for large scale system. Convectional calibration methods of a structured light system calculate both equipments intrinsic and extrinsic parameters, leaves a challenge task to calibrate a projector. Although projector calibration can be done via a complex way, the solution either contains too much error or time consuming. This thesis brings the camera planar calibration method and epipolar information into 3D reconstruction model to reduce the parameters in 3D model. The novel strategy is that the camera calibration information based on planar method can be saved as the reference information. The acquisition information of the test object is used to compare with the pre-saved reference information. The corresponding pixel deviation 11 between reference and a test object in camera image is applied to reconstruct the 3D point cloud, in which projector’s intrinsic and orientation parameters is avoided in calculation. Therefore, the projector calibration is not necessary for 3D reconstruction anymore. The correspondence function provided by the structured light is not only used to set up correspondence between a camera and a projector, but also establishes corresponding pair of pixels in a camera between two measurements: the camera pixel recording reference information and the corresponding camera pixel recording tested surface information. Error analysis is another important topic in the calibration of a given 3D surface profile measurement system. As early as 1980s, when the active vision system was invented, the error analysis problems have been pointed out. People did various kinds of error analysis to evaluate the system from different aspects and error sources.The error model developed by Trobina et al. [22] for an active range sensor was mainly concentrated on the errors from the camera perspective geometry.Yang et al. [23] proposed a more detailed mathematic model for each calibrated parameter in the 3D reconstruction.A relative error model [24] was presented for calibration of an active triangulation sensing model. Uncertainties analysis and their effects in 2D camera calibration [25] have also been discussed. In summary, people mainly concentrated on the error analysis related with device parameters, sensor model, and calibration methods in the past. As the scale of a 3D profile sensing system increases, the consistence of the system cost by the random image noise also needs to be evaluated. In practical applications, especially for industry applications, consistence decides whether the measurement results are reliably or not. When the FOV and standoff distance of the system increase to a scale that consistence error is no longer small enough to ignore, consistence analysis and test need to be done before the calibration starts and a poor consistence test result requires the redesign of the system. In this study, the consistence performance of the design system is analyzed, simulated, and tested. Tradeoff values are chosen for the standoff distance and inspection area by considering consistence error. 12 1.2.4 Surface Coding via Structured Light The encoding and decoding procedure of the structured light is one of the most critical factors for an optic measurement system. The biggest challenge is the capability of simultaneously satisfying the efficiency, reliability, and accuracy requirements. The projected patterns are deformed by the free-formed test surface. Therefore, the codeword is designed to be against this distortion and at mean time the encoded pattern needs to carry as many codewords as possible. The corresponding decoding on-line image calibration algorithm should dynamically calibrate the received image against the distortion and image noise to extract the codeword. Since these two procedures are always be coupling with each other, the literature review of both functions are explained simultaneously. Several pattern strategies have been proposed to make a fast 3D inspection using a single or a few patterns, which can be categorized into two types in terms of test objects: one-shot pattern for moving objects and multi-pattern for static objects. The moving objects can only be inspected by one-shot pattern algorithm, otherwise, the system may acquire incorrect pattern code due to the movements of the parts. The major challenge for a one-shot pattern is the capability of compressing enough codewords. People employed different strategies for it, such as color and spatial-neighborhood patterns. In the color method, the codeword is determined in a single pattern because the color of different primitives, such as dots, lines, grids, are different. The decoding calibration can separate each codeword via color filter. Spatial-neighborhood can be encoded because the code of each primitive (symbol) depends on the value of the primitive and its neighbors. Usually both strategies are combined together to create a one-shot pattern. Pattern with round disks, small color dots on the black background, was developed [26] [27]. Decoding function first separated each image primitive, and employed color information to assign unique codeword for primitive. The codeword density can be increased via strips design [28] [29] or grids design [30] [31]. The drawback of color strategy is that the color contrast ratio can be affected by the color reflectivity, inherent texture of the work piece, and ambient lights, 13 which results in less reliability than monochromatic light, such as black-white light. In this study, a new structured light pattern based on the spatial-neighborhood strategy was proposed [32]. The primitive of the pattern is based on the corner of the black-white checkerboard pattern, which is also called the X-point. The X-point provides more robust and accurate position compared with a conventional pattern in which the primitive is based on symmetrical symbols such as disk, strip, etc. The orientation of the X-point is used to encode the primitive of the pattern. Moreover, the point cloud density in different regions of the inspected part can be adjusted.This strategy is very useful for applications, in which high point density is only required for some critical regions such as connection interfaces. Multi-shot strategy is only available for stable objects but it is more robust against image noise than one-shot method. After detecting the illumination problems, such as shadow problem, under- / over- exposure problem and internal reflection problem, the projection pattern could be readjusted by a proper feedback. The recent research is mainly focused on how to improve the inspection accuracy and point cloud density. Among many strategies, Gray Code [33] combined with Phase Shifting, such as sinusoid wave [34], received a good result and was utilized in may areas. After successfully inspecting diffused based material via multi-shot method, people started to apply the multi-shot method to detect optical-challenging objects without pretreatment. As a pioneer, Nayar [35] [36] built upon a shape-from-intensity method and iteratively approximate the 3D shape of the test objects. But the method is only to model relatively simple geometries instead of inspections. Later, people began to investigate more robust methods against optically-challenging objects. Literature shows that there are approximately four categories that people applied to increase the measurement performances: directly coding method, color invariance based method, polarization based method, and adaptive projection method. In directly coding methods, people tried to develop encoding patterns which were robust against image noises. Instead of phase shifting method, digitized Gray Code Line Shifting (GCLS) [37] [22] attempted binary square wave as the projection patterns. Binary wave 14 achieved better performance than sinusoid waves. Different from phase shifting method, decoding process was changed from analog sample into digitized sample. Codeword 1 is encoded as intensity 255 and 0 is encoded as intensity 0 in the projector frame. High image intensity is decoded back as ’1’, and low image intensity is considered as ’0’ in the camera frame. However digitized directly coding method is still weak against shiny material. The unilluminated area can also result in high image intensity in the camera frame due to the internal reflection. Low reflection coefficient could also cause low image intensity in camera frame even high illumination pattern is projected. Combined binary pattern with color information [38], some shiny objects were successfully measured. The authors hypothesized that the effects of internal reflection and ambient light in the scanning process could be eliminated via color invariants. The inter-reflection is not preserved under bas-relief (GBR) transformation. However, the method could not inspect the black material which could absorbs all incident colors. Furthermore, under- / over- exposure problem was not mentioned in the study. Therefore, the capability of the method is limited. Based on the fact that specular reflection changed the polarization direction of the incident light, polarizers were added in front of the cameras / projectors to filter out the internal reflection light [39] [40]. Therefore, shiny objects could be measured in their systems. However, without the pre-knowledge of test material optic properties, surface normal, and exact number of internal reflections, authors had to manually and carefully adjust the polarizers to a proper angle to filter out the internal reflection light. Once the position of the part changes, or a new part was put into the system, polarizers had to be re-adjusted [41]. Therefore, the system is not very flexible. Adaptive coding methods could dynamically modify projection illumination based on the scene, which can avoid the over-under exposure problem and be further used to eliminate the internal reflection issues. The method is more robust than the polarization based method, since no modification is needed if a new part is inspected. However, the biggest challeng- 15 ing is the correspondence problem between the camera and the projector. The projector illumination needs to be adjusted but the receiver is the camera not the projector. The correspondence, which is usually called crosstalk function, is needed to create the feedback from the camera to the projector. After feedback information is obtained, the illumination of the structured light could be dynamically adjusted to measure optically-challenging objects. However, the crosstalk function is a dilemma for a structured light system: structure light is used to find out correspondence, but the correspondence is needed to create a scene based structured light. Koninckx et al. [42] proposed a scene adapted structured light system to avoid under- / over- exposure problems. Not only the reflection coefficient by scene is considered, but also the nonlinear responses for both imaging devices are simulated. However, the approximate knowledge of the geometry has to be pre-known to solve the dilemma. Without the foreknowledge, the method is relatively lack of flexibility for different work pieces. P. Sen [43] estimated crosstalk function by turning on each projector pixel each time, and also applied an adaptive subdivision to speed up the algorithm. However, the authors only use the method to dynamically control the illumination condition but not to reconstruct the given scene. Furthermore the number of the projecting and recoding loops is too many. Therefore, the efficiency is very low. An adaptive corresponding algorithm was developed [44] [45] by Xu et al. to inspect objects with strong internal reflection. The authors classify the captured areas as certain areas and uncertain areas. They iteratively peel out the certain areas and enhance the illumination condition on the uncertain regions. Objects with strong internal reflection can be measured in their system without any pre-knowledge of the test geometry. While the limitation is that its iteration numbers is not fixed. It may cost hours to inspect a scene [44] [45] and thus the algorithm efficiency is relatively low. In summary, a robust, flexible, and efficient inspection system for the optically-challenging objects is needed for quality inspection in industrial field. In this study, a robust encoding and decoding algorithm to measure optically-challenging objects is proposed. Monochromatic WB stripes are applied to avoid the influence from test objects’ colors. A scene based 16 projection mask is utilized to adjust the projection illumination so that the under- / overexposure problem and shadow problem can be eliminated. Combined with the projection mask, an extrapolation model is developed to estimate the edges between high and low illuminated regions. Therefore, shiny objects could also be inspected in a structured light system. 1.2.5 3D Data Registration Registration of point clouds is to find Euclidean motions among a set of point clouds or, between a point cloud and the model set of a given part, respect to a common reference frame. For a large scale inspection system, two categories registrations are needed: 1) Registration of multiple point clouds into one common frame. 2) Registration between the final point cloud and the CAD model. Registration problem can be roughly considered to be two parts: coarse alignment and fine registration. Coarse alignment determines an initial condition and fine registration estimates a best solution to be close to the truth as much as possible. Without coarse alignment, fine registration is time consuming and often fails to find a solution. Methods for coarse alignment include principle component analysis (PCA) [46], point signature [47], and image transformation [48], etc. One of the most famous fine registration methods is the Iterative Closest Point (ICP) algorithm [49] [50]. ICP is based on the search results of pairs of nearest points in the two data sets to estimate a coordinate transformation. This iterative process executes until result reaches the pre-defined threshold. Many improvements and modifications of the basic ICP algorithm have been developed: Johnson et al. [51] adopted color information combined with Euclidean distance to set up correspondence in order to improve the registration results. Instead of searching the distance between corresponding points, Pottmann used the distance between the surfaces as a registration evaluation standard [52]. Gregory investigated the Euclidean invariant features in generalization ICP in order to decrease the probability of 17 being trapped in a local minimum [53]. Moreover, ICP method needs a lot of processing time to find the corresponding closest points. In order to deal with this problem, people tried to increase the speed of the algorithm. Timothee used multi-resolution way to speed up the ICP searching process [54]. ICP family is a good algorithm in fine registration, and the merit of this algorithm is that the total linear least square error will be minimized after the final alignment. In this study, ICP based method is utilized to solve the registration of multiple point clouds into one common frame. However, this merit will change into a drawback in error map registration between final point cloud and CAD. The error between work piece and CAD model should not be minimized by registration process. The reduced error map can not reflect the correct manufacture error, therefore, an object-oriented registration method based on ICP is proposed to solve the problem. 1.2.6 Omni-directional 3D Reconstruction Vision Systems The capability of vision based control is highly related with information provided to the operator / computer. An omnidirectional 3D camera is a system that provides rich 3D information with a wide rang viewing and thus becomes a valuable choice for such applications. An omnidirectional camera offers a 360 Field Of View (FOV) and a complete information about the environment surrounding the vision sensor. The corresponding realtime 3D perception algorithm could further extend the capability of the omnidirectional camera to measure the depth information. Applications, such as robot navigation, remote manipulation, auto-driving system, map reconstruction, and pedestrian detection, demand this kind of vision sensor to quickly and accurately collect necessary information. With the aid of an omnidirectional 3D camera, the navigated robot could be easily avoid collisions; the remote controlled robot arm could freely pick and place parts all around; map reconstruction for an unknown scene could be achieved within one pass, and the pedestrians approaching from the rear could also be detected. Aiming on this demand, we develop a low cost structured light 18 based omnidirectional 3D camera, consisting of a projector, a camera, and two hyperbolic mirrors. A panoramic view is provided around the sensor without any blind areas, and the depths from the sensor to the objects are measured within one-shot projection. Basically there were two ways to obtain omnidirectional images [55]. The first approach was to generate the omnidirectional images through the image mosaic method. The method either employed multiple cameras viewing different directions or utilized one rotating camera [56] to obtain multiple images of the environment. The final omni-directional image was the merged result using the mosaicing method. 3D reconstruction by such methods was usually achieved by rotating a stereo vision system consisting of two cameras [57] [58]. In each rotation angle, the system was considered as a normal stereo vision system and acquired the 3D coordinates. An omni-directional 3D rebuild was achieved by scanning the environment a full 360 degrees. For such a system, it could not build all the environment at same time and the mechanic rotation could also influence the accuracy of the result. The second approach is to use an omni-direction camera [59] [60] [61] to obtain panoramic vision. An omni-directional camera usually consists of a convex shape mirror, such as a hyperbolic, parabolic, conic, or sphere mirror, and a normal camera. By viewing the back image of the mirror, a panoramic vision can be obtained. The math models [62] [63] and the configurations [64] have been discussed a lot in the past. An omnistereo vision system could be formed to obtain 3D coordinates by employing two omni-direction cameras. Via vertically-aligning two omnidirectional cameras [65], an omnistereo system was successfully designed and was implemented. A 3D sample point was rebuild by the intersection of two corresponding light arrays from two calibrated omnidirectional cameras. Monocular vision could also rebuilt 3D information but less commonly compared with stereo vision method. In order to reconstruct an environment map, an omnidirectional camera was put onto a mobile robot [66] [67]. Via the estimation of the robot motion, the feature points in the scene could be reconstructed. However, both systems usually suffered from the correspondence problem and both measurement accuracy and result 19 density were limited by this correspondence issue. Facing on the correspondence problem, a structured light based omnidirectional vision sensor was developed [68] [69]. The authors put a laser facing to a conic mirror and then formed their structured light emitter. An omni-direction camera and such a light source were put vertically to form their omni-direction structured light based 3D sensor. The laser was split into a light ring by the conic mirror and the light ring was emitted to the environment. The omnidirectional camera captured the light ring and the radius of the light ring was utilized to calculate the depth information. Since there was only one light ring in the image, correspondence problem did not exist in their system. The scene points illuminated by the light ring cloud be detected in realtime. However, the system could only scan a thin ring from the scene and limited 3D information was provided. In order to sample more points from the scene, Ukida et al. [70] developed a structured light aided omnistereo vision sensor. Three projectors shooting in 3 different directions were added to an omnistereo vision system. With the aid from the structured light, a more accurate and more dense 3D reconstruction result was achieved. However, 3 projectors did not cover the whole FOV of the omnidirectional camera, several blind areas were failed to obtain 3D information. In short, there is a demand to design an omnidirectional 3D sensing system capable to fully measure a 360 degree surrounding environment with a high dense sample result. In this paper, a structured light based omnidirectional 3D camera, consisting of a projector, a camera, and two hyperbolic mirrors, was developed. The hyperbolic mirror and the projector formed an omnidirectional projector. The omnidirectional projector and the omnidirectional camera were aligned vertically and cloud sample dense 3D points around all 360 degree surroundings. The system can measure the environment without any blind areas. Besides the high density 3D sample results, the measurement speed is also very important. A lot of applications, such as navigation, remote manipulation, pedestrian detections and et al., did not only require measuring the environment but also needed a realtime sampling method. Based on the number of projection images, structured light systems could 20 be roughly separated into to two categories: one-shot projection system and multi-shot projection system. Once the imaging sensor was moving or a moving object was approaching, multi-shot method failed to handle such cases. The work [68] [69] that Gluckman et al. developed could measure the scene in realtime, but its point density was very limited. The system [70] that Ukida et al. developed applied several multi-projection methods to set up correspondence. The measurement could not be finished in real time. In order to sample dense 3D points in realtime, a robust one-shot surface coding method was needed. In the past, people only developed different kinds of one-shot coding methods for a normal structured light system. Kinect system developed by Microsoft could detect 3D in realtime and obtained dense samples, but such a system’s view angle was so limited compared with omnidirectional system. Much less information was presented to the operator compared with an omnidirectional 3D camera. Color based and spatial-neighborhood based coding methods were two most common one-shot methods for a conventional structured light system. In the color method, the codeword was determined in a single pattern because the color of different primitives were different. The drawback of the color strategy was that the color contrast ratio could be affected by the background colors, which resulted in less reliability than with monochromatic light, such as black-white light. Spatial-neighborhood [71] using monochromatic light was more robust than the color based method. The codeword of each primitive (symbol) depended on the value of the primitive and its neighbors. Each codeword was unique from the others so that correspondence could be established. However, this method was not suitable for an omnidirectional sensor. The geometric relationship could be destroyed by the convex shaped mirrors. The projection pattern was first warped by the hyperbolic mirror in front of the projector and then was further warped by the hyperbolic mirror in front of the camera. Without knowing two warping functions, the spatial-neighborhood method could not directly applied to the omnidirectional 3D sensor. Therefore, existing solutions for one-shot surface coding were not suitable for an omnidirectional 3D system. 21 Besides the structured light systems, ladar system [72] could also get realtime 3D results from the scene. However, such a system could only sample limited number of 3D samples. Furthermore, the cost of the system is also much more expensive than our system. In order to meet this demand, a one-shot surface coding for a structured light based omnidirectional 3D system was developed to measure the environment in realtime. Light rings embedded with different angular frequencies and intensities were emitted from the omnidirectional projector to the environment through the hyperbolic mirror. Taking advantage of the epipolar constraints, the invariance of the phase angle and angular frequency were utilized to establish the correspondence and then were trangulized to obtain the depth information. Compared with traditional one-shot method, the contribution was the one-shot design for an omnidirectional projector and the challenge was the pattern design against both mirror distortion and scene distortion. The codeword should be designed based on the invariance from scene distortion and the two warps that were affected by the hyperbolic mirrors were also needed to develop the pattern. In order to obtain an accurate 3D profile reconstruction, it was also of great importance to calibrate the geometrical relation between the optical components of omnidirectional stereo vision. Generally, a uniform theoretical model was used to present the geometric relationship between the 3D point in the scene and the corresponding 2D point in the image plane of the camera or projector. Then, the model parameters were estimated through several features with known locations in the scene [73] [74]. The calibration accuracy, however, was limited by the imprecise alignment of optical components and residual errors due to the manufacturing error of the off-the-shelf optical components. To improve the accuracy, the vector-based calibration strategy was proposed to uniformly calibrate both camera and projector, where an incident vector for a pixel of the projector and a reflected vector for that of the camera were assigned respectively. In summary, The development of the omnidirectional 3D camera contributed in the following aspects: 22 1) Developed and configured a structured light based omnidirectional 3D camera. It was capable to measure 360 degree environment with dense 3D samples. 2) Developed and implemented a one-shot algorithm for such a 3D camera. 3D samples could be measured in real time. It was essential when the sensor was located on a moving base. 3) Developed a vector based calibration for such a system. The proposed method was able to calibrate an omnidirectional projector, which people have not discussed yet based on my literature review. 1.3 Difficulties and Challenges The objective of this study is to develop large scale structured light based measurement systems, which include topics not only in the development of theoretical methodology but also application implementation. Although several 3D sensors have been developed in past, this study aims at the the development of large scale optic systems for accurate inspection and fast environment reconstruction. The main challenges involved in this study include: 1) Calibration tasks for a large scale structured light system. It is very difficult to maintain high inspection accuracy when the system scale increases. One pixel error, 3D reconstruction error cost by one image pixel, in a large scale system is much bigger than the error in a small one. A complete calibration methodology for a large scale structured light sensor, including 3D model optimization and precision analysis, needs to be developed. Both difficulty during the calibration implementation and the calibration requirement arise when the scale is enlarged. Traditional calibration tasks involves too much parameters in their 3D recovery model, 22 parameters including both imaging devices’ intrinsic and extrinsic parameters. Especially the projector parameters is the major source of calibration error. 2) Consistence analysis because of the large scale of the system. Conventional structured 23 light inspection system usually much less FOV. The influence costed by the random image noises can tolerated in small systems. However, for a large scale system, the random image noises begin to affect the consistence of the system, which is a novel problem for this field. 3) Robust encoding and decoding algorithm is needed inspect optically-challenging objects. Without pretreatment, the reflection ratios for a large work piece can vary a lot. The incident illumination could be dynamically change correspondingly. Otherwise, it is a dilemma to adjust camera aperture. The dark part requires a big aperture and the shiny part needs a small aperture. Optical challenging objects, ranging from shiny, dark, and even blank, should be inspected simultaneously. Shiny objects could cost the mis-coding problem due to the internal reflection problem – the reflection light from a point in the scene is not only related with its own incident light but also with the incident lights in its neighborhood. The un-illuminated area can also result in high intensity in the camera image because of internal reflection. 4) Registration between each sensor frame to the common frame. A large inspection system employees more than one pair of sensors to cover a work piece. The point cloud saved in each individual sensor frame should be merged into a common frame. Sensor calibration error could enlarge the registration error, and make the final result unacceptable. Since the test work piece may not necessarily have any features to establish the correspondence. And furthermore, very Each sensor frame shares very small overlaps. The registration is a challenge task for multi-sensor inspection system. 5) On-line registration calibration between the final point cloud and CAD model within limited time. The color-coded error distribution map is the final purpose of a quality inspection system, which is based on the registration between the final point cloud and CAD. A valid registration tells the distributions of manufacturing errors. Failure to gain a correct error map costs expensive to the manufacturing. 6) Robust one-shot encoding and decoding algorithm is needed for environment reconstruction. Since the system needs to measure moving objects, and thus is sensitive to the 24 acquisition time. All necessary information for 3D reconstruction must be obtain within one projection and acquisition. The encoding and decoding algorithm should simultaneously satisfy the requirement of accuracy, robustness, and efficiency. 7) 360 degree navigation system with realtime 3D perception ability should be build up in a limited space on a mobile robot. 1.4 Objectives of the Work 1) The development of the new 3D sensor model with less calibration parameters. Compared with camera calibration, projector calibration is more complicated and has more calibration errors. Therefore, a novel sensor model which can avoid the projector intrinsic parameters and orientation should be developed. 3D point is not calculated through traditional triangularization method but through the plane induced parallax information. 2) Analysis, simulation, and verification of the precision of the system was done for the large scale structured light system. The effect of random image noise was analyzed and numerical evaluated. The precision of the designed and implemented system reached the system requirement. 3) A robust surface coding method was developed to inspect the optically-challenging objects. Monochromatic White and Black (WB) stripes instead of color patterns are utilized to eliminate the effect from material’s texture and color. A projection mask, dynamically adjusting the illumination of a projector, is applied to avoid under- / over-exposure issues: incident lights shooting onto the shiny or high reflection area are reduced and lights onto the shadow or low reflection area are enhanced. An extrapolation model combined the projection mask successfully solves the internal reflection problem. A pattern of WB stripes and its inverse are sequentially projected onto the test material. The edges between white and black stripes are estimated through an extrapolation model even internal reflection occurs. 4) A registration method was developed to merge multi-sensor frame into a common 25 frame, and thus a final point that cloud could represent the entire work piece was derived. Based on a pre-measured painted windshield, four individual sensor frame were transferred into a common frame. 5) A color-code error distribution map was created to evaluate a given work piece to finish the quality inspection task. People could easily find out whether a work piece is qualified or not. 6) A new structured light pattern based on the spatial-neighborhood strategy was proposed. The primitive of the pattern was based on the corner of the black-white checkerboard pattern, called the X-point. The X-point provides more robust and accurate position compared with a conventional pattern in which the primitive is based on symmetrical symbols. The orientation of the X-point is used to encode the primitive of the pattern. 7) Adding hyperbolic mirrors in front of the camera and the projector, an omni-directional 3D optical sensor for mobile robot navigation was developed. Its corresponding sensor model, coding method, and calibration were developed. 1.5 Organization of the Study The development of a structured light based 3D measurement system roughly consists of 3 steps: the working principle of the 3D sensor, calibration of the sensor, and surface coding method for the system. For an accurate inspection system, multiple sensors are utilized to enlarge the system. Therefore, additional registration is needed after each sensor rebuild each individual point cloud. For a fast navigation system, a convex shaped mirror based omni-direction imager is used to enlarge the system field of view. Therefore, omni-direction 3D reconstruction navigation system is explained after the introduction of a traditional single direction navigation system. Chapter 2 explains the working principle of the 3D sensor. The corresponding sensor calibration is given in chapter 3. Surface coding emphasizing single-shot or multi-shot al- 26 gorithms are introduced in chapter 4. Chapter 5 provides the registration algorithm to transfer data in different coordinates. Chapter 6 gives the development of omni-direction 3D rebuild system. Current experimental results are given in chapter 7. Chapter 8 presents the conclusions, discussions and further works. 27 CHAPTER 2 Development of a 3D Sensor Based on Structured Light The 3D shape of a work piece is conventionally measured by a Coordinate Measurement Machine (CMM) via adopting a touching probe onto the inspected surface and acquiring coordinates. The nature of point-wise contact based method limits both CMM’s measurement speed and the capability of inspecting complicated work pieces. Non-contact optic measurement method via machine vision, with the merits of fast speed, high inspection sample rate, and overall simplicity of operation, becomes a popular alternative inspection method for CMM. The implementation of a structured light based inspection system is highly related with its scale. A successfully designed and implemented system is capable to measure big work pieces and remain high accuracy. A large scaled structured light system with long standoff distance, large Field Of View (FOV), deep Depth Of View (DOV), and multi-sensor raises the inspection error, and thus lift the difficulty of its calibration. The final inspection result for such an inspection system carries the sensor calibration error, random image noise, and multisensor registration error. Traditional 3D reconstruction model requires intrinsic and extrinsic parameters of both the camera and the projector. In order to reduce sensor calibration 28 error, an optimized 3D reconstruction sensor model with fewer parameters is developed. By defining a reference plane in the sensor model, the intrinsic parameters and the orientation of the projector can be avoided in the 3D recovery calculation. This chapter describes an innovative 3D reconstruction sensor model with less parameters. Such a sensor model successfully solve a practical problem in industry application again large scale. 2.1 Introduction of Imaging Devices A Structured light based 3D scanner usually consists of a projector and a camera. The camera obtains a perspective mapping from the world frame to the image from. The projector provides additional constraints to limit the last degree of freedom in the perspective geometry, and then achieve the 3D reconstruction. In order to optimize the 3D recovery, background information of the imaging devices are given in this section: pinhole model approximation, homography matrix between two planes, epipolar geometry between two pinhole model, parallax imaging, lens distortion, Field of View (FOV) and Depth Of View (DOV). 2.1.1 A Pinhole Model of an Image Device An imaging device, such as a projector or a camera, can be approximated as a pinhole model. Since there is no difference between a camera pinhole model and a projector pinhole model, this section introduces a camera pinhole model as an example. The geometric model of pinhole camera consists of an optic point Oc and an image frame OI uv on the focal plane I. The fundamental property of the perspective geometry is that an image point p = [up , vp , 1]′ is collinear with the Oc and its corresponding point P = [xP , yP , zP , 1]′ . The point Oc is called optical center, and the camera frame is Oc Xc Yc Zc with the axis Oc Zc perpendicular to the image frame I. OI is the intersection point between axis Oc Zc and plane I, which is called the principal point. Rotation matrix Rc and translation 29 Figure 2.1: The pinhole model of a camera. (For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation) vector tc provide the geometric relationship between the world frame OW XW YW ZW and the camera frame Oc Xc Yc Zc . The light array passing any points in world frame into the image plane can be denoted as a 3 × 4 perspective transformation matrix Cc . (2.1) λc p = C c P , in which λc is a scale factor. The Matrix Cc can also be factored into three parts: intrinsic parameter Kc , rotation matrix Rc , and the translation vector tc in world frame, and Eq. 2.1 can be transferred into:  xP up    ∼  yP   vp  = Kc Rc tc    zP  1 1   fu α u0   Kc =  0 fv v0  0 0 1    (2.2) (2.3) Symbol ∼ means the equation equals up to a scale factor. fu and fv in Kc are the focus = lengths along the u direction and v direction in image frame respectively. α is the skew 30 factor of a camera. For a square pixel camera, fu = fv = f and α is 0. The original point of the camera frame Oc can also be described by rotation matrix Rc and tc . (2.4) tc = −Rc Oc Substituting Eq. 2.4 into Eq. 2.2, the perspective transformation of the pinhole model can be expressed as:   up   =  vp  ∼ Kc Rc 1 I, −Oc      xP yP zP 1      (2.5) Back projection from Eq. 2.5, each light array from a pinhole model can be denoted as Eq. 2.6. L denotes as the light array passing point P and λc is the scale factor. L = Oc + λc (Kc Rc )−1 pp (2.6) In order to reconstruct a light array from a given pixel pp of a pinhole model. The parameters in the linear equation: intrinsic parameters, rotation matrix and translation vector are needed. 2.1.2 Homopraphy Matrix from Single View Geometry Planar objects, as one of the most common entities, is widely utilized in machine vision. The link between the planar object and their corresponding image points under perspective projection is described in this section. Homography matrix describes the projective geometry of a camera image plane and a world plane. Image of points on the image plane are related to corresponding points on world plane by homography matrix as shown in figure 2.2. The planar surface normal is defined as the z axis of the world frame, and the z values of all points on the plane equal to 0. Light array passing the pixel p intersects the world plane at point P . The homography 31 matrix between two plane can be derived from the camera perspective mapping by Eq. 2.7. The ith column of the rotation matrix Rc is denoted by ri . Figure 2.2: (a) The homography matrix between the image plane and the world plane.   up   λ  vp  = Kc r1 r2 r3 tc 1     XP XP Y   P     = Kc r 1 r 2 t c  Y P   0  1 1 (2.7) From Eq. 2.7, the definition of the image plane and the world plane can be expressed as a 3 × 3 nonsingular matrix: H = Kc r1 r2 tc (2.8) From the definition in Eq. 2.8, homography matrix H can be factorized into three parts: the intrinsic parameters Kc , the first two columns r1 and r2 of the rotation matrix Rc and the translation vector tc . Utilizing the factorization information, homography matrix is usually used to calibrate a camera or a projector by regressing into the Kc , R and t ??. 2.1.3 Epipolar Geometry and Fundamental Matrix A typical structured light sensor consists of a camera and a projector. The epipolar geometry is the intrinsic projective property between two the views. The most important property 32 of the epipolar constraint is that it is independent of scene structure, and only depends on the imaging devices’ internal parameters and relative poses. The fundamental matrix F encapsulates such an intrinsic geometry. Fundamental matrix F is 3 × 3 matrix of rank 2. Figure 2.3: The epipolar geometry between two image planes. Consider the case shown in figure 2.3, P is a point on the tested surface; pc and pp is the projections of the point P on the image planes Ic and Ip respectively; Oc and Op are the optical centers of the camera and the projector. Point P , Oc and Op form a plane, which is defined as the epipolar plane. The intersection of the epipolar plane with Ic is the epipolar line lc , and the intersection between the epipolar plane with Ip is lp . The position of point P varies for different tested surface, but the correspondence point pp in Ip is always on the epipolar line lp , which is the most important properties of the epipolar geometry. Each focal point projecting on the other image plane forms point ec and ep , named epipolar or epipolar point. The point correspondence between pc and pp is: pc F pp = 0 , in which F presents the 3 × 3 fundamental matrix between two pinhole model. 33 (2.9) The correspondence between the pixel and its epipolar line is: lp = F p c (2.10) lc = F T p p (2.11) The epipolar lines in both image are related by: lp = F [ec ]× lc (2.12) lc = F T ep × lp (2.13) , in which [e]× is skew symmetric matrix of an epipolar e. The epipolar plane is formed by an arbitrary point from scene and two optical points of two pinhole models. Illustrated by figure 2.4, points P1 , P2 , and P3 forms three eqipolar 1 2 planes with the optical points Oc and Op . The epipolar planes intersect Ic at line lc , lc and 3 1 2 3 lc and intersect Ip at line lp , lp and lp . Since all the eqipolar planes share with the baseline Oc Op , and ec and ep are the intersection points of baseline with Ic and Ip respectively. Therefore all the epipolar lines in Ic intersections at point ec and all the epipolar lines in Ip intersections at point ep . ec and ep is the null space vector of fundamental matrix F and F T respectively. The eipolars in both images are: F ec = 0 (2.14) F T ep = 0 (2.15) The fundamental matrix F , presenting the epipolar geometry in two pinhole models, is widely used solving correspondence problem in stereo vision. It is also one of the easiest 34 Figure 2.4: The pencil of epipolar lines in each image plane on the epipole. parameters to be calibrated, since the calculation of fundamental matrix does not require any foreknowledge of the scene. No calibration gauge is needed. 2.1.4 Plane Induced Parallax in an Imaging System In the figure 2.5, Point P in the scene projects itself as pc and pp in image plane Ic and Ip . Oc and Op are the two optical points of two pinhole models. The epipolar in each image frame are ec and ep respectively. lc and lp are the corresponding epipolar lines in both image W planes. The ray P Op intersects the plane W at point Pp and ray P Oc intersects the plane W W at point Pc . The images of point P and Pp are coincident points at pp in the image plane W W c W Ip . pW = (Hc )−1 Pp = Hp pp is the projection of point Pp on the image plane Ic . The c vector between points pc and pW in the camera image plane is the parallax related with the c homography induced by the plane W . If the point P is in the other side of the plane W , and then the point pW will be on the other side of point pc along the epipolar line lc . c Point pc can be expressed by: c pc = pW + ρec = Hp pp + ρec c (2.16) Since pc , pc and ec are collinear on lc , the scalar ρ and ec form the parallax vector relative 35 Figure 2.5: A plane W induced parallax in camera image plane. c to the homography Hp induced by the plane W . ρ is viewed as depth information related with the plane W . If ρ = 0 then point P is on the plane W . The sign of ρ indicates the side between point P and plane W . if ρ is smaller than 0, and then point P is behind plane W . Otherwise, point P is before plane W . 2.2 Optimize the 3D Sensor Model via Parallax Information 2.2.1 Problem Statement Traditional 3D reconstruction model based on structured light utilized the intersection between incident light and reflection light in the epipolar plane. As shown in figure 2.6, this constraint is interpreted geometrically in terms of the rays in space corresponding to the two image points. pp and pc are two projection points from Pw in the projector plane Ip and camera plane Ic respectively. Op and Oc are optical points of the projector and the camera respectively. The incident light from pp is named as Lp and the Lc denotes the reflection light. From the epipolar constraints, pp , Op , Pw , Oc and pc are in the same plane. Pw is the 36 intersection point of Lp and Lc . Since pp , pc and Pw constructs a triangle in this plane, this method is called triangulation method. Figure 2.6: Traditional 3D reconstruction model. The triangulation 3D reconstruction model requires to generate the back projection equations for the correspondence pixels: Lc = Oc + λc (Kc Rc )−1 pc Lp = Op + λp Kp Rp −1 pp (2.17) (2.18) The solution of Eq. 2.17 and Eq. 2.18 is the coordinate of point Pw . Due to the calibration error or image noise, Lc and Lp may not be in the same plane resulting that there is no intersection point between this two rays. The middle point of common perpendicular is viewed as the solution in such a case. This traditional structured light 3D model is known at least two limitations: 37 1) The first one is the large amount of parameters to be calibrated. It includes the intrinsic and extrinsic parameters of both imaging devices, together 22 parameters. The more parameters to be calibrated, the more errors induced and the less accuracy is received in the result point cloud. 2) The second one it the projector calibration. Due to the property of the projector, not serving as an information receiving device, its calibration is more complicated than a camera and contains more errors than a camera. This thesis combines the camera planar calibration method into 3D reconstruction to optimize the number of unknowns in the equation. The novel strategy is that the camera calibration information based on planar method can be saved as the reference information. The acquisition information of the test object is used to compare with the pre-saved reference information. The corresponding deviation between reference and test object is applied to reconstruct the 3D point cloud, in which only few parameters need to be calibrated. Therefore, the projector calibration is not necessary for 3D reconstruction anymore. The correspondence function provided by the pattern design is not only used to set up correspondence between a camera and a projector, but also establishes corresponding pair of pixels in camera between two measurements: the camera pixel recording reference information and the corresponding camera pixel recording tested surface information. 2.2.2 Recording the Reference Information with Homography Matrix If the camera intrinsic Kc and extrinsic parameters, Rc and tc , are calibrated corresponding to plane W through Eq. 2.8, the light array vector from projector can be calculated without projector intrinsic parameters and its rotation matrix. In figure 2.7 Lp coming from the projector intersects reference plane W at point Pr and is reflected into the camera at pr == c r [ur , vc , 1]T by the light array Lr . tpc is the baseline distance between the projector and the c c camera. By vector relationship, each light ray from the projector can be represented by 38 Figure 2.7: Recoding reference planar information. reference point Pr and light array Lr : c −− −→ −− − → −−→ −− O P Pr = O C Pr − O C O P (2.19) Since the camera was already calibrated with the plane W via the homography matrix W W W Hc . Point Pr can calculated through the definition of Hc : Pr = Hc pr . Substitute Pr , c pr and camera calibration information into Eq. 2.6, the depth information λPr of point Pr c can be calculated by: λP r = |Pr − Oc | (Kc Rc )−1 pr c (2.20) −− −→ Once the camera’s parameters are well calibrated with the reference board, Oc Pr can be considered as known. Vector tpc is the only value to calculate the linear equation of vector 39 −− −→ Op Pr from the projector, which also results that the intrinsic and orientation parameters of the projector are avoided in reconstructing light vector coming out from a projector. The plane information was saved as reference information and replaced projector parameters in 3D reconstruction. 2.2.3 Correspondence between Measurements Figure 2.8: Correspondence between reference recording and work piece inspection. Image patterns designed in projector plane carry unique information from pixel to pixel. In the reference recording process, codeword carried in Lp establishes the correspondence with light Lr . In the test surface measurement, codeword carried in Lp establishes the c correspondence with light Ls . Combing with two measurements, the correspondence between c Lr and Ls is set up in two measurements as shown in figure 2.8. c c 40 In figure 2.8, a tested surface S in blue color is located on the reference board. Lp from the projector is shot onto the surface at point Ps instead of Pr on the reference board and s is reflected into the camera image plane Ic at point ps = [us , vc , 1]T by the light array c c Ls . From the codeword provided in the light Lp , the correspondence betweenLr and Ls c c c is established. The Correspondence pixels pr and ps , which are both in camera frame, are c c used to reconstruct point Ps . In contrast, traditional method constructs point Ps needs correspondence pixels in different image frames: projector frame Ip and Ic , which results calibration in both image devices is needed instead of one. 2.2.4 Working Principle of the New Sensor Model Figure 2.9: The optimized 3D model. 41 Shown in figure 2.9, the reference reflection light Lr can be written as: c Lr = Oc + (Kc Rc )−1 λr pr c c (2.21) The reflection reference point Pr on the reference W is: Pr = Oc + (Kc Rc )−1 λPr pr c (2.22) Kc is the camera intrinsic parameter. Rc is the rotation matrix, and Oc is the coordinate of the camera optic point. λr is the depth along the array Lr and λPr is the depth of point Pr c along Lr . c The surface reflection light Ls denotes: c Ls = Oc + (Kc Rc )−1 λs ps c c (2.23) The incident light Lp is: Lp = Op + Kp Rp −1 λ p pp (2.24) p The projection of projector optical point Op in camera image plane Ic at epipolar ec = [ue , ve , 1]T . The coordinate of Op can be expressed in camera frame by: p Op = Oc + (Kc Rc )−1 λOp ec (2.25) , in which λOp the depth along array Op Oc . Since point Op and Pr can be expressed in the camera frame, substitute Eq. 2.25 and Eq. 2.22 into the following: −− −→ Lp = Op + λ Op Pr (2.26) The incident light, which previously be expressed in the projector frame by Eq. 2.24, can be written in camera frame by: 42 p p Lp = Oc + (Kc Rc )−1 λOp ec + λ λPr pr − λOp ec c (2.27) Through Eq. 2.23 and Eq. 2.27, an equation of depth λs can be written by: p p λs ps = λOp ec + λ λPr pr − λOp ec c c (2.28) Extend Eq. 2.28, we can get: λs ups = λOp ue + λ λPr upr − λOp ue (2.29) λs vps = λOp ve + λ λPr vpr − λOp ve (2.30) When the depth λs equals the following, the surface point Ps can be derived. λOp λPr ue vpr − ve upr λPr ups vpr − vps upr − λOp ups ve − vps ue Transfer the Eq. 2.31 into the following: (2.31) λs = λPs = λOp λPr λs = u e u pr ve vpr (2.32) u ps u pr u ps u e λP r − λOp vps ve vps vpr The Eq. 2.32 can be simplified into a determinant form: ue ve 1 λOp u pr vpr 1 λP r u ps vps 1 =0 λPs (2.33) Substitute λs into the Eq. 2.23, The point of Ps can be calculated: Ps = Oc + (Kc Rc )−1 λOp λPr ue vpr − ve upr λPr ups vpr − vps upr − λOp ups ve − vps ue 43 ps c (2.34) From Eq. 2.34, the 3D reconstruction is all done in the camera frame via the pixel location deviation between two measurements: reference recording and surface measurement. The p parameters that need to be calibrate are: Oc Kc , Rc , ec , λOp . The parameters λPr , upr and vpr are calculated though the reference recording process. Parallel Configuration In last section, 3D reconstruction has been achieved only the camera frame by adopting the pixel deviation between two measurement as the system input. Projector calibration can be avoided, resulting that less parameters is utilized in the 3D recovery model. If the baseline Op Oc is configured parallel with the reference plane W , the 3D recovery can be even more easier via similar triangle theory. Shown in figure 2.10, The base line Op Oc is parallel with the reference recording plane W , so translation vector tpc can be consider as one scale value as d. Both imaging devices, the projector and the camera, share with the same standoff distance s s S. Pr is the intersection point between plane W and extension of line Oc P s, and Pr can be consider the projection of Ps on plane W when viewing from the camera. hP s denotes the s height of point Ps . Since line Op Oc is parallel with Pr Pr , triangle Ps Op Oc is similar with s the triangle Ps Pr Pr . The ratio of similar triangles is constant: s hP s Pr Pr = S − hP s d (2.35) s s Pr Pr denotes the length between point Pr and Pr in the reference plane and it can be transferred into the camera image frame by: s s w w Pr Pr = |Pr − Pr | = |Hc (ps − pr )| = |Hc psr | c c c (2.36) The point Ps can calculated through the following equations: hPs = s w (Pr Pr ) S |Hc psr | S c = sP w d + Pr r d + |Hc psr | c 44 (2.37) Figure 2.10: 3D sensor model in the parallel configuration.       u ps f 0 u0 xPs       λ  vps  =  0 f v0  Rc I −Oc  yPs  0 0 1 hPs 1 2.3 (2.38) Chapter summary For a large scale structured light sensor with multi-sensor, large field of view, and long standoff distance, the work principle of a 3D measurement sensor is critical for the system. It could greatly affect the system performances. A novel 3D reconstruction model is developed. Compared with the traditional structured light sensors, the intrinsic and rotation of the projector are avoided in the 3D reconstruction model. Since the projector calibration error is the major error source, the developed 3D sensor cloud further reduce the calibration error and improve the system accuracy. 45 CHAPTER 3 Sensor calibration Sensor calibration is to regress of the parameters in the 3D reconstruction model. It usually consists of two parts: the consistence evaluation of the system and the calibration method to regress of parameters. The two problems left to be solved from Eq. 2.37 and Eq. 2.38 in sensor calibration are summarized as: 1) Error analysis and consistence evaluation of the input psr based on the system configc uration; 2) Camera calibration and regression of translation vector S and d. 3.1 Precision Analysis of the Given Sensing System Precision is also called the consistence of the system. People always confuse it with the concept of accuracy. In fact these two are totally different concepts. Precision is a value to evaluate the system’s repeatability. The accuracy is a value describing the difference between the ground truth and the mean of the multiple measurements. The difference between these two concepts are shown in figure 3.1. y axis denotes the probability density and x axis denotes the measurement value. Precision can be viewed as the standard deviation of multiple measurements, which represents the repeatability. Accuracy is the deviation between reference value (ground truth) and the mean of multiple measurement. There is a 46 significant difference between these two values: the accuracy of the system can be improved by calibration, but the precision can not. Because precision is mainly affect by the random image noise from the environment. Figure 3.1: The difference between precision and accuracy. Two examples are given in figure 3.2. In case (a), system’s accuracy is high but with low precision. The difference between ground truth (the center of the rings) and the mean of all the dots is small. The system can be considered as a well calibrated system with high accuracy. However, Industry can not use such a system, since neither measurement is able to provide correct information. The difference between the ground truth is almost random, which means the system can not improved by calibratoin. In case (b), System’s accuracy is low but with high precision. The difference between ground truth and the mean of all the dots is big. The system can be considered as a poor calibrated system with high precision. Industry is able to utilize such a system by simply adding an offset into the measurement result. calibration can not improve the precision. In contrast, poor precision may significantly lower the calibration accuracy. The calibration process utilizes the measurement of the gauge to regress of the model parameters. If the measurements have too big standard deviation, the calibrated parameters will contain big errors. In the calibration process, mean of multimeasurement of a given calibration gauge is used to regress of parameters. In the industry 47 (a) (b) Figure 3.2: (a) High accuracy with low precision; (b) low accuracy with high precision. application, a proper precision result is critical for the system. Calibration accuracy is bonded by the system precision. 3.1.1 Uncertainty Analysis of an Imaging Device Consistence performance of a 3D inspection system is mainly from the random image noise from the surrounding environment. The random image noise increases corresponding with the standoff distance S. Larger inspection range results a longer standoff distance. Traditional inspection field ranging from 10,000 to 50,000 mm2 has relatively lower image noise, which can be ignored in measurement process. When a sensor’s Field Of View (FOV) reaches about 2,400,000 mm2 , uncertainty analysis of random noise needs to be considered. Proportional with the FOV, the standoff distance is almost 10 times larger than the traditional application, resulting big random image intensity noise from environment. Larger FOV also means that each pixel covers bigger area. Same level of sub-pixel variation in the camera image gives bigger 3D reconstruction error. In figure 3.3, The pattern is shot from a projector to a flat object at point Xi . Pixel ′ Ii is the projection from the point Xi in the world frame. Due to the random noise, Ii is applied instead of Ii to construct the light array which intersect the plane object at point ′ ′ ′ Xi . The angle between Ii Xi and Ii Xi is θ. The focus length is f , and S denotes the standoff ′ distance. The error between Xi and Xi is defined as e: 48 Figure 3.3: Camera error model. T Figure 3.4: Projector error model. 49 ′ e = Xi − Xi ≈ S ∗ θ (3.1) Same principles with a camera model, the error model of a projector is also approximately proportional with the standoff distance S, which is shown in figure 3.4. Longer standoff distance has worse system precision. An inspection system with a given design parameter standoff distance S needs to measure the precision to decide whether it needs further adjust or not. 3.1.2 Simulation of the System Error Model From Eq. 2.37, once the input psr has a certain number of uncertainties, the height error c will be: ∆hPs = S d 1 + w sr H c pc − S d 1 + w sr |Hc (pc + ∆psr )| c (3.2) W Homography matrix Hc is the mapping from the image plane to the reference in Eq. S 2.36. Pr Pr is the distance in the reference plane, and psr is the pixel distance in the c camera image plane. In the current system, the camera is configured almost parallel with the reference plane to reach big FOV. So we can assume that the resolution of the camera is a constant value k which transfers the pixel length to the physical length on the reference plane. So the simulation formula can be approximately expressed as: ∆hPs = S 1+ d k |psr | c S − 1+ d sr | + k |∆psr | k |pc c (3.3) , in which k represents the resolution scalar factor to convert the unit from pixel into millimeter. Assuming the system has 0.1, 0.2, 0.3, 0.5 pixels’ error, the error in height are shown in the figure 3.5 . S is designed as 1600 millimeters. d is designed as 500 millimeters, and k 50 equals 0.7 mm/pixel. The y axis in figure 3.5 denotes the height error amount and the unit is millimeters. The x axis is input value of the sensor model: psr , the deviation in the camera c image frame between point ps and point pr . As the error increases from 0.1 to 0.5 pixels, c c the error of height increases. In each sub-figure in figure 3.5, the error curve drops as the input psr increases. psr in the sensor model is costed by the height deviation between point c c Ps and reference plane. Large psr means that point Ps has a big height from the reference in c figure 2.10, which results the standoff distance between the sensor to the point Ps becomes shorter. From the Eq. 3.1, a shorter standoff distance results smaller error. delta m: 0.1pixel delta m: 0.2pixel 0.5 Error in Height Error in Height 0.25 0.2 0.15 0.1 0.05 0 100 200 Input m pixels delta m: 0.3pixel 0.2 100 200 Input m pixels delta m: 0.5pixel 300 100 200 Input m pixels 300 1.5 Error in Height Error in Height 0.3 0.1 0 300 1 0.8 0.6 0.4 0.2 0 0.4 100 200 Input m pixels 300 1 0.5 0 0 Figure 3.5: Error analysis results: height errors when input error equals 0.1, 0.2, 0.3, 0.5 pixel. 51 Table 3.1: The standard deviation results for all sensors Sensor one Sensor three Sensor four 0.17 mm 3.1.3 Sensor two 0.08 mm 0.08 mm 0.13 mm System Precision Test Results An inspection gauge is measured 10 times. The standard deviation is used to describe the precision of the system. The mean of the height map in each pixel is calculated by: 1 ¯ h(u, v) = n n hk (u, v) (3.4) k=1 , in which u and v stand for the column and row location of a pixel;n is the total measurement times, and k is the measurement order. The standard deviation of each pixel is: 1 n Std (u, v) = n 2 ¯ hk (u, v) − h (u, v) (3.5) k=1 The standard deviation for each couple of projector and camera can be calculated by: Stds = 1 nu · nv Std (u, v) , u = 1, ..., nu ; v = 1, ..., nv u (3.6) v where nu and nv are maximum row and column number of the camera resolution. In our application, they are 2044 and 2002, respectively. Current four precision results for four sensor are given in table ?? 52 3.2 3.2.1 Sensor Calibration Calibrate Homography Matrix between the Camera Image Plane and the Reference Plane Apply the definition of homography matrix in Eq. 2.7:   ui   W  vi  = Hc 1   Xi   T T  Yi  , Pi = [Xi , Yi , 1] ∈ W, pi = [ui , vi , 1] ∈ Ic , i = 1, ..., n 1 (3.7) , in which pi and Pi are a pair of corresponding pixel in image plane and point in reference plane. n is the number of corresponding pairs used to calculate the homography matrix. Convert the H matrix in vector form as h = (h11 , h12 , h13 , h21 , h22 , h23 , h31 , h32 , h33 )T , and substitute it into the Eq. 3.7. The homogeneous equations for n points become Ah = 0, with A the 2n × 9 matrix:  u1 v1 0 0    u2 v2  0 0  A=       un vn 0 0  −v1 X1 −X1 −v1 Y −Y1    −v2 X2 −X2   −v2 Y2 −Y2          1 0 0 0 −un Xn −vn Xn −Xn  0 un vn 1 −un Yn −vn Yn −Yn 1 0 0 0 u1 v1 1 0 0 0 u2 v2 0 −u1 X1 1 −u1 Y1 0 −u2 X2 1 −u2 Y2 (3.8) The solution of Ah = 0, can follow the standard procedure which can minimize the algebraic residuals |Ah|. Via singular value decomposition (SVD), the null space vector h is the solution which can minimize the residuals. Fig. 3.7 (a) shows the calibration method to calculate the homography matrix between the chessboard and the camera image plane. Fig. 3.7 (b) gives a top view of the chessboard. The camera frame is Oc xc yc zc . Oc is the optic center of the camera and zc axis is perpendicular to the image plane IC . The frame Ow xw yw zw is the world frame. The surface 53 normal of the chessboard is defined as the z direction of the world frame. Both frames follow the right-hand rule. The homography matrix can be viewed as a part of the perspective projection matrix Cc without the third column. Figure 3.6: (a) The illustration of homography matrix calculation; (b) The used chessboard. 3.2.2 New Calibration Tasks of the Camera The limitation of the traditional homography based calibration method lies in the regression process from homography matrix to the intrinsic and extrinsic parameters of the image devices. The regression solution goes into ill-conditioned when imaging devices are parallel with the inspection plane. The intrinsic and extrinsic parameters are used to construct a 54 light array from a given pixel such as the array in Eq. 2.17. If the light arrays can be expressed in different ways, the regression process can be avoided. Define a 3 × 3 non-singular matrix H, and multiply the pixel pc with (H −1 H), Eq.2.17 is changed into: Lc = Oc + λc (Kc Rc )−1 H −1H pc (3.9) The equation of the light Lc can be transferred into Lc = Oc + λc (HKc Rc )−1 (Hpc ) (3.10) The new calibration tasks of the Eq.3.10, which is mathematically not difference with Eq.2.17, are changed from finding Kc , Rc and Oc into: 1) Choose a proper 3X3 non-singular matrix H and find the matrix multiplication result of (HKc Rc ). 2) Find the translation vector Oc ; The forward transformation of Eq.3.10 can be expressed as:   uc   H  vc  ∼ HKc Rc = 1 I, −Oc      x y z 1      (3.11) ′ N N Defining p′ = (u′ , vc , 1)T = Hpc and Kc Rc = HKc Rc . Eq.3.11 is changed into the c c following:   u′ c  ′ ∼ N N  vc  = Kc Rc 1 I, −Oc      x y z 1      (3.12) N N , in which Kc is intrinsic parameter of a new virtual pinhole model, Rc is the new rotation matrix of the new pinhole model, and p′ is the new pixel location on the new image plane. c 55 Figure 3.7: The light array passing two pinhole models. In Fig. 3.7, Optic point of the camera denotes point O, image plane Ic has the distance ′ of f to the optic point. The virtual image plane denotes Ic , with distance f ′ to the optic ′ point. Homography matrix between image plane Ic to the Ic is the 3 × 3 matrix H. The N intrinsic parameter of both models are Kc and Kc respectively. The light array L can be both constructed from pixel (u, v) or by the pixel (u,′ v ′ ). If the homography matrix between two pinhole models is known, the light array equation can be reconstructed in both ways depending which one is easier to calculate. 3.2.3 The Choice of Non-singular Matrix H in New Calibration Tasks By factorization the homograhy matrix into the traditional format of Eq. 2.8, camera calibration [75] can be processed by iterations to find K, r1 , r2 , r3 = r1 × r2 , tc , and tc = −Rc Oc to fit parameters existed in the light array equation. Traditional calibration method stops facc torizing homograohy matrix here. However, the homography matrix Hw can be continually factorized: 56 c H w = Kc Rc i Rc j Rc −Oc (3.13) , in which i = [1, 0, 0]′ and j = [0, 1, 0]′ are two unit vectors along x and y directions in the world frame respectively. Oc = [xc , yc , zc ]′ is the coordinate of the optic point Oc in the world frame. Eq. 3.13 transfers into the following form:    1 0 −xc   c HW = Kc Rc 0 1 −yc  0 0 −zc (3.14)  1 0 −xc   matrix 0 1 −yc  seems a normal 3 × 3 matrix. In fact it is an inverse matrix of new 0 0 −zc pinhole model’s intrinsic parameter matrix:    −1 1 0 −xc −zc 0 xc     0 1 −yc  = (−z)  0 −zc yc  0 0 −zc 0 0 1 (3.15) Define a new pinhole model’s intrinsic parameter matrix as:   −zc 0 xc   K N =  0 −zc yc  0 0 1 (3.16) The final factorization of the homography matrix can go to:  −1 −zc 0 xc   c HW = Kc Rc (−z)  0 −zc yc  0 0 1 (3.17) The calibration board and the optic point form a new virtual pinhole model, in which new intrinsic parameters are expressed by the coordinate of the optic point Oc in world frame. The absolute value of zc can be considered as new focus length. The image center is [xc , yc ]T . The new pinhole model can be viewed as a ’square pixel camera’, since focus lengths along x direction and y direction are the same and skew factor equals to 0. Since the new pinhole model is defined in the world frame, the rotation matrix between the world 57 frame and the new camera frame is an 3 × 3 identity matrix. Substituting Eq. ?? into Eq. ??, the relationship between the image pixel and the point on the calibration board is expressed by the following equation:     u x ∼   NI  K  v  = Kc Rc  y  1 1 (3.18) w c Choose Hc = (Hw )−1 as the 3X3 non-singular matrix H in calibration tasks in section III and substitute it into Eq. 3.11:    u′ c ′ vc    −zc 0 xc ∼  −1 −1  =  0 −zc yc  R K KR 1 0 0 1 I, −tc The reconstruction of the light array changes into the final form:      x y z 1        −1   xc −zc 0 xc uc      w Lc =  yc  + λc  0 −zc yc  Hc  vc  zc 0 0 1 1  (3.19) (3.20) The forward transformation matrix becomes:           x x ′ uc uc −zc 0 xc 0         y  ′ y  w  Hc  vc  = v ′ c  =  0 −zc yc 0    = Cc   z  z  1 0 0 1 −zc 1 1 1 (3.21) ′ , in which −zc equals to the standoff distance S. The new perspective matrix Cc only w contains the translation vector Oc . Hc is the transformation from a pixel pc in the old image plane to a pixel p′ in the new image plane. c     u′ c uc   w  Hc  vc  = v ′ c  1 1 (3.22) w After the homography matrix Hc was chosen as the 3 × 3 nonsingular matrix in the W new calibration task, the multiplication of (Hc Kc Rc ) can be easily expressed by three 58 elements of Oc . And now the only unknown is Oc in the camera pinhole model equation. The traditional linear or nonlinear regression to find intrinsic and extrinsic parameters to rebuild the light array equation is avoided. The new image plane is mapped into a plane perfect parallel with the reference plane. 3.2.4 Calibration of Camera Translation Vector Oc and Baseline Distance d Figure 3.8: Multi-plane calibration strategy. A multi-plane pixel to pixel calibration strategy has been developed for current system. The chessboard shown in figure 3.6 is lifted up multiple times and its correspondingly position is recorded for calibration. In order to eliminate the random image noise in recording, all the recordings in the following calibration process are measured 5 times and the mean of measurement results is used in data regression. 1 ¯ I (u, v) = m m Ik (u, v), m = 5 (3.23) k=1 Ik (u, v) in Eq. 3.23 represents the intensity value of a pixel [u, v]T of the k th recording. The calibration procedure can be summarized as follow: 1) Place the chessboard as reference board underneath of a sensor, and the homography w r matrix Hc and lens distortion are calculated via the corners coordinates [xr , yw , 0, 1]T in w 59 r the world frame and their corresponding pixel locations ur , vc , 1 c T . 2) The Grey Code Line Shifting (GCLS) is projected onto the chessboard and is captured by the camera as the reference cording. 3) Lift the chessboard at height h1 called layer 1. Substitute the point Pw = w i i [xi , yw , hi , 1]T , the corresponding camera pixel [ui , vc , 1]T , and Hc into Eq. 3.21. c w 4) Project GCLS onto the broad and find the correspondence by the codeword explained in figure 2.8. The pixel distance psr (h1 , u, v) is calculated based on the correspondence in c each pixel u = 1, ..., nu , v = 1, ..., nv . Since every point on the board has the same height, psr (h1 ) is: c 1 nu nv u v , where nu and nv are the column number of the psr (h1 ) = c ph1 (h1 , u, v) c (3.24) camera 2044 and row number 2008, re- spectively. Each layers’s psr (h1, u, v) distribution is approximated as a Gauss curve shown c in figure 3.9, which is due to the random image noises and some implementation errors. Figure 3.9: The distribution of psr (h1 , u, v) for each layer. c w Substitute Hc and psr (h1 ) c 5) Repeat the step 3) and 4) 9 times and each time at different height. 6) Solve Eq.3.21 by the collected data: Transfer the Eq.3.21 into the following: 60  ′      i − Pw 0 i Pw ′ T i −ui Pw c T ′ i u i Pw c ′ i v i Pw c  ′T  c1 c    ′T ′ i T −v i Pw  c2   c  c  3 ′T cc 0 0 T T T   =0 (3.25) ′ , where ci is the i-th row of camera perspective matrix Cc . The form of Eq. 3.25 can be c transformed into Ac = 0. For a set of n pairs of correspondence points, a 3n × 12 matrix A by stacking up Eq. 3.25 for each correspondence. c is the right null-space vector, and can be solved by SVD method. w 7) Substitute Hc and psr (h1 ) into Eq. 2.37, and change the format into: c  h1   ...  hn  −h1 psr ′ (h1 ) c  d      = ... ...  S  sr ′ (h ) sr ′ (h ) pc −hn pc n n psr ′ (h1 ) c   (3.26) w , where psr ′ = Hc psr . S and d are solved by least square method. c c 8) Residual function compensation for each pixel: The sensor model requires that the baseline is parallel with reference plane. The implementation error makes the real case is a little off from the ideal case. In order to further decrease the errors, the residual compensation function for each pixel of the camera is approximated by a second order polynomial. The Residual function for each pixel is the difference between: e (u, v) = hn (u, v) − S d 1 + w sr |Hc pc (hn , u, v)| (3.27) Since the difference between the ground truth and the system measurement value, the residual function can be approximately by Taylor expansion. A second order polynomial is used to approximate each pixel’s residual function in Eq. 3.28. w w e (u, v) = A (u, v) |Hc psr (hn , u, v)|2 + B (u, v) |Hc psr (hn , u, v)| + C (u, v) c c 61 (3.28) Figure 3.10 illustrates a result of residual function approximation of a given pixel u = 80, v = 2. The x axis is the value of pixel distance |ps r(u, v)| when measuring different c heights, and y axis is the difference between the ground truth and measurement results (unit in millimeter). The blue curve is the residual between ground truth and measurement result, and the red one is the second order polynomial approximation. From the figure, we can see that the error reaches 1 millimeter before the error compensation and becomes less than 0.2 millimeter after compensation. Figure 3.10: The second order polynomial to approximate the residual function for a pixel 3.3 Chapter Summary A complete calibration method is developed for the structured light sensor. All the parameters in the sensor model can be successfully regressed. In order to further increase the 62 accuracy, the residual function for each camera pixel is approximated by a second order polynomial. The consistence of the developed system is analyzed, simulated and tested. The illed-condition problem in camera calibration problem is avoided by applying homography mapping. 63 CHAPTER 4 Surface Coding for a Structured Light System The encoded patterns, which are called structured light, are projected from the projector onto the test work pieces and are recorded by the camera. The codewords carried in reflection lights are extracted in camera frame within the decoding process. With the aid from structured light, the camera and projector pixels sharing same codeword are corresponded to obtain 3D coordinates of the test work piece. The measurement result, the point cloud representing for the work piece, can be further used for different applications, such as quality inspection for industry, mobile robot navigation, and so on. 4.1 Introduction of the Surface Coding Methods The tasks for the surface coding process is to set up the correspondence between image frames: the camera frame and the projector frame. The projected codewords should be preserved from the effect of the test objects distortion, such as distortion from their geometries, textures, and optical properties. The corresponding decoding process should correctly extract codeword. The efficiency, robustness, density, and its accuracy are the most important characters to evaluate the part. 64 The efficiency of the coding methods is usually revaluated by the system acquisition time. The codewords could be encoded and decoded with less projection patterns. Therefore, the system could finish the measurement process with less time. The robustness of the coding method is the ability to avoid the albedo affect from the tested objects. The reflection light from the tested objects to the camera is highly related with the surface optical properties. The textures, colors, and reflection coefficients could disturb the decoding process. The encoding and decoding process should be designed against the optical properties of the objects. The density of the encode codewords controls how dense the measurement point cloud is. More codewords could sample more 3D points from the objects and thus obtain more geometry information of the part. A dense sample result is critical for industry inspection application, as errors could be reflected easily by a high density point cloud. The codewords are encoded / decoded in the projector frame / camera frame respectively. It is very important to accurately extract the codewords from both image frames. The image coordinates of the codewords are used to reconstruct the 3D measurement result. Therefore, the accuracy in extracting codewords determine the accuracy of the measurement result. According to the acquisition time, the coding methods could be roughly separated into two categories: multi-projection and single-projection. Usually a multi-projection method takes more time than a single-projection method, as more projecting-recording loops are contained in a multi-projection method. A single-projection method encodes the codewords within one projection pattern, while a multi-projection method creates the codewords via several patterns. The efficiency of a coding method is highly related with the capability that the system could measure. A moving object can only be measured by single-projection method. However, the disadvantage is that its robustness, density, and accuracy are usually lower than the multi-projection method. Therefore, any applications that emphasize on the measurement time not the accuracy prefers to utilize the single-projection method. Multi-shot strategy is only available for stable objects but it is more robust against image noise than one-shot method. After detecting the illumination problems, such as shadow 65 problem, under- / over- exposure problem and internal reflection problem, the projection pattern could be readjusted by a proper feedback. Since more projections are applied to the objects, more codewords can be encoded into the patterns. Therefore, the advantages of the multi-projection method are the robustness, density, and accuracy. The multi-projection is usually used in inspection application for quality control in industrial application. Emphasizing on different performance factors, the coding methods are explained in two categories: single-shot projection method and multi-shot projection method. 4.2 4.2.1 Single-shot Coding Method Pattern Generator The patterns based on spatial neighbors can be generated shown in figure 4.1. The generation procedure starts with the top left 3 × 3 random matrix. Then, the window moves to the next position. The Hamming distances between the new primitive codeword and all of those previously generated are calculated, where the Hamming distance is defined as: 9 h(uv,u′ v′ ) = δk , δk = k=1 0 if puv = pu′ v′ 1 otherwise (4.1) It is shown in figure figure 4.1 that we can obtain better results when the Hamming distance is greater than 3. If all the distances are no smaller than a design threshold, the primitive is accepted and the next iteration was performed until all the gaps are filled. If not, a new primitive is assigned. The window is successively shifted from the left to the right with a step size of ∆x pixels, from the top to the bottom with a step size of ∆y pixels. The shifting steps can be adjusted to satisfy the requirement of construction resolution as long as the primitive can be identified clearly in the camera image plane.For example,some critical region of the inspected part need higher resolution than other regions on the part. 66 Ix IY Figure 4.1: The pattern generation procedure. 4.2.2 Primitive Design Designing a good pattern is critically important for achieving accurate correspondence with the optical triangulation technique, especially for one-shot method. The primitive design should satisfy the following constraints: (a) monochromatic light and (b) robust and accurate detection. Taking the monochromatic light into account, the symbols should not contain color coding information. Hence, the symbols with geometrical feature will be adopted in this study, instead of the traditional color based coding patterns. Therefore, the image features should be accurately extracted. The center symmetric primitive, such as circle, disk etc., is widely used for fringe patterns and the intensity centroid of the symbol is regarded as the symbol 67 location. However,it is inaccurate when shadows and occlusions occur. As shown in figure 4.2, an inspected part with a step in the scene hides part of the projected pattern from the camera. Hence, a partial occlusion affects the centroid position. In this design, the strategy determines the symbol location with the corner of high contrast checkerboard. Because it only uses the local information of this hidden pattern, it does not influence the accuracy performance. Figure 4.2: An example of occlusion. Figure 4.3: The comparison of different primitives. 68 Figure 4.4: The primitive value. Thus, portions of the X-point region hidden by smudges or occlusions are disregarded with no significant impact on accuracy as shown in figure 4.2. Additionally, the disk in an image does not contain any geometrical information other than their center location. The X-points has both a location and an orientation. This additional characteristic is used to discriminate the different primitives in pattern. As shown in figure 4.4, the red arrow denotes the orientation of the primitive and the corresponding value. The line connecting the neighbor primitives in the horizontal direction is considered as the base axis. The angles between the principle axis of the primitives and the base axis are 0◦ , 45◦ , 90◦ , and 135◦ and the corresponding values are 0, 1, 2, and 3, respectively. 4.2.3 Primitive Detection and Codeword Decode Once the pattern is projected on the scene, a single frame is captured by the camera. The image processing is simple because the primitives are designed on the black background and separated enough. There is a strong gradient in the image intensity around the primitive boundary. Hence, the contour of the primitive is easily detected and can be implemented in real-time. Additionally, it is less sensitive to reflectivity variation and ambient illumination than threshold based segmentation. Consequently, the moment and principal axis method is adopted to recognize the primitive. The moment of the geometrical primitive is represented as: ∞ Mij = xi y j f (x, y)dxdy −∞ 69 (4.2) The coordinate of the mass center is denoted as: Xm = M10 /M00 , Ym = M01 /M00 (4.3) A central moment is basically the same as the moment just described except that the values of x and y used in the formula are displaced by the mean values: ∞ CMij = −∞ (x − Xm )i (y − Ym )j f (x, y)dxdy (4.4) The angle between the principal axis and x axis is: 1 α = a tan 2 2CM11 CM20 − CM02 (4.5) Hence, the mass center of the contour is detected by Eq. 4.3 and is regarded as the initial rough location of the X-point. Then, the final location of the X-point is determined by the Harris algorithm within a local region for corner detection. Consequently, the principal axis is extracted by Eq. 4.5. In fact, two perpendicular axes can be extracted: the long and short axes. The long axis is regarded as the principal axis. Then, orientations of the principal axis and the base axis are compared to determine the value of the primitive. If difference between the principal and base axis is lower than an experimentally predefined threshold, then this primitive is determined. The image processing for the pattern with 5 × 5 primitives is shown in figure 4.5. Once each primitive value is determined, the next step is to find out the eight corresponding neighbor primitives to determine its codeword. The closest primitives in the eight directions are easily found with the help of the direction of the long axis. Next, the codeword of this primitive depends on its value and its neighbor values. For example, a primitive codeword is determined as figure. 4.6. When each value of the primitive on the camera image plane is obtained, the corresponding pixel matching will be performed. The left upper primitive on the camera image plane 70 Figure 4.5: Image processing: (a) contour extraction; (b) location detection; and (c) principal axis identification. Figure 4.6: Determination of the codeword for each primitive. is selected as the matching window to find the corresponding primitive on the projector image plane. Then, the matching windows both on the camera and projector image plane are shifted to the next primitive from left to right, and from top to bottom. The procedure will be performed by are cursive search algorithm until all corresponding primitives are found. 71 4.3 4.3.1 Multi-shot Projection Coding Method Problem Statement Although some commercial 3D area sensors have been developed and been utilized in quality inspection, the technology still need to conquer several challenges to extend its further applications. One of the biggest challenges is its weakness against the optical properties of the test objects. Optically-challenging objects, such as painted parts with different color, materials with different reflection coefficients, and shiny punched metal work pieces, have to be pre-treated into diffuse reflection surface before inspection. The textures and colors of objects could influence a color based structured light. The difference of reflection coefficients can result in under- / over- exposure problem, which brings out a dilemma to adjust the camera aperture: a bright part requires a small aperture, while a dark area needs a big aperture. The shiny work pieces induce the internal reflection problem: a reflection light from a scene point is not only related with its incident light but also the incident lights in the neighborhood. A low illuminated area could be lighted up by a bright area in its neighborhood and thus results a wrong codeword. In summary, there is a demand of a structured light system that is fully capable to measure optically-challenging objects and to cope with the following problems: 1) Color and texture problem: An Object has different colors and textures. A color based codeword could be influenced from the test object and therefore generates a wrong codeword. 2) Shadow problem: A directly illuminated area can be viewed darker than an indirectly illuminated part. As a consequence, a high illuminated area could be considered as small intensity region and low illuminated area could be interpreted as high intensity region. 3) Under-/over- exposure problem: The intensities of the reflection lights from same illumination can vary a lot because of the different reflection coefficients. If reflection coefficients of the work pieces have significant differences, it is a dilemma to adjust the camera aperture. 72 4) Internal reflection: The reflection of light from a point in the scene is not only related with its own incident light but also with the incident lights in its neighborhood. The unilluminated area can also result in high intensity in the camera image because of internal reflection. Aiming to solve problems mentioned above, the paper developed a novel encoding and decoding method of a structured light based inspection system capable to sample dense 3D point clouds from optically-challenging objects. In order to successfully inspect the test objects, the developed structured light should simultaneously have the following properties: 1) A color invariance encoding and decoding algorithm. 2) Dynamical control the projector illumination to avoid under- / over- exposure problem and shadow problem. 3) Robust decoding against the internal reflection problem. The symbols used in this part are explained here: I means a pixel intensity. The subscript P and C respectively represent a projector and a camera. (u, v) is usually used as a pixel location in the projector frame, and (u′ , v ′ ) is used as a pixel location in the camera frame. For instance, IP (u, v) means the intensity value at pixel location (u, v) in the projector frame. 4.4 Encoding and Decoding Strategy 3D reconstruction through structured light system requires the establishment of the correspondence table between two image devices: a projector and a camera, in which encoding and decoding play important roles. Digitized binary monochromatic patterns which are robust against image noise were utilized as encoded projection images. Traditional approaches assume that the inspected objects are perfect diffuse reflection. The strategy of the encoding and decoding is shown in figure 4.7. The first step is the structured light encoding in projec- 73 tor frame: codeword 0 is encoded as intensity 0 and codeword 1 is transferred into intensity 255. In inspection process, the patterns are projected into the scene, and are reflected into the camera. In the decoding process, a low intensity camera pixel is decoded as 0 and a high intensity pixel is translated into 1. Pixels in both frames shared with the same codeword are corresponded . Figure 4.7: Traditional encoding and decoding. Once the inspection object is changed into an optically-challenging one, the encoding and decoding become more complicated. The intensity of a projection pattern needs to rapidly compensate the albedo effects from the optically-challenging objects. The so-called density adjusted projection patterns are the combination between GCLS and an intensity mask matrix MP . The intensity of a projected pixel IP (u, v) can be expressed by the following formula: GCLS (u, v) M (u, v) IP (u, v) = IP P (4.6) GCLS IP (u, v) denotes the intensity of a traditional GCLS pattern at location (u, v) in a projector frame. MP (u, v) can be considered as a weighted coefficient for the code GCLS (u, v): if M (u, v) = 0, the pixel is turned off; if 0 < M (u, v) < 1, the intenIP P P sity of the projection is reduced; if MP (u, v) = 1, the intensity of the pixel remains the same; if MP (u, v) > 1, the intensity of the projection is enhanced. Internal reflection phenomena can be eliminated if its incident light is reduced. The camera could view a dark / shadow region clearly if the incident lights are enhanced. In summary, the projection mask 74 works as a feedback function from the camera receiver to the projector. In order to get the scene adapted mask matrix MP , the camera frame photo response function fph and the crosstalk fcr from the camera frame to the projector frame are needed. The camera frame photo response function fph in Eq. 4.7 represents the relationship between the projection IP (u, v) intensity and captured intensity IC (u′ , v ′ ), which is mainly costed by the albedo effect of the scene. A proper value IP of the projection intensity is first derived from the inverse camera photo response function to ensure the camera pixel (u′ , v ′ ) can receive a suitable intensity. The crosstalk function fcr in Eq. 4.8 allocates this proper value IP into the correspondence projector pixel (u, v) to ensure the radiance from the projector pixel (u, v) does go into the camera pixel (u′ , v ′ ). After this two tasks are accomplished, the projection mask can be derived. IC u′ , v ′ = fph (IP (u, v)) (4.7) u′ , v ′ = fcr (u, v) (4.8) Considering the camera as the feedback receiver, the projected intensity IP (u, v) can be expressed by: −1 IP fcr u′ , v ′ −1 = fph IC u′ , v ′ (4.9) In this study, we normalized the projection and capture chain by the following assumption: a projection of intensity 255 will result in intensity 255 in the camera frame. Therefore, the intensity mask M(u, v) equals: M(u, v) = M −1 fcr u′ , v ′ = −1 fph (255) 255 (4.10) Since the intensity mask M(u, v) should be dynamically adjusted by the captured value IC (u′ , v ′ ), fcr is first utilized to transfer the camera pixel location (u′ , v ′ ) into projector pixel 75 −1 location (u, v). fph (255) specifies a desired projection value, which can make the received value as 255. The ratio between the desired projection value and 255 is the projection intensity mask value. The flow chart of the inspection process is displayed in figure 4.8. It first derives the intensity mask and then applies it to the structured light and at last finishes the 3D reconstruction. Figure 4.8: The flow chart of the encoding and decoding strategy. There are generally three steps for the inspection process: (1) Initial projection to derive the camera photo response function, the initial guess of the crosstalk function and the density mask. 76 (2) Advanced projection of the intensity adjusted pattern to modify the crosstalk function and update the density mask. (3) Final projection to obtain the correspondence and finish 3D reconstruction. The details of steps are explained in the following sections. 4.4.1 Derivation of the Camera Photo Response Function for a Given Scene The high illumination intensity reflection light can cost the camera saturation problem – some highly saturated pixels spill over and affect neighbor pixels’ intensity value, such as a shiny spot in a picture. On the other hand, low illumination intensity reflection light may cause the camera fail to capture the incident light. In order to simultaneously capture the reflection lights from both high and low reflective surfaces, a proper feedback algorithm is needed. A photo response function reflects the radiometric chain from the projector image frame to the camera image frame. The inverse function of the photo response function is used as the feedback function to eliminate scene albedo effects. A photo response function basically contains three parts: radiance of a given projector pixel fP , reflection radiance from incident radiance fR , and intensity of a camera pixel from in the reflection radiance fC . In this study, all of three functions are approximated as linear functions. The radiance RP (u, v) from a projector pixel IP (u, v):   R (u, v) = f (I (u, v)) = k (u, v)I (u, v) + k (u, v)  P 1 2 P P P  0 IP (u, v) (4.11) 255 The reflection radiance RC (x, y, z) at a scene point (x, y, z) from the incident radiance RP (u, v) can be written as: RC (x, y, z) = fR (RP (u, v)) = k3 (x, y, z)RP (u, v) + k4 (x, y, z) The camera Intensity IC (u′ , v ′ ) from the reflection radiance RC (x, y, z) 77 (4.12) IC (u′ , v ′ ) = fC (RC (x, y, z)) = k5 u′ , v ′ IP (u, v) + k6 u′ , v ′ 0 ≤ IC (u′ , v ′ ) ≤ 255 (4.13) where k1 (u, v), k2 (u, v), k3 (x, y, z), k4 (x, y, z), k5 (u′ , v ′ ), and k6 (u′ , v ′ ) are unknown coefficients in the linear functions. The radiometric chain through different projector pixel, different scene point, and different camera pixel varies via the subscripts of the coefficients. Eq. 4.11 and Eq. 4.12 are substituted into Eq. 4.13 and then the equation is treated as a linear function since all three equations are linear functions:   IC u ′ , v ′ = k 7  0≤   0≤ u′ , v ′ IP (u, v) + k8 u′ , v ′ IC (u′ , v ′ ) ≤ 255 IP (u, v) ≤ 255 (4.14) k7 (u′ , v ′ ) and k8 (u′ , v ′ ) are the combination of the previously coefficients. The radiometric chain problem is treated as a black box problem. Only k7 (u′ , v ′ ) and k8 (u′ , v ′ ) are needed to adjust the projection intensity through received intensity. Coefficients from k1 to k6 are not necessary for the photo response function. In order to solve the photo response function in Eq. 4.14, two tasks need to be accomplished: (a) The coefficients k7 (u′ , v ′ ) and k8 (u′ , v ′ ). These two are used to find out what the suitable projection intensity is for a camera pixel (u′ , v ′ ) to avoid under- / over- exposure. (b) The crosstalk function between projector pixel (u, v) and camera pixel (u′ , v ′ ). This function is utilized to find out where the suitable projection intensity should be located in the projector frame. Basically, the solution of the first task answers what the projection value is, and the second one answers where this value should be located. Since it is very difficult to solve both tasks simultaneously, task (a) is first peeled by projecting single intensity images: all the projector pixels equal to same intensity value IP . Since all the pixel values in the projector frame are same, the crosstalk can be ignored. No matter which pixel projector is crosstalked with the camera pixel (u′ , v ′ ), the intensity is always the same. Equation 4.14 can be changed 78 into:   IC u ′ , v ′ = k 7 u ′ , v ′ IP + k 8 u ′ , v ′  0 ≤ IC (u′ , v ′ ) ≤ 255   0 ≤ IP (u, v) ≤ 255 (4.15) Single intensity images with the value equaling 0, 50, 100, 150, 200, 250 are projected to the test objects. Figure 4.9 gives an example when an 100 intensity image was projected. 5 arbitrary points from A to E in the received image are selected and marked as different colors to check their intensity variations. Their received intensities are given in figure 4.10. x axis is the projection intensity in projector frame, and y axis represents the received intensity in the camera frame. The color lines represent the photo response curves for different points in the scene. A is a point on a black object, so its received intensity value is always very low. Even projecting intensity 255 onto the point, its received value is still a little less than 50. D and E are two traditional shiny points, their correspondence camera pixels can be saturated very easily. C can be viewed as a ’normal’ diffuse point, neither too shiny nor too dark. B is a special case for points on a black object. When the incident angle and reflection angle reach certain condition, black object can also result in ’mirror-like’ effect, very sensitive to the incident light. The parameters k7 and k8 can be regressed after the projected intensities and received intensities are substituted into the Eq.4.15. The result of the proper projection values in the camera frame is given in figure 4.11. Similar as a film of a camera, the brighter the camera receives, the darker the proper projection value is. 4.4.2 Initial guess of the crosstalk function The result given in figure 4.11 only solves how big the projection value should be, it does not answer where projection value should be located. IP is calculated by Eq. 4.14 but the index number u and v in projector frame are missing. The initial guess of the crosstalk function assumes that the test scene is a flat surface 79 Figure 4.9: The captured picture when projecting an image with intensity 100. on the inspection table. The crosstalk between two image frames becomes a homography mapping in Eq. 4.16. P P W P PP = H W PW = H W H C PC = H C PC (4.16) As shown in figure 4.12 (a), IP and OP form a pinhole model of a projector; IC and OC form a pinhole model of a camera. PP = (uP , vP , 1), PW = (xW , yW , 1), and PC = (uC , vC , 1) are the points on projector image plane, the inspection table, and the camera image plane, P W P respectively. HW , HC and HC are three 3 × 3 homography matrixes between two of the planes. 4.4.3 Advanced Projection Pattern Generation Combining the initial guess of the cross talk function and the camera photo response function, the initial projection mask can be derived via Eq. 4.10. The result is shown in figure 4.13 (a). However, in reality the real scene is different from the initial assumption, and the location 80 Figure 4.10: The photo response curve for 5 points in the last figure. of the corresponding projector pixel will not be exactly at PP = (uP , vP ) but at somewhere near to this location. Shown in figure 4.12 (b), when the real scene is higher than the assumption, the real location will be PP 1 . If the real scene is lower than the assumption, the real location could be at (PP 2 ). Assuming the possibility of the real location is a Gaussian distribution around pixel ((uP , vP )), the projection mask is dilated by a certain radius and is displayed in figure 4.13 (b). A combination between GCLS pattern with the projection mask is shown in figure 4.13 (c). 81 Figure 4.11: The proper projection values in the camera frame. 4.4.4 Interpolation Model for Diffused Reflection Region and Extrapolation Model for Internal Reflection Region Both crosstalk function and the correspondence for 3D recovery need the binarization of the received images into codeword 1 and 0. The difference between these binarizations lies in the accuracy requirements. Binarization in pixel level accuracy is enough for a crosstalk function to control the illumination, but the correspondence for 3D recovery needs to reach sub-pixel level accuracy. However, in the other aspect the illumination condition in finding out correspondence is better than the condition in finding out the crosstalk function, because the illumination condition is further improved after the crosstalk function is calculated. In all, a robust binarization method is critical in the measurement of optical-challenging objects. An extrapolation model for internal reflection region and an interpolation model for diffuse 82 (a) (b) Figure 4.12: (a) The homography mapping among the projector image plane, the camera image plane and the inspection table, (b) the correspondence pixel PP changes according to the PW . reflection region are proposed. Binarization methods of the received image can be roughly classified into three branches. 1) Naive binarization: based on fixed intensity threshold. 2) Adaptive binarization: based on different thresholds for different pixels, in which a pixel’s threshold is determined by the overall weight of its neighborhood. 3) Dual projection binarization method: a pixel’s binarization is based on whether the projection or its inverse is brighter [34] [45]. Method three is more robust against image noise and has been widely used in 3D measurement. In this study, a modified dual projections robust binarization method is used to establish the crosstalk function. In stead of unitizing the stripes for line shifting, the edge between every two stripes is employed. Sub-pixel interpolation for the edge location is added to increase measurement accuracy. An estimation algorithm for the edge location is also added when the camera saturation or the internal reflection occurs. As shown in figure 4.14 (a) and (b), images of a stripe pattern and its inverse are used to do binarization analysis. Black and 83 (a) (b) (c) Figure 4.13: a) The initial projection mask; (b) the dilated projection mask; (c) one example of a projection pattern. 84 Figure 4.14: (a) A received image of projected pattern; (b) a received image for the inverse pattern; (c) interpolation of the edge location; (d) estimation of the edge location in a strong illumination condition. white stripes are the projection pattern, and the highlighted blue lines indicate the edges. The edge falls into a camera image pixel. Therefore, an interpolation of sub-pixel model is needed to improve the measurement performance. Figure 4.14 (c) illustrates the binarization method and edge interpolation for the diffuse reflection surface. y axis indicates the received intensity in camera frame, and x axis is the image coordinates in the vertical direction. The red line indicates the projected pattern and the blue line means the inverse pattern. A, B, C and D are the four intensity values to do binarization and edge interpolation. Since intensity A is bigger than its inverse B by a certain threshold, pixel n is assigned as 1; for the same reason, pixel n + 1 is decoded as 0. Assuming that the intensity change within a pixel is a linear function, the location of the edge X can interpolated by intensities A, B, C and D. Although the linear assumption may not be perfect for all the cases, the error can still be bounded within one pixel. The model in 85 figure 4.14 (c) is utilized in both calculating the crosstalk and the 3D correspondence, since it can go to sub-pixel accuracy. X =n+ A−B (C − B) − (D − A) (4.17) In the advanced projection, the over exposure problem and the internal reflection problem can not be totally avoided. The calculation model of the edge location is changed into extrapolation from interpolation, which is shown in figure 4.14 (d). In the over exposure condition, the incident light intensities of the white stripes are bigger than 255. So the blue line raises from point B and reaches the maximum intensity 255 before reaching point C. The red line also drops from dashed line into the solid line. As a consequence, point A changes into A′ and C moves into C ′ . BC ′ and DA′ will not intersect each other until their extensions merge. If the internal reflection happens, things could become more complicated. Point C ′ could illuminate its neighborhood C ′′ by internal reflection and causes line C ′ B move backwards to C ′′ B ′ . Point A′ illuminates the point D and make the line A′ D move forward to A′′ D ′ . Therefore, the step distance S between point B ′ and D ′ changes into a number bigger than 1 pixel. If we view from a received image, it means that the white stripe erodes the dark stripes. The white stripes become wider and the dark stripes become narrower. Extrapolation model assumes that the intensity change within several pixels is a still linear function, so the stripe edge X can be derived via Eq. 4.18. X =n+ S A′′ − B ′ (C ′′ − B ′ ) − (D ′ − A′′ ) (4.18) The assumption that extrapolation used is stronger than the interpolation model assumed. The linear assumption is not only within one pixel but also within several pixels. In reality, linear function approximation within one pixel only contains little error, but the linear function approximation within several pixels can produce a certain amount of errors. When the width S increases, the estimation location of the X contains more errors. Since 86 the error for extrapolation model can not be bounded within one pixel, this model is only used in approximating the crosstalk function but not in correspondence calculation. Figure 4.15 shows the final projection mask after crosstalk function is derived. The intensity mask could perfectly control the projection intensity to ensure the received intensity between under-exposure and over-exposure. It could also minimize the internal reflection by reducing the incident light intensity in the shiny area. From the figure, we could also see that the projection mask in the blind areas contains a lot of noise, for instance the zoomed parts (a) and (b). If we compare same parts with the original image ?? (a), we could see that these areas are the blind areas, either the projector can not illuminate these locations or the camera can not view the projected patterns there. Since there is always one image device failing to detect the blind area, correspondence pixels can not be generated to rebuild 3D points. Therefore, the projection mask error in these areas will not affect the 3D calculation results. Figure 4.15: The final projection mask. 87 4.5 Verification of the Projection Mask Performance As shown in figure ?? (a), three work pieces: one black part, one dark part, and one shiny part, are used to test the designed system. In order to verify the idea of this study, two aspects of received results are given and discussed in this part: a) the verification of the projection mask and b) the verification of the interpolation and extrapolation models. 4.5.1 the Verification of the Illumination Adjustment via Projection Mask In this study, a dynamic scene based projection mask for adjusting the incident light intensity is proposed against optic challenging parts. In order to verify this idea and validate the mask performance, the received images before and after the mask are provided. Figure 4.16, 4.17 and 4.18 illustrate the effect of projection mask in capturing the images of the scene. Figure 4.16 is the received image without the projection mask. There are several areas still having over-exposure and internal reflection problem. Since the aperture of the camera is adjusted to the maximum, almost all the work pieces are looked as shiny parts. Figure 4.17 is the captured image after the initial projection mask is added. Because the assumption that scene is flat is too strong, the projection mask could not adjust the incident light perfectly and some areas are still shiny. Figure 4.18 is the result after the final projection mask is applied. From the image we could see that all the incident lights are dynamically adjusted and the shiny phenomena are disappeared. p 4.5.2 the Verification of the Interpolation and Extrapolation models Figure 4.16, 4.17 and 4.18 only provide the general performance of the projection mask. More details intensity information are given to illustrate the followings: 88 Figure 4.16: The received image without the projection mask. 1) Verification of the linear interpolation model which is utilized to get sub-pixel accuracy. The interpolation model employs the intersection point to find the intensity edges. The model needs to be proved by the real received data. 2) Demonstration of the extrapolation model which is proposed to explain the internal reflection and over-exposure issues. The real intensity distribution in the internal reflection region should prove the extrapolation model. 3) Verification of the result of the extrapolation model. The extrapolation model is not only applied to prove the internal reflection but also utilized to find the stripe edges in the shiny region. The corresponding experimental result needs to be provided. 4) The study proposed that the adaptive projection mask could reduce / eliminate internal reflection. The received intensity distributions before / after projection mask need to be compared to demonstrate the idea. 89 Figure 4.17: The received image with the initial mask. Figure 4.19(a) and 4.19(b) are two detailed views of a black part when stripe pattern and its inverse without projection mask are projected respectively. The internal reflection and the reflection coefficient of the black work piece are both low. Therefore, these detail views are selected to verify the sub-pixel linear interpolation model. The widthes of W/B stripes AB are the same with the widthes of B/W stripes A′ B ′ equaling to 39 pixels. The high illuminated stripe does not erode the low illuminated stripe, which indirectly proves the internal reflection is small and thus the surface could be viewed as diffuse reflection. The pixels along the vertical line shown in figure 4.19(a) and 4.19(b) are scanned. The binarization results of the selected pixels are given in table 4.1. The pixel’s intensity curves around point B / B ′ along the vertical direction are shown in figure 4.19(c) to verify the interpolation model. The red curve is the intensity distribution along red line in figure 90 Figure 4.18: The received image with the final mask. 4.19(a), and the blue curve is the intensity distribution along blue line in figure 4.19(b). It provides more clear information about the intensities variations. If the red curve is above the blue curve by a certain threshold, the pixels are decoded as "high" equaling as 1, otherwise the pixels are classified as "low" equaling as 0. The intersection point means edges between high and low illuminated regions. The distribution of the real received intensity is similar with the internal reflection developed in figure 4.14 (c). The pixel intensities at pixel location 899 are 61 and 34 respectively; and the pixel intensities at location 900 are 29 and 64 respectively. The location of the intersection point (x axis coordinate) is 899.4, after applying the model in Eq. 4.17. Comparing intensity distributions in figure 4.19(c) and the curves in figure 4.14 (c), the interpolation model can be verified. In order to check the over-exposure and internal reflection phenomena, detail views of 91 (a) (b) (c) Figure 4.19: (a) A selected detail view of the black part when projecting pattern without the mask; (b) the same view when projecting the inverse pattern; (c) the intensity curve around point B / B ′ . 92 (a) (b) (c) Figure 4.20: () A selected detail view of the shiny part when projecting pattern without the mask; (b) the same view when projecting the inverse pattern; (c) the intensity curve around point C / C ′ . 93 Table 4.1: Pixels binarization without projection mask Low Pixel High Low High 821-860.1 860.1-899.4 899.4-938.7 938.7-975 the shiny part is displayed in figure 4.20(a) and 4.20(b). As explained in the extrapolation model, if the internal reflection and over-exposure issues happen, the W stripe will erode the B stripe, resulting that the W stripe’s width is bigger than the B stripe’s. The pixel distance CD is 33, and the pixel distance C ′ D ′ is 38. The width difference is the major harm from the internal reflection and the over-exposure issues. More clear information cloud be found in figure 4.20(c). Instead of an intersection point, the red curve and the blue curve share a common short line segment. The pixels within the common short line are always highly illuminated and can not been classified by traditional methods, which results the failure to sample 3D coordinates for shiny regions. The extrapolation sub-pixel model in figure. 4.14 (d) fits such a situation. The intensity values of the pixel 1292 and 1293 on the blue curve are 226 and 255 respectively; and the intensity values of the pixel 1294 and 1295 on the red curve are 255 and 155 respectively. Applying the Eq. 4.18, Strip edge could be estimated at location 1293.7 by extrapolation. The distributions in figure 4.20(c) could verify the extrapolation model in figure 4.14 (d). The extrapolation model result can successfully separate the high / low illuminated regions. The high / low illuminated region is decoded into 1 / 0. As a comparison, the same views but with projection mask are shown in figure 4.21 and 4.22. Figure 4.21 is applied to verify that the illumination variation costed by the projection mask will not affect the pattern edge detection in the diffuse reflection region. Figure 4.22 is employed to show that illumination variation costed by the projection mask reduce / eliminate the internal reflection problem in the shiny region. Figure 4.21(a) and 4.21(b) are the two detail views of the black part when the adaptive mask is utilized. The stripe width 94 (a) (b) (c) Figure 4.21: (a) A selected detail view of the black part when projecting pattern with the mask; (b) the same view when projecting the inverse pattern; (c) the intensity curve around point B / B ′ . 95 (a) (b) (c) Figure 4.22: (a) A selected detail view of the shiny part when projecting pattern with the mask; (b) the same view when projecting the inverse pattern; (c) the intensity curve around point C / C ′ . 96 AB still equals to A′ B ′ as 39 after projection mask is applied. The binarization results are shown in table 4.2. The result is almost the same with table 4.1. Figure 4.21(c) is the intensity curves around point B / B ′ . The pixel intensities at location 899 are 80 and 28 respectively; and the pixel intensities at location 900 are 25 and 62 respectively. Applying the model in Eq. 4.17 again, the intersection pixel location is 899.6. Compared to results from figure 4.19, we could conclude that the illumination variation costed by the projection mask will not affect the decoding in the diffuse reflection region. Figure 4.22(a) and 4.22(b) are two detail views of the shiny part when projecting stripe pattern and its inverse with mask. The stripe widthes of ’W’ / ’B’ are same: CD = C ′ D ′ = 35. Compared to result from figure 4.20, we could conclude that the internal reflection is eliminated. Same conclusion can be drawn from figure 4.22(c). After the illumination is adjusted by the mask, there is an intersection between two curves. Instead of applying the extrapolation model, the interpolation model can be utilized to find the edge. The intensities at pixel 1293 are 133 and 80 respectively; and the intensities at pixel 1294 are 83 and 141 respectively. The intersection point can be found, 1293.5, by Eq. 4.17. Table 4.2: Pixels binarization without projection mask Low Pixel High Low High 821-860.2 860.2-899.6 899.6-938.7 938.7-975 In summary, the illumination variation costed by the projection mask will eliminate internal reflection in shiny region without any bad effect to the diffuse region. 4.6 Chapter Summary In this chapter, two surface coding methods are introduced: single-shot algorithm and multishot algorithm. single-shot algorithm could greatly reduce the acquisition time and thus obtain the ability to measure a moving object. The difficult of this method is to encode all 97 the codeword within one image. A spatial-neighborhood method was developed for singleshot method, the mass center and orientation of each primitive was utilized as its value each. Each primitive’s value associated with its eight neighbors’ value forms a codeword. Compared with traditional color based codeword, this method is more robust, as it can get ride of the color effect from the scene. The disadvantages of the single-shot method were the poor density and the weakness against optical properties of the test objects. A multi-shot method was developed to inspect optically-challenging objects without surface pre-treated. Compared with single-shot method, it is more robust and dense but less efficient. A projection mask that dynamically compensate the albedo effects from the scene was developed. A proper feedback was used to control the illumination of the projector. Therefore, both shiny and black materials were able to be measured simultaneously. An extrapolation model combined projection mask was applied to avoid the internal reflection issues. Compared with the previous methods, the proposed method was much more robust, efficient, and flexible. 98 CHAPTER 5 Registrations between Different Coordinates after 3D Reconstruction 5.1 Registration between different 3D coordinates The previous chapters describes the sensor working principle and its calibration. The measured data are usually given as "point cloud", which can be compared to the designed surface shape, such as the Computer Aided Design (CAD) model, for error analysis. The deviation between the point cloud and CAD can be displayed by a color-coded error map. The error distribution map is used to evaluate the part quality by comparing the results with acceptable Geometry Dimension and Tolerance (GDT). In order to finish the tasks of quality inspection, there are still two questions remained: 1) Registration between sensor frame and common frame: Each point cloud is obtained in individual sensor frame. They need to be stitched together for representing the whole work piece It means all the point clouds need to be represented in one common frame. The transformation matrix between each sensor frame to the common frame is first calculated and then is applied to update each point cloud. The updated point clouds merges into one entire point cloud which can represent the whole work piece. The final point cloud can be 99 further compared with CAD model to find out the color coded manufacture distribution error map. Since each sensor frame is steady with each other, the transformation between sensor frame to the common wont not change after registration calibration. Therefore, this registration is also called as off-line registration calibrated. 2) Error map generation. The mathematical definition of the deviation between data sets: the triangulated CAD model and the point could. The deviation along the surface normal of each triangulated surface of CAD and the closest point from point cloud is applied to describe the error distribution. The color code from blue to red is used to visualize the distribution, in which blue denotes the CAD is lower than the point cloud and red means the CAD is higher than the point cloud. Error map generation requires: a) the on-line registration between the final point cloud and the CAD model; b) the definition of the deviation between two data sets. Since the position of the work piece is not fixed, the registration matrix between the final point cloud and the CAD model is changing from time to time. The registration matrix needs to be calculated after inspection within a limited time. Therefore it is also called online registration calibration. The deviation between final point cloud and the CAD model is defined by the distance between closest point in point cloud and the triangulated surface in CAD. If the triangulated CAD model is above the closest point in final point cloud, the deviation is defined as negative, indicated as blue color; in the other side, the deviation is considered as positive, indicated as red color. 5.2 Off-line Registration Calibration from Sensor Frame to Common Frame The tasks in this off-line registration calibration are to find the rigid transformation matrix to update all point clouds in their individual frames into a common frame. S The result point cloud P j in each sensor fame can be written as: 100 S P j= Sj pi (xi , yi , zi ) ∈ R3 , j = 1, 2, 3, 4 (5.1) The registration to transfer four point clouds into a common frame is defined as: C Sj P C = ∪4 j=1 TS P j (5.2) C TS is the transformation matrix from sensor frame j to the common frame. The task is to j C find the transformation matrix TS for each sensor frame. In order to solve TSj in Eq. 7.3, j three tasks need to be accomplished. S 1) Two point clouds P j and P C need to be derived first. Point set P C can be determined S by pre-measured gauge, and P j is the sensor measurement result. 2) The correspondence pairs from two data sets are needed. The well-known ICP algorithm is used to get correspondence after the rough alignment. The closest points are assumed as a corresponding pairs between two data sets. The final alignment can be found iteratively between two sets. C 3) Derive the rigid transformation matrix TS after the correspondence is established. j The rigid transformation matrix consists of two parts: the rotation transformation matrix R and translation vector t. Matrix R is a unit orthogonal matrix, with three degree of freedoms. The derivation of R is a little complicated to follow the unit, orthogonal, and three degree of freedoms constraints. 5.2.1 Calculation of Rotation Matrix via Quaternion Algorithm A quaternion is an extension of a complex number, which is defined in: q = q0 + q1 + jq2 + kq3 (5.3) , where i, j, and k are three orthogonal 3D vectors as the bases of 3D Euclidean space. i2 = j 2 = k 2 = ijk = −1A unit quaternion q has norm of one and can be expressed by: 101 q = cos (θ/2) + sin (θ/2) V (5.4) , in which θ is the rotation angle and V = [xV , yV , zV ]T is a unit vector representing the rotation axis. Then, a unit quaternion q represents an arbitrary rotation with parameters V and θ in 3D space. A quaternion based rotation matrix, therefore, becomes:   2 2 2 2 q0 + q1 − q2 − q3 2 (q1 q2 − q0 q3 ) 2 (q1 q3 + q0 q2 )   2 2 2 2 Rq =  2 (q1 q2 + q0 q3 ) q0 − q1 + q2 − q3 2 (q2 q3 − q0 q1 )  2 2 2 2 2 (q1 q3 − q0 q2 ) 2 (q2 q3 + q0 q1 ) q0 − q1 − q2 + q3 (5.5) A point P = [x, y, z, ]T can be considered as a pure quaternion (q0 = 0). Its rotation can be expressed as: Rq P = q ◦ P ◦ q ∗ (5.6) , where ◦ denotes a quaternion multiplication and q ∗ is the conjugate of the quaternion q. The rigid transformation between different frames can be solved in two steps via quaternion algebra:       xc xs xs       ∗  yc  = Rq  ys  + t = q ◦  ys  ◦ q + t zc zs zs (5.7) , in which Ps = [xs , ys , zs ]T is a point in sensor frame, and Pc = [xc , yc , zc ]T is a point in common frame. We can obtain: (Pc + Ps )× η − (t)× η − t = Ps − Pc (5.8) , where ()× represents a skew symmetric matrix and η = tan(θ/2)V . i i+1 i i+1 Let Mi ≡ Ps − Ps and Ni ≡ Pc − Pc , Eq. 5.8 changes into: (Mi + Ni )× η = Mi − Ni , i = 1, 2, ..., n − 1 102 (5.9) , in which n is number of points in both data sets. After η is solved, the rotation axis V and the rotation angle θ can be obtained by: V = θ = 2a tan η η ηmax Vmax (5.10) (5.11) Therefore, the rotation matrix Rq and the translation vector t are found by:   xV λθ + cθ xV yV λθ − zV sθ xV zV λθ + yV sθ   Rq = xV yV λθ + zV sθ yV λθ + cθ yV zV λθ − xV sθ xV zV λθ − yV sθ yV zV λθ + xV sθ zV λθ + cθ t = Pc − Rq Ps (5.12) (5.13) , where λθ = 1 − cos(θ), cθ = cos(θ), and sθ = sin(θ). In order to calibrate a rigid c transformation Ts , the 3D coordinates of correspondence points in both sensor frame s and common frame c need to be determined. Coordinates of points in the common frame c can be predetermined by making a calibration gauge, and the points in the sensor frame are the measurement results of the gauge after the sensor is calibrated. The correspondence between two data sets: data in common frame and data in sensor frame is established by Iterative Closet Point (ICP) method. 5.2.2 Iterative Closest Point (ICP) based Registration Based on the accuracy of the merge result, registration can be roughly considered to two parts: coarse alignment and fine adjustment. Coarse alignment derive the transformation matrix based on a few feature points in two data sets. Therefore, the result accuracy is not very high. It is usually used for the applications with low accuracy requirement, such as navigation or initial alignment for fine adjustment. The ICP fine alignment algorithm [?] [?] is based on searching of the pairs of nearest points in both data sets and estimates 103 a coordinate transformation to alignment them together. The whole process is iteratively executed until the registration error reaches the threshold, in which the overall registration error is minimized. ICP algorithm is shown in figure 5.1 as a recursive process. The algorithm is to find the p rigid transformation matrix Tq between set P with Np points and set Q with Nq points. Xp and Xq are two points from set P and Q, respectively. The algorithm includes: Step 0) Generate the coarse alignment by a few correspondence features between to data sets. These features can be defined from artificial markers, colors, and curvatures. Update set T0 Q− > Q0 . Each recursive loop of the ICP algorithm can be described as follows: Step 1) Set up the correspondence pairs of points by minimizing the cost function, the Euclidean distance between every two points in two data sets. The Euclidean distance between an arbitrary point XQ ∈ Q and all the points in the set P is calculated. The point from P which has smallest cost function, Euclidean distance, is considered as the correspondence point of point XQ . The algorithm goes from the first point in Q to the last one and create a list Lk with Nq pairs of correspondence points from two sets. k indicates the current loop number. Step 2) Derive the rigid transformation matrix Tk which consists of rotation matrix Rk and tk , via quaternion method explained in previous section. Step 3) Update the date set Qk into Qk+1 via matrix multiplication: Tk Qk = Qk+1 . Step 4) Calculate the average error ek via: 1 ek = Nq Nq 2 2 Tk Xq (i) − Xp (i) (5.14) 1 Step 5) If the decrement between ek−1 and ek is smaller than T hreshold1, or ek is smaller than T hreshold2, the algorithm output the results: P , Qn , Tf = Tn Tn−1 , ..., T0 , n means the total number of loops, and Tf means the final transformation matrix. Otherwise the algorithm goes to Step 1) and starts another loops. The algorithm was proved to converge 104 Figure 5.1: ICP algorithm monotonously to the local minimum [49] [50]. ICP algorithm is widely used in point cloud registration. However, it can not be directly 105 applied for current application in this thesis due to several limitations and practical issues a) Correspondence selection due to point density. ICP algorithm assumes that two data set represent the same shape, and therefore the rigid transformation is able to be regressed via minimizing the cost function e. Point density of a data set represents the digitized sample rate of a given 3D shape. If the point densities of two data sets have significant difference, such as a measured point cloud and a triangulated CAD model, data set with less point density is preferred as set Q in the ICP algorithm. Nq pairs of correspondence points will be involved in the calculation. Otherwise, Np pairs of correspondence points are used, which greatly reduce the algorithm efficiency. Selecting Q as the data set with high density also results that one point in P is the corresponding point for multiple points in Q and then may increase the error. b) Interpolation points for the correspondence. Although two data sets sample the shape, the assumption that the closest point is the corresponding point is still too strong in some case. The corresponding point of a point in set Q can be the interpolation point from set P instead simply selecting the closest point from P . In order to further increase the accuracy of the algorithm, interpolation points may be needed. c) The consistence of the registration. The coarse alignment is considered by the initial condition of ICP. Since ICP algorithm can only insure to reach the local minimum instead of global minimum, the final position varies when initial condition changes. ICP registration method based error map generation requires that the work piece’s initial position should be almost same. The error maps of the same work piece and same CAD should be same from time to time. If the standard deviation of the error maps exceed the threshold, the result error maps become meaningless. d) The definition of the cost function. ICP algorithm utilizes the Euclidean distance between two points as the cost function, resulting that the closest points are the corresponding points to calculate transformation T . The cost function is not unique, the distance from point to plane for instance, can also be used as the converge criteria depending on the appli- 106 cation requirement. The minimization of the cost function results that the distance between points in set P and planes in set Q reaches the local minimum. e) Calculation time of the algorithm. On-line calibration requires the registration between the final point cloud and CAD model can be finished within limited time. The searching for corresponding points in each loop requires Np × Nq times calculation. If the size of the data is big, for instance around 100, 000 points for each, the calculation time can reach more 30 minutes to complete the algorithm. Octree space labeling method and multi-resolution method are used to speed up the algorithm. f) Objected-oriented registration. The registration between the final point cloud and the CAD model is different from the traditional registration application. Conventional registration assumes that two data sets are from the same shape, and then an algorithm that minimizes Euclidean distance can be applied. The assumption is broken in current application. The final point cloud does has manufacturing errors compared with CAD model, and furthermore finding these errors is the purpose of the whole system. The registration is defined by the factory based on how the work piece is assembled to the others. The connection parts, such as holes of screws, which is usually call datum points, should have more weight in the registration process. The new registration that provides additional constraints and weight is developed. 5.2.3 System Calibration Procedure In figure 5.2, the painted windshield with white markers is used as the system registration calibration gauge. Each white dot marker’s center coordinate was measured by a CMM and was considered as the ground truth data for registration calibration. The rest part was painted into black to assure only white dots coordinates are inspected by RIS system. System registration calibration can be divided into four steps: 1) Obtain RIS measurement data: measure the system registration gauge, filter out the noise and save four measurement results in four sensor frames. Figure 5.3 is the measurement 107 Figure 5.2: System registration gauge. result of sensor one, the coordinates of the left top part of the system registration gauge. RIS system measures a cluster of points for each marker on the gauge. Figure 5.3: Sensor 1 measurement result. 2) Interpolate the coordinate of the marker center: ICP algorithm can not be directly applied in current case due to the limitation b) explained in last section. Closest points in both data sets may not represent the corresponding points, therefore the corresponding point 108 can the interpolation among the 3D point cloud in the second data set. In order to further increase the calibration accuracy, the interpolation is first applied after the data acquisition of RIS sensor. The radius of all the circular markers is 5 millimeters. The coordinate of the center of each marker was interpolated after obtaining the raw RIS measurement data. Figure 5.4 is the top view of the sensor 1 measurement result. It also displays the interpolated centers of each marker. The green dots denote the measurement points, and the red stars are the interpolated marker center. As we can easily see from figure5.4, the RIS sensor may not measure the center of the markers. Registration based on closest point can still generate a certain level of error. The interpolation point of each marker is able to represented the center and results better accuracy. The center of each cluster of green points was estimated by the conventional Hough transform. Figure 5.4: The center of each marker. 3) Registration process: registration process is divided into two parts: rough alignment (initial alignment) and fine alignment. Rough alignment is to match the RIS data to the CMM data in a low accurate level. Fine alignment applies the result from the rough alignment as initial position. ICP [49] [50] method is applied to obtain a high accurate result. 109 Figure 5.5 shows the rough alignment result for sensor one. Figure 5.6 shows the fine alignment result for it. Red dots denote the RIS sensor one result, and blue ones denote the CMM result. Red dots and blue dots are overlayed with each other in fine alignment. Figure 5.5: The rough alignment between sensor 1 and CMM data. 4) Repeat step 1) to 3) for the other 3 sensors. After system registration calibration is done, four individual point clouds successfully join into one in a common frame shown in figure 5.7. 110 Figure 5.6: The fine alignment between sensor 1 and CMM data. Figure 5.7: System calibration results. 5.3 Registration between the Final Point Cloud and CAD Model The last step of a quality inspection is to find the deviation between the final point cloud 111 and the CAD model. The color code information is used to evaluate the manufacturing error of the work piece. If the manufacturing error exceeds the tolerance, the work piece is considered as unqualified. In the registration step between point cloud and CAD, the fine alignment matrix is needed to match the measured data with CAD model. Current registration method can be directly applied for this type of application because: 1. Photogrammetry method is not feasible for each work piece on an assembly line. 2. Feature points do not exist on part surface. 3. The mechanic support can’t guarantee the same location and orientation of each work piece. 4. The online point cloud registration needs to be finished within limited time. 5.3.1 Octree Space labeling method and Multi-resolution Method based ICP Registration between the final point cloud and the CAD model needs to be finished within a limited time. Therefore, several algorithms to speed up the registration are applied to the registration algorithm in this thesis. Traditional ICP method searches the corresponding point through whole data set P for a point Pq in set Q, which is not necessary. After the rough alignment, the corresponding point in P stays within the neighborhood around point Pq . Searching a neighborhood around point Pq instead of the entire point cloud can greatly save the algorithm execution time. In order to limit the search region in set P , octree space labeling method is utilized. An octree is a tree data structure in which each internal node has exactly eight children. Octrees are most often used to partition a three dimensional space by recursively subdividing it into eight octants as shown in figure 5.8. The center of the space is considered as the fist node, and the space is divided into 8 parts according to the geometrical relationship with the node. Each subspace will be divided recursively into another 8 parts. The nodes of the octree is shown in figure 5.9. The first node in level one leads 8 children in level two, and each node in level two leads another 8 in level three. 112 o1 o1 o1 o1 o1 o1 o1 Figure 5.8: Space partition by the octree space method. Figure 5.9: The node tree of the octree method. In current application, both data sets, the point cloud and the CAD, are divided into 64 subsets by three levels of octree respectively. The subsets with same coding name in both data sets, such as the subset 000 in the CAD and the subset 000 in the final point, are considered as the corresponding subsets. The searching of corresponding points are conducted only 113 with in each corresponding subsets. Therefore, the execution time of algorithm can be cut approximately 63/64. The loop of the cctree algorithm for the CAD model is described figure 5.10. o1 o1 o1 o1 o1 o1 o1 Figure 5.10: Octree for CAD model. O1 = [xo1 , yo1 , zo1 ]T , the geometry center of the data set, is first calculated by Eq. 5.15: O1 = 1 N i PCAD , i = 1, ...N (5.15) N The coordinates of a point P , x, y, and z, are compared with O1 step by step. 1 is returned if P ’ coordinate is not smaller than O1 ’ coordinate, otherwise 0 returns. By this way, every point goes into 8 different subsets: 000, 001, 010, 011, 100, 101, 110, and 111. The same loop goes into each subset again and result 8 × 8 = 64 small subsets representing the CAD. The final point cloud is also treated with similar algorithm and is first divided into 8 parts in loop 1 shown in figure 5.11. The only difference is that an offset value T is added to the separation criteria, the geometric center of the final point cloud. If the coordinate of 114 P is smaller than O1 + T , the algorithm returns 0; if the coordinate is bigger than O1 − T , the algorithm returns 1. The points near the edge between two subsets can go to both sets. The difference is illustrated in figure 5.12. Each subset in the point cloud is bigger than the corresponding subset in the CAD. The neighbor subsets have a certain region of overlap. A point near the edge of a subset in CAD may result its corresponding points in the neighbor subset in point cloud. The corresponding subset in the final point should be enlarge to cover this case. Shown in figure 5.12, subsets in CAD does not have overlap, but the subsets in the point cloud have. P p P P o1 p o1 o1 P o1 P o1 P o1 p o1 o1 p o1 p o1 p o1 o1 P o1 p o1 Figure 5.11: Octree for the final point cloud. Traditional ICP method also searching the corresponding point one by one in final point cloud, which means the searching resolution is a constant value 1. At the begging of the recursive algorithm, it is not necessary to search every point. Searching every 4 or 8 points is enough to generally align two data sets. Resolution can be function of iteration number n. As n increases, the searching resolution comes back to 1. Usually the iteration stop between 115 Figure 5.12: The difference between two algorithms. 7 loops to 12 loops. Therefore, the resolution function used in this thesis is given in Eq. 5.16. The algorithm is also forced to run at least 4 loops to find the correct corresponding points. Res = max 1, 2(4−n) 5.3.2 (5.16) Feature Weighted Registration Usually, the measured work piece needs to be assembled with the other parts. Therefore, several user specified locations are used to connected as links to the other parts. Those points or parts, once defined, will significantly determine the transformation of a point cloud in its CAD frame. Points in the data sets weights different in the registration process. Traditional ICP methods weight all points same is improper in the error map registration. Curvature Curvature Information: Curvature information is widely used in feature-based matching. Usually people utilize mean curvature or Gaussian curvature information to find feature points in a 3-D data set. Those feature points are helpful to set up the correspondence since they can be easily identified. The points carrying relatively large curvature such as peak points or valley points are chosen as feature points. The method in this thesis estimates the invariant points from both data sets by curvature information. In quality inspection process, all work pieces share similar but not same geometry. Each work piece has some 116 deformation from the designed shape. From the manufacture process, it can be concluded that the simpler the part shape is, the lower the risk of manufacture error the part has. For instance, flat shape with zero curvature carries smallest manufacture error, because almost no manufacture is evolved. So flat parts can be viewed as the most likely invariant parts in the work piece and should be weighted more in the registration process. In contrast, complex geometric parts will be less trustable and weight less. If the industry does not provides the datum points / features, we consider that the parts carrying the least curvature weight most in the registration process. GDT and Measured Condition GDT is short for geometry dimension and tolerance. It is a standard method to specify the geometric form and tolerance information of work piece in industry which is used in quality inspection process. Only the error map is within the acceptable tolerance, the work piece can be considered as qualified. The error map generation is related with the registration process. Without correct registration, the error map can not reflect the part’s quality. The registration based on the data features is critical in the error map generation. Check fixture is a properly designed support of the work piece, which guarantees the work piece’s position and orientation. Figure 5.13 shows a picture of a check fixture for windshields. In order to avoid the position error, three or a few more widely spaced datum points stoppers are measured and fixed on the check fixture. The windshield will sit on the surface of check fixture with all stopper bars touching the windshield edge. However, the final position and orientation between a windshield and the check fixture are judged by eyes, which means both accuracy and precision can not be granted. The same work piece can provide different error distribution maps. The second disadvantage of the check fixtures is that each fixture only fits one kind of work piece. Different types of work pieces request a new check fixture, which costs a lot. Straightness and flatness are perhaps the two most popular types of geometries to be 117 Figure 5.13: Check fixture and the stopper bar touching datum points of the work piece. used as datum, because least manufacture involved in those parts. The installation points are chosen as datum points, since those points carry less curvature and less tolerance. In this Thesis, the edges touching the datums on the check fixtures will be weighted more in the ICP registration. The position and orientation of windshield corresponding the check fixture is used as an initial condition of the modified ICP algorithm. 5.3.3 The Object-oriented online Registration Model Registration is the process to find out rigid transformation T between two data sets A and B in two coordinates. The following parts are only concentrated on how to solve the equation {ai } = T {bi } to get transformation T . The total square error e2 is defined as e2 NA = (ai − T bi )2 (5.17) i=1 ai is a point from the CAD model A, and bi is the correspondent point in the measured data set B. 118 In order to analyze and solve the online registration problem, the theoretical model is built up from ideal case to the real case. case 1 Ideal case, which is the easiest. Given two data sets: A{ai (xi , yi , zi ), ai ∈ R3 }, and B{bi (xi , yi , zi ), bi ∈ R3 } in two coordinate systems. Shape A is equal to shape B, Data set A is CAD, B is the measured data. The registration matrix T between two data sets needs to find out. Solution: Since shape A equal shape B, the only error source for e is the registration error, which is named as eRegistration . The registration error for each pair of corresponding points in two data sets can be written in the Eq.5.18 ei = eRegistration,i = ai − T bi (5.18) Linear least square method is directly applied. By minimizing e2 , T can be solved by iteration. (ICP method) case 2 Random noise eRandom is added in to data set B. The unknown registration matrix T needs to find out between two coordinate systems. The error between each pair of corresponding points in two data sets has two parts: the registration error and the random error, so the equation can be written as Eq.5.19 ei = eRegistration,i + eRandom,i = ai − T bi (5.19) Linear least square method can still be directly applied in the equation. Since linear least square method is good at solving the random noise, T can be found by iteration. (ICP method) 119 case 3 The deformation error eDef orm is added into data set B. which can be viewed as the real case. In this case, the error equation is: ei = eRegistration,i + eRandom,i + eDef orm,i = ai − T bi (5.20) If linear least square method is still directly applied to equation by minimizing linear square of e to find out T . The total square e2 including the deformation error eDef orm will be minimized in the registration procedure, Registration process changes the deformation information, which makes the error map un-trustable. The solution: Since shape B is a deformation form from A, there exist some parts B1 that have same shape with the correspondent parts of A denoted as A1 ; there exist some parts B2 that have small deformation with the correspondent parts of A denoted as A2 ; and there exist some parts B3 that have medium or big deformation with the correspondent parts of A denoted as A3 . Now data sets A and B will be separated into three subclasses: A = {A1 , A2 , A3 }, B = {B1 , B2 , B3 }. 3.1 First assume region A1 in A (CAD) is known from the GDT information In CAD, and B1 is unknown in measured data B, and assume B1 always exists. The error e1 between A1 and B1 only have two parts: eRegistration + eRandom . Linear least square method is able to solve T between A1 and B1 . Based on the characteristics information carried between A1 and A, a filter can be build up to estimate the possible data set B1 P ossible of B1 from B, which has the following relation B ⊇ B1 P ossible ⊇ B1 . The filter is built by the curvature and moment information A1 carried in set A. 3.2 In this case, the existence of B1 can not be guaranteed. The assumption that A1 always has correspondent B1 may not always be true. CAD model provides the information that A1 ’s corresponding part B1 should has low tolerance, however the work piece could be unqualified due to the manufacture process. By adding this constraints, case 3.1’s solution 120 matrix needs to verify before registration. The solution matrix T was applied to calculate e1 . If e1 is bigger than the acceptable tolerance carried by CAD, which means B1 in fact does not satisfy the assumption, then this work piece will be considered as unqualified and thrown away. The post process’s verification result tells people any work piece challenges the assumption will be punished as an unqualified part to be thrown away, so that the solution strategy becomes un-challengeable. 3.3 A1 in data set A changed unknown too. None of any pre-knowledge about A and B is known. Solution: curvature information in A is used to estimate the A1 . And then A1 is used to estimate B1 to find T . Assumptions: 1) During the manufacture process, the more complex the part is made, the more manufacture error. The less complex the part is made, the less manufacture error, the more reliable. 2) Geometric complexity is related with the curvature information: The less mean curvature point ai has, the less geometric complex the point has, the more likely ai belongs to A1 , ai ∈ A1 . Based on this assumption, AP ossible ⊇ A1 is estimated by curvature 1 information. Judged based on the curvature information, AP ossible , AP ossible , AP ossible are estimated. 1 2 3 P Similar with case 3.1, replace A1 by AP ossible , estimate B1 ossible by filter. After the equation 1 eRegistration + eRandom = {ai } − T {bi } are solved, Tk=1 is found. Similar with case 3.2, postprocess is carried out to calculate every point’s ei in AP ossible. If any point’s error is 1 bigger than threshold, the point in AP ossible will be thrown away. Iterative step k goes to 1 P k + 1, New B1 ossible was re-calculated based on new AP ossible . This iterative steps will stop 1 after AP ossible converges to A1 by comparing the post-processing verification results with 1 the accepted tolerance. 121 5.3.4 The Methodology in Current Task Notation: Before the error map generation, the transformation T which aligns measured point cloud from sensor frame into CAD model in CAD frame is needed. The CAD model in the CAD frame can be denoted as: CAD = {pCADi (xi , yi , zi ) ∈ R3 } (5.21) The datum points in the CAD model: A1 = {pDatumi (xi , yi , zi ) ∈ R3 } (5.22) For different work piece j, its notation in part frame is: p p Wj = {pW ,i (xi , yi , zi ) ∈ R3 } (5.23) j For measurement time k, the location of the work piece is denoted as Tk , this transformation matrix denote from part coordinate frame to world frame. So the description of the work piece j in the location k in world frame is: p w Wj,k = Tk Wj (5.24) The measured point cloud of the work piece first stores in the common sensor coordinate frame can be denote as: p s s w s 4 4 Wj,k = Ui=1 (Tw )Wj,k = Ui=1 (Tw )Tk Wj (5.25) 4 s The term Ui=1 (Tw ) is to transfer four individual sensor frame point clouds into a final one. The inspection system used here has four 3-D sensors, each one measures a quarter of the work piece. The final point cloud needs to be transferred from four individual sensor frames. The final registration process is to align the measured point cloud with the CAD model: 122 p CAD s 4 s Wj,k = T Wj,k = T Ui=1 (Tw )Tk Wj (5.26) p In Eq.(5.26), there are two variables in the left side: Wj presents different work pieces, Tk presents different location of the work piece. The transformation matrix T needs to cover all these two variables. The algorithm is started by finding the possible datum points sets P B1 ossible in the measured data by applying a filter function to search though measured data set. 5.3.5 Registration Process The filter is designed by the collection of datum points’ characteristics in CAD, including the curvature information and the moment information. After those information are captured, threshold tolerance is add on the filter. The threshold tolerance will control how many points P can pass the filter into the set B1 ossible. The filter will go over the whole measured data set P to get B1 ossible. F ilterf unctionf (pi ) = 1, Ki ∈ Kc ± τK Mi ∈ Mc ± τM 0, otherwise (5.27) In the Eq.(5.27), pi denotes the point i in measured data, Ki denotes the mean curvature of point i, Mi denotes moment of point i. τ is the threshold tolerance. If the points satisfy the curvature and geometric condition, the filter function equals to 1, otherwise filter function equals to 0. ICP is applied in the algorithm: Repeat the iteration process i = from 1 to iF inal , until the total least square error reaches the threshold. (a) Corresponding points: The CLOSEST points min[(xa1 − xb1 )2 + (ya1 − yb1 )2 + (za1 − zb1 )2 ] are considered as corresponding points P between A1 and B1 ossible . (b) Minimizing the total square error, in order to find the rotation matrix R and translation vector t. 123 (1) (c) Updating the set A1 (2) to A1 by T . (d) Continue step (a) to (c) until the average total square error under threshold value. 5.3.6 Error Map Generation After the point cloud is transferred into the CAD model frame, the distance D between point cloud and CAD surface is calculated along the normal direction. The deviation △ can be described by Eq. 5.28, Where A represents CAD model and B represents a measured point cloud. △ : (A, B) −→ E (5.28) The error map E includes a set of 3D points and their distances D in a metric space to the closest triangle in CAD model M, as seen in Eq. 5.29 D(bi , A) = min(d(pi , Tj ), i = 1, 2, ...m, j = 1, 2, ...n) (5.29) E = {< bi , D(bi , A) > |i = 1, 2, ..m} The methodology explained earlier is implemented step by step. The firs step is to calculate the datum points’ moment information and the characteristic information in CAD model. Then the estimated possible datum points in the measurement point cloud are found out. At last, ICP algorithm is applied between four datum points in CAD and the possible datum points sets in the measured point cloud. The registration matrix based on the invariance part of the windshield is finally derived. Figure 5.14 (a) is the CAD model surface with four datum points on it. Based on the four datum points and the methodology outlined before, the final alignment between CAD and point cloud is generated, shown in Figure 5.14 (b). The final error map for a windshield is displayed in figure 5.15. Red color means that the CAD is higher than the point cloud. Green color means that both sets have same heights, and blue means that CAD is lower than the point cloud. From the color bar in the left side of the figure 5.15, the error is from from −5 millimeters to 6 millimeters. 124 Figure 5.14: (a)CAD model with datum points.(b)the registration result based on datum points Figure 5.15: The color coded error distribution map for a piece of windshield. 5.4 Sensor Fusion The 3D point cloud is not the only way to display the measurement result. In the navigation application, the 3D point cloud is fused into the 2D image. The distance between the object and the camera is first translated into a color code and then the color code is fused into the display. An example is given in figure 5.16. Since the 3D point cloud is calculated through 125 the corresponding image pixel, the transformation between the 3D point cloud and the 2D image can be directly obtained. Figure 5.16: The fusion result for navigation application. 5.5 Chapter Summary After 3D point cloud is derived through the structured light sensor, there are usually several post processes. Registration or sensor fusion are two common tasks. Registration is to find out a rigid transformation matrix between two 3D frames and to put different point cloud into one frame. ICP based registration method is developed in this chapter. Through the system calibration, the final point cloud merges all the four individual point clouds and represents the entire work piece. Octree and multi-resolution is used to speed up the iteration speed. An object-oriented method is developed to register the final point cloud with the CAD model 126 and thus an error distribution map could be calculated and displayed. 127 CHAPTER 6 The development of an Omni-direction 3D camera for mobile robot 6.1 The Working Principle of the Omni-directional 3D Sensor The developed omnidirectional 3D camera consists of a projector, a camera, and two hyperbolic mirrors. Similar with traditional omnistereo systems, a projector replaced a camera and resulted in a structured light based system. The principle of the 3D camera is illustrated in figure 6.1. Under ideal condition, optical centers of the camera and the projector coincide 2 2 with the focal points Fm1 , Fm2 of the two hyperbolic mirrors, respectively. The projector is regarded as an inverse camera, that is, the projector maps a 2D pixel in the projector to a 3D array in the scene. The optical path of the 3D camera can be described as follows: a light ray from the 2 projector goes through the focal point Fm2 and then intersects with the hyperbolic mirrors at point Pm2 . The hyperbolic mirror has a useful property that any light ray going towards one of the focal points will be reflected through the other focal point. Hence, the incident 1 light ray will be reflected away from the other focal point Fm2 . Then, it intersects with the 128 observed target with diffuse reflection property so that part of the light ray will be reflected 1 to the focal point Fm1 of hyperbolic mirror 1; thus, the light ray will be further reflected 2 at point Pm1 of the hyperbolic mirror 1 to the camera through the other focal point Fm1 due to the property of the hyperbolic mirror. Therefore, under ideal condition, the relation between any 3D point in the scene and the corresponding 2D point in the image sensor can be represented by a uniform model with a single viewpoint [62]. However, the above ideal mathematical model requires the lens’ optical center to be coincide with focal point of the hyperbolic mirror. This requirement, however, is difficult to be completely satisfied, especially for the projector since it cannot view the scene directly. So the uniform model for every pixel of the sensor will cause the residual error, resulting in incorrect 3D reconstruction. To solve such a problem, a vector-based 3D reconstruction strategy which determines the ray vector for each pixel of the camera and projector in the scene was proposed and the corresponding look-up-tables (LUT) was established to calibrate the system. A light array Lp (up , vp ) from a projector pixel (up , vp ) intersects its corresponding light array from camera Lc (uc , vc ). The intersection point P is the reconstruction point. Instead of reconstructing the light array from the image pixel, light arrays are directly rebuilt in the task space: 1 Lp = Pw + UW p λp (6.1) Lc = Q1 + VW c λc w (6.2) 1 Pw is an arbitrary point on Lp and UW is a unit vector representing its direction. Q1 w and Vw in Eq. 6.2 have the similar meanings. 129 6.2 Calibration of the Omni-directional 3D Camera First, The calibration of the omnidirectional camera was discussed using the vector-based calibration strategy. As shown in figure 6.2, Uw is the incident vector in the world frame of a pixel Ic in the catadioptric camera, which is our calibration parameter for each pixel. By moving a predefined reference board (white flat board) with serval precise-located markers, 1 2 the view vector Uw can be specified by the two points Pw and Pw . Although the calibration concept is straightforward, it is a challenge to calibrate all the pixels simultaneously. To this end, an extra dioptric projector, as well as a dioptric camera, is introduced. The dioptric projector shoots two-dimensional encoded patterns onto the reference board; meanwhile, the dioptric camera records the resulting image of the board. In such a way, the phase value can be determined for each pixel in the dioptric camera. The origin of the reference board frame is set to coincide with one marker and the xy axes are parallel to reference board. Obviously, the z axis is perpendicular to the reference board. In this case, ′ the coordinate Pr in the reference board frame can be transformed to the corresponding Ic in the image plan frame of the dioptric camera by the homography matrix H: ′ P r = H × Ic (6.3) where H can be specified by using the markers on the reference board since the coordinates of these markers in the image plane frame and reference board frame are both known. It should be pointed out that the location Pr can be calculated within the sub-pixel by appropriate image processing. The desired location Pw of the point is measured in the world frame. Therefore, the location Pr in the reference board frame should be further transformed to Pw in the world frame: Pw = RPr + t (6.4) where the 3 × 3 rotation matrix R and 3 × 1 translation vector t can be obtained through the known markers. Markers are measured using a high accuracy instrument (e.g. the 130 measuring arm or laser tracker), which is kept static during the calibration procedure so that the instrument frame is considered as the world frame. Whereas, the dioptric projector and the dioptric camera are moved to optimal position in sequence when the reference board is placed at different sides. Furthermore, the illuminated patterns are simultaneously captured by the catadioptric 1 camera so that the corresponding coordinate Pw in the world frame can be determined for 2 every pixel in catadioptric camera. Similarly, the coordinate Pw can be obtained when the 1 reference board is moved to another location. Then, the reflected vector Uw and a point Pw for each pixel in the world frame can establish a LUT, whose size is equal to the resolution of the catadioptric camera. Without loss of generality, any point viewed in the catadioptric camera can be presented by: 1 Pw = Pw + αUw (6.5) 2 1 Pw − Pw . It should be noted that the point viewed within 2 1 Pw − Pw the pixel in the catadioptric camera can be realized by using bilinear interpolation with four where α is a scale factor; Uw = neighboring pixels. On the other hand, the calibration of the catadioptric projector can be achieved in a similar way except with the absence of the extra dioptric projector. For this technique, encoded patterns are directly shot from the catadioptric projector and the dioptric camera records the phase distribution and calculates the corresponding locations Q1 and Q2 in the r r reference board frame. Similarly, markers on the reference board are measured by the high accuracy instrument so that we can obtain the locations Q1 and Q2 in the world frame. w w Hence, any point on the incident light ray of the projector can be expressed by: Qw = Q1 + βVw w (6.6) Q2 − Q1 w w . Eventually, the incident vector Vw and a point 2 − Q1 Qw w Q1 for each pixel of the catadioptric projector can also establish a LUT, whose size is equal w where β is a scale factor; Vw = to the resolution of the catadioptric projector. 131 Once projector and camera are calibrated, the point on the observed target can be computed by the intersection of the two vectors of the projector and camera [76]. 6.3 Error Analysis for the Designed 3D camera The accuracy of 3D reconstruction is bounded by the resolution of the imagers. The resolution of a camera is the distance that one pixel covers at a certain amount of distance. Any variation within the distance can not be directly detected and thus bounds the accuracy of the designed system. Difference from a traditional camera, the resolution of an omnidirectional camera is not a constant value. As shown in figure 6.4(a), the resolution around the edge is much lower than the center of the image. The resolution decreases along the image radius. The reason is because of the curvature of the omnidirectional mirror. Since the curvature of the mirror is not a constraint value, the back image from image also shrinks at different ratios. Figure 6.4(b) illustrates the optical path of omnidirectional camera. The wall is approximated 2.5 meter from the sensor, and point P moves along the wall. The corresponding pixel p moves along the radius to the center. Since the resolution is different from pixel to pixel the detection accuracy of point P is highly related with its position. If point P is around the same height with the mirror, its resolution is low and its detection accuracy is high. Once the point P moves lower, the resolution arises and the accuracy also drops.     x   2 2 2 2 xI fx α u0 2eu0     ∼ b −ez − a x + y + z   y   0 fy v0 2ev0     yI  = 2 z 2 − a2 x2 + y 2 z  b 1 0 0 1 2e I 1 (6.7) In error simulation, P moves vertically from 300mmto−1500mm, and its corresponding resolutions in image frame and in world frame are shown in figure 6.6(a) and figure 6.6(b) respectively. Ideal omnidirectional camera model is applied in calculation through Eq 6.7. The parameters are used design values: a = 18.75mm, b = 18.75mm, e = 40.0mm, fx = fy = 250, u0 = 320, v0 = 240, X = 2500mm, Y = 0, Z = −1500 300mmand the camera pixels are 132 640 by 480. From the resolution curve in figure 6.6(a), the resolution is round 7mm/pixel when the radius is 250 pixels, and the resolution arises to 25mm/pixel when radius is 100 pixels. From the resolution curve in figure 6.6(b), the resolution reaches 7mm/pixel when z is 300mm high. As the z decreases, the resolution increase. The omnidirectional projector resolution curve follows the same working principle with the omnidirectional camera. The only difference is that it was put upside down. The reconstruction error ∆d can be calculated by the resolution error. Shown in figure 6.5, AP1 is the distance that one camera pixel covers, and BP1 is the distance that one projector pixel covers. Since the triangleABP2 is similar with the triangle OC OP P2 , ∆d can be easily calculated: 160mm. When the object depth is around 4000mm,∆d equals 256mm. If a 2K by 2K camera is used instead of the current one, ∆d equals 39mm when object distance is 2500mm. ∆d = 62mm when the object distance is 4000mm. 6.4 One-shot Surface Coding for the Sensor A robust projection pattern is critically important for achieving accurate correspondence for a structured light based measurement system, especially for a one-shot method. The design should satisfy the following constraints: (a) monochromatic light and (b) pre-warping to cancel mirror distortion and (c) invariance from scene variation. Taking the monochromatic light into account, the designed primitives should not contain color coding information. The background colors will not influence a monochromatic light based system. Furthermore, the color based design pattern could not be applied to an infrared light source, but the monochromatic based design could. The final design of the omnidirectional 3D camera would employe near infrared light source instead of normal one. 133 Traditional one-shot pattern only works fine in a normal structured light system, it could not be used in omnidirectional structured light system without pre-warping. The convex shape mirrors in the omnidirectional sensor could distort both the geometries and the orientations of the primitives. Compared with traditional monochromatic light based patterns, light arrays are warped by the hyperbolic mirror twice in an omnidirectional system. Hence the geometrical features based pattern is even more difficult to be calculated compared with the traditional one. The wanted projection image should be first un-warped in order to cancel the mirror distortion. Since a projector can not receive information, its pre-warp function is much more difficult than a camera’s. Second, even the un-warping function is calculated, the projected geometrical light features are still further arbitrarily distorted by the unknown environment. The designs of the primitives should also be an invariance from this distortion and then a correct correspondence can be linked between two image frames. 6.4.1 Image warping in an omnidirectional camera Both of the projector warp and the camera warp are the transformations between conic image to panoramic views [77]. Camera image warp has been discussed a lot in the past [77]. In essence, the warping function scans the conic image around the center and then horizontally allocates each part into a rectangle, shown in figure 6.7. The radius and the center location are needed to finish the warping process. The radius and the center can be easily obtained for an omnidirectional camera by image analysis, but the warp for the projector is much more difficult and has been discussed very little in literature. The center of the conic image can not be directly estimated, since a projector is not an information receiving device. Through the LUT created in the calibration process, the projector can ’view’ the scene by a calibrated camera, which is illustrated in figure 6.3. By treating the omni-projector as a pseudo omni-camera, its warping function can also be obtained by LUT. Basically, the designed projection marks are first calculated on the reference board in the world frame and then are transferred back to the projector image frame via LUT. Hence, the conic image 134 center and its radius can be interpolated. 6.4.2 One-shot projection pattern design for the omnidirectional 3D camera The task to design a one-shot projection is to assign each projected bright pixel / image primitive / marker one unique codeword to distinguish each other. The unique codeword will bring correspondence between two image frames. A projection pattern with concentric circles is designed for the one-shot projection shown in figure 6.8. After the center of conic projector image is derived, concentric circles with different radii are utilized as the projection pattern. Each circle is assigned by a different angular frequency and a projection intensity in order to distinguish each other. The intensity functions of the concentric circles are described by Eq. 6.8 using the inverse Fast Fourier N Transform (FFT). The intensity function Ip (r, θ) is designed in the polar coordinate and 2π r stands for the radius and θ is the angle. N is the number points on the circle. X(r, k) is the assigned angular frequency for each circle and and Ir is the intensity coefficient for each circle. Within each circle all of the bright pixels share the same angular frequency: X(r, k). If the image is scanned from the center to outside by a radius, the circles are labeled by different angular frequencies and different projection intensities. If the image is scanned along the tangent direction of each circle, each bright pixel can be labeled by a different phase angle θ. The angular frequency, projection intensities and the phase angle θ assign each bright pixel one unique codeword. Ip (r,     N θ) =  2π   1 N N Ir X N − 2π θ−1 (k−1) (r, k) ωN , k=1 if r = R1 , ...R10 (6.8) 0, else ωN = e(−2πi)/N 135 (6.9) Ir = IR1 , ...IR10 , when r = R1 , ...R10 6.4.3 (6.10) Epipolar constraints and the decoding method The task for a one-shot pattern decoding algorithm is to extract the codeword so as to establish the correspondence between a projector pixel and a camera pixel. Most importantly, the extracted codeword should be an invariance against the unknown environment distortion. The projected pattern is distorted by the unknown test scene and the distortion is used to reconstruct the 3D coordinates. Different object depths could bring about different distortions. Utilization of such distortions is the fundamental working principle of a triangular based 3D reconstruction. However, the drawback of of the distortion is that it could also destroy geometric shapes and the locations of the pixels / primitives / markers. The designed codeword should avoid such a case. A robust encoding and decoding algorithm should utilize the invariance information against the distortion. In this paper, epipolar constraints [55] are utilized to extract the codeword and to find out the invariance. As the objects depths vary, the corresponding camera pixel changes its position. However, there are still certain rules that it must follow. As shown in figure 6.9, a projector pixel Ip (rp , θ) emits a light array, intersects to the scene at point P , and reflects into the camera image plane at point Ic (rc , θ). No mater where P is, Ip and Ic always share the same phase angle θ and θ is the invariance component from the scene variance. In this paper, each projection ring is labeled by different projection intensity and angular frequency and each pixel is assigned by an unique codeword: the phase angle. Therefore, the proposed one-shot pattern is capable of linking correspondence with high accuracy. The intensity of each arc / circle can been easily detected by the camera and its angular frequency can also be derived through FFT: 136 N X (r, k) = (j−1)(k−1) Ic (r, j) ωN (6.11) j=1 Combined its phase angle, a codeword of a bright pixel can be uniquely determined and be utilized to set up the correspondence. 6.5 Chapter Summary This chapter presents a structured light-based omni-directional 3D environment reconstruction system using the projector, camera, as well as hyperbolic mirrors. The corresponding vector-based calibration strategy is proposed to improve the system accuracy. 137 Zm1 Ym1 1 F m1 Xm1 Pm z  e a2 2 x2  y 2  b2 1 2e Zc Yc Lc (uc, vc) f 2 Oc, F m1 Xc (uc, vc ) Pc (PX, PY, PZ ) (up, vp )Pp Yp Zp 2 Oc, F m2 Zm2 P m Xp Lp (up, vp) Ym2 2 F m2 Xm2 Figure 6.1: The working principle of the designed omni-direction system. 138 Figure 6.2: Calibration of the omni-directional camera. 139 Figure 6.3: Calibration of the omni-directional projector. 140 (a) (b) Figure 6.4: Received image from the world frame: (a) the received image; (b) Illustration of the optical path. 141 ' Figure 6.5: The object depth error from the resolution limitation. 142 (a) (b) Figure 6.6: The omnidirectional camera resolution: (a) the resolution curve in image frame; (b) the resolution curve in camera frame. 143 Figure 6.7: Image warp for a panoramic view. Figure 6.8: The designed one-shot projection pattern. 144 Figure 6.9: The epipolar geometry of the omnidirectional system. 145 CHAPTER 7 Current Experimental Results and Data Analysis This chapter gives the experimental results and data analysis is given based on the results. According to the implementations of the systems, the measurement results are roughly separated into three categories: 1) The experimental results for the developed Rapid Inspection System (RIS) for free form surface. 2) The experimental results for navigation system in single viewing direction. 3) The experimental results for the omni-directional navigation system. 7.1 7.1.1 The Rapid Inspection System (RIS) The Configuration of the System Rapid Inspection System (RIS) for freeform surface, which is designed and implemented in Michigan State University Robotics and Automation Lab, is a 3D scanning system for manufacturing quality control. 3D point cloud and corresponding error distribution map with the Computer Aided Design (CAD) model will be generated after inspection process 146 has been conducted. Configuration of the system is illustrated in figure 7.1. Projector Camera Figure 7.1: RIS system configuration. RIS is a multi-sensor system, having four pairs of cameras and projectors on the top. 3 inches × 3 inches firm Aluminum T slots are selected to build a frame structure to eliminate the vibration from the ground. Sensors are mounted on the upper beams, a workpiece, such as a windshield, is located on the lower holder. Each pair of sensors forms an active vision sensing system. 3D point cloud in each sensor frame is constructed by calculating the deformation of Gray Code Line Shifting (GCLS) pattern by the test surface. The LCD 147 projectors on the beam shoot image patterns on the tested object and the 2k × 2k high resolution cameras receive the reflection lights. The field of view for each sensor is 2,100 millimeters (L) × 1,600 millimeters (W) × 600 millimeters (H). In the left top red box of figure 7.1, it highlights and zooms the cross view of sensor 1. A LCD projector projects the light patterns and a CMOS camera receives the reflection lights. The pixels on the projector frame and the camera frame denote as mp : (up , vp , 1)′ and mc : (uc , vc , 1)′ , respectively. S The result point cloud P j in each sensor fame can be written as: S P j= Sj pi (xi , yi , zi ) ∈ R3 , j = 1, 2, 3, 4 S j j P j = f j mp , mc , j = 1, 2, 3, 4 (7.1) (7.2) Function f j () denotes the 3D reconstruction model of sensor (Sj ) to recover the point cloud j j from corresponding pairs of pixels mp and mc in both image frames. The registration to transfer four point clouds into a common frame is defined as: C Sj P C = ∪4 j=1 TSj P (7.3) C TS is the transformation matrix from sensor frame j to the common frame. j 7.1.2 The Inspection Result via Single-shot Coding Algorithm As explained in the previous chapters, single-shot coding algorithm is capable to measure a moving object. Assuming the work pieces are moving along the assembling line, singleshot coding method is applied to inspection the parts. Since the single-shot is weak against the image noises, a pre-treated work piece is utilized: a spayed pillar part from automobile industry shown in figure 7.2. Its corresponding inspection result is given in figure 7.3. Although the method is weak against image noise and it could only obtain relatively low 148 point density, the method could still obtain the point cloud for a pre-spayed work piece. The point cloud is much lower than a multi-shot coding method, but is still higher than traditional CMM based inspection. Compared with multi-projection method and CMM, single-shot method could measure a moving work piece and its acquisition time is less than 1 second. Figure 7.2: The inspected pillar with surface pre-treatment. Figure 7.3: The inspected point cloud of the pillar. 149 7.1.3 The Inspection Result via Multi-shot Coding Algorithm Multi-projection coding method is more robust against image noise, so it could inspect work pieces without surface pre-treatment. Figure 7.4 gives the 3D point cloud of three work pieces by the inspection system. From the result, we could conclude that the system could inspect the work pieces and provide dense 3D point clouds no matter what optic property is: black, dark, and even shiny. Figure 7.4: The 3D point clouds for the work pieces. The major advantages of the proposed method is to increase the completeness of the inspection results. The traditional inspection method can not fully inspect the opticallychallenging objects without surface pre-treatment. In order to get a quantity evaluation of the completeness, two additional experiments were conducted. In the first experiment, the work pieces were powder sprayed. The camera and projector settings, such as camera aperture, projection brightness, and projection contrast, were properly readjusted to avoid under- / over- exposure issues. The number of the result 3D point cloud was considered as the ground truth. If any inspection result reached the same number, the inspection 150 completeness was considered as 100%. The comparison between our results and the first experimental results was to find out what is the percentage of the completeness we reached. The second one is to inspect the same work pieces using naive structured light coding / decoding methods. The patterns without adaptive mask are projected. In this experiment, the camera aperture can not be adjusted perfectly. The black piece requires a big aperture, but the shiny piece needs a small aperture. The aperture is set around the middle: perfect to the dark piece, but too big to the shiny part and too small to the black part. The comparison between our results and the second experimental results is to find out how much the method improved. The results are listed in table 7.1. The experimental results using our methods are listed as the third one. Table 7.1: The number of the point clouds in different experiments 1st 2nd 3rd Black one 30434 14793 (48.6%) 29526 (97.0%) Dark one 77433 71024 (91.3%) 76533 (98.8%) Shiny one 147755 126486 (90.7%) 143497 (97.1%) The numbers in the second column in the table 7.1 are the results of the first experiment, and the number of points are considered as the ground truth. Its completeness is defined as 100%. The numbers in the third column are the results by the naive inspection method. From the results, we could see that more than half of the black work piece are failed to be reconstructed, and around one tenth of the other two pieces are failed to be rebuild either. The reason that black piece received poor performance is that the aperture size is adjusted relatively small. Therefore the inspection of the shiny part is relatively good but the black piece is too poor. The numbers from the third experiment shows the completeness of our method: all the parts can be inspected over 97%. Around 50% of the completeness is 151 improved for the black piece and around 7% of the completeness is improved for the other two. Because the projection mask could dynamically adjust the illumination, the system could simultaneously inspect both shiny part and black part. 7.1.4 The Registration Result from Four Pairs of Sensor Frames A piece of painted windshield was inspected by the RIS system. Each pair of the 3D sensor covers around a quarter of the windshield. The final point cloud representing the windshield is the merged result from four individual point clouds in four sensor frames. The windshield is also measured by a CMM machine. Since CMM has been utilized for years for its accuracy performance, the CMM data is utilized as ground truth. The difference between the CMM measurement data and the RIS measurement data is applied to evaluate the calibration result. Figure 7.5: The comparison between CMM data and the RIS data. Figure 7.5 shows both the measurements. The upper three point clouds with triangulated 152 surface are the RIS measurement result after aligned with CAD model with different view angles, and the lower three point clouds with low density are CMM measurement results with the same three view angles. The evaluation between these two sets employee the following formulas: M= 1 NCM M di = NCM M 1 NCM M i i PRIS − PCM M (7.4) NCM M i M is the mean distance between the corresponding points of both sets. di , PRIS , and i PCM M are defined as follows: i i di = PRIS − PCM M = i i XRIS − XCM M 2 i i + YRIS − YCM M 2 i i + ZRIS − ZCM M 2 (7.5) i i i i PRIS = XRIS , YRIS , ZRIS T i i i i PCM M = XCM M , YCM M , ZCM M ∈ MRIS T ∈ MCM M (7.6) (7.7) , in which MRIS and MCM M stands for the measurement results from RIS and CMM, respectively. The corresponding points are found via ICP method explained in the last chapter. The mean distance between CMM data and RIS data is about 0.15 millimeter. Considering CMM data as the ground truth, the calibration RIS system has around 0.15 millimeters error. Backprojection of the 0.15 millimeters error into the camera frames, is equivalent to 0.14 pixel calibration error. By comparison, conventional relatively calibration errors for a structured light system are around 0.02 pixel in camera and 0.29 pixel in projector respectively. Since in our sensor model projector’s intrinsic and extrinsic parameters do not participate in the calculation, 3D points in world frame are directly correspondence with projector pixels. Backprojected error in projector can be viewed as 0 pixel. The total errors in RIS can be viewed as 0.14 pixels. 153 The error map of the same piece of windshield from CMM is also applied to compare the calibration performance of the RIS system shown in figure 7.6. Figure 7.6: The CMM error map. Compared with the error map in last chapter 5.15, we can see that both error maps have similar distribution color coding. In the CMM error map, green color means that CMM measurement data has similar height with the CAD model. Blue color indicates that CMM data is lower than the CAD model. If CMM data is higher than the CAD, the color is red. The histogram in figure 7.6 shows that the error roughly ranges from −5 millimeters to 6 millimeters. Therefore, the magnitude of the error map is also similar with each other. 7.1.5 Inspection for Different Work Pieces In order to test the capability of the inspection system, work pieces with different dimensions, geometric shapes, and optic surface properties are measured by the system. Figure 7.7 shows another tested work piece and its corresponding CAD model is given in figure 7.8. Figure 7.9 and 7.10 are the point cloud and error distribution map respectively. From the result in 154 this section and previous sections, it is demonstrated that the system is able to inspect work pieces without pre-treatment via multi-projection surface coding method. Figure 7.7: The first tested work piece with bright color and more complex shape. 7.2 The Navigation System for Single Direction The structured light system could also be utilized into the navigation application via the single-shot coding method. This section will provide some experimental result for such a system. 155 Figure 7.8: The corresponding CAD model for the first work piece. 7.2.1 The Prototype of the Navigation System for Single Direction Figure 7.11 shows the prototype of the navigation system for single direction. The system consists of a camera, a projector and an infrared light filter. The system field of view is about 400 mm by 400 mm at the distance around 1000 mm. The projector shots the single-shot pattern into the tested scene and the infrared light filter eliminates the visible light. The near infrared light pattern is projected onto the scene and is reflected into the camera. 7.2.2 The Fusion Result for the System Figure 7.12 shows the tested objects in front of the sensor. The white object is a little further than the wooden one, and the wooden one forms a certain slop. The captured image from 156 Figure 7.9: The point cloud for the first work piece. the camera is shown in figure 7.13. Although people could not see the projected pattern, the camera could view both the near infrared light pattern and the scene at the same time. The experimental result is given in figure 7.14. The depth information is transferred into a color code and is fused into the image display. Conventional the operator could only get 2D information and the distance between the object and the camera has to be estimated. After the color-coded depth information is fused into the image, operator could control the mobile base more accurately. Furthermore, once the projection pattern is shotted onto a living object, such as a person or an animal, the infrared light pattern will not disturb them. 157 Figure 7.10: The error map for the first work piece. e 7.2.3 Integration the Single Direction Sensor with a Mobile Robot After tested the developed sensor, the second step is to integrate it into a mobile robot. Figure 7.15 shows that the single direction sensor is put in front to the mobile robot and there is a robot arm above the sensor. Once the sensor returns the fusion results, different operations can be achieved based on 3D and 2D fusion information. The whole system can be described as figure 7.16. The system mainly contains Segway mobile platform, SCHUNK 7DOF LWA manipulator, near-infrared 3D sensor, phantom force reflect joystick, as well as tele-operation control computer. The mobile robot system work on two mode: the active and passive mode. The robot can mobile platform moves under the control of joystick by teleoperation, called passive mode. Similarly, the 7DOF manipulator can autonomously grasp 158 Figure 7.11: The configuration of the navigation system for single direction. the target object in the workspace with the help of the 3D sensor perception at the active mode. The communication between the remote operator and robot system takes advantage of the internet. Figure 7.17 illustrate the result of the integrated system. The robot arm is initially at a different position. After the fusion sensor returns the 3D coordinates of the desired part, the robot picks the part and lifts it up. The whole process demonstrates the feasibility of fusion sensor based control. 7.3 The Omni-direction Navigation System In order to avoid the blind area, an omni-directional navigation system is also developed. Hyperbolic mirrors are added in front of the camera and the projector. Current experimental results are given in this section. The prototype of the omnidirectional sensor is implemented as illustrated in figure 7.18. The detail view of the developed omnidirectional 3D camera is in the right side of the figure. A projector associated with a hyperbolic mirror is used 159 Figure 7.12: The tested objects. as an omnidirectional structured light emitter. A CCD camera (model M056-ZA180-002, resolution 640 × 480) with a hyperbolic mirror is used as an omnidirectional camera to view the panoramic view. Both imaging devices are put vertical and form the designed 3D omni-directional camera. The whole omnidirectional perception system was mounted on the back of the mobile base and there is a robot arm for operator to do the manipulation tasks. The field of view (FOV) of the hyperbolic mirror is expected around 360-degree in horizontal plane with the upper elevation angle approximate 30-degree with respect to the horizontal plane. The mirror parameters a and b along the semimajor and semiminor axes 160 Figure 7.13: The captured image when projecting near infrared light pattern. are a2 = 1248.2060, b2 = 351.7940, respectively. In order to verify the performance of the designed 3D camera, the sensor is put into a room. The distance from the walls to the sensor is used to evaluate the developed 3D camera. The omnidirectional 3D camera was roughly located at the center of the room and obviously the room corners are further than the walls. Due to the limitation of the projector illumination the environment lights are turned off, when the structured light is projecting. The projected image is shown in figure 7.19. As explained in previous chapter, several concentric circles are projected from the projector and the radii of these light rings are used to perceive the object depths. From the figures 7.19, all the projected light rings are almost parallel in the task space. The positions of the rings change according to the object distance: once the distance increase, the rings lifts up; if the distance remains almost same, the rings also remain horizontal and are paralleled with each other. The radii distortions of 161 Figure 7.14: The fusion result for the sensor. the projected rings are used to rebuild objects depths. From the figures, we can easily see that the rings lift up when they reaches the corners. The received image by the omnidirectional camera is also indicated by figure 7.20. Four walls are labeled by four directions: from east to north. The frequencies and the density variations for the light rings can be easily obtain to separated each light rings. The phase angle of corresponding pixels are preserved by epipolar constraints to separated each pixels along the rings. The radii of the rings are changing according to the distances. We can see how the ring radii change at the four corners of the room. Due to the low resolution of the camera, the radii changes around the south corners are not very big, but there are a clear radii changes around the north corners. It is because that the sensor is quite far way from the north wall but near to the south wall. The second image is taken when the light is turned on and the fusion result is shown in figure 7.21. Different colors are assigned to the light rings to indicate the object depths. Blue color means the distance is far: around 4 meters and red means near: around 2 meters. 162 Figure 7.15: The integrated mobile robot with the designed sensor. A panoramic view is also provided by figure 7.22 for a better display. From the image, we can see that the north wall is farther than the rest, since its color is assigned as blue. The ring patterns clearly raise at the corners indicating that the corners are further than the walls. The assigned colors also show that corner are further than walls. Because the figure 7.21 does not fully cover the conic image, there are several black regions in the un-warped panoramic view. One is above the person’s head in the figure 7.22 and two are in the top right and top left respectively. 163 Figure 7.16: The description of the integrated system. 7.4 Chapter Summary The chapter provides all experimental results of the developed measurement systems. The developed RIS system could utilize the single-shot surface coding method to inspect a surface pre-treated work piece and obtains a fairly dense point cloud. With the aid of multi-shot coding method, the system could inspect optically-challenging objects without surface pretreatment and obtains a high dense point cloud. Multi-sensor strategy is applied to enlarge the system field of view. A painted windshield is employed to test this mutli-sensor inspection system. The final point cloud is merged from 4 individual one. A comparison between CMM result is used to verify the result accuracy. Using single-shot coding method and infrared light, a structured light based navigation sensor is developed. A fusion image with color coded depth information is provided for the operator. Therefore, the operator can view the image and obtain the depth information at the same time. In order to avoid blind area, an omni-direction structured light based measurement sensor is developed. The experimental result for the prototype of the senor is provided. The unwrapped fusion result is given. 164 However, the accuracy of the measurement result still needs to be improved. 165 Figure 7.17: The result of designed system. 166 Figure 7.18: The developed omnidirectional 3D sensor. Figure 7.19: Examples of the projected pattern on the left corn of the room. 167 Figure 7.20: The received image by the omnidirectional camera. 168 Figure 7.21: The fused result with the object depth. Figure 7.22: The unwarped fusion result. 169 CHAPTER 8 Conclusions 8.1 Conclusions Based on the results and discussion presented in previous chapters, the following conclusions can be made: 1. A 3D sensor model with less calibration parameters is developed. Traditional triangular-based 3D reconstruction model employees the incident light from a projector and the reflection light to a camera to calculate the intersection point. The recovery of both lights demands parameters for these two devices, which in total is 22 parameters, to reconstruct a 3D point. Due to nature of the projector, not an information receiving device, projector calibration error becomes the major error sources of a structured light system. The novel 3D model optimizes the traditional model by reconstructing 3D point only in camera frame, resulting that the rotation and intrinsic parameters of a projector are avoided. The pixel deviation in camera image frame between two inspections: measurement of a plane and measurement a test, is employed to reconstruct the 3D coordinate. Two important concepts in multi-view perspective geometries, homography and epipolar, are utilized in the 3D model optimization. 2. A large scale structured light based inspection system for industry application is 170 developed and is successfully implemented. The system is capable to inspect a large moving diffuse reflective object with one-shot algorithm and to inspect a large static object exhibiting a mixture of both diffuse reflection and specular reflection. The inspection volume of the system is much bigger than traditional 3D optic sensors’. 3. A multi-plane calibration strategy is developed to calibrate the new sensor model. In order to further increase the accuracy of the system, the residual function for each pixel is approximated by a second order polynomial. The calibrated sensor can reconstruct 3D point cloud and represent the 3D shape of the test objects. 3. New surface coding methods are developed: one-shot algorithm for moving object and multi-shot algorithm for free-formed object. 4. Registration between each sensor frame and common frame is developed based on a modified ICP algorithm. Four individual point clouds finally merge into a entire one representing the whole piece of the object. An innovative object-oriented registration between the final point cloud and CAD is developed to generate the error distribution map. Octree space labeling and multi-resolution method are utilized to speed up the ICP based fine registration process. 5. Inspection results are compared the CMM data, which represents the ground truth, to valid the new sensor model and its corresponding calibration. The mean Euclidean difference between 3D point clouds, the CMM data and the RIS data, is about 0.1 millimeter. CMM data and RIS data exhibit the similar error distribution map for a same work piece. 6. The navigation system for the mobile robot control is developed via one-shot surface coding method. The system can simultaneously provide the 3D image fused with the depth information. Therefore, the operator could control the mobile robot more accurate. 7. An omni-direction system was developed. Its sensor model, and its corresponding calibration, error analysis, and coding method are developed. Primitive results verified the feasibility of the developed sensor. 171 BIBLIOGRAPHY 172 BIBLIOGRAPHY [1] S. T. Newman and A. K. Jain, “A survey of automated visual inspection,” Computer Vision and Image Understanding, vol. 61, pp. 231 – 262, 1995. [2] S. Kurada and C. Bradley, “A review of machine vision sensors for tool condition monitoring,” Comput. Ind., vol. 34, no. 1, pp. 55 – 72, 1997. [3] S. H. Dupont, J. Kastelik, and M. Pommeray, “Structured light fring projection setup using optimized acousto-optic deflectors,” IEEE/ ASME Transactions on Mechatronics, vol. 4, no. 4, pp. 557 – 560, August 2010. [4] M. Rea, D. McRobbie, H. Elhawary, Z. T. H. Tse, M. Lamperth, and I. Young, “System for 3-d real-time tracking of mri-compatible devices by image processing,” IEEE/ ASME Transactions on Mechatronics, vol. 13, no. 5, pp. 590 – 597, October 2009. [5] D. Marr and T. Poggio, “A computational theory of human stereo vision,” Proceedings of the Royal Society of London. Series B. Biological Sciences, vol. 204, no. 1156, pp. 301 – 328, May 1979. [6] O. Faugeras, Three-Dimensional Computer Vision: a Geometric Viewpoint. MIT Press Cambrige, 1993. [7] T. tsai, “A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses,” IEEE Journal of Robotics and Automation, vol. RA-3, no. 4, pp. 323 – 344, August 1987. [8] K. Levenberg, “A method for the solution of certain non-linear problems in least squares,” The Quarterly of Applied Mathematics, vol. 2, pp. 164 – 168. [9] R. Y. Tsai and R. K. Lenz, “Review of the two-stage camera calibration technique plus some new implementation tips and new techniques for center and scale calibration,” in Optical Society of America, Topical Meeting on Machine Vision, 1980, pp. 38–41. [10] H. Q. Zhuang and W. C. Wu, “Camera calibration with a near-parallel (illconditioned) calibration board configuration,” IEEETrans. Robotics and Automation, vol. 12, pp. 918 – 921, December 1996. 173 [11] H. Luo, L. Zhu, and H. Ding, “Camera calibration with coplanar calibration board near parallel to the image plane,” Sensors and Actuators, vol. 132, pp. 480 – 486, November 2006. [12] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330 – 1334, 2000. [13] ——, “Camera calibration with one-dimensional objects,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 7, pp. 892 – 899, 2002. [14] F. Qi, Q. Li, Y. Luo, and D. Hu, “Camera calibration with one-dimensional objects moving under gravity,” Pattern Recognition, vol. 40, pp. 343 – 345, 2007. [15] H. C. Longuet-Higgins, “A computer algorithm for reconstructing a scene from two projections,” Nature, vol. 293, pp. 133 – 135, 1981. [16] Q. Luong and O. D. Faugeras, “Self-calibration of a moving camera from point correspondences and fundamental matrics,” The International Journal of Computer Vision, vol. 22, no. 3. [17] S. J. Maybank and O. D. Faugeras, “A theory of self-calibration of a moving camera,” The international of Computer Vision, vol. 8, no. 2. [18] J. Drarei and S. Roy, “Design of a higher jumping rescue robot with the optimized pneumatic drive,” in Proceedings of the Interenational Conference on Computer Vision Theory and Applications, vol. 2, Lisbon, Portugal, February 2009, pp. 377 – 382. [19] S. Zhang and P. S. Huang, “Novel method for structured light system calibration,” J. Opt. Soc., vol. 45, p. 083601, August 2006. [20] B. Zhang and Y. Li, “Dynamic calibration of the relative pose and error analysis in a structured light system,” J. Opt. Soc., vol. 25, no. 3, pp. 612 – 622, March 2008. [21] Q. Shi, N. Xi, and H. Chen, “Calibration of robotic area sensing system for dimensional measurement of automotive part surfaces,” in International Conference on Intelligent Robots and Systems, Alberta, Canada, August 2005, pp. 1526 – 1531. [22] M. Trobina, Error Model of a Coded-Light Range Sensor. BIWI-TR-164, 1995. [23] Z. Yang and Y. Wang, “Error analysis of 3 d shape construction from structured lighting,” Pattern Recognition, vol. 29, no. 2. [24] G. Sansoni, M. Carocci, and T. Rodella, “Calibration and performance evaluation of a 3-d imaging sensor based on the projection of structured light,” IEEE Transactions on Instrumentation and Measurement, vol. 49, no. 3. 174 [25] P. Chen and D. Suter, “Homography estimation and heteroscedastic noise-a first order perturbation analysis,” Monash University, Tech. Rep., 2005. [26] A. Adan, F. Molina, and L. Morena, “Disordered patterns projections for 3d motion recovering,” in 3D Data Processing, Visualizatio and Transmission, September 2004, pp. 262 – 269. [27] P. Payeur and D. Desjardins, “Structured light stereoscopic imaging with dynamic pseudo-random patterns,” in International Conference on Robotics and Automation, May 2009, pp. 687 – 696. [28] M. A. Tehrani, A. Saghaeian, and O. R. Mohajerani, “A new approach to 3d modeling using structured light pattern,” in In: Third international conference on information and communication technologies: from theory to applications, May 2008, pp. A1 – 5. [29] C. Je, S. Lee, and R. Park, “High-contrast color stripe pattern for rapid structuredlight range imaging.” in In: Proceedings of the eighth European conference on computer vision, Prague, Czech Republic, 2004, pp. 95 – 107. [30] J. Savi, J. Batlle, and E. Mouaddib, “A robust-coded pattern projection for dynamic 3d scene measurement,” Pattern Recognition Letters, pp. 1055 – 1065, 1998. [31] E. M. Petriu, Z. Sakr, H. Spoelder, and A. Moica, “Object recognition using pseudorandom color encoded structured light,” in In Proceedings of the 17th IEEE instrumentation and measurement technology conference, vol. 3, Baltimore, MD, USA, 2000, pp. 1237 – 1241. [32] J. Xu, N. Xi, C. Zhang, Q. Shi, and J. Gregory, “Real-time 3d shape inspection system of automotive parts based on structured light pattern,” Optics and Laser Technology, vol. 43. [33] A. Inokuchi, K. Sato, and F. Matsuda, “Range imaging system for 3-d object recognition,” in Processing Of International Conference of Pattern Recognition, 1984, pp. 806 – 808. [34] D. Scharstein and R. Szeliski, “High-accuracy stereo depth maps using structured light,” in Processing Of International Conference on Computer Vision and Pattern Recognition, 2003, pp. 195 – 202. [35] S. Nayar, “Shape recovery using physical models of reflection and inter-reflection,” Ph.D. dissertation, 1991. [36] S. Nayar, K. Ikeuchi, and T. Kanade, “Shape from inter-reflections,” in Proceeding of International Conference on Computer Vision, 1990, pp. 2 –11. 175 [37] Q. Shi, N. Xi, and Y. Chen, “Recursive measurement process for improving accuracy of dimensional inspection of automotive body parts,” in In: 2007 IEEE international conference on robotics and automation, Roma, Italy, 2007, pp. 4764 –4769. [38] R. Benveniste and C. Ünsalan, “Binary and ternary coded structured light 3d scanner for shiny objects,” in Proceedings of ISCIS 10, London, UK, September 2010, pp. 241 – 244. [39] Q. Hu, K. G. Harding, X. Du, and D. Hamilton, “Shiny parts measurement using color separation,” Proceedings of SPIE, the International Society for Optical Engineering, vol. 6000. [40] J. Clark, E. Trucco, and L. Wolff, “Using light polarization in laser scanning,” Image and Vision Computing, vol. 15. [41] T. Chen, H. Lensch, C. Fuchs, and H. P. Seidel, “Polarization and phase-shifting for 3d scanning of translucent objects,” Proceeding of IEEE Computer Vision and Pattern Recognition. [42] T. P. Koninckx, P. Peers, P. D. é, and L. V. Gool, “Scene-adapted structured light,” IEEE Computer Society Conference on CVPR 2005, vol. 2, 2005. [43] G. G. S. M. M. H. M. L. P. Sen, B. Chen and H. Lensch, “Dual photography,” in Proc. ACM SIGGRAPH, Los Angeles, U.S.A., 2005, pp. 745 – 755. [44] Y. Xu and D. G. Aliaga, “Robust pixel classification for 3d modeling with structured light,” in Proceeding GI ’07 Proceedings of Graphics Interface 2007, New York, NY, U.S.A., September 2010, pp. 233 – 240. [45] ——, “An adaptive correspondence algorithm for modeling scenes with strong interreflections,” IEEE Transactions on Visualization and Computer Graphics, vol. 15, pp. 465 – 480, 2009. [46] Y. D. S. Lee and D. Chung, “Registration of multi-range views using the reversecalibration technique,” Pattern Recognition, vol. 31, no. 4, 1998. [47] C. J. R. Chua, “Point signatures: a new representation for 3d object recognition,” International Journal of Computer Vision, vol. 25, no. 1, 1997. [48] Ph.D. dissertation. [49] P. J. Besl and N. D. Mckay, “A method for registration of 3-d shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, 1992. [50] Z. Y. Zhang, “Iterative point matching for registration of free-form curves and surfaces,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 2, 1994. 176 [51] A. E. Johnson and S. B. Kang, “Registration and integration of texture 3-d data,” Cambrige Research Lab, Tech. Rep. [52] H. Pottmann, S. Leopoldseder, and M. Hofer, “Registration without icp,” Computer Vision and Image Understanding, vol. 95, 2004. [53] G. C. Sharp, S. W. Lee, and D. K. Wehe, “Icp registration using invariant features,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, 2002. [54] T. Jost and H. Hugli, “A multi-resolution scheme icp algorithm for fast shape registration,” in Conference of Proceedings of First International Symposium of 3D Data Processing Visualization and Transmission, Breguet, Switzerland, 2002, pp. 540 – 543. [55] Z. Zhu, “Omnidirectional stereo vision,” in Proc. of ICAR 2001, 2001, pp. 22–25. [56] S. Peleg and M. Ben-Ezra, “Stereo panorama with a single camera,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1999, pp. 395 – 401. [57] S. Peleg, M. Ben-Ezra, and Y. Pritch, “Omnistereo: panoramic stereo imaging,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, March 2001. [58] Y. Li, H.-Y. Shum, C.-K. Tang, and R. Szeliski, “Stereo reconstruction from multiperspective panoramas,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, January 2004. [59] S. Baker and S. K. Nayar, “A theory of catadioptric image formation,” in Proceedings of the 6th International Conference on Computer Vision, January 1998, pp. 35 – 42. [60] Y. Yagi and S. Kawato, “Panoramic scene analysis with conic projection,” in In IEEE international conference on robotics and automation, Ibaraki, Japan, 1990, pp. 181 – 187. [61] K. Yamazawa, Y. Yagi, and M. Yachida, “Omnidirectional imaging with hyperboloidal projections,” in In IEEE international conference on robotics and automation, Yokohama , Japan, 1993, pp. 1029 – 1034. [62] S.-S. Lin and R. Bajcsy, “Single-viewpoint, catadioptric cone mirror omnidirectional imaging theory and analysis,” J. Opt. Soc. Am. A, vol. 23, December 2006. [63] ——, “Single-viewpoint omnidirectional catadioptric cone mirror imager,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, May 2006. [64] Z. Zivkovic and O. Booij, “How did we built our hyperbolic mirror omni-directional camera - practical issues and basic geometry,” Intelligent System laboratory Amsterdam, Tech. Rep., 2005. 177 [65] J. Gluckman, S. K. Nayar, and K. J. Thoresz, “Real-time omnidirectional and panoramic stereo,” in In Proceedings of the 1998 DARPA Image Understanding Workshop. Morgan Kaufmann, 1998, pp. 299–303. [66] R. Kawanishi, A. Yamashita, and T. Kaneko, “Construction of 3d environment model from an omni-directional image sequence,” in Asia International Symposium on Mechatronics, Hokkaido University, Japan, pp. 27 – 31. [67] ——, “Three-dimensional environment model construction from an omnidirectional image sequence,” Journal of Robotics and Mechatronics, vol. 21. [68] D. S. Ly, “Simultaneous calibration of a structured light-based catadioptric stereo sensor,” Master’s thesis, Universite de Picardie Jules Verne, Amiens, France, 2008. [69] R. Orghidan, “Catadioptric stereo based on structured light projection,” Ph.D. dissertation, Spain, 2005. [70] H. Ukida, N. Yamato, Y. Tanimoto, T. Sano, and H. Yamamoto, “Omni-directional 3d measurement by hyperbolic mirror cameras and pattern projection,” in In IEEE International Instrumentation and Measurement Technology Conference, Victoria, Vancouver Island, Canada, pp. 365 – 370. [71] J. Xu, N. Xi, C. Zhang, Q. Shi, and J. Gregory, “Real-time 3d shape inspection system of automotive parts based on structured light pattern,” Optics and Laser Technology, vol. 43, pp. 1 – 8, May 2010. [72] S. Wu, S. Decker, P. Chang, T. Camus, and J. Eledath, “Collision sensing by stereo vision and radar sensor fusion,” IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTAT ION SYSTEMS, vol. 10, December 2009. [73] M. Rufli, D. Scaramuzza, and R. Siegwart, “Automatic detection of checkerboards on blurred and distorted images,” in IEEE International Conference on Intelligent Robots and Systems, Nice, France, pp. 3121 – 3126. [74] D. Scaramuzza, A. Martinelli, and R. Siegwart, “A flexible technique for accurate omnidirectional camera calibration and structure from motion,” in Proceedings of the Fourth IEEE International Conference on Computer Vision Systems, Washington, DC, USA. [75] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330 – 1334, 2000. [76] J. Xu, N. Xi, C. Zhang, and Q. Shi, “Windshield shape inspection using structured light patterns from two diffuse planar light sources,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst, Oct 2000, pp. 721 – 726. 178 [77] J. Zheng and S. Tsuji, “Panoramic representation for route recognition by a mobile robot,” International Journal of Computer Vision, vol. 9. 179