1!. c. or... Ir n.0,-b...)- . _ {to anon Ix... . A; Srvfn . 1 2 11:.)an r. . 3.. . 0-. 7.v..|l'v ‘1 r: :91... .q .1. v; . .nvyI!.I v3!) .hototp-i' .14:n.a( .ol. )v ‘0. I! '1 .1. z. 1.1.}. 1) ‘ f , Ilalnlulo.‘a .uyll. rl4lr1..1].1 Iliutll-Vvl‘ .12'. a! v: I... «ovtuv a I o.|.. 1.. lot. I 3.7171(43. 3.... .px I)! 41.. Jurir. + A)». I silt-v3.2: . r.) ‘ . v: a( I... .vf ; v. '11.}..(1! arr-:1. II“? 3.11; ‘ ._ . q .5: ..J=!JA.: ‘1 .U ;:....; .15.)... ‘ . I: . .haau.1 A o . v. ~.~...:: d. > ‘3‘. $71.. $15121}. » . "“5819 ullllllllllllllllllll 00794 5227 This is to certify that the dissertation entitled MARKOV RANDOM FIELD CONTEXTUAL MODELS IN COMPUTER VISION presented by SATEESHA GOPALAKRISHNA NADABAR has been accepted towards fulfillment of the requirements for PhD degreein Computer Science Major professor <5] 147m Date MS U i: an Affirmative Action/Equal Opportunity Institution 0—12771 ‘— F LIBRARY Michigan State University x. A fi PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. DATE DUE DATE DUE DATE DUE FEB O S 199? l “—“fi—‘él ___J|_,_ :l __ . _, msu Is An Affirmative Action/Equal Opportunity Institution chfl-DR MAI MARKOV RANDOM FIELD CONTEXTUAL MODELS IN COMPUTER VISION By Sateesha Gopalakrishna Nadabar A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Computer Science Department 1992 MAF Many coiiriteratil process E“ world smil random {1* processes. range ima We pr! The edge-1 resulting i and segmr the superh ~Ihe .\ll of several 1 ABSTRACT MARKOV RANDOM FIELD CON TEXTUAL MODELS IN COMPUTER VISION By Sateesha Gopalakrishna Nadabar Many low-level vision processes need to incorporate contextual information to counteract the effects of sensor noise and incomplete knowledge about the sensing process and the scene. Context represents a priori assumptions about the physical world such as continuity and smoothness. This research explores the use of Markov random field (MRF) models to specify the contextual information in low-level vision processes. Two important low-level processes considered here are: edge detection in range images and fusion of intensity and range images. We propose an MRF model-based algorithm to detect edges in range images. The edge-based segmentation is integrated with a region—based segmentation scheme resulting in a hybrid algorithm for surface segmentation. Results of edge detection and segmentation experiments presented on several real range images demonstrate the superior performance of the MRF model-based approach. The MRF prior model used in our edge detection algorithm needs the specification of several parameters. The main difficulty in estimation of these parameters is that Suffii‘lt‘m l for the ("if of tllt’ Ill) 0f clique l eter estiI: prior mmf sults usir; pararrwtv proceduro The 1. tion in tl registe’trm 0f inform; sufficient ground truth information is not easily available. We develop a new scheme for the estimation of MRF line process parameters which uses geometric CAD models of the 3D objects expected to be present in the scene. A canonical representation of clique potentials is used to reduce the number of model parameters. This param— eter estimation scheme is further extended to estimate the parameters of the MRF prior model for the problem of edge detection in intensity images. Edge detection re- sults using real range and intensity images indicate that performance with estimated parameters is as good as with the “best” parameters selected using trial and error procedure. The thesis also explores the use of MRF models to capture contextual informa- tion in the problem of detection and labeling of edges by fusing information from registered intensity and range images. The key idea in this approach is that fusion of information from range and intensity images can be carried out by using a sin- gle physically meaningful edge label for each edge location. Experimental results on several real range-intensity image pairs are presented. The issue of computational requirements of the algorithm is also discussed, and the computational performance of two parallel computer implementations of the algorithm on a Connection Machine CM-2 are studied. The results of edge detection and fusion experiments demonstrate the effectiveness of Markov random fields as models of contextual information for low-level vision problems. To my parents and my Wife, Bhagya iv 1 W15 Without my progr just like patient l3; Wprrtwm as my the. DUl’JPs‘ \ Dubes gm proved ll; with Phil for h.— ‘lpir I mm L‘HlVPrgit Farroklin develop” envirohm I flaw. Greg L09. examlfiati 96m 6mm My Sp. Pics have i inn Hm1 Dian, Slim Aldo‘ Tim Cheri. I “X th 0f (:00! Jutign of I ACKNOWLEDGMENTS I wish to express my deep gratitude to my thesis advisor, Professor Anil Jain. Without his guidance, support, encouragement and the occasional tough talks when my progress was slow, this work would not have been completed. He always provided just the right amount of criticism and technical help. My many thanks to him for patiently reading through many a half-baked draft over the last four years. For me, he represents the role model of a successful scientist and it has been an honor having him as my thesis advisor. I would like to express my sincere thanks to Professors Richard Dubes, Mihran Tuceryan, and Connie Page for serving on my committee. Professors Dubes and Tuceryan have made many useful suggestions that have tremendously im- proved the thesis. I would also like to acknowledge the many fruitful discussions I had with Professor George Stockman. I thank Dr. Anbalagan of Innovision Corporation, for helping me get a good understanding of the practical aspects of vision systems. I would like to thank former members of the PRIP group at Michigan State University, Chaur-Chin Chen, Sei-Wang Chen, Patrick Flynn, Joseph Miller, Farshid Farrokhnia for their advice during my beginning years. Patrick Flynn in particular, developed many general utilities which have contributed to improving the working environment in the PRIP laboratory. I have been extremely fortunate in being blessed with good friends and colleagues Greg Lee, Narayan Raja and Debby Trytten. All four of us took our comprehensive examinations at the same time as a Gang—of—Four. The last year of work would have been extremely tedious but for their support and company. My special thanks to Chitra Dorai for proofreading the thesis. Many other PRIP- pies have helped in making this work interesting and enjoyable. I would like to thank Qian Huang, Sushil Bhattacharjee, Sally Howden, Hans Dulimarta, Philippe Oha— nian, Sharathchandra Pankanti, Marie-Pierre Dubuisson, Steve Walsh, Jian-Chang Mao, Timothy Newman, Philippe Ballard, Chen Yao, John Courtney and Jinlong Chen. I would also like to thank our lab manager John Lees for doing such a superb job of coordinating and maintaining the computing facilities. I appreciate the contri- bution of many of our student managers who have put in a lot of effort in providing a good unilt‘lsi my lull; this rile: :\(,‘(,‘( Clii’lliit‘l Fina founda: COIIKJIC a good software environment in the lab. Finally, I am deeply indebted to my wife Bhagya Nadabar for her support and understanding during the preparation of this thesis. Not only did she put up with my long hours of stay at the lab, but also has contributed directly by proofreading this thesis. Access to the Connection Machine parallel computer used for experiments in Chapter 7 of this thesis was provided by the Argonne National Laboratories. Financial support for this research was provided in parts by the National Science Foundation grants IRI-8901513 and ECS-8603541 and by a Fellowship from Innovision Corporation. vi LIST OF LIST OF 1 “firm 1.1 II. 1.23 .\ 1-3 (. 1.4 1:. 1.3 ( 1.6 ( 2 Mark 2.1 3 ‘J ‘2 g 2.2 3 2.3 l. “2.4 3 2.3 ( 2.6 .\ 2.7 S 3 Sung 3.1 I 3 CC TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES 1 Introduction 1.1 Regularization and the Role of Constraints ............... 1.2 Mechanical and Probabilistic Regularization Approaches ....... 1.3 Object Recognition ............................ 1.4 Range Image Acquisition ......................... 1.5 Contributions of the Thesis 1.6 Organization of the Thesis ........................ 2 Markov and Gibbs Random Fields 2.1 Markov and Gibbs Random Fields .................... 2.1.1 Neighborhood System oooooooooooooooooooooo 2.1.2 Cliques ............................... 2.1.3 Markov Random Field (MRF) .................. 2.1.4 Gibbs Random Field (GRF) ................... 2.1.5 MRF-GRF Equivalence ...................... 2.2 MRFs as Models of Spatial Context: Bayesian Formulation ...... 2.3 Estimators for a posteriori Labels .................... 2.4 Markov Random Fields on Edge Labels: Line Process ......... 2.5 Coupled Markov Random Fields ..................... 2.6 Multivariate Markov Random Fields 2.7 Summary ................................. 3 Survey: MRF Models in Computer Vision 3.1 Textured Image Analysis ......................... 3.1.1 Texture Classification ....................... 3.1.2 Texture Synthesis ......................... 3.1.3 Texture Segmentation vii oooooooooooooooooooooo viii ix CDO'lr‘kl-l 10 15 15 17 17 18 19 2O 20 21 22 24 27 28 30 31 32 32 35 35 37 3.3 3.4 cu c." c.” or or w 4- w n: 3.2 Gray Level Image Analysis ........................ 3.2.1 Image Segmentation ....................... 3.2.2 Image Restoration ........................ 3.2.3 Edge Detection .......................... 3.3 Range Image Analysis .......................... 3.4 Multi-Sensor Image Analysis ....................... 3.4.1 Color Image Analysis ....................... 3.4.2 Multispectral Image Analysis .................. 3.4.3 Fusion of Range and Intensity Images .............. 3.4.4 Integration of Early Vision Modules ............... 3.5 Miscellaneous Applications of MRF ................... 3.6 Related Approaches ............................ 3.7 Summary ................................. Edge Detection in Range Images 4.1 Edge Detection as Bayesian Estimation ................. 4.2 Computation of Edge Likelihoods .................... 4.2.1 Jump Edge Statistic ....................... 4.2.2 Crease Edge Statistic ....................... 4.2.3 Difficulties with Gradients and Discontinuities ......... 4.2.4 Combining Jump and Crease Edge Likelihoods ......... 4.3 Edge Detection Algorithm ........................ 4.4 Clique Potential Functions ........................ 4.5 Experimental Results ........................... 4.6 Hybrid Method for Segmentation of Range Images ........... 4.7 Evaluation of the Edge Detector ..................... 4.8 Conclusions ................................ Parameter Estimation of Line Process MRF 5.1 Estimation of Line Process Parameters from Object Models ..... 5.2 Least Squares Formulation ........................ 5.3 A Canonical MRF Representation: Normalized Potential Functions 5.4 Experimental Results ........................... 5.5 Edge Detection in Intensity Images ................... 5.5.1 Edge Detection Approach .................... 5.5.2 Estimation of Parameters from 2D Object Models ....... 5.5.3 Experimental Results ....................... 5.6 Conclusions ................................ 38 41 42 43 44 45 46 47 49 50 51 52 52 55 56 57 57 58 59 66 68 74 82 83 84 87 90 107 108 108 109 115 6 Edge detr- 61 Edgy 6.? WM 6.3 Pam: 6.4 £va: 6.3 Coin ': —.I Implemcr nection .\l 7.1 11M 7.3 l’aral.» 7.3 my} rill.u.~ 7-1 (‘Ulli‘lh 8 CODClUSIOL 8.1 (,‘UIH 1!; 8-2 Dim i; A List of pa, BIBLIOGRA 6.1 Edge Labeling as Bayesian Estimation ................. 6.2 TPM and HCF Algorithms for Fusion ................. 6.3 Parameter Estimation of Coupled MRFS ................ 6.4 Experimental Results ........................... 6.5 Conclusions ................................ 6 Edge detection and labeling by fusion of intensity and range images 116 118 127 132 143 Implementation of Energy Minimization Algorithms on the Con- nection Machine 7.1 The Connection Machine ......................... 7.2 Parallel TPM Algorithm ......................... 7.3 Edge Labeling Performance of Parallel TPM and Parallel HCF Algo- rithms ................................... 7.4 Conclusions ................................ Conclusions and Directions for thure Research 8.1 Conclusions ................................ 8.2 Directions for Future Research ...................... A List of Parameters BIBLIOGRAPHY ix 145 147 148 4.1 \ 5.1 ( 5.2 \ .11 ’1‘ A2 '11 .13 ’1‘} AA 1‘} 1.1 4.1 5.1 5.2 7.1 Al A2 A.3 A.4 LIST OF TABLES Some early vision processes and corresponding constraints. ...... 5 Visual comparison of segmentation results. ............... 74 Conditional probabilities under two representations ........... 89 Visual comparison of edge detection results. .............. 107 Execution time of the parallel TPM algorithm in seconds ........ 155 The parameters used in Chapter 4. ................... 165 The parameters used in Chapter 5. ................... 166 The parameters used in Chapter 6. ................... 166 The parameters used in Chapter 7. ................... 166 H p—a H I... F—i H p—e o—n p...‘ p—o O) Q. 4‘ w ' H . I r» a x '(v (v I» (u Ix.) I» Iv ~l (.7) (JG w- c»: (C; >-‘ h-I-t V w (C; ’-‘ 7’ 1‘ w J... p—o ( Us: ‘1 CT) 0' AA I l ‘1 q ‘1 J.» 4.. Q": LI) {"7 ‘5 4'10 11...; LIST OF FIGURES 1.1 Two images of a scene at different noise levels .............. 2 1.2 Intensity values at different noise levels .................. 3 1.3 Block diagram of a typical vision system ................. 3 1.4 A typical (a) range image and (b) the corresponding intensity image. 10 1.5 Technical Arts IOOX range image scanner. ............... 11 1.6 Range image database. .......................... 12 2.1 A hierarchical ordering of neighbors for site t. ............. 18 2.2 Cliques for the first-order neighborhood of a site t ............ 19 2.3 HCF site labeling algorithm for discrete labels .............. 26 2.4 Pixel and line (edge) sites. ........................ 27 2.5 Second-order neighbors of edge sites. .................. 27 2.6 Neighborhoods of pixel and line sites ................... 29 2.7 Multivariate MRF: the label and measurement variables. ....... 30 3.1 A classification of MRF application areas. ............... 33 3.2 Several texture patterns from the Brodatz’s album. .......... 34 3.3 Texture patterns generated using a second—order MRF. ........ 36 4.1 Edge clique types for a second-order neighborhood. .......... 52 4.2 Pixel numbering in the neighborhood of a vertical edge site. ..... 53 4.3 Distribution of J under the alternative hypothesis ............ 54 4.4 Cliques and their ad hoc potential function values ............ 58 4.5 Edge detection results for the GRNBLK3 image. ........... 61 4.6 Edge detection results for the HARRISCUP image ........... 62 4.7 Edge detection results for the COLUMNl/GRNBLK3 image. . . . . 63 4.8 Edge detection results for the AGPART2 image. ........... 64 4.9 Edge detection results for the COLUMN2/ELL image ......... 65 4.10 Hybrid segmentation results for the GRNBLK3 image ......... 69 4.11 Hybrid segmentation results for the HARRISCUP image. ...... 70 4.12 Hybrid segmentation results for the COLUMNl/GRNBLK3 image. . 71 xi 4.13 11 1.1-1 11 4.15 S. 4.16 S. 4.17 S 4.15 S 4.1?! 8 4.211 S 4.21 S 1.22 5' 4.211 E 4.2; g «1.23 < O! {2' Q" C," CH CM 0! c," u: U! (J! r- ..«a (5) r]; — CT) 0' 4.. (.0 'CJ *"‘ B. H I'. U! ' u! 9' 9' «J y...‘ r-i w“ t—J F“ ('1' w c" Q! (a » r L." Li" 4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20 4.21 4.22 4.23 4.24 4.25 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 5.20 5.21 5.22 5.23 5.24 Hybrid segmentation results for the AGPART2 image. ........ 72 Hybrid segmentation results for the COLUMN2/ELL image ...... 73 Segmentation results for the GRNBLK3 image. ............ 75 Segmentation results for the HARRISCUP image ............ 75 Segmentation results for the COLUMNl/GRNBLK3 image ...... 76 Segmentation results for the AGPART2 image. ............ 76 Segmentation results for the COLUMN2/ELL image .......... 77 Segmentation results for the BIGWYE4 image ............. 77 Segmentation results for the ADAPTER/CURVBLOCK image. . . . 78 Segmentation results for the BALL2/PROPANE image. ....... 78 Segmentation results for the BLOCK4/HALFSPH image. ...... 79 Segmentation results for the CAP/BOX2INCH image. ........ 79 Segmentation results for the COL2/BALL image ............ 80 (a) Synthetic range image and (b) corresponding edge map ....... 85 Neighborhood numbering scheme ..................... 85 Neighborhood for a simple MRF. .................... 88 Clique configurations for the simple MRF. ............... 88 Clique configurations with non-zero canonical potentials. ....... 91 Canonical clique potential values estimated from 3D CAD models. . . 91 Synthetic range images of objects used for estimating clique parameters. 92 Edge detection results for the GRNBLK3 image ............. 94 Edge detection results for the HARRISCUP image. .......... 95 Edge detection results for the COLUMNl/GRNBLK3 image ...... 96 Edge detection results for the AGPART2 image ............. 97 Edge detection results for the COLUMN2/ELL image. ........ 98 Edge detection results for the BIGWYE4 image ............. 99 Edge detection results for the ADAPTER/CURVBLOCK image. . . . 100 Edge detection results for the BALL2/PROPANE image ........ 101 Edge detection results for the BLOCK4/HALFSPH image. ...... 102 Edge detection results for the CAP / BOX2IN CH image ......... 103 Edge detection results for the COL2/ BALL image. .......... 104 Edge map synthesized using 2D model of an industrial part. ..... 109 2D objects in the database. ....................... 110 Canonical clique potentials estimated from 2D models. ........ 110 Sampling of MRF line process using the Gibbs sampler ......... 112 Step edge detection results for the INDUSTRIAL PARTS image. . . . 113 Step edge detection results for the SCENE image ............ 114 xii 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.111 6.11 6.12 6.13 6.14 6.17, 3—4 '_j c; (J. 4... W I; -—I ~l KI fl fl fl I—l 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 7.1 7.2 7.3 7.4 7.5 7.6 7.7 Relationship between different edge types and information sources. . . Pixel and edge site neighborhoods. .................. Pixel cliques and the corresponding potential functions ........ Ad hoc edge clique potential function values .............. TPM algorithm for fusion. ....................... HCF algorithm for fusion. ....................... Neighborhood of a vertical edge site in the simplified MRF model. . . Plot of Equation (6.10 ). ........................ Edge labeling results using TPM (OR =2 0.003 inches) ......... Edge labeling results using TPM (OR = 0.03 inches). ........ Edge labeling results for the BLOCKS image ............. Edge labeling results for the JIG image. ............... Edge labeling results for the FUNNEL image. ............ Edge labeling results for the BLOCK4/COLUMN1 image ...... Edge labeling results for the AGPART image. ............ A coding scheme for parallel updating of variables. ......... Combined neighborhood dependency of pixel and line variables. Parallel TPM algorithm for edge labeling with fusion ......... Speedup of parallel TPM algorithm over sequential algorithm. HCF algorithm for edge labeling with fusion .............. Speedup of the parallel HCF implementation over Sparcstation 2. Edge labeling results on the JIG image ................. xiii 119 120 123 124 126 128 129 131 133 134 136 137 138 139 140 150 151 153 156 157 159 160 CH} IIItI‘t Computer data are u and C011515 $11111 111nm striirmrw unianiilia Can bf) 11¢ huge arm The t “Oi Simp for (:9er as lln'ar‘} and11q thmlflh 1 actual tr In the 11 CHAPTER 1 Introduction Computer vision deals with machine processing of visual information. Visual sensory data are usually obtained in the form of two-dimensional images of the physical world and consist of measurements which depend on the imaging geometry, illumination and structures present in the scene. The goal of any vision system is to recognize familiar structures in the system’s environment and obtain concise symbolic descriptions of unfamiliar structures using visual sensory data. The task of a vision system then, can be described as data reduction in order to extract meaningful descriptions from huge amounts of input visual data. The task of reducing vast amounts of visual data to meaningful descriptions is not simple. The description obtained from the visual data should remain the same for certain changes in the scene or sensing environment [10]. This property is known as invariance of the description and is illustrated in Figures 1.1—1.2. Figures 1.1(a) and (b) both show the same scene, but the noise level in the two images is different, corresponding, for example, to sensors with different sensitivity. Figure 1.2 shows the actual gray values in a small portion of the images in Figure 1.1. Notice that even though the human eye assigns the same interpretation to the scene in both cases, the actual gray values are quite different. This is because the data reduction mechanism in the human visual system is invariant to the sensor noise, within reasonable lim- Figure 1.1. Two images of a scene at different noise levels. its. Invariance is essential for a vision system since the visual system ‘lives’ in an environment in which sensory data could change, even though the structures in the world remain the same. For a fairly general computer vision system, invariance with respect to the following factors is required [10]. 1. Sensor noise 5" Optical distortion 3. Viewpoint 4. Perspective distortion 5. Variations in photometric conditions. Figure 1.3 shows a block diagram of a computer vision system. The task of the early vision modules is to recover the physical or intrinsic properties at each image l u—‘r-4 l-l .. ’4 7131 ’l -l -_. _v 1' Figure 1.2. t“'0 images 171 156 126 86 71 163 158 127 103 57 173 160 129 86 65 173 157 124 96 82 173 157 128 85 66 167 153 139 84 57 175 159 126 87 68 183 155 129 97 64 171 157 122 77 64 173 154 128 80 72 (a) (b) Figure 1.2. Intensity values at different noise levels for the same 5 x 5 portion of the two images in Figure 1.1. Recognition of structure and learning Feature predictions I Invariant features V Derive Invariant descriptions I'M Image predictions I Intrinsic Images ii Early visual processes T Sensory data Figure 1.3. Block diagram of a typical vision system. location 1pm first step in t attempt to in many-room configuration this non-into With the 111111 following w“ the solution If either of t 1.1 R 80111 1.1101 tn; posed P1011 \‘dilOUS cm The Eiddjti SUCII ”If...” the earlv 0f COIISI t; and 1h? ‘ location (pixel) of the corresponding 3D world point being sensed and represents the first step in obtaining invariant descriptions. That is to say, early vision modules attempt to invert the world-image mapping. However, the world-image mapping is a many-tqone mapping and is not invertible because, typically, there are many world configurations which result in the same sensory data. In addition to dealing with this non-invertible mapping problem, early vision processes need to deal effectively with the ubiquitous sensor noise. Therefore, early vision problems are ill-posed in the following sense [45, 84]. A problem is well-posed if (i) a unique solution exists, and (ii) the solution depends continuously on the data (stability against small perturbations). If either of these conditions is not satisfied, the problem is said to be ill-posed. 1.1 Regularization and the Role of Constraints Both biological and computer vision systems take the same approach to solve these i11- posed problems— using constraints; the set of possible solutions is restricted by using various constraints such that there is a unique stable solution to the ill-posed problem. The addition of these constraints converts an ill-posed problem into a well-posed one. Such methods have been called regularization methods [84]. Table 1.1 shows some of the early vision processes and the constraints applicable to these processes. The type of constraint used depends on the environment in which the vision system operates and the type of information that it seeks in the environment. One can identify two classes of constraints [21]: natural constraints, and artificial constraints. Natural constraints (also called generic constraints) arise from the specific environment and eliminate configurations which rarely occur in that environment. Artificial constraints incorporate restrictions arising from the use of higher level knowledge. Some generic constraints that vision researchers have tried to incorporate in early vision modules are: (i) neighboring pixels are likely to have similar properties; (ii) close parallel edges 18 are unlikr‘i.‘ smooth. I: attribute d7 for exampl surfare dept This type of information constraints. purpose of <1 1.2 M Table 1.1. Some early vision processes and corresponding constraints. [Early Vision Module ] Constraints applicable ] Edge detection Piecewise continuous edges Surface Reconstruction Piecewise continuous surfaces Shape from Shading Lambertian surfaces Structure from Motion Rigidity are unlikely; (iii) isolated edge elements are improbable; (iv) physical surfaces are smooth. The constraints presented above can all be specified as the dependency of an attribute at a pixel in the image on the attributes in the pixels spatial neighborhood. For example, smoothness constraint on surfaces can be specified by restricting the surface depth value at a pixel to be close to the surface depth values at its neighbors. This type of neighborhood dependency is called contextual dependency (or contextual information) and the associated constraints are referred to as contextual or spatial constraints. This thesis concentrates on modeling contextual dependency for the purpose of detecting edges in range and intensity images. 1.2 Mechanical and Probabilistic Regularization Approaches Two main approaches to regularization have been proposed in computer vision litera- ture: (i) mechanical vieWpoint [10, 94] and (ii) probabilistic viewpoint [6, 20, 38, 74]. In the mechanical vieWpoint, continuity or piecewise continuity constraints are used to reduce the class of solutions that are acceptable. The ill-posed problem is reformulated as an optimization problem by associating a cost with each possible solution. The cost of a solution is taken to be the sum of the various energy components in an analogous mechanical system (splines under tension), so the optimal solution corresponds to the minim the “’5 the (”f energ." that tilt“ in tile in nation. a tliert"iUf“ and 21““ inCOTlWW by TchUi’ In tht.’ l tic promo“. statistital t lem. A Mai to the ill-[M statistical p tures (regula dependcntiw the in‘iage pl. Specified by a Szelislri :51; [1011118 for rt'gt the constraints the PFOliabilist reasons. 1. ' The med],- minimum energy state of a mechanical system [10]. Because of the analogy between the cost function and the energy of a mechanical system, this approach is known as the energy function minimization approach. In ‘standard regularization’ methods, the energy function corresponds to the energies of generalized splines [84]. This means that the solution is too smooth to be faithful at locations where discontinuities exist in the image. However, changes in surface geometry, surface composition, and illumi- nation, all give rise to discontinuities in the intrinsic characteristics of the scene and therefore, it is important to retain the location of discontinuities in the solution. Blake and Zisserman [10] use ‘weak continuity constraints’ that can occasionally break, to incorporate discontinuities into the cost function. A similar approach is also taken by Terzopoulos [94] by using the so called continuity control functions. In the probabilistic viewpoint, the structure in the image is modeled as a stochas- tic process, the a priori knowledge is modeled by a Markov random field (MRF), and statistical estimation techniques are used to obtain a solution to the ill-posed prob- lem. A Markov random field associates a probability value to each possible solution to the ill-posed problem according to a Gibbs Distribution, which is well known in statistical physics [38]. What is more important, a significant portion of the struc- tures (regularities) likely to be present in the scene are locally specified as contextual dependencies in the form of statements such as “points that are close together in the image plane usually belong to the same surface”. Such local structures can be specified by a locally dependent MRF. Szeliski [92] has shown the relationship between mechanical and probabilistic view- points for regularization. It is shown that by selecting an appropriate MRF model, the constraints specified by the mechanical viewpoint can be exactly specified using the probabilistic viewpoint. We prefer the probabilistic approach for the following reasons. 1. The mechanical vieWpoint does not allow the specification of spatially nonhomo— 3. Tl ddt ne'd 4. If...» pnw than ' Even lll' mechanical ‘ specified. it i. specified as S1 f probabilitim Markov ra appealing {ran fundamental «l I I used in compul lllf.’ prior plulm search terlmiq 21' expensive. in imporra: t innate pariumnh the difficulty in geneous interactions which are useful for modeling top-down constraints. These constraints can be captured using a nonhomogeneous MRF model [64]. 2. The mechanical vieWpoint does not naturally allow the specification of mutual constraints in a multiple—source information fusion problem. MRF models can be easily extended to multivariate observations. 3. The probabilistic vieWpoint has the advantage that random variation in the data can be incorporated into the model (e.g., use of statistical estimates of mean and covariance matrices in multispectral classification [103]). 4. The mechanical viewpoint. can incorporate only Gaussian degradation (noise) processes. The probabilistic viewpoint allows modeling of complicated (other than Gaussian) image degradation processes. Even though we take the probabilistic vieWpoint here, we do make use of the mechanical vieWpoint at times. Specifically, when smoothness constraints are to be specified, it is easier to take a mechanical vieWpoint since these constraints are better specified as spline energies. It is, however, straightforward to convert the energies to probabilities [92, 36, 88]. Markov random fields (and regularization in general) constitute an intuitively appealing framework in which to cast early vision problems. However, there are two fundamental difficulties that need to be addressed before this approach can be widely used in computer vision: (i) there are no good methods to estimate parameters of the prior probability density which represents the contextual information, and (ii) search techniques for maximizing the a posteriori density are computationally very expensive. An important advantage of the probabilistic approach, namely, the ability to es- timate parameters of the contextual model, has not been realized so far because of the difficulty in obtaining true realizations (ground truth) of the contextual model. This 1h?“ truth inffi scene conl iiiforn'ldll“ the>l2lllg i ground Ir” Th? 5‘“ been ailill‘t‘ istif fil’l’r“) order to {Hi quire sevela slow for mm computers a: sion tasks [3‘ parallel mm; computatimn- ers. 1.3 Ob ‘ . iUIQIIJdHC rm' I at J I ‘ This thesis proposes an engineering solution to the problem of obtaining the ground truth information. The thesis advocates the use of geometric models of “typical” scene configurations (scene consisting of “typical” objects) to generate ground truth information. Specifically, we relate the geometric and probabilistic models by syn- thesizing images using the geometric model and treating the synthesized images as ground truth information for estimating the parameters of the probabilistic model. The second difficulty in using MRF models, namely, computational burden, has been addressed by several researchers [10, 20, 90, 36, 79, 94]. Reasonable determin- istic approximations to stochastic optimization algorithms have been suggested in order to reduce the computational time. These deterministic approximations still re- quire several minutes of computation time on a sequential machine and therefore are slow for most vision applications. It is generally acknowledged that massively parallel computers are necessary to run these algorithms at speeds required for real-world vi- sion tasks [38, 79]. This thesis implements two MRF-based algorithms on a massively parallel computer (Connection Machine CM-2) in order to study the improvement in computational performance of these algorithms over traditional single-CPU comput- CI‘S. 1 .3 Object Recognition Automatic recognition of objects is an important task which has received considerable attention in computer vision literature [7]. The bulk of this interest is due to the importance of object recognition in machine vision industry for automation of certain tasks that are either (i) too routine, (ii) too expensive, or (iii) too dangerous for humans to perform. Recognition involves identification of objects present in the scene and determining their position and orientation with respect to a reference coordinate system (pose estimation). Mos {are} {:11 vision P scene {W tlit‘fst’ 8"" texture [ stereo tec methwls } cause of l recent year shape rue-as taiued :3"). as range im the correspt each pixel v; pOint Iowan intensity in),- The task melllS )5 exfr OfIhlS Chap“. There fore. mt resulting in or role in obtaiui: To illustrat ures l.1~l.‘2. If 9 Most 3D object recognition systems depend on the availability of 3D shape (sur- face) measurements of the objects as input. Therefore, much of the effort in early vision processes is directed towards obtaining the 3D shape of the objects in the scene from one or more intensity images. Some of the well known approaches among these are: (i) shape-from-stereo [89], (ii) shape—from-shading [42], (iii) shape-from- texture [11], and (iv) shape—from-contour [98]. With the exception of shape-from- stereo techniques which have been used for outdoor navigation systems [89], all the methods mentioned above have not been used in any complete vision system be- cause of high computational requirements and restrictive model assumptions. In recent years, there has been an increasing interest in an engineering approach to 3D shape measurement problem, where direct depth measurements of 3D objects are ob- tained [59, 85]. The resulting depth map can be represented as a 2D image, known as range image. In a range image each pixel value represents the range or distance of the corresponding scene point from the camera. In contrast, in an intensity image, each pixel value represents the amount of light reflected from the corresponding scene point towards the camera. Figure 1.4 illustrates this difference between range and intensity images of a scene. The task of recognizing objects directly from either intensity or range measure- ments is extremely difficult because of the many factors mentioned in the beginning of this chapter (sensor noise, viewpoint changes, etc.) that affect these measurements. Therefore, most object recognition systems use more than one stage of processing, resulting in one or more intermediate representations. Segmentation plays a major role in obtaining these intermediate representations. To illustrate the importance of segmentation, consider the images shown in Fig- ures 1.1-1.2. If the pixel values in Figure 1.2(a) and (b) are compared directly, then it is difficult to determine that they are sensed from the same surface points; on the other hand, if the vision system uses the fact that there is a large gradient (or edge) Figure 1.4. A typical (a) range image and (b) the corresponding intensity image. between the third and fourth columns in the tables, then the similarity between the two images becomes clearer. Therefore, if the vision system relies on an intermediate representation such as edges in the image, then small changes in the measurements do not affect the performance of the vision system drastically. Several studies in computer vision literature have recognized the importance of detecting edges in range images [57, 12]. We explore the use of Markov random field models to capture the contextual information in the problem of detecting edges in range images. Recently, there is growing interest in using multiple sensors to increase the robustness of vision systems [21]. This thesis also examines the use of Markov random field models to fuse information from range and intensity images. 1.4 Range Image Acquisition The range images used in this thesis were acquired using a Technical Arts IOOX Scanner [93]. This scanner uses a structured light mechanism [52] to infer the depth 11 O :5 l ‘ ‘ £3 2.... Figure 1.5. Technical Arts 100X range image scanner. (range) values of the surface points in the scene. A thin laser stripe illuminates surface points corresponding to a single row of pixels in the range image. A camera placed such that its optical axis is at approximately 45 degrees to the plane of the laser light senses the intensity values. Figure 1.5 shows the set up of the scanner at the Michigan State University PRIP laboratory. The surface points that are illuminated by the laser light can be easily separated from those that are not, using a simple thresholding technique. The range values for these pixels are inferred from the displacement of the laser stripe in the intensity image. The range values for other rows in the range image are obtained by moving the object using a motorized stage, and repeating the procedure outlined above for each row position. The acquisition time for a 240 x 240 range image is about 3 minutes. The scanner has a spatial resolution of 0.04 inches and a depth resolution of 0.001 inches. We use the 11 range images shown in Figure 1.6 to test our edge detection algo- 12 Figure 1.6. Range image database: (a) GRNBLK3 image, (b) HARRISCUP image, (c) COLUMNl/GRNBLK3 image, ((1) AGPART2 image, (e) COLUMN2/ELL image, (f) BIGWYE4 image. 00 Figure 1.6 (cont’d). Range image database: (g) ADAPTER/CURVBLOCK image, (h) BALL2/PROPANE image, (i) BLOCK4/HALFSPH image, (j) CAP/BOX2INCH image, (k) COLUMN2/BALL image. rithm. The test in nmmamimnwd GRNBLK3:132 ll.\RRlS(‘1'1’: 1 COlleN1'(H{ AGPARTE:\3 COLYXIXZfll BlGWYH 1, ADAPTERJV a WOQdPh BLOCK4/t11‘ pldted h QUVBOXII bodnit. C0Ll\1\2 , bal] 14 rithm. The test images were selected such that they represent a reasonable mix of planar and curved surfaces. Following is a brief description of each image. GRNBLK3: 122 x 226 image of a wooden block. HARRISCUP: 106 x 164 image of a coffee cup. COLUMNl/GRNBLK3: 194 x 212 image of a pair of wooden blocks. AGPART2: 83 x 156 image of a rotationally symmetric industrial object. COLUMN2/ELL: 240 x 240 image of another pair of wooden blocks. BIGWYE4: 185 x 225 image of Y-shaped object. ADAPTER/CURVBLOCK: 197 x 187 image of a plastic adapter sitting on top of a wooden block. BALL2/PROPANE: 215 x 190 image of a rubber ball placed next to a cylinder. BLOCK4/HALFSPII: 129 x 171 image of a hemispherical section of a rubber ball placed next to a rhombus shaped wooden block. CAP/BOX2INCH: 128 x 146 image of a plastic cap placed on top of a 2 inch card- board box. GOLUMN2/BALL: 217 x 197 image of another wooden block placed next to a rubber ball. All these images are not of the same size because, in some of the images, background pixels were deleted to reduce the amount of unnecessary processing. Some of the images in Figure 1.6 have missing data, because some points in the scene are not simultaneously visible from the direction of the laser light and the CCD camera. All the images in Figure 1.6, except the COLUMN2/ ELL image were obtained from the MSU range image database [34]. 1.5 Con This thesis exp- inforniatioiiiii end here are: e Tilt“ thesis phi; The thesis hl>() the proldtuiitif nnensnj'arnl r. The main «'1 l. Dew-1.4,” (if-lg? prur tion :31] ~. {:59 ()f (a Parainvn. 3. App] it at] rangp 51m . Stud}. ()f 15 1.5 Contributions of the Thesis This thesis explores the use of Markov random field models to specify the contextual information in low-level vision processes. Two important low-level processes consid- ered here are: edge detection in range images and fusion of intensity and range images. The thesis proposes an MRF model-based algorithm to detect edges in range images. The thesis also explores the use of MRF models to capture contextual information in the problem of detection and labeling of edges by fusing information from registered intensity and range images. The main contributions of this thesis in the order of importance are: 1. Development of a new scheme for the estimation of parameters of an MRF edge process by relating geometric and probabilistic models of the domain (Sec- tion 5.1); 2. Use of canonical potential representation to reduce the number of MRF model parameters (Section 5.3); 3. Application of MRF model for edge detection in range images (Chapter 4); 4. Application of MRF models to obtain labeled physical discontinuities by fusing range and intensity images (Section 6.1); 5. Study of computational aspects of two parallel algorithms for the sensor fusion problem on the Connection Machine parallel computer (Chapter 7). 1.6 Organization of the Thesis The thesis is organized as follows. Chapter 2 gives some notations and definitions used throughout the rest of the thesis. Chapter 3 is a review of existing literature on Markov random fields and two related research topics— relaxation labeling and anisotropic diliusiu ndjustilies the in a method for mml problem of fragnn Chapter 5 pre field defined on e in Chapter 4 int ‘ scheme with t-stii chosen paramett- like problem ()I (' Chapter 6 l” meUSf' Of (fig. i‘i‘r-C"I’P(th"it<> (on the two inldhlt‘fi. Chapter 7 p Chapter 6 on a Finally (ill; outlines some ( 16 anisotropic diffusion. Chapter 4 presents the range image edge detection algorithm and justifies the need for a parameter estimation scheme. This chapter also describes a method for combining edge-based and region-based segmentations to overcome the problem of fragmented boundaries. Chapter 5 presents the scheme for estimation of parameters of a Markov random field defined on edge sites. The estimated parameters are used with the algorithm in Chapter 4 for range image edge detection. The performance of the edge detection scheme with estimated MRF model parameters is compared with that of empirically chosen parameters. This chapter also extends the parameter estimation method for the problem of edge detection in intensity images. Chapter 6 presents an algorithm for fusion of range and intensity images for the purpose of edge detection and labeling. The algorithm uses Markov random fields to incorporate contextual information as well as a framework to fuse information from the two images. Chapter 7 presents details of implementation of the fusion algorithm presented in Chapter 6 on a Connection Machine parallel computer. Finally, Chapter 8 gives the conclusions drawn from the research in the thesis and outlines some directions for future research. CHAPTER 2 Markov and Gibbs Random Fields This chapter introduces some notations and definitions used throughout this thesis. Section 2.1 presents some basic concepts of Gibbs and Markov random fields. Sec- tion 2.2 develops the Bayesian formulation for labeling problems in which the spatial prior information is represented using an MRF. Section 2.3 gives notations for MRFs defined on edge lattice instead of the familiar pixel lattice. Sections 2.4 and 2.5 present the notations for coupled MRFs and multivariate MRFS. 2.1 Markov and Gibbs Random Fields Let X = {X1,X2, ...XM} be the M—tuple random vector representing the colors or labels at the M sites of an image lattice, S. We will represent realizations of random variables by the corresponding lower case symbols. For example, we represent a realization of the random vector X by a vector x. A subset of the random variable X defined on sites in B C 8 is denoted by X3. Although the image is usually a 2-dimensional array, a linear ordering of the pixels (sites) of the image is assumed for notational convenience. Each X, takes a value from the set of labels, A. The range of labels could be either discrete or continuous. Therefore, for discrete variables A = {1,...,G} and for continuous variables A = ’R, the set of real numbers. In order to 17 avoid duplit at in] probabilities of l denoted by hi .' denoted by fx \ 2.1.1 Neig A neighborhood .l, the neighlmr l. The site t 2. lfs is a in 18 Figure 2.1. A hierarchical ordering of neighbors for site t. avoid duplication, we give definitions only for the discrete case. We use P(.) to denote probabilities of named events. For a random variable X, the density function will be denoted by fx and the conditional density with respect to a random variable Y is denoted by leYc Let the set of all possible labelings be Q. 2.1.1 Neighborhood System A neighborhood system, N, on a lattice, S, is a collection of subsets, M of S, where M, the neighborhood of site t, satisfies the following two criteria. 1. The site t is not a neighbor of itself: t é M,Vt 2. If s is a neighbor of t, then t is a neighbor of s: ifs E M then t E N,,V(s,t) Some commonly used neighborhood systems in image processing are shown as a hierarchy in Figure 2.1. The number at each pixel represents the order of the neighborhood system in which they first appear as neighbors of t. Notice that this hierarchy is not the only possible hierarchy. For example, one can create another hierarchy by naming the neighbors according to their city-block distance from the site of interest. Nt‘lglilmh H Two different d sidered to be l near the bourn free boundary their neighhur. bC‘flilddU' [Inuit model for the determined by 2.1.2 Clit Consider tin and an edge is of each other. 0i each other. connected grey fir 31‘ and seen and 2l. resper Ola site t. 19 1 ltllxlcl Figure 2.2. Cliques for the first-order neighborhood of a site t. Neighbors of sites near the boundary of the lattice need to be defined separately. Two different definitions are possible depending on whether the boundaries are con- sidered to be periodic or free boundaries. In the periodic boundary model, sites near the boundary of the lattice have as many neighbors as interior sites. In the free boundary model, these sites have only the sites inside the lattice boundary as their neighbors, so they have fewer neighbors than for interior sites. We use the free boundary model for the edge detection problem in Chapter 4 and a periodic boundary model for the fusion problem in Chapter 6. The choice of the boundary model was determined by implementation considerations. 2.1.2 Cliques Consider the graph defined by {5,N}. Here, the sites form the vertices of the graph and an edge is placed between two vertices if the corresponding sites are neighbors of each other. A clique is a subset c C S in which every pair of sites are neighbors of each other. That is, the subgraph formed by the sites in a clique is a completely connected graph. Let C denote the set of all cliques in {5,N}. For example, for the first- and second-order neighborhoods in Figure 2.1, the number of cliques in C are 5 and 21, respectively. Figure 2.2 shows the five cliques for the first-order neighborhood of a site t. 2.13 Mar A Markov rainh propefi 1"" l. PositiVllF' tram-w ator. The p()>lll\‘l ability of ort‘ut‘ from Markov rl site I depends ( the remaining 5 not necessarily ‘ more sense to 11 model Wit h but 2.1.4 Gib While a Marko conditional pro HE‘S of the imaE '\ Gibbs r: a joint distril 5611Mile )1] Space. {2 20 2.1.3 Markov Random Field (MRF) A Markov random field is a probability distribution on X which satisfies the following properties. 1. Positivity: P(x) > 0 for all x E Q. 2. Markovianity: P(.rtlx5\{,}) = P(Itlxdv,),Vt, where \ is the set difference oper- ator. The positivity property states that every configuration in Q has nonzero prob- ability of occurrence. The second property, Markovianity, is an intuitive extension from Markov chain theory and states that the distribution of value (or colors) at a site t depends only on the values at the neighbors of site t and not on the values at the remaining sites. The word neighbor is used here to denote any site in M, and not necessarily a spatial neighbor of the site t, although in image processing it makes more sense to use spatial neighbors. For example, Geman et al. [37] have proposed a model with both spatial and random neighbors. 2.1.4 Gibbs Random Field (GRF) While a Markov random field specifies the local properties of the image in terms of conditional probabilities P(IrtlxM), a Gibbs random field specifies the global proper- ties of the image directly. A Gibbs random field on a lattice S, for a given neighborhood system N, is a joint distribution which is specified by the following probability measure on the sample space 0 for the lattice. P(x) = e-UW/z, (2.1) where and lle) is lllt Clique c. The f‘ The suniniat lui continuous. '1‘} to extremely l. the fact that l distribution of 2.1.5 MI One of the ma distribution is lent makes it The Hammer in terms of «7] Theorem 2, GRF and vit- A t."Piral the associah. MRF for CO neighborhor» distribm ion 21 where U(X) = Z VAX), (2-2) CEC z = Z e-Ul"), (2.3) x60 and Vc(x) is the potential function (alternatively, clique function) associated with the clique c. The function Z is a normalizing constant known as the partition function. The summation in the definition of Z should be replaced by an integral when x is continuous. The computation of Z is extremely tedious, except in trivial cases, due to extremely large cardinality of the set Q. The power of the GRF models lies in the fact that by controlling the relatively few functions VC(.), the form of the joint distribution of X can be completely specified. 2.1.5 MRF-GRF Equivalence One of the major difficulties with using MRFS is to verify whether a given conditional distribution is indeed valid or satisfies the definition of an MRF [5]. This validity prob- lem makes it very difficult to specify an MRF in terms of conditional distributions. The Hammersley-Clifford theorem, stated below, allows the specification of an MRF in terms of clique potentials [5]. Theorem 2.1 : For a given neighborhood system, a unique MRF exists for every GRF and vice versa. A typical use of the theorem is to model the image globally by a GRF using the associated clique potentials and then use the local dependence property of the MRF for computational advantage since only a small number of variables in the neighborhood of a site need to be used to compute the locally dependent conditional distribution at that site [38]. 2.2 MI Ba: Markov randu formation in r: l. ' ‘ pfOlHt‘lll inmix number of clas: image segment pixel. Where :5, regions. \Vt- at. l- iaheling prulilt'l Contextual info. problems. the tr neighboring site: world such as n models of (ontc'x spatial prior. We the Bayesian {ran Markov random ii Let X 2 {X labels at each of tl I. i l; {“9” are . .- C, a) talm-s/ ' Site labels. One ' , is spat ial dependenr iv distribution P/Xl 22 2.2 MRFS as Models of Spatial Context: Bayesian Formulation Markov random field models have been successfully used to represent contextual in- formation in many ‘site’ labeling problems [6, 20, 22, 29, 30, 38, 74]. A site labeling problem involves classification of each site (pixel, edge element, region) into a certain number of classes based on an observed value (or vector) at each site. For example, in image segmentation problems, pixels are labeled based on observed value(s) at each pixel, where the number of labels correspond to the number of “true” segments or regions. We assume that the desired labeling is the “true” image. The goal of site labeling problem then is to recover the true image from the noisy observed image. Contextual information plays an important role here because in many site labeling problems, the true label of a site is compatible in some sense with the labels of the neighboring sites. Context represents our a priori assumptions about the physical world such as continuity and smoothness. Markov random fields are appropriate models of context because they can be used to specify the spatial dependency or spatial prior. We present below, the procedure for deriving an optimal labeling in the Bayesian framework by modeling the a priori knowledge about site labels as a Markov random field. Let X = {X1,X2, ..., XM} be the M-tuple random vector representing the ‘true’ labels at each of the M sites. Vector Y = {Yb 16,..., YM} represents the observations (6.9., gray values) at the M sites. Thus we have two types of information about the site labels. One is in the form of measurements Y and the other is in the form of spatial dependencies among the set of labels X represented by the prior probability distribution P(x). These two types of information can be combined optimally using the Bayes. rule. Contextual info: statistical (leper t0 the Gibbs pn tion 12;“) Mini neighboring sitt and continuity l5 tlét’tl here “It not expected it Assumption ~33 \ For cornputati1 ”dependent ()l W' (J! 23 the Bayes’ rule. The a posteriori probability of X given Y is given by P(xly) : inxjr:l(:))P(X) (2.4) Assumption 1: prior probability distribution Contextual information is incorporated through a Markov random field model of the statistical dependence among the labels on neighboring sites in X, which is equivalent to the Gibbs process [5]. The potential functions, VC(.) of the GRF model (see Equa- tion (2.2)) encode our a priori knowledge about the spatial dependence of labels at neighboring sites. For site labeling problems, natural constraints such as smoothness and continuity are enforced using VC(.). We emphasize that the random field model is used here only to incorporate spatial context into the site labeling problem and is not expected to be an accurate model of the true image. Assumption 2: Class conditional density For computational reasons, it is assumed that the observations yt are conditionally independent of each other [6] Therefore, the density for Y, given the true labeling, is fY|X(YlX) = “fitflytlxd- (2.5) Note that f(y,|:rt) is the conditional density function for the observation Yt, given the true label as, at site t. This assumption on the conditional density corresponds to assuming an independent and identically distributed noise process. In practice, this assumption is often violated because of the presence of image texture, or correlated noise. The independence assumption has been used widely not only in MRF literature but in computer vision as a whole because, it gives a reasonable approximation to the actual noise processes for many problems and also it enables the development of simple and efficient algorithms [6, 74, 83]. We emphasize that this assumption is not due to a limitation of the MRF models. For a formulation with more general noise process. we refer tinder assun site labels X. 2n where Zy l5 a H The local ( l random field. neighbors of pi Where 2! is a . where the sur 2.3 Es The site 1 , (1 H‘ 5’, estimate t ”to St termS ( ._ . ‘Mdklmum a 24 process, we refer the reader to [38]. Under assumptions (1) and (2), the a posteriori probability distribution for the site labels X, given the observations Y = y also has the form of a Gibbs random field P(xly) = e-U‘X'yt/Zy, (2.6) where Z), is a normalizing constant, and the corresponding energy function is A! U(X|Y) = Zl-1n(f(ytlri))l+ Z V.(x). (2-7) i=1 CEC The local characteristics of a Markov random field can be derived from the Gibbs random field. Let XArt be a (vector) random variable representing the labels of neighbors of pixel t. The conditional probability of Xt can be written as follows [5]: P(aixwi = e‘U'W’Wszt, (2.8) where Z: is a normalizing constant, and Utll‘thNtaYl = ‘lnfffytlxtll + Z chxA/tla (2-9) czt€c where the summation is over all the cliques c to which the site t belongs. 2.3 Estimators for a posteriori Labels The site labeling problem can now be stated as follows: given the observation vector y, estimate the label vector x. Contextual information is represented by the right- most terms (ZCEC Vc(x) and Zonal/Ax“) ) in Eqs. (2.7) and (2.9). The MAP (Maximum a posteriori) estimate is the vector 5‘: which maximizes P(xly) With re- man a: the 11. cornput Sew labeinrs ofan er Segnien' ofthe ; also t};t each an. Carlo n Pesterio llarg,h(, uPdated algorit hi Ellen [n ('1 (.‘Ol’npn Preperm, Chou COIIfidt 71‘ and C0,“; 25 spect to x. Maximizing this a posteriori probability mass function which is a function of M variables is a formidable task since for a 256x256 image, M = 65,536. Ge- man and Geman [38] have used simulated annealing to obtain an approximation to the MAP estimate in the context of image restoration, but their procedure is also computationally expensive. Several alternative estimates have been suggested for obtaining the a posteriori labeling. Marroquin et al. [74] have suggested that minimizing the expected value of an error functional is a better criterion for some problems. For example, for the segmentation problem an appropriate criterion is to minimize the expected value of the percentage of pixels misclassified. For this criterion, the optimal labeling is also the labeling that maximizes the a posteriori marginal probability (MPM) at each site. They propose a method of computing the MPM estimate using a Monte Carlo method. W’hen rt is continuous, the optimal estimate is the mean of the posterior marginal distribution at each site and is called the Thresholded Posterior Marginals (TPM) estimate [74]. Besag [6] proposed a deterministic iterative scheme called Iterated Conditional Modes (ICM), in which the labeling of a site is iteratively updated so as to maximize the local conditional probabilities P(szva). This algorithm finds a solution that corresponds to a local minimum of the energy function given in Eq. ( 2.7). Two important advantages of the ICM algorithm are: (i) it is a computationally attractive scheme, and (ii) it avoids the undesirable large scale properties of the MRF prior (see [6] for details). Chou et al. [20] have proposed another estimation algorithm known as the Highest Confidence First (HCF) algorithm. The HCF algorithm, like ICM, is deterministic and computationally attractive. Because of these advantages we use the HCF algo- rithm in this thesis. The HCF algorithm is similar to the ICM algorithm except for the order in which the sites are visited. While the ICM algorithm does not impose any single order for visiting the sites, the HCF algorithm requires that the site that is visit MT 5' site tin. with .\' is surnr /_ .._- Banana _ _ 26 is visited next be the one which causes the largest energy reduction. Additionally, HCF starts by initially assigning a NULL (non-committal) label to each site. The site commits to a label only if it gains enough confidence in its decision. Cliques with NULL labels contribute zero potential. The HCF algorithm for discrete labels is summarized in Figure 2.3. HCF Algorithm for Discrete Labels 1. Initialize all sites to NULL label. 2. Compute a stability measure for each site t as follows. 0 If the site t has a non-NULL label 17,, then stability : Ut(g,th,y) - “(WM/Vt,” o If the site has a NULL label, then stability = U.(g,x,o,,y) - Ui(i,x/v,,y) where g : argmin Ut(l,XA/’t,Y) 16 A i : argmin Ut(l,xN't,y) 16 AJ 75 g 3. Repeat (a) Select the site with minimum stability (b) Change its label to the one which corresponds to the lowest energy. (c) Recompute stability for the site and its neighbors. until (stability 2 O) for all sites. Figure 2.3. HCF site labeling algorithm for discrete labels. FIRM“: edflt’ si “man DIXEI \‘a mulch“! lattice {, Edge site Pixel site Figure 2.4. Pixel and line (edge) sites. (a) (b) Figure 2.5. Second-order neighbors of edge sites: (a) neighborhood of a horizontal edge site; (b) neighborhood of a vertical edge site. 2.4 Markov Random Fields on Edge Labels: Line Process Geman and Geman [38] introduced the notion of line variables (sites) to augment pixel variables (sites) in order to model spatial discontinuities in the image. The introduction of this notion initiated a great deal of reserach in the area of MRF modeling for context [74, 10, 87, 20, 81]. The line variables are placed in a. dual lattice (vertical and horizontal sites between the pixel sites) as shown in Figure 2.4. The stochafilt' Consider It indicate wheth lattice consist it (“my represt'ti a pixel prtu e» use indirect inc irelglrliutlluml ( ff edge sites lg. we use for the t in Figure 3,3, The line [iri assumptions ai. tress [Fix] fl“. , and the corms,“ Th e a Walt rior Tl" 16 functions l. 28 The stochastic process defined by the line variables is usually referred to as a line process or edge process. Consider the image lattice shown in Figure 2.4. The edge sites in the lattice indicate whether an edge exists between the two pixels it separates. Let L be the lattice consisting of only edge sites. Let L = { L1, L2, ...LK} be the K-tuple random vector representing the labels at the K sites of the edge lattice, [3. Note that while a pixel process is measurable (observable), a line process is not. It is common to use indirect measurements for edge sites in terms of measurements at pixels in the neighborhood of the edge site [19, 58]. We denote the indirect measurements at the K edge sites by the random vector W : {VV1,lV2,..., WA}. The neighborhood that we use for the edge process to specify edge constraints in Chapters 4 and 5, is shown in Figure 2.5. The line process is very useful for edge detection because it represents natural assumptions about the structure of physical edges such as continuity and smooth- ness [58]. The expressions for the a posteriori probability distribution of edge labels and the corresponding energy function can be derived analogous to that in Section 2.2. The a posteriori energy function is given by K UUIW) = Zt—Intftwtiltni + Z Vc(1)- (2.10) tzl C60 The functions Vc(l) represent the contextual information in the edge labels. The edge cliques and the constraints imposed are discussed in detail in Chapter 4. 2.5 Coupled Markov Random Fields A Markov random field defined on the coupled lattice of pixel and line variables shown in Figure 2.4 is known as a coupled Markov random field [38]. A commonly Figure 2.6. .\ei zontal edge sit «- used neighborh Figure 2.6 [33. edge constraint follovi'ing eonsti SmOot hing arm constraints on t The lattices We wi 11 denot e process and L IS represented l ‘29 Figure 2.6. Neighborhoods of pixel and line (edge) sites for (a) pixel site, (b) hori- zontal edge site, and (c) vertical edge site. used neighborhood system for defining coupled Markov random fields is shown in Figure 2.6 [38, 74, 19]. Coupled Markov random fields help specify the pixel and edge constraints simultaneously. The addition of the line process helps to specify the following constraints: (i) the values of line variables can be used to avoid undesirable smoothing across discontinuities, and (ii) the cliques of the line process can specify constraints on the structure of the discontinuities. The lattices of pixel sites and line sites will be denoted by S and L, respectively. We will denote the coupled process by a pair (X,L), where X represents the pixel process and L represents the line process. A point in the sample (solution) space is represented by (x,l). Usually, only the pixel process is measurable (observable). ‘Ab f<‘.— D-‘t‘ I ,4‘" Figutt‘ The line rm» 0 solution spam- expression for form to Equa‘. F or the coup} the expressior because nieas 2.6 1V1! \flfffl {Huh if) hw- “'9 a multi‘ could be eitl ' J 30 Intensity label Intensity measurement Range labe .‘,;'/‘I"(..' " ’4' "' . Range Measurement J-‘n,"“l'/ ‘ 1.,“ r f -/ 1., Edge site Labels ( X, L) Measurements Y Figure 2.7. Multivariate MRF: the label and measurement variables. The line process is not observable and only helps to specify the constraints on the solution space. we denote the measurements at the pixel sites by a vector y. The expression for the a posteriori energy function for the coupled process is similar in form to Equation (2.7) and is given by U((X,1)IY) = Zl—ln(f(yz|1:))l+ Z Vc(XJ)- (2-11) tES CEC7 For the coupled process, the cliques in set C contain both pixel and line sites. In the expression above, the line process only helps in specifying contextual constraints because measurements y are present only at pixel sites. 2.6 Multivariate Markov Random Fields When multiple data sources (sensors) are used to measure the scene properties, we will have a multivariate observation at each site. The corresponding pixel label process could be either multivariate or univariate depending on the labeling problem. For exazri; corres and it {rel UT! {Piaf l! .1, Tstt in Sect lUIliVar li‘.£1r 1 p05“, rli 2.7 In this ( fiEidg‘ a the Bay. pm” di> proldffili reviewed 31 example, in multispectral image classification problems, the pixel labels are univariate corresponding to the different categories, whereas in the problem of restoring range and intensity images, the pixel labels are multivariate, with one label each for the restored range and intensity values at the corresponding site. Figure 2.7 shows the relationship between the label and measurement variables for the problem of fusing range and intensity images. A Markov random field in which each site has more than one label is known as a multivariate Markov random Field [71]. The definitions and the estimation methods described in Sections 2.1—2.3 extend easily to the multivariate case. We will only mention here our notation for such MRFs. The set of variables on which the MRF is defined will still be represented by (X, L) as in the case of the simple coupled process in Section 2.5. For the purposes of this thesis, it is sufficient to use only a single (univariate) line process L. We will denote the k” measurement from all sources by Y", and the kth label at all locations by X". We defer a detailed derivation of the a posteriori probability distribution to Chapter 6. 2.7 Summary In this chapter, we first presented definitions of univariate Markov and Gibbs random fields, and gave notations for coupled and multivariate MRFS. We also reviewed the Bayesian formulation for combining sensory measurements and spatial contextual prior distribution and described some commonly used estimators for the site labeling problem. The HCF algorithm which is used in Chapter 4 for edge detection was also reviewed. CHAPTER 3 Survey: MRF Models in Computer Vision Computer vision literature contains several problems for which Markov random fields models have been used to obtain a solution. These problems can, by and large, be grouped into two categories: (i) MRF is used to model the contextual information to enhance decision making algorithms, and (ii) MRF is used to describe the image using a few parameters so that the image can be modeled and synthesized. In this thesis, we are mainly concerned with modeling contextual information. Another useful classification, which we will use here to describe significant work in the field, is the one based on the image domain in which MRF has been put to use [60]: textured images, gray level images, range images and multi-sensor images. A pictorial depiction of this classification is shown in Figure 3.1. 3.1 Textured Image Analysis Texture analysis is an important and very difficult problem. The fundamental diffi- culty in texture modeling is that no one model of texture formation can account for the large variability of texture types. Figure 3.2 shows some texture patterns from 32 lm'dge A 33 Textured Images — Gray Level Image Analysis — Imagos _ Range Images — Images L Miscellaneous —E Multi—Sensor _ Toxturo classification Toxturo synthosis Toxturo sogmontation Imago Rostoration Imago Sogmontation Rdgo Dotoction Imago Rostoration Edgo Dotoction Color Imagos Multispoctral Imagos Rango and Intonsity Imagos Imago Intorprotation Spocklo imago sogmontation Figure 3.1. A classification of MRF application areas. J_--2-—;J‘—’ thim bl (3x 34 Figure 3.2. Several texture patterns from the Brodatz’s album [14]: (a) reptile skin, (b) expanded mica, (c) pressed calf leather, and (d) water. the Brodd [he {8&1 “1 Approa tural app“ attributes O techniqm‘S t View a text! specifies him only for text mathematir'a cal neighlmrli model-based a i. » ' oeen extensm 3.1.1 Tex Texture classifi a singl e texture in any classifire ClassfiCat ion rul model pararnete have been deri w t -- ' he \quOUS other 35 the Brodatz’s album [14] to illustrate the variety of texture patterns encountered in the real world. Approaches to texture analysis can broadly be classified as feature-based, struc- tural approach and model-based [3]. Feature—based approaches extract numerical attributes or features from a textured image and use statistical pattern classification techniques to classify or segment the texture regions [46, 61]. Structural approaches view a textured image as generated by a set of rules or a formal grammar which specifies how texture primitives are arranged [3]. Structural approaches are suitable only for textures with high degree of regularity. Model-based approaches provide a mathematical model to capture the statistical dependence among the pixels in a lo- cal neighborhood [25]. Unlike feature-based approaches which are descriptive only, model—based approaches can both describe and generate textures. MRF models have been extensively used in the following texture analysis tasks. 3 . 1 . 1 Texture Classification Texture classification deals with the problem of classifying an image consisting of a single texture into one of several possible categories of textures. As is typical in any classification problem, this involves the selection of a feature vector and a classification rule. A textured image can be modeled by an MRF and the estimated model parameters are then used as features for classification [29]. Textural features have been derived from both continuous MRF [23] and discrete MRF [17]. Some of the various other texture features used in the literature are, gray level co-occurrence features [46], fractal features [65] and Gabor filter features [61]. 3.1 .2 Texture Synthesis The goal in texture synthesis is to generate (synthesize) an image with desired tex- Iran.- .3 all -‘uL - Figure 3 l6\ grar 36 Figure 3.3. Texture patterns generated using a second-order discrete MRF model (4 gray levels). or": rexl 2at€> Hou't as X'fri 3L1”? In [extur regions W into differ have been spatial dist) within a tm hierarchical a of segmental} textured regio algorithm base! into many regim parameters. T}; usmg this all/m. 37 tural properties using only a few parameters. Texture synthesis is useful in computer graphics and animation. MRF models have been suggested for synthesis of textured images [47, 25, 27, 17]. Figure 3.3 shows examples of textures generated using second— order Derin—Elliott discrete MRF models [27]. The idea of using MRF models for texture synthesis is appealing because, the problem of synthesizing a texture trans- lates to the problem of sampling an MRF distribution which is fairly straightforward. However, usefulness of MRF models for real world texture synthesis problems such as virtual reality and data compression has not yet been demonstrated. 3.1.3 Texture Segmentation In texture segmentation, the given image is assumed to contain several homogeneous regions with different textures. The problem here is to separate and classify pixels into different regions and is more difficult than texture classification. MRF models have been used in two different capacities to attack this problem: (i) to model the spatial distribution of region labels, and (ii) to model the distribution of gray levels within a textured region. Derin and Elliott [27] and Manjunath et a1. [70] use a hierarchical approach which applies a top-level MRF to incorporate spatial continuity of segmentation labels and a separate MRF models the gray level distribution for each textured region. Bouman and Liu [13] developed a multi-resolution segmentation algorithm based on the same approach. Cohen and Cooper [22] partitioned the image into many regions, and each region was tested for homogeneity using estimated model parameters. They report. very good segmentation results on several natural textures using this approach. The pro The lion other PT”. on liomugt' Where the t is to get an information i in Section 2.2 Estimates of tli probability (list MRF mode intensity but. f a real images [30“- 3'22 [Ina image restorau' [10] ' .- S.‘ 0f (lc'graa QSUIHEd [0 be 38 3.2 Gray Level Image Analysis Three of the most important tasks in the analysis of gray level images (without texture) are: restoration, edge detection, and segmentation. All these problems are instances of the general pixel labeling problem. MRF models are used to enhance the performance of these algorithms by incorporating contextual information in the labeling process. 3.2.1 Image Segmentation The problem of image segmentation is to partition an image into homogeneous regions. The homogeneity prOperty could involve the pixel gray levels, or texture, or some other property. Here, we will restrict the discussion to partitioning an image based on homogeneity of gray levels. This problem has been cast as a pixel labeling problem, where the desired labeling is the true region map. The segmentation problem then is to get an estimate of the true region map from the observed image. Contextual information is incorporated in the prior term in the Bayesian formulation introduced in Section 2.2 (see Equation 2.7) and is modeled by a Markov random field. Various estimates of the true labeling (MAP, MPM) have been defined using the a posteriori probability distribution of labels [38, 74]. MRF model-based segmentation algorithms perform well on regions of uniform intensity but fail when the intensity in a region has a gradient, as is often the case in real images [30]. 3.2.2 Image Restoration Image restoration methods attempt to recover the “true” image from an observed noisy or degraded image. A mathematical model of the degradation phenomenon is assumed to be available. The contextual information is present in the form of spatial -_. - , I “‘J .. 39 dependency among gray levels. Nearby pixels tend to have similar (closer) gray levels. Markov random fields are appropriate models for this contextual information. Image restoration is another instance of the pixel labeling problem. One significant difference between image segmentation and image restoration, is that, in restoration, the number of labels in the recovered image is equal to or greater than the number of gray levels in the corrupted image, whereas in segmentation, there are fewer labels than gray levels. A more fundamental difference is that segmentation labels are symbolic whereas labels in a restoration problem are numeric values corresponding to the gray levels in the image. Discrete MRFS have been used for restoration by Geman and Geman [38], Be- sag [6], and Marroquin [74]. Geman and Geman used a hierarchical model consisting of the usual intensity process indexed by pixels, and an unobservable line process indexed by imaginary edge sites between pixels as discussed in Section 2.5. Jeng and Woods [62] have extended this work to continuous MRF models. Continuous MRFS are more appropriate for restoration, because usually, the undegraded pixel values are continuous in the real world. A popular class of continuous MRFs is the compound Gauss Markov random field model used by Jeng and Woods as well as by Simchony et al. [88]. This model has the advantage that sampling the conditional distribution is straightforward. Some impressive image restoration results have been obtained with MRF models on images of man-made objects [38, 88, 10, 74]. However, they do not have any apparent advantage over methods such as Wiener filtering [40] for natural scenes because of the less significant role played by edges in natural scenes. 3.2.3 Edge Detection MRF model can be used for edge detection in one of the following two ways: (i) as a by-product of image restoration, (ii) as a direct labeling problem. In both cases, the edge sites are defined to lie between two adjacent pixels as shown in Figure 2.4. Dafly paren \\ usual and a exinn a dne randoi labels Wt“ SPt‘ a nien 0f {E‘Slt labeh ] agmal Shnthoi sian .\ll equ‘Vale (hot Slt‘p ibrlgt m Equat “li90] ' than for - Ollie] llim induded, 40 Daily [26] has argued in favor of hexagonal grids for restoration because of the ap- parent reduction in distortion for edges of arbitrary orientations. When edge labeling is obtained as a by-product of image restoration, the model usually consists of a hierarchical or coupled MRF with an observable pixel process and an unobservable line process [38]. The labeling problem is posed as the joint estimation of the pixel labels and the edge labels. When edge detection is posed as a direct problem, the prior distribution of the edge labels is modeled by a Markov random field. Since there is no direct observation at the edge sites, support for edge labels is derived from the intensity (or any other image attribute whose discontinuity we seek) values in the neighborhood of the edge site and expressed in the form of a likelihood of that label. Geman and Geman [38] obtained edges as a by—product of restoration using a hierarchical MRF model. They experimented with two edge labels {edge, no edge] and four edge labels {straight edge, left diagonal, right di- agonal, no edge} on some simple but noisy images and obtained impressive results. Simchony et al. [88] used a simple line process with only singleton cliques, and Gaus- sian MRF model for the pixel process for edge detection and showed that this is equivalent to Blake’s [10] weak membrane formulation. Chou et al. [20] posed edge detection as a direct labeling problem and used a step edge model to compute likelihood of edge labels ( f (wtledge) and f (wtlno-edge) in Equation (2.10)) at each line site. The same model was also used by Swain et al. [90]. The direct labeling problem is easier since the search space is much smaller than for the restoration problem, which reduces the computational burden. On the other hand, this approach has the disadvantage that, because the pixel process is not included, the model may not specify the constraints accurately. Mu [Ere 41 3.3 Range Image Analysis Range images require processing techniques different from those for intensity image analysis [59]. This is primarily due to the physical interpretation of measurements. For example, range image segmentation techniques aim at separating different sur- faces from one another unlike in the case of intensity images, where, by and large, segmentation is based on uniformity of the pixel values with the hope that they cor- respond to uniformity in some physical property in the real world. Most of the range image operations are dictated by the need to preserve the properties of geometric surfaces which include the accurate specification of its boundaries. For many applica- tions, surfaces are assumed to have zeroth (continuity of depth values) and first-order (continuity of surface orientation) continuities. Discontinuities of these two types are known as jump and crease edges, respectively. Range image edge detection, there- fore, necessarily involves the detection of both jump and crease edges. Range image restoration should ideally take both these discontinuity types into account, so the surfaces are not smoothed across these edges. MRF models have not been applied widely to range image analysis problems. MRF models have only been used for sparse and dense depth restoration problems with jump edges [74, 20], although the model is sufficiently general to deal with crease edges as well. Bilbro and Snyder [9] have used an MRF model with a large neighborhood for range image restoration. The model does not incorporate discontinuities directly, although the large neighborhood and the clique potential functions used are claimed to preserve the edges during restoration. 3.4 Multi-Sensor Image Analysis Multi-sensor image analysis techniques deal with the fusion of information from dif- ferent sources. Data fusion allows better decision making by relying more on sensory ("*1 do. (’3 ple like Wt): ”101 42 information and less on a priori assumptions which may or may not be accurate. By incorporating diverse sources of sensory information, data fusion reduces the uncer- tainty in the decisions. A variety of methods exist for achieving data fusion. These methods depend on the nature of the information sources, the relation between these sources, and the desired output. For example, the type of processing required to combine two independent intensity images of the same scene to obtain a segmentation into uniform intensity regions is much different from that required for combining thermal and visual images of a scene to obtain segmentation in terms of uniform heat flux regions [82]. Clark and Yuille [21] give a good survey of various information fusion methods. Markov random field models can model the contextual information for image data fusion. A site is assumed to have a vector of observations as shown in Figure 2.7. MRFs can be easily generalized to handle vector observations. In the following sub- sections we describe some specific fusion problems which have been solved using MRF models. 3.4.1 Color Image Analysis Color images are formed by sensing the visible electromagnetic spectrum at three primary wavelengths corresponding to red, green and blue (650, 530, 410 nanometers). Each of these three images consist of responses of filters whose characteristics are designed to be close to those of the corresponding photoreceptors (cones) in human eyes[86] The key idea in color image analysis is to use the correlation between different color planes (or component images) to aid in decision making. It is well known that humans use both color and intensity information to discriminate objects more effectively than would be possible with intensity alone [86]. Wright [100] uses a simple correlation model which penalizes configurations in which the labels in the three images at the 43 same edge site are different. This model does not consider the correlations that exist among the red-green—blue components at a single pixel. Nevertheless, very good restoration results were obtained by Wright [100] on synthetic images. One can also take the correlation between different sensor values at a pixel into account, in addition to those at the discontinuities. Daily [26] incorporates pixel correlations implicitly by using a color difference measure that purports to enhance discriminability. A multivariate MRF model with coupled univariate line process (see Section 2.6) is used to model the joint prior distribution of pixel and edge site labels. Smoothing of adjacent pixel values is suspended in all three color planes when an edge is present at an edge site. Daily [26] obtained very good restoration and edge detection results using this model on natural images. 3.4.2 Multispectral Image Analysis Multispectral images are different from color images in that the range of wavelengths sensed is broader than the range of visible light wavelength, and the specific fre- quencies usually do not correspond to the red, green and blue channels used in color images. The goal of multispectral imaging is the segmentation and classification of pixels. It is a very useful imaging technique for remote sensing applications. Early methods for multispectral image segmentation/classification ignored the spatial de- pendency of segmentation labels and simply used the different spectral observations at each pixel to classify the pixel. Switzer [91] incorporated some spatial dependency, but the relaxation method used ignores pixel observations once initial classifications are obtained independently at each pixel, and then uses only spatial information of labels. Zhang et al. [103] used a multivariate MRF model for segmenting multispec- tral images. They used a simple model without a coupled line process where the correlation between different values at a single pixel was accounted for by assuming a multivariate normal distribution for the noise. The mean vector and covariance ma— 44 trix for each class were estimated using training samples. Note that for many remote sensing applications, the additional constraints that can be enforced by using a. line process may be of little use because of lack of structure in the edges ( this is not true for the image of a city or farm, however). The images used in Zhang et 0.1. had no apparent structure in the edges. They obtained 53—10% improvement in classification accuracy over a non—contextual classifier on four real images. 3.4.3 Fusion of Range and Intensity Images Combining range or depth measurements and intensity measurements of a scene can be useful in two ways: (i) it can provide a robust solution, and (ii) it can provide information which can not be derived from either image alone. Robustness is provided by redundancies. It is very likely that an actual discontinuity in the physical world produces discontinuities in both range and intensity images. So the effect of noise can be minimized by using both the images simultaneously in image analysis. A range image provides only depth measurements of the scene and may not be sufficient to differentiate, for example, a Pepsi can from a Coca-Cola can since the depth measurements are the same for both objects. An intensity image provides additional features such as the printed characters on the objects which enable the recognition of the objects accurately. An intensity image is sensitive to ambient light, shadows and highlights. Therefore, a robust and accurate recognition system can be achieved by combining the two types of information. Chou [19] used a coupled MRF model for combining intensity and (possibly sparse) depth observations, to reconstruct depth. Chou used intensity edges as partial support for the existence of a depth discontinuity at the corresponding location in the depth map. This is achieved by deviating from the usual coupled MRF approach in which line sites are assumed to be hidden with no direct observation; Chou derived the likelihood of edge labels for each site from the intensity image as the support for edge 45 labels in the coupled MRF corresponding to the depth image. This innovative use of the edge likelihoods allowed working with only a single coupled MRF model rather than two coupled models (one each for intensity and depth images) or a multivariate coupled MRF model that would have been necessary otherwise. The integration of intensity and range data has been shown to produce good surface reconstruction even when the range data is sparse. Bayesian formalism performs better in combining different sources of information than methods that use ad hoc rules to combine the individual segmentations [43]. 3.4.4 Integration of Early Vision Modules Understanding a complex information processing task such as vision calls for a mod— ular approach [73]. This is partly because there is psychological evidence that bio- logical vision also is somewhat modular [96] owing perhaps to evolution of different perceptual capabilities at different times and partly because it seems impossible to understand the complex process of vision as a whole. Therefore, the bulk of the com- puter vision research has concentrated on specialized modules (shape, texture, etc.) that provide a certain capability of the vision system. It is obvious that somewhere along the information flow path, information from these different modules must be combined for a more complete interpretation of the scene [97]. Gamble et a1. [35] experimented with integrating results from stereo, motion, color and (intensity) edge detection modules. Markov random fields model the mutual con- straints between different modules. A separate MRF lattice for each module is coupled to the intensity image at the location of discontinuities. The coupling is in the form of a term in the energy function of each MRF which is proportional to the bright- ness change in the intensity image at that location. The idea is that discontinuity in any module usually results in large intensity changes. The coupling term seems to have been chosen heuristically. Results are presented for only two images and they 46 seem reasonable. Gamble et al. also propose a more general framework which labels the discontinuities in terms of their physical origin, but no experimental results are presented and not enough detail is presented to judge the feasibility of this approach. 3.5 Miscellaneous Applications of MRF Some Markov random field applications do not come under any of the above cate- gories. Notable exceptions are: (i) modeling of complex data in speckle images [28]; (ii) image interpretation [76]; and (iii) segmentation and recognition of tinker toy descriptions [24]. Derin et a1. [28] used a hierarchical MRF model similar to those used for segmen- tation of gray level and textured images, to segment a speckled image. Speckled noise is observed in synthetic aperture radar (SAR) and laser images and are characterized by reflections from a rough surface illuminated by coherent light. The received signal is represented by two components— an inphase component and a quadrature compo- nent which together form the complex amplitude data at any pixel. The intensity image corresponding to this complex data is the squared magnitude at each pixel and this is the data that is commonly used for image analysis. The intensity image in the presence of speckled noise exhibits a granular appearance and hence the name speckle images. Derin et al. show that the phase part of the complex data, contrary to its visual appearance, carries useful information for segmentation purposes. The top-level in their hierarchical approach models the spatial continuity of regions. The speckle within a region is modeled by a bivariate Markov random field. The joint distribution of complex amplitude data in a region is shown to be a second-order bi- variate Gaussian Markov random field. They demonstrated the performance of their method on many synthetic images as well as a few real SAR images. The segmenta- tion results are very good for all the test images and the use of complex data results 47 in significantly better results than when only amplitude data is used. Modestino and Zhang [76] used an MRF defined on an adjacency graph of seg- mented regions for the problem of image interpretation. Image interpretation is con- cerned with assigning interpretation labels, I = {11, ..., IN} to each of the N disjoint regions found in a segmented image. Each I,- takes one of the possible interpreta- tion labels {L1, ..., LM}. The MRF model is justified here because the interpretation attached to a region depends largely on the interpretation of the regions that are adjacent to it, and less so on regions that are not adjacent to it. Modestino and Zhang obtain the MAP estimate of the interpretation labels using simulated anneal— ing. The interpretation results are reasonable, but results on only two images have been reported. 3.6 Related Approaches In addition to the regularization approach mentioned in Chapter 1, various other paradigms have been proposed in the literature to incorporate contextual dependency. Two important ones are relaxation labeling [54] and anisotropic diffusion [83]. Relaxation labeling is an iterative technique for propagating local constraints on labels of neighboring sites. Associated with each site, t are weights pt(l) 6 [0,1], for each possible label, I. Contextual knowledge is represented by compatibility functions, rt,(ll, 12) which quantify compatibility of label II at site t with label I; at a neighboring site 3. The weights for labels at each site are updated in a locally parallel fashion based on current weights and the compatibility functions according to the following equation [101]: Z: Pt(l)(K)l1 + Athlml Ptfl) where K is the iteration number. The iterations are stopped when the weight for some label is 1 (or is close to 1) at all sites. The sites are labeled with the label for 48 which the associated weight is the highest among all other labels at that site. There are two notable drawbacks to this procedure. The weights are difficult to interpret since they have no semantics attached to them (even if they have one initially, it will be lost after a few iterations). Secondly, after the initial assignments of weights, the measurements at each site are ignored and so, it is likely that later iterations lead the solution farther away from the observations. In other words, the two forms of evidence are not combined consistently and there is a possibility of ignoring one form of evidence. Notice that Markov random field contextual models, when used in a Bayesian framework deal effectively with this problem by using the Bayes’ rule to combine evidence. Anisotropic diffusion is a paradigm used for computing a multiscale representa- tion [99] of images. It is achieved by computing a series of images indexed by a scale parameter, 1, using the anisotropic equation [83] May) = div(€(ar,y, t) V I(16.31)) = C(rcay,t)AI(-'r,y) + V0 V 1013,31) where, div is the divergence operator, V and A are the gradient and Laplacian op- erators, c(2:,y, t) is the conduction coefficient and I, is the image attribute at scale t. An important requirement for multiscale representations is that the edges and their location at small scale (high resolution) should be preserved at larger scales (low resolution) . This is equivalent to enforcing smoothness constraints across the image except at the location of discontinuities. Perona and Malik let the conduction coefficients C(x, y, t) depend on the brightness gradient at each location, so diffusion (Gaussian smoothing) is discouraged between pixels that have large brightness gra- dients between them by decreasing the conduction coefficient between the two pixels. They have obtained good edge detection results using this method. They Show that anisotropic diffusion also minimizes an energy function, and therefore, is a special 49 case of the MRF model. The limitation of this method is that it is applicable to a very restricted class of contextual models [83]. 3.7 Summary In this chapter we looked at two categories of computer vision problems in which MRF models have been applied. In the first category, MRF models are used to math- ematically describe image textures using only a few parameters. Only limited success has been achieved in modeling image textures using MRF models. The issues in texture modeling using MRF are: (i) specification of the model in terms of neighbor- hood size and clique parameters (ii) robust estimation of parameters and (iii) test of goodness-of—fit between the data and the model. Several studies which use MRF pa- rameters for texture classification report good classification results on a limited class of textures. However, for texture synthesis and texture segmentation problems, it is not clear whether MRF models result in superior performance compared to related approaches. MRF models have also modeled the spatial a priori information or contextual information in order to enhance the performance of decision making algorithms in a wide range of image analysis problems such as restoration, segmentation, edge detection, and multi-sensor fusion. Performance results reported using MRF model- based methods for image segmentation and edge detection range from reasonable on real images to impressive on synthetic images. The relationships between MRF models and two other paradigms that incorporate contextual information, namely, relaxation labeling and anisotropic diffusion are briefly examined. The generality of MRF model-based approach is discussed. h, (It lit] for det CHAPTER 4 Edge Detection in Range Images There are two common approaches for segmenting range images into surface patches: region-based [50, 8] and edge—based [31, 75]. Region-based segmentation schemes group ‘similar’ pixels to form surface patches. Edge-based segmentation schemes utilize the dissimilarity in properties of adjacent surfaces to compute the surface boundary map of the image. Closed boundaries define a surface patch. Jump and crease edges have been widely used for segmenting range images. A jump edge in range images is caused by discontinuities in depth values, whereas a crease edge is formed by the discontinuity of surface normals. Crease edges are more difficult to detect than jump edges [32]. Region-based schemes assume that regions or surfaces in range images can be globally approximated by an analytic function [8]. Edge-based schemes need only to approximate the surfaces near the discontinuities. Region-based methods guarantee closed boundaries for regions, a feature that is missing in edge-based methods. Edge- based schemes have the advantage of relying only on local information to separate surfaces and therefore, need not approximate the entire surface by a global analytic function. By suitably combining edge-based and region-based approaches, one could realize the advantages of both methods [102, 58]. In this chapter we develop an MRF model-based approach to detect jump and 50 51 crease edges in range images. The edge detection algorithm has been combined with the Hoffman and Jain’s [50] region-based scheme resulting in a hybrid algorithm for segmentation. Our segmentation scheme is compared with the Hoffman-Jain segmen- tation scheme using several range images. 4.1 Edge Detection as Bayesian Estimation We View the problem of edge detection in range images as a site labeling problem. This approach is different from edge relaxation methods [3] in the way contextual information is combined with measurements. In the site labeling approach, these two types of information are combined consistently using the Bayes’ rule, whereas in edge relaxation methods, data values are ignored after an initial labeling is obtained and therefore, the final edge labeling may not be consistent with data. Our approach to edge detection in range images is similar to the one used by Chou et al. [20] for edge detection in intensity images. Each edge site can be labeled as either EDGE or NON-EDGE. An important difference between site labeling problems that deal with pixel sites and those that deal with edge sites is that there is no direct measurement or obser- vation available at an edge site since the edge sites are defined to lie between two adjacent pixels. We circumvent this problem by using indirect measurements in the form of edge strengths computed from the range values at pixels in the neighborhood of the edge site. As mentioned in Section 2.4, we let w = {w1,w2,w3, ...,wK} rep- resent the indirect measurements and l = {11,12, 13, ..., 1K} represent the edge labels at the K edge sites. The neighborhood shown in Figure 2.5 is used here to define the a priori distribution. Figure 4.1 shows all possible clique types for this neigh- borhood. To completely specify the a posteriori distribution (see Section 2.2), we need the conditional distribution f (willt), and clique potential functions Vc(..) For 52 C: :1 = E: E (a) (b) (C) (d) E E :1" [I I jh L J U U — n E E (e) (f) (g) (m Figure 4.1. Edge clique types for a second-order neighborhood. the case of binary edge labels {EDGE, NON-EDGE}, we only need the likelihood ratio 1(wiifiudfiggfifi at each line site and the clique potential functions VC(.) [20]. 4.2 Computation of Edge Likelihoods The edge strengths are derived from the range values observed at pixels in the neigh- borhood of the edge site. We define a statistic which indicates the edge strength at an edge site and derive its distribution. Obviously, two such statistics will be required, one each for jump edges and crease edges. 4.2.1 Jump Edge Statistic We propose the absolute value of the difference in the average depth values on the two sides of an edge site as a statistic for testing the hypothesis that there is a jump edge at that site. If the N depth values in the neighborhood of an edge site t are 53 1:] 2 I I 1 I -+1 N—l N 2 ' , R1 R2 Figure 4.2. Pixel numbering in the neighborhood of a vertical edge site. A similar numbering scheme is used for horizontal edge sites. numbered as in Figure 4.2, then the jump edge statistic J,, at edge site t, is given by 2 N J, : |N( d, — E: d,)|, (4.1) where d,- is the depth value at the pixel i (see Figure 4.2) and [I is an absolute value operator. A very large value of N reduces the localization accuracy whereas a very small value increases sensitivity to noise. We used two pixels on either side of an edge site to compute the jump edge statistic, so N = 4. The null hypothesis is that no jump edge exists at a site, and the alternative hypothesis states that a jump edge exists at that site. 1. Distribution under the null hypothesis: Assuming independence of depth mea- surements on the two sides of an edge site, the distribution of statistic J under the null hypothesis, fJo, is a t-distribution [78] with % degrees of freedom, mod- ified such that negative values are not allowed. For computational reasons, we have approximated the t-distribution by a normal distribution. The density of .7: can then be computed using the corresponding 2 statistic for a standard normal distribution given by J: \/(4/N)variance(Jt)’ (4’2) z: 54 f(J|EDGE) k r Sd (TJ) maxI J J Figure 4.3. Distribution of J under the alternative hypothesis. The variance of J, can be estimated either from a sample range map from the sensor, or from the given range image itself. 2. Distribution under the alternative hypothesis: The computation of distribution of the statistic J under the alternative hypothesis (fJA) is complicated by the composite nature of the alternative hypothesis. Based on the intuition about how physical objects are arranged, we assume that any depth difference greater than a specified minimum depth difference (taken here to be the standard devi- ation of J, sd(J)) is equally likely. Figure 4.3 shows the assumed distribution of J, given that a jump edge exists at the edge site. We have scaled the depth values to lie between 0 and 255 so the value of max J is 255 for the jump edge statistic. The jump edge likelihood ratio LJ(t), at edge site t, can then be computed as : fJAUt) fJofjt) , LJU) (4.3) where j, is the observed value of the statistic Jt. 55 4.2.2 Crease Edge Statistic The change in direction of the surface normal determines the likelihood of a crease edge at an edge site. Surface normals were computed by using only four of the eight neighbors closest in depth value to the center pixel along with the center pixel in order to reduce the undesirable effect of using pixels that do not belong to a single surface. We define the crease edge statistic Ct, at edge site t, using the pixel numbering scheme given in Figure 4.2, as N/2 N 2 c. = ”75(an 2 min. (4.4) ‘:1 i=g+l where n,- is the unit vector in the direction of the surface normal at pixel i and H.” denotes the magnitude operator. The value of N = 2 was used to compute the crease edge likelihoods because the use of larger values of N tend to encourage detection of spurious crease edges inside homogeneous curved surfaces. 1. Distribution under the null hypothesis: As in the case ofjump edge statistic, we approximate the distribution of C under the null hypothesis, foo, by a normal distribution. This approximation is necessary only for computational simplicity. 2. Distribution under the alternative hypothesis: Intuition suggests that the distri- bution of C under the alternative hypothesis, fCA, is similar in form to that of fJA(J); only the maximum and knee values in Figure 4.3 are different and are set to max C and sd(C), respectively, where sd(C) is the standard deviation of the statistic C. The value of max C is 2.0 for the crease edge statistic. The crease edge likelihood ratio Lc(t), at edge site t, can then be computed similar to LJ(t) in Eq. (4.3) as ___ fC'A(Ct) fCO(Ct) . 140(1) (45) where c, is the observed value of the statistic Ct. 56 4.2.3 Difficulties with Gradients and Discontinuities The jump and crease edge statistics described above have been defined under the assumption that, in the neighborhood of an edge site (R1 and R2 in Figure 4.2), the observations are i.i.d. Gaussian. This assumption can be violated in two ways: (i) there could be discontinuities within R1 and R2 as in the case of an edge site close to a boundary between two surfaces; (ii) the true depth values (or surface normal directions) may not be constant in R1 and R2 (6.9., slanted surfaces for jump edges and curved surfaces for crease edges). These violations lead to undesirable effects such as classifying a thick strip of edge sites near a boundary as edges, or identifying edge sites within regions with high depth or surface normal gradients as edges. Similar problems are encountered in regularization schemes which use weak constraints [10]. We incorporate two modifications to each of the two statistics so that the modified statistics can be used in a hypothesis test derived under the i.i.d. Gaussian assump- tion. We first replace the statistics J and C by deviates from local averages computed in a 1 x 5 (or 5 x 1 for vertical edge sites) window of edge sites parallel to the center site. For edge sites away from the boundary, the local average and the individual values are close, so that the modified statistic will take small values for these sites. For edge sites on a boundary, individual values are higher than the local averages and the modified statistics take larger values. The second modification performs a nonmaxima suppression in the same window by suppressing the statistic value at the center site if it is not either the largest or the next largest value in the window. The second largest value in the window is not suppressed to account for parallel edges that are very often formed near corners. 57 4.2.4 Combining Jump and Crease Edge Likelihoods At each edge site we have two likelihood ratios, one for the jump edge statistic and the other for the crease edge statistic. How should we combine this information to obtain the edge labeling? At one extreme, one could have two separate labeling processes, one each for jump and crease edges which make use of the corresponding likelihood ratios only. This approach does not account for the interaction between the two processes and makes it difficult to enforce continuity constraints. At the other extreme, one could use the two likelihood ratios as components of an observation vector, specify appropriate cliques and potential functions, and use an MRF based labeling framework. However, the specification of cliques and potential functions is a difficult problem which is further compounded by a vector of observed values. As a compromise, we have allowed limited interaction between the jump and crease edge observations. At each edge site, we replace the two likelihood ratios by the maximum value, allowing us to work with Markov random fields with scalar observations. 4.3 Edge Detection Algorithm Our edge detection algorithm can be summarized as follows. 1. For each edge site (a) Compute the jump edge likelihood ratio, L]. (b) Compute the crease edge likelihood ratio, LC. (c) Edge likelihood ratio, LE : max (LJ, LC). 2. Use HCF algorithm [20] given in Figure 2.3 to obtain the desired edge labeling. The parameters to be specified in this algorithm are: neighborhood size, potential function values, and standard deviations of jump and crease edge statistics. 58 — _ = 3.2 0.1 -l.7 10.0 -1.7 -0.08 == == 1:1: =— -_ __ -1.0 5.1 -4.5 -4.5 -1.0 1.0 = _ I I I I - : Edge site labeled EDGE = ' Ed 9 site labeled NON—EDGE 10.0 30.0 ' 9 Figure 4.4. Cliques and their potential function values determined using an ad hoc method. 4.4 Clique Potential Functions The cliques of interest depend on the size of the neighborhood, while the potential function values depend on the labeling problem at hand. Clique potentials are chosen based on the spatial structure or constraints that a clique symbolizes. For example, if both the edge sites of clique (b) in Figure 4.1 are labeled EDGE, the clique poten- tial should be assigned a small positive or a negative value since this configuration represents continuity of edges. If only one of the two sites is labeled EDGE, a high positive potential value should be assigned to discourage the formation of these con- figurations since they indicate a break in the boundary. In other words, if the labels in the clique do not agree with a priori constraints of continuity and smoothness, then that labeling is discouraged by assigning a high potential value to that clique. However, these considerations only provide some guidelines for setting the initial val- 59 ues of parameters. We used the potential values used by Chou ct al. [20] as the initial values of the parameters. The actual values of the parameters were arrived at by manually altering these parameters based on performance of the edge detection algo~ rithm on GRNBLK3 and HARRISCUP images. The following procedure was used to select the parameters. At each step in the procedure, only one parameter value was changed. For a selected parameter, a new value is assigned based on edge detection error (broken edges, parallel edges, missing edges, spurious edges) on test images. This procedure was repeated on all parameters until there was no visible change in the edge detection result. This procedure results in a locally “best” set of parameter values. Following Chou et al. [20], we set the potential values to zero for some clique configurations which either do not symbolize any important spatial constraints or the associated constraints are already incorporated using a larger clique configuration. While determining these clique parameters empirically, it was observed that edge detection results are most sensitive to parameters of cliques with single edge sites and least sensitive to parameters of cliques with 4 edge sites. Chou et a1. [20] explain this behavior by noting that the single edge site cliques control the first-order statistics and cliques of larger size control higher order statistics. It is known that higher order statistics are less important than first-order statistics in characterizing images [63]. The cliques used in our experiments are given in Figure 4.1 and the configurations with non-zero potentials are shown in Figure 4.4. 4.5 Experimental Results Edge detection experiments were performed using images from the range image database described in Section 1.4. We report results on five different range images. The standard deviations of the jump and crease edge statistics (sd(J), sd(C)) were 60 estimated from the homogeneous area of an arbitrary test image (HARRISCUP). The estimated standard deviations were 1.05 for the jump edge statistic and 0.05 for the crease edge statistic. The same cliques and potential function values were used for all the five images. The edge detection results for these five images are presented in Figures 4.5-4.9. Each figure shows the following details. 0 The original range image in part (a). o The range image as a pseudo—intensity image in part (b). The pseudo-intensity image highlights the edges and surfaces in the original image. The pseudo- intensity value at each pixel is proportional to the difference in angle between the surface normal at that pixel and the viewing direction. 0 Edge detection result using the algorithm of Section 4.3 with clique parameters from Figure 4.4 is shown in part (c). Figure 4.5(a) shows the GRNBLK3 image. This is a fairly simple image to seg- ment and as shown in Figure 4.5(c), the algorithm detects all the crease and jump edges present in the image. Figure 4.6 shows the results for the HARRISCUP image. Our algorithm detects some spurious edges near the bottom of the cup, but all other edges in the scene are detected correctly. Figure 4.7 shows the results for the COL- UMNl/GRNBLK3 image. Again, all significant edges (edges of length more than a few pixels) have been found correctly. The crease edge near the top corner is broken in the edge map indicating the need for postprocessing when surface segmentation is desired. Figure 4.8 shows the results for the AGPART2 image and Figure 4.9 shows the results for the COLUMN 2/ ELL image. All the jump and crease edges have been detected correctly by the edge detection algorithm. 61 Figure 4.5. Edge detection results for the GRNBLK3 image: (a) range image; (b) pseudo-intensity image; (c) edges detected using ad hoc clique potentials. 62 Figure 4.6. Edge detection results for the HARRISCUP image: (a) range image; (b) pseudo-intensity image; (c) edges detected using ad hoc clique potentials. Figure 4.7. Edge detection results for the COLUMNl/GRNBLK3 image: (a) range image; (b) pseudo-intensity image; (c) edges detected using ad hoc clique potentials. 64 Figure 4.8. Edge detection results for the AGPART2 image: (a) range image; (b) pseudo-intensity image; (c) edges detected using ad hoc clique potentials. 65 Figure 4.9. Edge detection results for the COLUMNZ/ELL image: (a) range image; (b) pseudo-intensity image; (c) edges detected using ad hoc clique potentials. 66 4.6 Hybrid Method for Segmentation of Range Images Our edge detection method, like all other edge-based segmentation methods does not always result in closed boundaries. Closed boundaries are needed to extract surface patches. An obvious solution is to apply some heuristics to convert the edge map with broken boundaries into closed contours which represent surface boundaries [31]. However, this reduces the robustness of the segmentation method significantly because small spurious edges may be treated as boundaries if excessive relaxation (continu- ation) is used to close broken edges. To overcome this problem, we have developed a hybrid segmentation method which uses our MRF-based edge finding algorithm for discontinuity detection and a region-based method for continuity detection. The outputs of the two algorithms are integrated using a simple approach. We have used the clustering-based range image segmentation method of Hoffman and Jain [50]. Their method used six features per pixel in a squared-error clustering algorithm: 3 pixel coordinate values and the 3 components of the surface normal at the pixel. Pixel coordinates incorporate spatial information and surface normals provide surface continuity and smoothness information. The clustering algorithm is unable to identify the correct number of surfaces; in other words, there is no one to one correspondence between the clusters and the physical object surfaces present in the image. The Hoffman and Jain method oversegments the image by specifying a desired number of clusters which is larger than the expected number of surfaces present in the scene. A merging step then merges adjacent surface patches or clusters which are of the same surface type and which are not separated by a jump or crease edge. This algorithm gives reasonable segmentation results in most cases but on some images it fails to merge some surface patches belonging to the same physical surface [50]. 67 Our hybrid segmentation algorithm works as follows. The Hoffman-Jain region- based segmentation algorithm provides an initial segmentation (only the initial clus- tering step is used). The oversegmented solution produced at this stage has the desirable characteristic that the resulting clusters or surface patches are almost al- ways contained in a single physical surface in the scene. Therefore, nearly all true edges are preserved in the oversegmented solution to within a couple of pixels ac- curacy. The boundary segments in the oversegmented solution are validated using evidence from the edges found in the MRF-based edge detection algorithm. Consider the boundary between an arbitrary pair of adjacent surface patches, p1 and p2, detected by the clustering algorithm. Let NT be the total number of edge sites along this boundary. Refer to the edge sites on this boundary as candidate sites. For each candidate site, we verify whether a corresponding edge site in the edge-based segmentation is marked as an EDGE. In order to allow for localization disparities between the two segmentations, matching is relaxed to include the two nearest sites parallel to the current site in the edge-based segmentation. Consider a candidate site, i, on the boundary of interest. We mark candidate site, i, to have a match in the edge-based segmentation if any one of the three corresponding edge sites mentioned above have been labeled as EDGE. Let NM be the total number of candidate sites that have a match in the edge—based segmentation. The surface patches p1 and p2 are merged if (Ix—:4) < t. where t is a positive threshold, 0 < t < 1. We tried several values of the threshold t and found that the segmentation results were the same for a wide range ( t E (0.40, 0.80)) of values. We used t = 0.60 for all the results reported in this chapter. The segmentation results of the hybrid algorithm are shown in Figures 4.10 - 4.14. Parts (a) and (b) of these figures show the input range image and the corresponding 68 edge-based segmentation, part (c) shows the initial segmentation obtained by the region-based method, and part (d) shows the final segmentation. Comparing the edge-based and region-based segmentations for these images, we see that: (i) surface patches do not cross over jump or crease edges and the number of surface patches match the number of visible surfaces on the object in the scene, and (ii) broken edges in the edge-based segmentation are connected in the final segmentations. For example, the broken crease edge in the wedge of the polyhedron in Figure 4.12(b) is connected in Figure 4.12(d). The final segmentations for all the test images show that all but very small surface patches have been grouped correctly and their shape retained; this is an important trait for a segmentation algorithm meant to be used as a module in an object recognition system. 4.7 Evaluation of the Edge Detector The performance of our MRF-based edge detector is evaluated by comparing the performance of the hybrid segmentation algorithm with the Hoffman-Jain region- based segmentation algorithm. This is reasonable because the edges detected by the edge detection algorithm determine the location of the boundaries between surfaces in the final surface segmentation. The two segmentation algorithms were compared based on their performance on 11 range images. The final segmentations for the two algorithms are shown in Fig- ures 4.15 — 4.25. We compared the two segmentations by visually checking if the segmented surfaces match the physical surfaces or not. Visual assessment is justified here because of the lack of a widely accepted numerical measure of performance. The author’s visual assessment of the two segmentations is shown in Table 4.1. It is seen from the table that the hybrid algorithm outperforms the Hoffman—Jain segmentation method on 3 out of the 11 test images, and the two algorithms are comparable in Figure 4.10. Hybrid segmentation results for the GRNBLK3 image: (a) range im~ age; (b) edge-based segmentation; (c) initial region-based segmentation; ((1) result of combining region-based and edge-based segmentations. 70 Figure 4.11. Hybrid segmentation results for the HARRISCUP image: (a) range image; (b) edge-based segmentation; (c) initial region-based segmentation; ((1) result of combining region-based and edge—based segmentations. 71 Figure 4.12. Hybrid segmentation results for the COLUMNl/GRNBLK3 image: (a) range image; (b) edge-based segmentation; (c) initial region-based segmentation; (d) result of combining region-based and edge-based segmentations. 72 Figure 4.13. Hybrid segmentation results for the AGPART2 image: (a) range im- age; (b) edge-based segmentation; (c) initial region-based segmentation; (d) result of combining region-based and edge-based segmentations. 73 Figure 4.14. Hybrid segmentation results for the COLUMN2/ELL image: (a) range image; (b) edge-based segmentation; (c) initial region-based segmentation; ((1) result of combining region-based and edge-based segmentations. 74 Table 4.1. Visual comparison of segmentation results. The assessments in column 2 are as follows. “>”: the hybrid algorithm gives better segmentation results than the Hoffman-Jain algorithm. “z”: the segmentations produced by the two algorithms are comparable. “j”: the Hoffman-Jain algorithm gives better segmentation results than the hybrid algorithm. [ Scene [ Assessment [ GRNBLK3 HARRISCUP COLUMNl/GRNBLK3 AGPART2 COLUMNZ/ELL BIGWYE4 ADAPTER/CURVBLOCK BALL2/PROPANE BLOCK4/HALFSPH CAP /BOX21NCH COL2/BALL ||/\||VV|| ||V|| performance on 7 scenes. The Hoffman-Jain algorithm gives better segmentation on only BIGWYE4 image. Further, the Hoffman-Jain segmentation algorithm requires on an average three times the computational time required by the hybrid algorithm. Since the initial clustering step is common to both the algorithms, this savings in computational time is evidently due to the efficiency of the MRF model-based edge detector in comparison to the merging step used in the Hoffman-Jain algorithm. 4.8 Conclusions This chapter developed a new algorithm for detection of jump and crease edges in range images. The algorithm uses a Markov random field model to quantify contex- tual information in edge labeling. Edge detection experiments using several real range images indicate that the edges detected by the algorithm indeed correspond to the jump and crease edges in the input range images. It is, in general, difficult to choose 75 (a) (b) Figure 4.15. Segmentation results for the GRNBLK3 image using (a) Hoffman-Jain algorithm, (b) Hybrid algorithm. Figure 4.16. Segmentation results for the HARRISCUP image using (a) Hoffman- Jain algorithm, (b) Hybrid algorithm. 76 Figure 4.17. Segmentation results for the COLUMNl/GRNBLK3 image using (a) Hoffman-Jain algorithm, (b) Hybrid algorithm. Figure 4.18. Segmentation results for the AGPART2 image using (a) Hoffman-Jain algorithm, (b) Hybrid algorithm. 77 (a) (b) Figure 4.19. Segmentation results for the COLUMN2/ELL image using (a) Hoffman- Jain algorithm, (b) Hybrid algorithm. 0)) Figure 4.20. Segmentation results for the BIGWYE4 image using (a) Hoffman-Jain algorithm, (b) Hybrid algorithm. Figure 4.21. Segmentation results for the ADAPTER/CURVBLOCK image using (a) Hoffman-Jain algorithm, (b) Hybrid algorithm. (a) (b) Figure 4.22. Segmentation results for the BALL2/PROPANE image using (a) Hoffman-Jain algorithm, (1)) Hybrid algorithm. 79 Figure 4.23. Segmentation results for the BLOCK4/HALFSPH image using (a) Hoffman-Jain algorithm, (b) Hybrid algorithm. Figure 4.24. Segmentation results for the CAP/BOXZINCH image using (a) Hoffman-Jain algorithm, (b) Hybrid algorithm. (a) (b) Figure 4.25. Segmentation results for the COL2/BALL image using (a) Hoffman—Jain algorithm, (b) Hybrid algorithm. potential function values which optimize performance in the sense of being closest to the human grouping mechanism. The quantification of various prior constraints and Gestalt principles in the form of clique potentials is a formidable problem. Neverthe— less, as we have shown here. for specific problems, it is possible to utilize contextual information and other constraints in the form of MRF models. The edges detected by the edge detection algorithm are not always closed con- tours. Therefore, the edge-based segmentation was integrated with a region-based segmentation using a simple technique resulting in a hybrid segmentation algorithm. The results of experiments using the hybrid segmentation algorithm showed that the surface patches detected by the algorithm match the physical surfaces in the scene. Further, comparison of the hybrid segmentation algorithm with the Hoffman-Jain region—based segmentation algorithm showed that the hybrid algorithm gives visu- ally better surface segmentations than the Hoffman-Jain algorithm in some cases. Also, the hybrid algorithm is computationally more efficient than the Hoffman-Jain algorithm. Since the location of the edges in the hybrid segmentation is determined 81 based on the MRF model-based edge detector output, the superior performance of the hybrid algorithm is interpreted as evidence for the usefulness of MRF contextual models for the problem of edge detection in range images. CHAPTER 5 Parameter Estimation of Line Process MRF An MRF prior model is specified in terms of clique parameters. One important ad- vantage of the MRF models of context vis a vis other regularization methods [84, 10] is that the parameters of the MRF model can be determined from the knowledge of local conditional distribution of image labels. However, in most of the previous work using MRF models, the parameters of the prior model have been determined in an ad hoc fashion (the “best” parameters are found using a trial and error method) similar to the method used in Chapter 4. While the parameters specified using these methods seem to give reasonable results for some images and some image analysis tasks, it is not clear as to how close they are to the “best” set of parameters in terms of Gestalt principles of grouping. Another significant limitation in specifying these parameters empirically is that, as the neighborhood size and consequently the num- ber of parameters increases, the parameter space to be explored becomes unwieldy. Therefore, a systematic method for specification or estimation of these parameters is of significant interest. Several schemes have been proposed in the computer vision literature to estimate the parameters of an MRF [5, 27]. For the case of MRFs defined on pixel sites, these schemes have been applied with considerable success, particularly when MRF models are used to model image texture. For MRFs defined on edge sites, no parameter 82 83 estimation technique is available. Silverman et a1. [87] have achieved limited success with an alternative approach, where an analytical model is used to compute the penalty for edges. In this chapter, we propose a new method to estimate MRF model parameters when these models are used to find object boundaries in range or intensity images. We assume that mechanical (geometric) models of the 2D or 3D objects which are expected to be present in the scene are available. The models of the objects are used to generate multiple images of the objects from different viewpoints. The frequency of local edge configurations estimated from these synthesized edge maps is then used to estimate the parameters of the MRF. This estimation scheme has been applied to detect edges in range and intensity images and its performance is demonstrated on several images. 5.1 Estimation of Line Process Parameters from Object Models We face the following difficulties in estimating model parameters for MRFS defined on edge variables. 1. Line variables are unobservable and therefore, the true line process configura- tions are seldom available. 2. Even when such a true edge map is available, estimating parameters of an MRF from a single labeling is difficult because some edge configurations occur very infrequently. 3. Techniques of simultaneous parameter estimation and labeling [67] are also not suitable here because the local joint probabilities estimated from the given, possibly incorrect, labeling are very unreliable. 84 Because of these difficulties, the most commonly used method of specifying the parameters of the line process has been by trial and error. We propose a method to overcome these difficulties. We are primarily interested in site labeling problems with the intent of generating a surface (or edge) segmentation for a model-based object recognition system [34, 57]. We assume that a geometric model is available for every object expected in the scene. Further, it is assumed that all objects in the database are equally likely to appear in the scene and that all vieWpoints are equally likely. These two statements represent the prior knowledge about the scene. Our approach is to take a random sample (set of edge maps) from the population of edge maps derived from all possible vieWpoints of the given object models. It is important to note that random sample here means an edge map from a randomly cho- sen vieWpoint of a randomly chosen object model. We emphasize that the advantage of using the object models to synthesize the desired edge maps instead of using an estimated edge map as in simultaneous parameter estimation and labeling schemes, is the availability of “true” labels at the edge sites. For example, Figure 5.1(b) shows a synthetic edge map indicating the exact location of edges, obtained from the CAD model of the object shown in Figure 5.1(a). By generating images corresponding to several random vieWpoints of object models, we essentially obtain samples of edge maps which can be used for estimation of the MRF model parameters. 5.2 Least Squares Formulation For a second-order neighborhood, let 11% represent the set of labels assigned to the neighbors of edge site t. The neighbors of both horizontal and vertical sites will be treated in the same way by using the neighborhood numbering scheme shown in Figure 5.2. Since we are interested in edge detection, each site can be assigned one of the two possible labels: {EDGE, N ON-EDGE}. Let ('9 = [01, ..., 0P]T be the parameter 85 (a) (b) Figure 5.1. (a) Synthetic range image of an object and (b) corresponding edge map obtained from the CAD model of the object. 6 4 :3 5n n7 -3=”=5 : —|]I_—_I 2'] I ”6 3 = 1 1'= =’ 2 0 (a) (b) Figure 5.2. Neighborhood numbering scheme for (a) horizontal edge site, (b) vertical edge site. 86 vector that defines the clique function. The number of parameters p depends on the particular parameterization used to define the clique potentials. For example, for an isotropic and homogeneous model with the second-order neighborhood shown in Figure 5.2, the number of parameters is 35, when all possible clique configurations are used as parameters. When the parameters are specified in an ad hoc fashion, however, a portion of the possible clique configurations are assigned zero potentials in order to reduce the number of free parameters of the model. In Section 5.3, we show that there exists a parameterization with only 8 parameters for a binary process with 8 neighbors. The following formulation is based on the treatment by Derin and Elliott [27]. Let U(l,,l,~r,,@) be the sum of the potential functions for all cliques to which the edge site t belongs. U (@1th =2 V(l =(T (W )=@ Zn 0,, (5.1) c: tEC i=1 where §T(l¢,lM) 2 [n1,..., np]T and n,- represents the number of clique configurations of type i in the neighborhood of site t. The local conditional probability at site t can be written as Haw.) _ e'Ut’h‘Nve) P(th) — Z¢(INH®) , P(ltIlM) = (5.2) where Z;(l,v,,@) = :1, U(l¢,l,v,,@). Substituting 1; = EDGE and ft = NON-EDGE in Eq. (5.2) and combining the two resulting equations, we get P(NON-EDGE,1N,) P(EDGE,1M‘) eU(sDGE,lM ,G)—U(N0N-EDGE,IM,@) _ Taking logarithms on both sides and substituting for U(lt, 1M, 9) from Eq. (5.1), we 87 have P(mssnoa, 1M) Q EDGE l _ Q O .306 T = ( ( , M) (N N a,lM)) 6) log P(snssJM) (5.3) For the second-order neighborhood under consideration, there are 256 (28) possible values for the vector 11V:- The ratio P(Nos.soca,1M)/P(Eocs,l)v,) can be estimated from the edge maps synthesized from object models by counting the number of local edge configurations of type (nos.socs,lM) and dividing by the number of local edge configurations of type (meal/(rt). To avoid the ill-conditioning of Eq. (5.3) due to very small estimated values of P(mmsocs, 1M) and P(soosJM), we modify Eq. (5.3) to P(NON-EDGE,1N‘) + c P(EDGE,1M,) + 6 (@(BDGE,1M)— (NON.EDGE,1M))Te = log , (5.4) where c is a small positive number. By substituting each value of 1N. in Eq. (5.4), we will obtain 256 (28 possible neighborhood configurations) equations in p unknowns. This overdetermined linear system of equations can be solved by the method of least squares. This parameter estimation can be carried out off-line, so there is no computational overhead at run time. 5.3 A Canonical MRF Representation: Normal- ized Potential Functions For a specific Markov or Gibbs random field, there could be many equivalent potential functions which specify the same distribution [66]. So an important question to ask is “Is there an equivalent MRF representation, for a given neighborhood, with the minimal number of free parameters?” It turns out that, for the case of binary processes, such a representation exists, and the associated potential functions are called the normalized potential functions which are defined as follows [41]. 88 Figure 5.3. Neighborhood for a simple MRF used to demonstrate the existence of a unique canonical representation. Clique DC] IE] II C] I configuration Enumerative potential 3 b C d e Canonical potential 0 0 g 0 f [j : site labeled 0 I : site labeled 1 Figure 5.4. Clique configurations for the MRF with neighborhood shown in Figure 5.3 and the corresponding potentials under enumerative and canonical representations. Definition: A potential function V 2 {VA : A g .C} is a normalized potential if VA(l) = 0 whenever It =NON-EDGE for some t 6 A. The choice of the label N ON-EDGE in the above definition was arbitrary. For a given choice of the label in the definition, it is known that a normalized potential is unique for every MRF [41]. Because of their uniqueness property, normalized potential functions are also known as canonical potentials. We give below a simple example to demonstrate the existence of canonical potential representation. Consider a binary MRF pixel process with pixel labels {0,1} defined using the neighborhood shown in Figure 5.3. Figure 5.4 lists all possible clique configurations 89 Table 5.1. Conditional probabilities under two representations. Neighborhood Conditional Enumerative Canonical configuration probability representation representation :ryz P(ylatz) —(20+d) 1 000 P(OIOO) Fania)+e—(2b+c) 1+e—i9l _(a+b+d) 1 001 P(0[01) (“June—(Mac) 1+c—l9+li —(a+b+d) 1 100 P(OIIO) e—(a+:+d‘)+e-(b+c+c7 1+e-(9+fi -(2b+d) 1 101 P(0I1 1) fining—(n+0 1' +' c-Zg+2li -(2b+e) -(9) 010 P(IIOO) e-(2aid)+e—(Zb+67 lie-i9; -(b+c+c) -(9+f) 011 P(IIOI) c-Ta+:+d)+e-(b+c+e) fi—lir 9+» -—(b+c+c) ‘(9+f) 110 P(lIIO) finish—(mm lie-(9+!) —(2c+e) ‘(9+2/) 111 P(IIH) c—rzbia)+c—(2c+ej W 90 for this neighborhood under isotropic assumption, along with their potentials under enumerative and canonical parameterizations. Table 5.1 shows the conditional distri- butions corresponding to the two parameterizations. By equating the two expressions for conditional probabilities corresponding to the same neighborhood configuration (this can be done because, the two conditional distributions represent the same MRF), it is straightforward to see that the parameters of the unique canonical representation for this MRF are given by f = a—‘Zb—d g = 2(b—a)+e—d This example shows that enumerative representation with 5 parameters (a, b, c, d, e) is equivalent to the canonical representation with only 2 parameters (f, g). The clique configurations with non—zero canonical potentials for the neighborhood of Figure 5.2 are shown in Figure 5.5. As a consequence of the above definition, the potentials for all configurations in which at least one of the site labels is NON-EDGE is set to zero. Therefore, only eight parameters are needed to represent the MRF defined on the second-order neighborhood corresponding to the eight configurations shown in Figure 5.5. The number of parameters that need to be estimated is thus reduced considerably (from 35 to 8). 5.4 Experimental Results The proposed method for estimating clique potentials was tested using a model database of 10 3D objects [33]. The object models used in our experiments consist of mechanical CAD models of 3D objects built using the GEOMOD solid modeling system [56]. Several of these objects were used by Flynn and Jain in their 3D object recognition system called BONSAI [33]. Figure 5.7 shows one range image for each of these ten objects. For parameter estimation, the number of objects and the number of 91 - : Edge site with label EDGE Figure 5.5. Clique configurations with non-zero canonical potentials. _ _ _ _— - 21.6 -19.46 49,34 1.33 l _ _ I 28.5 0.99 -41.28 -2.95 _ : Edge site with label EDGE Figure 5.6. Canonical clique potential values estimated from 3D CAD models by the least squares method. 92 Figure 5.7. Synthetic range images of objects used for estimating clique parameters; only one view per object is shown here. views (images) of each object should be sufficiently large so that the frequency of local edge configurations computed from these images are consistent with the probability with which these configurations appear in a range image from an arbitrary viewpoint. We generated 100 edge maps corresponding to 100 random views for each of the 10 model objects and computed the local conditional probabilities of edge labels using a straightforward histogramming technique. The parameters of the canonical MRF representation can be estimated using the least squares formulation by setting the number of parameters p = 8. The estimated values of these 8 parameters from the local characteristics obtained from CAD models by setting 6 = 1/256 in Eq. (5.4) are shown in Figure 5.6. The parameter estimates are sensitive to the value of 6 used. However, changing the value of 6 does not have the same effect on the edge detection results. Presumably, the constraints specified by the parameter values estimated with different 6 values do 93 not change drastically, even though the numerical values of the individual parameters change considerably. Notice that the estimated value for the same clique configuration is drastically different from that selected in an ad-hoc fashion (compare the values in Figures 4.4 and 5.6). We must point out that these two sets of parameter values are not directly comparable, because the number of parameters in the two representations are different. However, even though the parameter values in the two representations are very different, the contextual constraints expressed by them need not necessarily be different. Edge detection experiments were performed using our database of range images shown in Figure 1.6. Here we report results on all the 11 images in our database. The algorithm presented in Section 4.3 was used for edge detection. The same cliques and potential function values were used for all the images. The edge detection results for these images are presented in Figures 5.8 ~ 5.18. In all these figures, the following details are shown. 0 The original range image is shown in part (a). o The range image is shown as a pseudo-intensity image in part (b). o The edge detection result using the algorithm of Section 4.3 using estimated clique parameters given in Figure 5.6 is shown in part (c). o The edge detection result using the algorithm of Section 4.3 using clique pa- rameters given in Figure 4.4 is shown in part (d). Figure 5.8 shows the results for the GRNBLK3 image. With estimated clique potentials, the crease edge near the top of the wooden block is broken but the crease Edge is correcty detected using the ad hoc clique potentials. Also, one of the jump edges is not detected cleanly with the estimated clique potentials. 94 (a) (b) (C) ((1) Figure 5.8. Edge detection results for the GRNBLK3 image: (a) range image; (b) pseudo-intensity image; (c) edges detected using estimated clique potentials; (d) edges detected using ad hoc clique potentials. 95 Figure 5.9. Edge detection results for the HARRISCUP image: (a) range image; (b) pseudo-intensity image; (c) edges detected using estimated clique potentials; (d) edges detected using ad hoc clique potentials. 96 ’ ’ Figure 5.10. Edge detection results for the COLUMNl/GRNBLK3 image: (a) range image; (b) pseudo-intensity image; (c) edges detected using estimated clique poten- tials; (d) edges detected using ad hoc clique potentials. 97 Figure 5.11. Edge detection results for the AGPART2 image: (a) range image; (b) pseudo-intensity image; (c) edges detected using estimated clique potentials; (d) edges detected using ad hoc clique potentials. 98 \‘Q Q \ Figure 5.12. Edge detection results for the COLUMNZ/ELL image: (a) range image; (b) pseudo-intensity image; (c) edges detected using estimated clique potentials; (d) edges detected using ad hoc clique potentials. 99 Figure 5.13. Edge detection results for the BIGWYE4 image: (a) range image; (b) pseudo—intensity image; (c) edges detected using estimated clique potentials; (d) edges detected using ad hoc clique potentials. 100 Figure 5.14. Edge detection results for the ADAPTER/CURVBLOCK image: (a) range image; (b) pseudo-intensity image; (c) edges detected using estimated clique potentials; (d) edges detected using ad hoc clique potentials. 101 Figure 5.15. Edge detection results for the BALL2/PROPANE image: (a) range im- age; (b) pseudo—intensity image; (c) edges detected using estimated clique potentials; (d) edges detected using ad hoc clique potentials. 102 6@ 5% Figure 5.16. Edge detection results for the BLOCK4/HALFSPH image: (a) range im~ age; (b) pseudo-intensity image; (c) edges detected using estimated clique potentials; (d) edges detected using ad hoc clique potentials. 103 Figure 5.17. Edge detection results for the CAP/BOXZINCH image: (a) range image; (b) pseudo—intensity image; (c) edges detected using estimated clique potentials; (d) edges detected using ad hoe clique potentials. 104 Figure 5.18. Edge detection results for the COL2/BALL image: (a) range image; (b) pseudo-intensity image; (c) edges detected using estimated clique potentials; (d) edges detected using ad hoc clique potentials. 105 Figure 5.9 shows the results for the HARRISCUP image. The crease edge near the bottom of the cup (top of the figure) is detected correctly from the estimated clique potentials. With ad hoc potentials, however, several spurious edges are detected near the crease edge. Figure 5.10 shows the results for the COLUMNl/GRNBLK3 image. All physical edges in the scene are detected in both cases. However, the crease edge on the head of the COLUMNl object is wrongly detected as two parallel edges when the ad hoe clique potentials are used, but is detected correctly when the estimated clique potentials are used. Figure 5.11 shows the results for the AGPART2 image. In this case, both esti- mated and ad hoc clique potentials result in some errors in the edge detection. With estimated clique potentials, no spurious parallel edges are detected but the crease edge corresponding to the outer ring of the object is not connected. On the other hand, with ad hoe clique potentials, the edges are all connected, but some spurious parallel edges are detected. Figure 5.12 shows the results for the COLUMNQ/ELL image. Here again, the MRF-based edge detection algorithm makes different mistakes with the two sets of clique potentials. With estimated clique potentials, the crease edge on the ELL is broken at several places. On the other hand, with the ad hoc parameters, the crease edge near the bottom of COLUMN2 is not detected very cleanly. Figure 5.13 shows the results for the BIGWYE4 image. The crease edge near the intersection of two cylinders is not detected completely with estimated clique potentials. With ad hoc parameters, this crease edge is detected properly. Figure 5.14 shows the results for the ADAPTER/CURVBLOCK image. The spurious edges detected on the sides of the ADAPTER are due to the “holes” (pixels where range data is not available) in the image. These edges can be eliminated by appropriate preprocessing such as median filtering of the input range image. All other 106 jump and crease edges in the scene are detected by the algorithm with both estimated and ad hoe clique potentials. Figure 5.15 shows the results for the BALL2/PROPANE image. All the significant jump and crease edges are detected using both sets of clique potentials. With the estimated clique potentials, however, the algorithm detects a spurious parallel edge close to the left side of ball. Notice that both objects were not included in the database of objects used to estimate the MRF model parameters. Figure 5.16 shows the results for the BLOCK4/HALFSPH image. The HALFSPH object was not used for estimating the MRF parameters. All jump and crease edges on the BLOCK4 object are detected correctly in both cases. On the HALFSPH object, while the jump edges are detected accurately in both the cases, only estimated clique potentials result in correct detection of the crease edge. Figure 5.17 shows the results for the CAP/BOX2INCH image. Notice again that the BOXQINCH object was not used for estimating the MRF parameters. The two sets of clique potentials result in good edge detection for the CAP object, but fail to connect two different crease edges on the edges of the BOXZINCH object. Figure 5.18 shows the results for the COLUMN2/BALL image. As in the case of the ADAPTER/CURVBLOCK image, the spurious edges detected on the ADAPTER are due to the holes in the image. The edge detection results with estimated and ad hoc clique potentials, are comparable for this image. It is seen that nearly all the jump and crease edges have been detected by the algorithm with both estimated and ad hoc clique potentials. It is also seen that edges detected using the estimated normalized clique potentials typically contain fewer spurious edges than with ad—hoc clique potentials. Table 5.2 summarizes the author’s visual assessment of the segmentation results in Figures 5.8 - 5.18. The table indicates that the edge detection performance with estimated parameters is as good as the performance with ad hoc clique potentials. Also, from the edge detection results 107 Table 5.2. Visual comparison of edge detection results. The assessments in column 2 are as follows. “>”: estimated clique potentials result in better overall edge detection than ad hoe clique potentials. “z”: edge detection results with the two sets of clique potentials are nearly the same. “<”: ad hoc clique potentials result in better overall edge detection than estimated clique potentials. [Scene [ Assessment I GRNBLK3 HARRBCUP COLUMNl/GRNBLK3 AGPART2 COLUMNQ/ELL BIGWYE4 ADAPTER/CURVBLOCK BALLQ/PROPANE BLOCK4/HALFSPH CAP /BOX21NCH COL2/BALL IIVVA IIVAIIAII on images containing objects not used for estimating the MRF model parameters, it is clear that all objects need not be used for estimating the parameters. The MRF model defined using the second-order neighborhood is capable of incorporating edge continuity and smoothness information only, and these constraints are very well represented by a few “typical” 3D objects. 5.5 Edge Detection in Intensity Images The parameter estimation technique introduced in Section 5.1 is applicable to a va- riety of MRF models. This section illustrates this point by using this parameter estimation scheme to detect edges in intensity images. We assume here that edge detection is performed as a processing step in a 2D object recognition system and 2D models of the objects expected in the scene are available. 108 5.5.1 Edge Detection Approach The edge detection approach proposed for intensity images is very similar to the approach used for range images. Here we are interested only in step edges in the intensity images. The problem of detection of step edges in intensity images is essen- tially the same as the problem of detecting jump edges in range images. Therefore, the statistic that we used for jump edges in range images could be used to detect step edges in intensity images as well. The number of pixels used in the computation of ‘step edgeness’, is the same as the number of pixels used for computing the ‘jump edgeness’ in Chapter 4. Specifically, we use N = 4 in Equation (4.1). The algorithm for intensity edge detection is essentially the same as in Section 4.3 except for the likelihood ratio, LE which now involves only a single term. 5.5.2 Estimation of Parameters from 2D Object Models The parameters of the MRF model that capture the relative frequency of edge con- figurations in intensity images can also be determined from the 2D models of the objects expected in the scene. We use a boundary representation of the 2D objects in the model database, because the location of edges in the model is important for estimation of clique parameters. This representation has been used by Ayache and Faugeras [2] in their 2D object recognition system, called HYPER. In this represen- tation scheme, the object boundaries are approximated by polygons. Each model description consists of a set of line segments from the approximating polygon with the following attributes: (as, y, a, I). Here a: and y are the coordinates of the midpoint of the line segment, a is the angle of the line segment with respect to positive x-axis, and I is the length of the segment. The parameter estimation scheme involves synthesizing different edge maps from the polygonal models of the 2D objects in the database. The edge map of an ob- 109 Figure 5.19. Edge map synthesized using 2D model of an industrial part. ject corresponding to a given vieWpoint can be synthesized by performing a two- dimensional rotation of the models. Figure 5.19 shows one such edge map from a random vieWpoint generated from the model of a “T” in our database. We empha- size that the advantage of using object models to generate samples of edge maps instead of obtaining edge maps corresponding to real images is that, a large number of vieWpoints can be simulated by changing only the orientation of the model. 5.5.3 Experimental Results The experiments were performed using a database of 10 2D objects shown in F ig- ure 5.20. The number of edge maps used in the estimation of parameters should be sufficiently large so that nearly all neighborhood edge configurations appear in this sample set. For each object in the database, 100 different edge images were gener- ated corresponding to 100 different rotations of the object model. The 8 canonical clique potential values estimated using the proposed method are given in Figure 5.21. Comparing these parameter values with the values given in Figure 5.6, it is seen that some of the clique potential values are quite different in the two cases. This reflects the nature of edge configurations in the range images of 3D objects and in- tensity images of 2D objects. We used the Gibbs sampler [38] to generate samples 110 Figure 5.20. 2D objects in the database. -2o.a7 ~2-14 : Edge site with label EDGE Figure 5.21. Canonical clique potentials estimated from 2D models using the least squares method. 111 (edge maps) from the MRF models of the line process using three sets of parameters. Figures 5.22(a), (b), and (c) show edge maps generated after 100 iterations of the Gibbs sampler using the ad hoe clique potentials (Figure 4.4), the clique potentials estimated using 3D models (Figure 5.6), and the clique potentials estimated using 2D models (Figure 5.21), respectively. A careful observation of these three figures reveals that the Gibbs prior distributions represented by the estimated parameters (Figures 5.22(b) and (c)) have fewer edges than the Gibbs prior represented by the clique potentials specified in an ad hoe fashion (Figure 5.22(a)). This implies that the ad hoe clique potentials encourage formation of more edges than are found in real images. Further, parameters estimated using 2D models result in more abrupt endings of edges or boundaries than with 3D models. This behavior can be explained because the edge maps generated from 2D polygonal models are only an approxima- tion to smooth object boundaries as can be seen from the gaps in the boundaries in Figure 5.19. Figure 5.23 shows the results of edge detection on a 256 x 256 image of some industrial parts and Figure 5.24 shows the results of edge detection on a 128 x 128 image (obtained using an Odetics sensor) of a scene in a manufacturing plant (this image was provided to us by Odetics Corporation). The objects present in the image of Figure 5.24 are not included in the database of 2D objects used to estimate the clique potential values in Figure 5.21. Both figures show (a) the input gray level image, (b) the edges detected using the MRF-based approach with clique parameters estimated from 2D models of the objects, (c) edges detected using the MRF-based approach with clique parameters specified in an ad hoc fashion, and (d) the output of the Canny edge detector [16] for comparison purposes. The Canny edge detector was used for comparison here because of its wide acceptance in the computer vision community. The parameters of the Canny edge detector were fixed at values that perform reasonably for a wide range of images. The quality of edge detection results 112 1"” ‘ “P 'I' v d". Jfiltfiww Q's-9““? . r'l‘. : "I I . O :re. at. . '- h" 4’3: a". we? : able" L)». . [IA- :I.'-~. ‘IEEI-v' I: N s: :41?!" _ same. Figure 5.22. Sampling of MRF line process using the Gibbs sampler: edges generated using (a) ad hoe parameters, (b) parameters estimated from the 3D CAD models, (c) parameters estimated from 2D object models. 113 Figure 5.23. Step edge detection results for the INDUSTRIAL PARTS image: (a) the input image; (b) edges detected using clique parameters estimated from 2D object models; (c) edges detected using clique parameters specified ad hoe; (d) edges detected using the Canny edge detector (with hysteresis threshold values, 0.9 and 0.5 and width of the gaussian o = 1 pixel). 114 (C) (d) Figure 5.24. Step edge detection results for the SCENE image: (a) the input image; (b) edges detected using clique parameters estimated from 2D object models; (e) edges detected using parameters specified ad hoc; (d) edges detected using the Canny edge detector (with hysteresis threshold values, 0.9 and 0.5 and width of the gaussian o = 1 pixel). 115 using estimated parameters is nearly the same as that obtained using parameters specified in an ad hoc fashion for both images. Further it is seen that overall, MRF- based edge detection is slightly superior to the Canny edge detector output because the latter results in oversmoothed edges especially near the corners and also fails to detect edges of low contrast. 5.6 Conclusions Estimation of parameters of an MRF contextual model has been a formidable prob- lem. We have presented a new off-line scheme for estimating parameters of MRF (GRF) prior models when they are used to find edges in intensity or range images. The idea of equivalent representation of an MRF using canonical clique potentials is used to reduce the number of unknown clique potential parameters to be estimated. Experimental results presented on several range and intensity images indicate that this parameter estimation method results in as good an edge detection performance as obtained from parameters specified by a trial and error method. The significance of the result is that it provides a systematic method for specification of MRF param- eters instead of an ad hoc method. The edge detection results also indicate that it is sufficient to use only “typical” objects to estimate the line process parameters. CHAPTER 6 Edge detection and labeling by fusion of intensity and range images Fusion of information from different sensors is essential for robustness in several ap- plications of computer vision [21] (6.9., navigation, inspection, target recognition). Fusion of range (depth) and intensity images can provide information about the phys- ical origin of edges — a useful intrinsic property of the scene. Range images provide evidence for jump (depth discontinuities) and crease edges (surface normal discon- tinuities). An intensity image, in addition, provides evidence for discontinuities in albedo (reflective property of a surface). Therefore, combining the two sources of information can help locate most of the important discontinuities in the image and classify them according to their physical origin. Fusion approaches differ in the formalism used to combine information from dif- ferent sources. The approaches reported in the fusion literature fall into three cat- egories [44]: (i) the Bayesian approach, (ii) the Dempster-Shafer approach and (iii) heuristic methods. The Bayesian approach uses Bayesian statistics to incorporate sensor uncertainty and a priori information [77]. When the variables in the system are spatially arranged such as in a vision system, a Markov random field models the a priori information. The Dempster-Shafer approach [53] incorporates uncertainty 116 117 using the Dempster-Shafer theory of evidence. In this theory, probabilities are as- signed to subsets of a frame of discernment, as opposed to Bayesian statistics where, probabilities are assigned to individual events. While the literature is inconclusive on the relative benefits of the Bayesian and Dempster-Shafer theories [15], it is known that Bayesian approach has a significant computational advantage over the Dempster- Shafer approach [4]. While Bayesian and Dempster-Shafer fusion approaches employ a principled treatment of uncertainty, heuristic approaches use mechanisms such as decision rules [51, 18], histogram processing [43], and pixel by pixel logical AND op- eration [39] to fuse information from different sensors. Fusion algorithms that use the heuristic approach lack robustness because of ad hoe rules built into the system. Bayesian approach with MRF prior is preferable to the other two approaches because (i) it incorporates sensor uncertainty, and (ii) it is computationally more efficient than Dempster-Shafer theory. Fusion approaches also differ in level in the information processing hierarchy that the integration of information takes place (see Figure 1.3). Fusion at higher levels of the hierarchy are preferable when the images are not registered accurately [1, 69]. On the other hand, fusion at lower levels helps in the detection of features that are useful at higher levels. Therefore, we believe that low-level fusion approaches are preferable when the images are registered accurately. In this chapter, we develop a scheme for detection and labeling of edges from a pair of registered intensity and range images in the Bayesian framework. We use a coupled Markov random field to (i) model the a priori knowledge about surface smoothness and discontinuity locations, (ii) enforce correlations between different sources of information, and (iii) enforce constraints on edge structure in the solution. Following Gamble et al. [35], we label edge sites according to the physical origin of 118 the image discontinuities by choosing a label from the set A given by A = {JUMP,CREASE,INTENSITY,NOEDGE}. Figure 6.1 shows the relationship between different edge types and information sources. We represent albedo and shadow discontinuities by a single label (IN TEN- SITY), because it is not possible to discriminate between these two types of discon- tinuities using only one pair of range and intensity images. Our approach to fusion is different from related work on low-level fusion reported in computer vision literature. Chou [19] uses the intensity image as partial evidence of an edge in the depth data. The physical relationship between jump and crease edges and the edges in the intensity image is not utilized. Gamble et al. [35] propose a multiple lattice solution where an edge is detected in each image. Their system op- erates in a loop where physical labels determined from these individual lattices feed back information to integrate the different sources. Their architecture is incompletely specified and no examples are given. The system by Wright [100] uses limited inte- gration similar to Chou’s method, where the only constraint is that edges in different images should occur at the same location. Our approach makes use of the physical relationship between different edge types to achieve the desired integration. 6.1 Edge Labeling as Bayesian Estimation In this section, we derive expressions for the a posteriori probability distribution of edge and pixel labels, and the corresponding energy function for the fusion problem. These expressions are derived analogous to the treatment in Section 2.2. For the fusion problem, each site of the pixel lattice S has two measurement variables and two label variables corresponding to the two input (range and inten- sity) images. Since we are interested in line labeling, we use coupled multivari- 119 Intensity image Range Image Shadow Albedo Jump Crease edges edges edges edges Intensity edges Figure 6.1. Relationship between different edge types and information sources. ate MRF model as shown in Figure 2.7. Note that there is no direct measure- ment at edge sites and therefore, the line labels have to be determined based on neighboring pixel labels. Let the measurements (observations) at each pixel site i be denoted by vector Y; = (KR,I/,-’), where KR and Y}, represent the range and intensity measurements, respectively. Similarly, there are two labels (the re- stored range and intensity values) for each pixel site i denoted as X.- = (XiR,X,-I). We denote the label at a line site j as a scalar I]; The range set of each IJ- is A = {JUMP,CREASE,INTENSITY,NOEDGE}. The size of the neighborhood desired for the MRF coupled process (X, L) depends on the order of the surface continuity enforced. Since we need to recover crease edges (discontinuities in the first derivative), the neighborhood used should be large enough to involve cliques that could be used to compute second derivatives of pixel values in a local neighborhood. The neighborhood system that we use for this problem is given in Figure 6.2. The following three assumptions on the degradation process are necessary for theoretical tractability and to make the computation of the a posteriori probability 120 (b) (C) Figure 6.2. Pixel and edge site neighborhoods: (a) neighborhood of a pixel, (b) neighborhood of a horizontal edge site, and (c) neighborhood of a vertical edge site. 121 distribution feasible. Assumption 1: The intensity and the range observations are conditionally indepen- m P(le) = P(yRIXR)P(yI|X') (6-1) This assumption is reasonable as long as the noise processes corrupting the intensity and range images are independent. Assumption 2: The measurements (both range and intensity) at each site are conditionally independent. WX) =IIPM (am (3")le =IIP(3'.-'|X.-) (6-3) Assumption 3: The measurement noise in both range and intensity images is i.i.d. Gaussian with mean zero. This leads to 1 _(yR-IR)2 (3’? IX ) = e ”R . (6-4) 2n0§ (yf-sz e 2"I . (6.5) 1 NBC-1W): — j/21ro} Here, a}; and a? are the variances of noise in the range and intensity images, respec- tively. Under these assumptions, the a posteriori probability distribution of the site labels is given by [80]: P((x.l)|y) = —1—e-”<(*">'”, (6.6) ZY where R_$32 !_ U((x,l)ly)=2 (3" 2 2') + (y 2. +ZV( (x 1) (6.7) «55 OR cee 122 We denote the local conditional energy at a site t with respect to a configuration (x, l) by Ut(g); it is the a posteriori energy of the configuration obtained by modifying the configuration (x, I) such that the label at site t is changed to 9. Note that the symbol 9 denotes both pixel and edge labels depending on whether t is a pixel site or an edge site. So 9 is a scalar for t 6 C, and g a vector of length two for t E S. The potential functions VC in Equation (6.7) control the joint probability struc- ture based on clique potential values. Perhaps the most important step in modeling the contextual information using a Markov random field is the specification of clique potentials or penalties which determine the corresponding Gibbs distribution. We are interested here in enforcing zeroth (zeroth derivative) and first-order (first deriva- tive) continuity in uniform regions of surfaces in the range image and a zeroth-order continuity in the intensity image. Additionally, we would like to enforce structural constraints on the local configurations of the edges by specifying penalties for edge configurations that are not frequently observed. The pixel cliques used and the cor- responding potential functions are given in Figure 6.3. The parameters a1, a; and 03 control the amount of smoothing performed on the pixel values. The potential values assigned to edge cliques using trial and error experimentation are given in Figure 6.4. The singleton edge clique is assigned four different potential (penalty) values corresponding to the four possible edge labels. This is essential, since if all the edge labels are penalized equally, then jump edge labels will always be preferred. Notice that edge cliques of size two encourage continuity among the labels of the same type by reducing the energy of such configurations. The integration of range and intensity values is performed by a combination of pixel cliques given in Figure 6.3 and the singleton edge cliques. The model parameters flI, fig, fl], a1, C12, and a3 can be expressed in terms of 6 parameters which are easier to specify (because these parameters do not depend on an and 01)— h, kc, kJ, A], AC, and A1, by a suitable modification of the corresponding 123 —1 r r _J|]L_ l_ [H |fl:| C] (a) (b) (C) (d) Clique I configuration 1 potential function (a),(b) fllfz 01(f1I—f21)2(1-6(I,JUMP))(1—6(I,CREASE))(1—6(I,INTENSITY)) +az(f1R - sz)2(1- 5(1,JUMP)) (CHd) f1111.2121'3 013(f13 " 2sz + ff)2(1- 5(11,JUMP))(1— 5(11,CREA5E)) (1 — 6(12,JUMP))(1-— 6(12,CREASE)) (6(i,j) = 1 ifi = j, and 0 otherwise) Figure 6.3. Pixel cliques and the corresponding potential functions. Ill-I I'll-llllll II... -0.05 0.08 III-I'll... II... -0.05 0.08 ENE“! fiflflfis -0.05 0.03 _ : JUMP_EDGE — 124 -0.04 -0.04 Iflflflfl “ENE! -0.0‘ : CREASE_EDGE fiflflfifl = \\\\\\\\ 3: INTENSITY_EDGE Figure 6.4. Ad hoc edge clique potential function values. 125 expressions used by Blake and Zisserman [10] for weak membrane and weak plate models. Here 11 is the sensitivity for detecting jump edges, and A): is the scale (in number of pixels) for edge type [c which measures the amount of spatial information desired to counteract noise. The various model parameters are then computed as follows. h2/\1 fi’ "' 4.0m}2 50 = kc*51 51 = kJ*51 /\2 (11 = I 2.0 =0: 0} A2 02 Z J 2 20*03 )‘40 a3 — 2.0M}2 The fusion problem can now be stated as follows: given the observation vector y (with range and intensity components), estimate the label vector (x,l) such that U((x,l)|y) is minimized. 6.2 TPM and HCF Algorithms for Fusion The SA, TPM, and ICM algorithms for estimation of a posteriori labels described in Section 2.3 can all be extended to work for coupled MRF models. The Thresholded Posterior Mean (TPM) algorithm developed by Marroquin et al. [74] has been used by Gamble et al. [35] for their multisensor fusion problem. In this chapter, we use the TPM algorithm only to compare the convergence rates with and without fusion. The TPM algorithm is stated in Figure 6.5. While the TPM algorithm is computationally less expensive compared to the simulated annealing (SA) algorithm, it still takes a prohibitively large number of 126 TPM Algorithm for Fusion 1. Initialize x = y. Initialize I by randomly choosing the label at site t, It 6 {JUMP,CREASE,INTENSITY,NOEDGE} for each t E E. 2. (a) For each image a 6 (R, I}, where R stands for range image and I stands for intensity image, 0 For each pixel site t, Update at? by sampling from the condi- tional distribution 0 aa . a —U I? in ,i, 0 PW: |x5\{,},l, 31¢ ) = ‘3 ( I 5“” y‘ )/Zta :zrf’I can be sampled from a Gaussian distribution [62]. (b) For each line site t, update I? by sampling from the conditional distribution _ . . P(ltll£\{t)a5() : e—U(lt|1£\{t},X)/Zt 3. Repeat (2) n times, saving realizations (x,l)(kH) through (x,l)("l 4. For each line site t, choose a label g such that g maximizes flatly). 5. For each pixel site t, choose labels 91 and g3 such that 91 is the mean of it], and g3 is the mean of if. Figure 6.5. TPM algorithm for fusion. 127 iterations (> 200). Therefore, for our experiments we chose to use Chou’s HCF algorithm [19] for coupled MRF optimization. This algorithm works for continuous pixel variables as well. Therefore, it is suitable for optimization of the fusion energy function which contains continuous valued pixel variables. The algorithm uses a NULL label to represent the initial labeling at a site similar to the discrete labels case given in Figure 2.3. The HCF algorithm for fusion is summarized in Figure 6.6. Notice that the stability of a site is non-negative only when it is in a minimal en- ergy state with respect to its neighbors. Therefore, the HCF algorithm is guaranteed to reach a local minimum of the energy function. It has been noted by Blake and Zisserman [10] that energy functions involving second-order terms (an energy term consisting of second derivatives) converge ex- tremely slowly. Because of this problem, previous approaches have avoided using the second-order energy terms for surface reconstruction. We believe that the task of finding minimal energy configurations in the fusion problem is made easier because of the cues available from the intensity image. Since surface orientation discontinuities almost always result in large intensity discontinuities, and because of the relative ease with which intensity discontinuities can be found, we believe that by fusing the inten- sity and range images, crease edges could be detected with much less computational effort than if only range images were used. 6.3 Parameter Estimation of Coupled MRFS The MRF model used for the fusion problem is a coupled MRF model consisting of both pixels and edge sites. To our knowledge, there have been no attempts to estimate the parameters of a coupled MRF. In the following, we describe how we can extend our parameter estimation scheme for line process (Chapter 5) to estimate parameters of a coupled MRF. We consider a. less general coupled MRF model than the model described in Section 6.1 for simplicity of presentation and ease of testing 128 HCF Algorithm for Fusion 1. Initialize all edge and pixel sites to NULL é A. 2. Compute a stability measure G¢(x,l) for each site t 6 (5,5) as fol- lows [19]. o If site t is a pixel site, then _ AUt(xminyxmin + C!) If xtzNULL Gt(x’l) _ { AUt(xm,'n,xt) otherwise Notice that xmm, xt and a are all vectors of length two with a range label component and an intensity label component. Here xmgn = argminh,” Ut((r, 2)), where r and i are the range and intensity labels, respectively, and 0 determines the penalty for committing to a label. The purpose of a is to discourage pixel sites from committing to any label till they have enough contextual information from the labels of the neighbors. We set a : (03/2, 01/2) for all the experiments. a If site t is an edge site, then _ ’minleA,l¢lmmAUt(l,1min) if X¢=NULL Gt(x’l) — { AU¢(lm,-,,,It) otherwise where 1",," = argmin, U¢(l). The function AU;(A,B) represents the energy difference Ut(A) — U¢(B), where A and B are possible labels at site t. 3. Repeat (a) Select the site with minimum stability (b) Change its label to the one which corresponds to the lowest energy. (c) Recompute stability for the site and its neighbors. until (stability 2 0) for all sites. Figure 6.6. HCF algorithm for fusion. 129 Figure 6.7. Neighborhood of a vertical edge site in the simplified MRF model. the parameter estimation scheme. Specifically, we consider the case when only range measurements are present. Further, edge cliques of size greater than one are not considered. Therefore, the only parameters considered in the simplified model are: 02,03,fic,and fly. Also, since intensity measurements are not present, there are only three possible edge labels, (JUMP, CREASE, NOEDGE}. In the interest of compactness of the equations to be presented, we represent the JUMP, CREASE and NOEDGE labels by J, C, and N, respectively. Consider the neighborhood of an edge site for this simplified MRF model shown in Figure 6.7. Let a configuration of the variables in Figure 6.7 (from left to right) be represented by (23,, 11, 2:2, I, $3, 12, 1:4). Since the clique potentials in Figure 6.3 are dependent on the first and second differences in pixel values rather than the actual pixel values, we represent the configuration of the neighbors of the edge site of interest by the 5-tuple (D1, 11,61, 12, D,), where D] = [(131 '— 2232 + (133' d 2 I332 — x3| Dr = I132 - 2x3 + and The equations relating conditional probabilities and the parameters can be derived analogous to Section 5.1. P(l = NKDla I]: (1,127 Dr)) U(l = J,(Di,11.d,12, 19.)) — U(1= N,(Dza'nd, 12. 13.)) = 1°g P(I = no), 11 «112 D )) 130 P(l = N|(D1, ll, (1,12, D,)) an: c,(Dz,zl,d.12.D.)) — W = N ,(Du'ndv 12.1).» = 10g Pa = CKDI I. (112 D )) Using only configurations in which 11 = 12 = N, and multiplying both sides of the two equations above by -1, we have P(l = J|(D,,N,d,N,D,.) P(l = N|(D,,N,d,N,D,.) r) ) (a2d2+as(Df+Df))—(3J) = log (68) P(l = C|(D,,N,d,N,D P(l = N|(D,,N,d,N,D,. ) ) (a3(D?+DE))—(Bc)= = log ; (6.9) The values D1, d and D, are continuous, and therefore, the number of configu- rations of type (D1, N,d, N, D.) is innumerable. For each such configuration, Equa- tions (6.8) and (6.9) give two linear equations relating the parameters and the con- ditional probabilities. Since there are four parameters, (a2, 03, fig, and 3]), we need at least two different neighborhood configurations to estimate the parameters. By using more than two neighborhood configurations, a least squares estimation scheme similar to that in Section 5.1 can be used to estimate the parameters. The parameter estimation scheme was tested using synthetic range images ob- tained from the CAD models of 10 3D objects used in Chapter 5. The estimated values of the parameters were found to be extremely sensitive to the choice of neigh- borhood configurations. This dependence of the parameters on the choice of neighborhood configurations can be understood by studying Equation (6.9). This equation can be rewritten in the form y 2 ma: + c as follows. (1 = N|(D1,N,d,N,D,) (1 = C|(D;,N,d,N,D,) log i i = (-—(13)(D,2 + D?) + 50 (6.10) Equation (6.10) is the equation of a straight line with slope = —a3 and ordinate intercept :2 flc. Figure 6.8 shows a plot of the estimated values of the left hand log P(I:N[(DI,N»vavD'D P(lzCI(Di.N,d»N,Drll 131 0.0 0.2 0.4 0-5 (D,2 + D?) Figure 6.8. Plot of Equation (6.10). 132 side (LHS) of Equation (6.10) against the corresponding values of (D,2 + D3). It is clear from the figure that the estimated values of the LHS do not follow the expected straight line relationship. Because of the large deviation of the plot from a linear form, the estimated parameters also exhibit extreme sensitivity to the choice of the neighborhood configurations. It is not clear why the estimated values of the LHS of Equation (6.10) deviate so much from the expected linear relationship. One probable reason is the inadequate accuracy of the CAD model for representing surfaces. The CAD models used represent curved surfaces using planar approximations, and therefore, the synthesized range image may not be accurate enough to estimate the second-order differences. A study of performance of the estimation scheme using a more accurate representation of surfaces is necessary to get an understanding of the reason for the failure of this parameter estimation method. 6.4 Experimental Results The labeling scheme presented in Section 6.1 was tested on both synthetic and real images. Synthetic data were used to study the effect of signal to noise ratio on the detectability of crease edges, and to compare the convergence rates with and Without fusion. The TPM algorithm was used for synthetic images, because of the ability to specify the number of iterations for the algorithm. Synthetic data were obtained from the CAD models of 3D objects. The real images were obtained by registering range data from the Technical Arts Scanner with the intensity image sensed using a separate CCD camera. The two images were registered using a camera calibration algorithm [68]. The parameters of the coupled MRF process were specified by trial and error method since our parameter estimation scheme did not provide reliable estimates. 133 Figure 6.9. Edge labeling results using the TPM algorithm for the synthetic BLOCK4 image: (a) range image (on = 0.003 inches); (b) intensity image (a; = 10 gray levels); (c) edge labeling after 100 iterations using only range image; ((1) edge labeling after 100 iterations using fusion; (e) edge labeling after 1000 iterations using only range image; (f) edge labeling after 1000 iterations using fusion. Color codes for edge labels are: red — JUMP edge, violet — GREASE edge, blue — INTENSITY edge. 134 .1 / (e) (f) 4,—J—r Figure 6.10. Edge labeling results using the TPM algorithm for the synthetic BLOCK4 image: (a) range image (OR = 0.03 inches); (b) intensity image (01 = 10 gray levels); (c) edge labeling after 100 iterations using only range image; (d) edge labeling after 100 iterations using fusion; (e) edge labeling after 1000 iterations using only range image; (f) edge labeling after 1000 iterations using fusion. Color codes for edge labels are: red — JUMP edge, violet — GREASE edge, blue — INTENSITY edge. 135 Figures 6.9(a) and (b) show a 128 x 128 synthetic image pair for the BLOCK4 object. The intensity image was synthesized such that there is a square shaped albedo edge (INTENSITY) on one of the surfaces. The range and intensity images are corrupted by independent additive Gaussian noise of standard deviations, on = 0.003 inches and a; = 10 gray levels. We compared the output of the TPM algorithm with and without fusion, after 11 = 100 and n = 1000 iterations. The values of the parameters used are: h = 0.007, )«J = 0.03, A0 = 1.3, A] = 4.0, kg = 1.3, 101 = 2.2. Figure 6.9(c) shows the output of the TPM algorithm after 100 iterations, using only the range image (no fusion). Figure 6.9(d) shows the corresponding output with fusion. It is seen that using both range and intensity images, not only does the quality of crease edge detection improve, but also the square shaped mark in the intensity image is detected as well. Figures 6.9(e) and (f) show the improvement in edge labeling results after 1000 iterations. A comparison of these two edge maps shows that when only the range image is used, the algorithm fails to converge to the correct solution even after 1000 iterations. When the range and intensity images are fused, the TPM algorithm results in a near perfect labeling after 1000 iterations. The above experiment was repeated by increasing the level of noise in the range image to OR = 0.03 inches. This experiment compares the sensitivity to noise, with and without fusion. The values of the parameters used for this experiment are: h = 0.07, A1 = 0.2, A0 = 1.3, A, = 4.0, kg = 1.2, 10.; = 2.2. The edge labeling results shown in Figures 6.10(c) and (6) indicate that the TPM algorithm without fusion fails to detect all the crease edges even after 1000 iterations. With fusion, however, the algorithm detects all the jump, crease, and intensity edges accurately after just 100 iterations (Figure 6.10(d)). The run time of TPM algorithm is 811.2 seconds for 100 iterations, and 8033.4 seconds for 1000 iterations, for a 128 x 128 image. This computational speed is too slow for most applications. Therefore, for the rest of the experiments reported in 136 Figure 6.11. Edge labeling results for the BLOCKS image: (a) range image; (b) intensity image; (c) detected jump edges; ((1) detected crease edges; (e) detected intensity edges; (f) edges detected using the Canny edge detector (with hysteresis threshold values, 0.9 and 0.5 and width of the Gaussian a' = 1 pixel). 137 Figure 6.12. Edge labeling results for the JIG image: (a) range image; (b) intensity image; (c) detected jump edges; (d) detected crease edges; (e) detected intensity edges; (f) edges detected using the Canny edge detector (with hysteresis threshold values, 0.9 and 0.5 and width of the Gaussian a' = 1 pixel). 138 re Figure 6.13. Edge labeling results for the FUNNEL image: (a) range image; (b) intensity image; (c) detected jump edges; ((1) detected crease edges; (e) detected intensity edges; (f) edges detected using the Canny edge detector (with hysteresis threshold values, 0.9 and 0.5 and width of the Gaussian a = 1 pixel). 139 Figure 6.14. Edge labeling results for the BLOCK4/COLUMN1 image: (a) range image; (b) intensity image; (c) detected jump edges; (d) detected crease edges; (e) detected intensity edges; (f) edges detected using the Canny edge detector (with hysteresis threshold values, 0.9 and 0.5 and width of the Gaussian 0' = 1 pixel). 140 (c) (d) ..., 0 (e) (0 Figure 6.15. Edge labeling results for the AGPART image: (a) range image; (b) intensity image; (c) detected jump edges; (d) detected crease edges; (e) detected intensity edges; (f) edges detected using the Canny edge detector (with hysteresis threshold values, 0.9 and 0.5 and width of the Gaussian a = 1 pixel). 141 this chapter, the HCF algorithm was used because of its computational efficiency. Figure 6.11 shows the results of the labeling algorithm on a real 256 X 256 image pair. The scene contains a rectangular wooden block with two albedo edges, rest- ing on another wooden block with a triangular extension. Figure 6.11(a) shows the range image of the scene and Figure 6.11(b) shows the corresponding intensity image. The output of our edge labeling algorithm was subjected to a postprocessing stage to remove intensity edges in the vicinity of jump and crease edges; these spurious intensity edges are detected as an artifact of the registration error between the range and intensity images. The output of the labeling algorithm after postprocessing is shown in Figures 6.11(c)-(e), where each figure individually shows JUMP, CREASE and INTENSITY edges, respectively. The values of the MRF model parameters se- lected empirically are as follows: h = 0.10, A.) = 0.2, kg = 1.0, A] = 3.0, kg = 1.1, IQ = 1.5. The noise model parameters were estimated from a homogeneous area in the images. The estimated values are OR = 0.03 and 01 = 3.0. It is seen that all the edges in the image have been detected and labeled quite accurately. In Figure 6.11(f) we show the edges detected by the Canny edge detector [16] on the range and inten- sity images. First, Canny edge detector was applied separately to range and intensity images to detect step edges in the two images. Then the two edge maps were overlaid on each other. Although, the Canny edge detector parameters were not optimized to obtain the best edge map, it shows that a simple application of an edge detector to the two images and combining the resulting edge maps does not yield good results, thus stressing the importance of consistently integrating the information in the range and intensity images. Figure 6.12 shows the results on the 128 x 128 images of a calibration jig. The values of the parameters used for this image pair are, h = 0.10, A; = 0.2, A0 = 1.0, A] = 1.5, kg = 1.1, 10.; = 1.5, 0;; = 0.03, 01 = 5.0 (same as for BLOCKS image except the value of 01). The output of the labeling algorithm after postprocessing is shown 142 in Figures 6.12(c)—(e). It is seen that both jump and crease edges in the image have been labeled quite accurately. There are a few spurious intensity edges which can be removed by further postprocessing steps such as connected component analysis. Figure 6.13 shows the results on an image pair of a funnel of size 240 x 240. The values of the parameters used for this image are, h = 0.11, A1 = 0.2, A0 = 1.0, A1 = 1.5, kg = 1.02, IQ = 1.5, on = 0.03, 01 = 5.0. The output of the labeling algorithm after postprocessing is shown in Figures 6.13(c)-(e). Both jump and crease edges have been labeled accurately. It is also seen from Figure 6.13(f) that simple fusion methods fail to give sensible results. Figure 6.14 shows the results on the images of BLOCK4/COLUMN1 objects, of size 128 X 128. The values of the parameters used for this image pair are, h = 0.04, A] = 0.1, A0 = 0.9, A; = 1.4, kg = 1.2, 101 = 1.5, 03 = 003,01 = 3.0. The output of the labeling algorithm after postprocessing is shown in Figures 6.12(c)- (e). The crease edges in the scene are only partially detected. The intensity edges corresponding to the shadow on the COLUMNl object and the albedo edge on the BLOCK4 object have not only been detected reasonably but have also been labeled accurately. However, several spurious intensity edges have been detected close to jump and crease edges due to the registration error between the intensity and range images indicating the sensitivity of the algorithm to registration errors. Figure 6.15 shows the results on the 100 x 180 images of AGPART object. The values of the parameters used for this image pair are, h = 0.05, A; = 0.1, A0 = 1.1, A, = 1.5, kg = 1.4, k; = 1.5, 03 = 0.03, a; = 3.0. While the jump edges have been labeled accurately, the crease edges in the scene are only partially detected. This is evidently due to the fact that if the parameters are chosen such that there is less penalty for crease edge labels, then high gradient points near the limbs will also become crease edge candidates. The current implementation of the HCF algorithm runs on a Sparc-2 workstation. 143 The CPU time taken to find edges in a 128 x 128 image is about 100 seconds. In contrast, the Canny edge detector takes only about 6 seconds of CPU time. The computational speed of the HCF algorithm is slow for most applications. A run-time analysis of the HCF algorithm shows that the algorithm makes about 20 visits to each site on an average. This large number of visits is eXplained by the propagation of spatial information required for accurate computation of second—order differences. In defense of the HCF algorithm, it should be mentioned that all published energy minimization algorithms, whether deterministic or stochastic, are computationally expensive [10, 88]. One solution to this problem is to use massively parallel computers. We have implemented parallel versions of TPM and HCF algorithms on a massively parallel computer. The details of these implementations are presented in Chapter 7. 6.5 Conclusions This chapter has presented an energy minimization approach for detection and label- ing of edges by fusing registered intensity and range images. The chosen set of labels for the edges (JUMP, CREASE, INTENSITY, NOEDGE) classifies edges based on the physical properties of the surfaces in the scene. Bayesian formulation is used to combine the range and intensity values at the location of discontinuities, and the a priori knowledge about the labels is modeled by a Markov random field. Results of the algorithm are presented on one synthetic and several real image pairs. The experiments using the synthetic image demonstrated that convergence of the energy minimization procedure is considerably faster (fewer iterations) with fusion than without. On real images, the edges detected and labeled using the MRF algorithm (HCF) match the physical edges in the scene. However, different model parameter values had to be used for different images. Specification of proper param- eter values is a significant drawback of the algorithm. Also, the algorithm is sensitive 144 to registration error between the range and intensity images. Another limitation of this algorithm, which is also shared by all energy minimization algorithms, is the high computational cost. Parallel computer implementation of this algorithm is necessary to achieve the level of computational performance required for many real-time vision tasks. This study does not completely determine the usefulness of MRF models of context for the fusion problem, but it shows that when the model parameters are chosen appropriately, MRF models result in good edge labeling results in the sense that the detected edges and their labels match the physical discontinuities. Further research on estimation of the MRF parameters for a coupled process is necessary to determine whether or not MRF models are useful models of context for the fusion problem. CHAPTER 7 Implementation of Energy Minimization Algorithms on the Connection Machine A significant amount of computational effort in early vision applications involves repeated local operations on individual pixels in an image. These operations can be expressed as an energy function with interaction terms that are local in nature. Regularization solutions to early vision problems involve the minimization of this multivariate energy function. The number of variables over which this minimization needs to be performed is usually of the order of the size of the image and is typ- ically very large. For example, for a 256 x 256 image, the number of variables in the energy function for the fusion problem in Chapter 6 is 196,608. Therefore, the number of possible solutions is extremely large and an exhaustive search of the solu- tion space to find the optimal solution is impossible for all practical purposes. One way to reduce the required computation is to use suboptimal methods to obtain an approximate solution. Even then, the computational task is overwhelming for serial implementation. It is widely recognized that parallel computer implementation of the energy min- imization algorithms is essential before these techniques can be useful in a real-time vision system [38, 79, 35]. Fortunately, these algorithms are suitable for massively parallel computation because they involve only local computations. Therefore, each 145 146 processing element needs to interact with only a small number of neighboring proces- sors to carry out the computation. The computations at each processor are identical so the algorithms are well suited for implementation on a SIMD (Single Instruction Multiple Data) parallel computer [55]. This chapter describes a study of computational performance of two parallel al- gorithms on the Connection Machine [48] for the minimization of the energy function (Equation (6.7)) for the fusion problem presented in Chapter 6. The two parallel fusion algorithms considered are: (1) parallel Thresholded Posterior Mean (TPM) and (2) a parallel variant of the HCF algorithm. The parallel TPM is a stochastic optimization algorithm. The parallel HCF algorithm, on the other hand, is a de- terministic algorithm and is expected to be faster than the parallel TPM algorithm. The computational performance of the parallel algorithms is characterized by two values: (i) the run time on the parallel computer and (ii) the speedup over sequential computer implementation which is given by Run time on the sequential computer Speedup = (7.1) Run time on the parallel computer The remainder of the chapter is organized as follows. Section 7.1 provides a brief introduction to one of the most popular SIMD machines, the Connection Machine parallel computer. In Section 7.2, we describe the parallel TPM algorithm. The results of an implementation of the TPM algorithm on a 16,384 (16K) processor Connection Machine (CM-2) are also presented. Section 7.2 describes the parallel HCF algorithm and the results of its implementation on a CM-2. Section 7.4 presents a comparison of labeling performance of the sequential and parallel versions of TPM and HCF algorithms. Finally, in Section 7.4 we give some concluding remarks. 147 7 .1 The Connection Machine The Connection Machine is a massively parallel SIMD computer with up to 65,536 (64K) l-bit processing elements [48]. The Connection Machine CM-l, which was the first implementation of the Connection Machine concept, had up to 512 bytes of local memory in each processing element (PE) and had a peak performance rating of 2000 MIPS with 64K processing elements. On CM-2, the next generation machine, each PE has up to 8K bytes of local memory. Each PE in a Connection Machine can access the local memory of other PEs using a general interprocessor communication network. An optional Floating Point Accelerator (FPA) enhancement can speedup floating point operations by a factor of up to 20 [95]. A full configuration (64K) CM-2 with FPA enhancement has a peak performance rating of 2500 MIPS and 5000 single-precision MFLOPS [95]. The experimental results reported in this chapter are for a 16K node CM-2 with FPA enhancement. For this configuration, the CM-2 has a peak performance rating of about 1250 MFLOPS [95]. The most suited programming style for the Connection Machine is the “data par- allel” style. That is, individual data items are allocated to individual processors by using the notion of parallel variables. A two-dimensional image which is typically represented as 2D array on sequential machines, can be represented as a parallel vari- able on the Connection Machine. Each pixel in the image is stored in an associated PE; for efficient use of the Connection Machine, bulk of the processing involving this pixel should be carried out in the PE in which it is stored. If the image size is greater than the number of PBS available, each physical PE operates on more than one pixel in what is known as the “virtual processing” mode [95]. The Connection Machine supports modified versions of common programming languages, such as C*, *LISP, CM FORTRAN and C / PARIS. Our implementation uses C*— a parallel version of the C language. This language was selected because of the ease of programming rather 148 than computational efficiency considerations. The C* programming language does not provide much low level control over the capabilities of the CM-2. Therefore, even though development of a program on a CM-2 is fairly straightforward, optimizing the code for computational performance is extremely tedious. Also, software mainte- nance of an optimized program is difficult because optimized programs are typically extremely unreadable. 7 .2 Parallel TPM Algorithm The TPM algorithm for estimation of image labels given in Section 6.1 is inherently parallel since the computations performed are essentially local in nature. Specifically, the algorithm exhibits data parallelism [49] and is suitable for implementation on the Connection Machine. A fundamental requirement for an algorithm to be called a data parallel algorithm is that, computations on different data elements should be identical. Consider the computations at each pixel in the sequential TPM algorithm for fusion presented in Section 6.2, which include (i) fetching values of the same number of neighboring variables (pixels, edge sites), (ii) constructing the local distribution of pixel labels based on these variables, and (iii) sampling a value from this distribution. These computations are identical at the machine instruction level and, therefore, qualify as data parallel. At a cursory level, it appears that we can perform the computations at each pixel simultaneously. However, as pointed out by Murray et al. [79], this is incorrect since this approach does not take into account the local dependency of the pixel labels on their neighbors (see Figure 6.2). Note that the computations in an SIMD machine are synchronous, and therefore, new pixel values are computed based on old values of neighbors. This means that the new pixel values may not be consistent with current values of their neighbors (in a probabilistic sense). Murray et 01. give the following 149 example to illustrate the point. “Consider pixels arranged alternately white and black in a chessboard pattern. Under synchronous updating, the MRF potentials will force every black to white, and vice versa, and similarly at the next cycle, causing the pixel pattern to oscillate.” The local dependency involved can also be understood by viewing the computa- tions as an evolution of a Markov chain with number of states equal to the number of possible labelings of the image. A necessary condition for the convergence of a Markov chain to its equilibrium state is that the Markov chain should be reversible [74]. This implies that if two pixels depend on each other, then they should not be updated simultaneously; otherwise, “the reversibility of the resulting chain will be destroyed, so it will no longer be possible to guarantee the convergence” [74]. From the foregoing discussion it is clear that for an MIMD parallel computer, where the individual processing elements operate asynchronously, one could still up- date all pixels in parallel. However, for synchronous SIMD machines, an alternative scheme needs to be used. A widely used approach is to divide the set of variables into subsets such that variables within a subset can be updated simultaneously [79, 72]. Such a scheme is called a coding scheme and was originally used by Besag [5] for parameter estimation in Markov random fields. The coding scheme we have used for the fusion problem is given in Figure 7.1. We have chosen to assign one each of vertical and horizontal line variables along with one pixel variable to each processing element. This arrangement of data elements is necessary to force the data elements to be homogeneous in all the processing ele- ments. Notice that this arrangement does not upset the data parallel nature of the algorithm since the computations at all elements belonging to a coding set would still be identical. The speedup achievable over a single processing element, using this 150 O. .00 .0 O..- O. O 9' .0. 06 9 Q. 0 9 .0 O .0 0:0. ~? 33' . o. O éfl? [o _O (Ii-3025 32v," o‘b’é’o‘é’o‘.‘ o O o - 0 O O 4 o" '0 fig. '0 0.0 . o .0 o o O O 6.. o 099. fofofofo’ 159.41; :14: . - Q 0'. O 0'. O 0.9.. O 9 Figure 7.1. A coding scheme for parallel updating of variables. PEs belonging to different coding sets are shown in different shades. 151 Processor of interest Figure 7.2. Combined neighborhood dependency of pixel and line variables. 152 scheme is N/IC, where K is the Chromatic Number of the neighborhood connectivity graph associated with the energy function, and N is the number of processing ele- ments. It is easy to see that for the neighborhood dependency given in Figure 7.2, the Chromatic Number is 4. The parallel TPM algorithm for the fusion problem is stated in Figure 7.3. The value of n and k are chosen empirically. We found that k = 100 and n = 200 gave reasonable results. We present below, the results obtained from an implemen- tation of the parallel TPM algorithm on a 16K CM-2. Apart from an obvious interest in determining the actual run times and the speedup achievable for varying image sizes, the experiments were performed to test the following hypothesis. 0 Straightforward vs. Fine tuned coding: The individual processing elements in a CM-2 are single-bit processors. That means, a 32-bit addition for example, needs to be performed by 32, l-bit additions. Similarly a 32-bit multiplication takes 32 x 32 number of steps on the Connection machine. More important, the time taken for an operation depends on the number of bits in the data. This is in sharp contrast to the traditional CPUs, where for example, a 32-bit addition takes as much time as a 16 bit addition because the addition is carried out in parallel. Operations that usually take O(1) time on a sequential long word machine takes 0(k) or 0(k2) number of steps on the Connection Machine, where k is the number of bits in the data. Therefore, it is not surprising that a straight-forward conversion from sequential code to parallel code often performs poorly on the Connection Machine. In other words, the order of performance gain achieved by optimizing the implementation on a Connection Machine is comparable to the order of performance gain achieved by changes at algorithmic level in a sequential implementation. Table 7.1 shows the execution times for images of three different sizes, for both se- 153 Parallel TPM Algorithm for Edge Labeling with Fusion 1. Load images YR and Y] on to the PEs. 2. Initialize in parallel for all PEs Cp 2 Coding set (p) :i: : yf if) = y; f2,“ = randomly from (JUMP, CREASE,INTENSITY, NOEDGE} ’ = randomly from {JUMP, CREASE,INTENSITY, NOEDGE} where the subscript refers to the variables assigned to the processor p and the superscripts V and H refer to vertical and horizontal line variables at that processor, respectively. The symbols 5: and I refer to estimated values. 3. For k = 1,4 Do in parallel if( Cp 2 k) Update if,:i:£,l:,l;l. 4. Repeat (3) n times, saving realizations (x, l)("+1) through (22,1)("1 5. Do in parallel Choose labels 9’, gR,gV and 9” such that (a) g] is the mean of a}; and gR is the mean of if. (b) 9" maximizes Pp(l;/|y). (c) g” maximizes Pp(l£’|y). Figure 7.3. Parallel TPM algorithm for edge labeling with fusion. 154 quential and parallel versions of the TPM algorithm. The corresponding speedup over the sequential implementation is shown in Figure 7.4. The execution times reported in Table 7.1 do not include the time for image loading. The times reported are for 200 iterations of updating. The sequential version was timed on a Sun Sparcstation 2. There are three parallel versions for which the results are given in Table 7.1. The version labeled naive is the straightforward implementation of the parallel program where scalar variables in the sequential program were replaced by parallel variables. The version labeled N 0 arrays replaces each element of a parallel array by the individ- ual variables. This optimization was performed because, the Connection Machine PE does not have the capability for indirect addressing, and therefore parallel array refer- ences are very inefficient. The second optimization was motivated by the observation that the Connection Machine library subroutine to generate random numbers in each PE, prand, is relatively slow (2 ms/call). The method of generating a standard nor- mal variable used in the previous two versions requires 12 uniform random variables (12 calls to prand). In order to minimize the number of calls to prand, an alternative method of generating standard normal variables, which requires, on an average, only one uniform random variable, is used (the additional computation required for this method is usually more in a sequential machine, but is small relative to the time for prand in the Connection Machine). This version is labeled, for lack of a better label, as Newnorm in Table 7.1 and Figure 7.4. This version also includes other optimiza- tions such as using the smallest sized storage space for all the variables. As can be seen from the table, both optimizations result in substantial reduction in computa- tion time. Figure 7.4 indicates that the speedup achieved increases with increasing image sizes. This is because when there are more pixels than the number of physical PBS, the floating point accelerator unit which operates in a pipelined mode is always full thus increasing its throughput. The value of maximum speedup achieved is 60. This speedup is reasonable because, the parallel TPM algorithm updates variables 155 Table 7.1. Execution time of the parallel TPM algorithm in seconds. Program Image Size Version 128 x128]256 x 256 1 512 x 512 Sequential 1613.1 6371.8 25568.1 Naive 231.5 644.4 1823.7 No arrays 87.4 233.5 642.4 Newnorm 55.7 155.3 424.4 belonging to different coding sets sequentially. sectionParallel IICF Algorithm We see that the parallel TPM algorithm takes approximately a minute of computing time on a CM-2 Connection Machine for a 128 X 128 image. This computational performance is quite slow for most vision applications. For this reason, we consider another parallel algorithm for optimizing the energy function (Equation (6.7)). This algorithm, called the parallel HCF algorithm, is a parallel variant of the sequential HCF algorithm presented in Section 6.1. The parallel HCF algorithm differs from the sequential HCF algorithm in the order in which sites are visited. In the sequential HCF algorithm, only the site with the minimum stability value can be visited next. In the parallel HCF algorithm, all sites that have minimum stability compared to their neighbors can be updated simultaneously. Swain et al. [90] have proved the convergence of the parallel HCF algorithm. This algorithm is suitable for data parallel implementation since many sites can be visited simultaneously. In order to force the computations at each data element to be identical, we group a pixel site with its right vertical and bottom horizontal line sites. The parallel HCF algorithm is stated in Figure 7.5. The coding scheme given in Figure 7.1 is also used for this algorithm to prevent variables which are directly dependent on each other from being updated simultaneously. The purpose of utilizing the rank value of a site is to arbitrate when there are 156 A SPEEDUP 60 50 40 30 2O 10 /a 1 V V > 128X128 256x256 512X512 IMAGE SIZE A : Fully optimized version 0 : Arrays replaced by individual variables '3 : Naive implementation Figure 7.4. Speedup of parallel TPM algorithm over sequential algorithm (Sun Sparc- station 2) for different image sizes. 157 Parallel HCF Algorithm for Edge Labeling with Fusion 1. Load images YR and Y1 on to the PEs. 2. Initialize in parallel for all PEs if? = 215 ii = 31.5 1" = NULL if = NULL Cp : Coding set(p) Hp = Cp*3 12],] 2: Cp*3+1 12;, = Cp*3+2 where the subscript refers to the variables assigned to the processor p and the superscripts V and H refer to vertical and horizontal line variables at that processor, respectively. RWRX and RSI are the rank values of the pixel site, vertical edge site and horizontal edge site at processor p, respectively. 3. For i = 1,m’ter Do in parallel Update stability values GP,G;,/,Gf. For site .9 6 {pixel at p, horizontal edge site at p, vertical edge site at p} do Make local copies of neighbors of 3. If (s has the minimum stability among its neighbors) OR (5 has a minimal stability and minimum rank among its neighbors) then do Update site label at s. Figure 7.5. HCF algorithm for edge labeling with fusion. 158 ties in the stability values of the neighbors. The updating step in the parallel HCF algorithm is deterministic unlike in the parallel TPM algorithm. The parallel HCF algorithm converges in about 50 iterations (niter = 50). This is a significant saving in computation time compared to the number of iterations required for the TPM algorithm. For a 128 x 128 image, the parallel HCF algorithm requires 3.05 seconds for 50 iterations on a 16K CM-2. This computational speed, while not sufficient for video rate processing, is quite fast compared to sequential implementations. Figure 7.6 shows the speedup achieved using the CM—2 implementation over a SUN Sparcstation- 2 implementation of the parallel HCF algorithm. Notice that speedups for the parallel HCF algorithm are more than that achieved for the parallel TPM algorithm. This is probably because, the parallel HCF algorithm does not use the expensive random number generator calls that are required for the parallel TPM algorithm. 7 .3 Edge Labeling Performance of Parallel TPM and Parallel HCF Algorithms The correctness of implementation of the parallel algorithms presented in this chap- ter was verified by running these algorithms on the JIG image pair. It should be emphasized here that the two parallel algorithms are not identical to their sequential counterparts because of the change in the order of updating. Figure 7.7 shows the edge labeling results on this image for (i) sequential TPM, (ii) sequential HCF, (iii) parallel TPM, and (iv) parallel HCF algorithms. The sequential and the parallel TPM algorithms were run for 200 iterations each (n=200, k2100). The parallel HCF algorithm was run for 50 iterations (niter = 50). It is seen from the figure that the edge labeling results of all the four algorithms are comparable. The computation times using different algorithms to process this image are 1613 seconds, 100 seconds, 55.7 seconds, and 3.05 seconds, respectively, for sequential TPM, sequential HCF, parallel TPM (on CM-2), and parallel HCF (on CM-2) algorithms. Notice that the 159 SPEEDUP A 100 80 60 40 20 ‘! ¥ + 128X128 256x256 512x512 IMAGE SIZE Figure 7.6. Speedup of the parallel HCF implementation on CM-2 over Sparcstation 2 implementation for different image sizes. 160 Figure 7.7. Edge labeling results on the JIG image with: (a) sequential TPM algo- rithm; (b) sequential HCF algorithm; (c) parallel TPM algorithm; ((1) parallel HCF algorithm. Color codes for edge labels are: red - JUMP edge, violet — CREASE edge, blue — INTENSITY edge. parallel HCF algorithm consumes the least amount of computation time (on a parallel computer) among the four algorithms with comparable edge labeling performance. 7 .4 Conclusions This chapter studies the computational performance of energy function minimization algorithms for the fusion problem on CM-2. It was found that these computations can be carried out significantly faster on the CM-2 than on the Sun Sparcstation-2, a state- of—the-art sequential workstation. It was also demonstrated that a straightforward 161 conversion of sequential code to parallel code does not give a good speedup. Testing the implementation of a parallel version of the HCF algorithm showed that energy minimization for the sensor fusion problem can be computed in about 3 seconds on a CM-2, for a pair of 128 x 128 images. Although this processing time is significantly higher than the video rate (33 milliseconds), it is a significant step towards achieving that. CHAPTER 8 Conclusions and Directions for Future Research This chapter summarizes the results of the research reported in this thesis. Several directions for future research in the area of Markov random field contextual models are also outlined. 8. 1 Conclusions This thesis has demonstrated Markov random fields as effective models of contextual information for edge detection in range images and for fusion of range and intensity images. Chapter 4 presented a new MRF model-based algorithm for detection of jump and crease edges in range images. Experiments using several real range images indicate that the edges detected by this edge detection algorithm indeed correspond to the actual jump and crease edges in the input range images. However, like all other edge detection algorithms, the edges detected by this algorithm do not always form closed contours. To avoid this problem, a hybrid algorithm which combines the out- put of our range image edge detector with the output of a region-based segmentation scheme has been developed. This hybrid algorithm has been compared with a well- known range image segmentation scheme developed by Hoffman and Jain [50]. The results of the comparison indicate that the hybrid algorithm results in better surface segmentations (surface segmentations that have fewer undersegmentation or overseg- 162 163 mentation errors) than the Hoffman-Jain algorithm. Since the location of edges in the hybrid segmentation is determined by the MRF model-based edge detector, we inter- pret the superior performance of the hybrid algorithm as evidence for the usefulness of the MRF contextual model for edge detection in range images. A key limitation of the MRF models has been the difficulty in estimation of the model parameters. Chapter 5 proposed a solution to this problem, which uses geo- metric CAD models of representative objects expected to be in the scene to derive ground truth information for estimation of MRF line process parameters. Comparison of performance (of the edge detection algorithm presented in Chapter 4) using esti- mated and empirically specified MRF parameters indicate that edge detection quality with estimated parameters is as good as edge detection quality with empirically spec- ified MRF parameters. This is important, because it enables us to use a systematic method for the specification of MRF parameters instead of a trial and error method. The experiments also indicated that it is sufficient to use only “typical” objects to estimate the parameters of the MRF model. Chapter 6 presented a new algorithm for fusion of range and intensity images. Experimental results of the fusion algorithm on real fused images, with empirically specified parameters, indicate that edges detected and labeled using this fusion al- gorithm match the physical edges in the scene. However, different model parameter values had to be specified for each image. This is a major shortcoming of this al- gorithm. The algorithm was also found to be sensitive to registration errors in the input fused image. Another limitation of this algorithm is its heavy computational requirements. Experimental results using parallel computer implementations of the fusion algorithm indicated that the computational time can be significantly reduced for this algorithm on a massively parallel computer. This study does not completely answer the question “are MRF models useful for modeling contextual information in the fusion problem?”. However, it shows that when the model parameters are chosen 164 appropriately, MRF models result in good edge labeling results (i.e., the detected edges and their labels match the physical discontinuities). Further study in the esti- mation of the model parameters for a coupled MRF process is necessary to determine the utility of MRF models for fusion. 8.2 Directions for Future Research Some future research directions that extend the work in this thesis are listed below. 1. Large neighborhood for line process. An interesting direction for research is the use of a larger neighborhood to define the line process. A larger neighborhood can capture the curvature information in edges and so it may be possible to impose stronger smoothness constraints. Our parameter estimation scheme in Chapter 5 extends easily for larger neighborhoods. It is also possible to reduce the number of parameters to be estimated to a manageable extent using the canonical representation of the MRF prior distribution. 2. Multiresolution algorithms. Multiresolution optimization techniques can both increase the speed and improve the performance of the algorithms [94]. A multiresolution implementation of the fusion algorithm, for example, has the potential to overcome registration errors in the input images. 3. More accurate CAD model. The parameter estimation scheme given in Sec- tion 6.3 was tested using CAD models where the curved surfaces were repre— sented using planar approximations. So, an obvious direction for research is to test the estimation scheme by using object models which represent curved surfaces more accurately. 4. Real-time optimization. The algorithms currently available for optimization of MRF energy functions are too slow for real-time applications. New faster algorithms for optimization are necessary to achieve this goal. APPENDICES APPENDIX A List of Parameters The edge detection and fusion algorithms developed in this thesis require several parameters to be specified. This appendix provides a list of the parameter values to make the work in this thesis reproducible. Tables A.1 — A.4 enumerate the parameters used for experiments in Chapters 4 — 7, respectively. Table A.l. The parameters of the edge detection and hybrid segmentation algorithms in Chapter 4. [ Parameters of the edge detection algorithm ] Ad hoc MRF parameters see Figure 4.4 Parameters related to edge cm 1.05 strength statistics 00 0.05 window width for local averaging 5 pixels Parameters of the hybrid segmentation algorithm Threshold t ] 0.60 165 166 Table A2. The parameter values used for experiments in Chapter 5. [ Parameter estimation from 3D CAD models ] e in Equation (5.4) 1/256 Estimated MRF parameters see Figure 5.6 Size of the synthesized edge map 150 x 150 [ Parameter estimation from 2D models ] c in Equation (5.4) 1/256 Estimated MRF parameters see Figure 5.21 Size of the synthesized edge map 150 x 150 a 1 pixel Canny edge detector parameters mask size 9 pixels low threshold 0.5 high threshold 0.9 Table A.3. The parameters of the fusion algorithm in Chapter 6. r Edge clique parameters ] see Figure 6.4 l [ Other parameters ] [Image [ UR [0’1] h [ A] [AC ] A1] [CC [’81] Synthetic BLOCK4 (Figure 6.9) 0.003 10 0.007 0.03 1.3 4.0 1.3 2.2 Synthetic BLOCK4 (Figure 6.10) 0.03 10 0.07 0.2 1.3 4.0 1.2 2.2 BLOCKS image 0.03 3 0.1 0.2 1.0 3.0 1.1 1.5 JIG image 0.03 5 0.1 0.2 1.0 1.5 1.1 1.5 FUNNEL image 0.03 5 0.11 0.2 1.0 1.5 1.02 1.5 BLOCK4/COLUMN1 image 0.03 3 0.04 0.1 0.9 1.4 1.2 1.5 AGPART image 0.03 3 0.05 0.1 1.1 1.5 1.4 1.5 Table A.4. The parameters of parallel TPM and HCF algorithms used in Chapter 7. [ Parameters of parallel TPM algorithm] Total number of iterations, n 200 Number of iterations to reach equilibrium, k 100 [ Parameters of parallel HCF algorithm] [ Number of iterations, niter ] 50 ] BIBLIOGRAPHY BIBLIOGRAPHY [1] K. Andress and A. Kak, “Evidence Accumulation and Flow of Control in a Hierarchical Spatial Reasoning System,” AI Magazine, vol. 9, pp. 75-94, 1988. E] [2] N. Ayache and O. Faugeras, “HYPER: A New Approach for the Recognition and Positioning of Two-Dimensional Objects,” IEEE Transactions on Pattern ; Analysis and Machine Intelligence, vol. 8, pp. 44—54, 1986. [ [3] D. Ballard and C. Brown, Computer Vision. Englewood Cliffs, NJ: Prentice-Hall, 1982. [4] J. Barnett, “Computational Methods for a Mathematical Theory of Evidence,” Proc. Seventh International Joint Conference on Artificial Intelligence, pp. 868- 875, 1981. [5] J. Besag, “Spatial Interaction and the Statistical Analysis of Lattice Systems,” Journal of the Royal Statistical Society, Ser. B, vol. 36, pp. 192—236, 1974. [6] J. Besag, “On the Statistical Analysis of Dirty Pictures,” Journal of the Royal Statistical Society, Ser. B, vol. 48, pp. 259—302, 1986. [7] P. Besl and R. Jain, “Three—Dimensional Object Recognition,” Computing Sur- veys, vol. 17, pp. 75—145, 1985. [8] P. Besl and R. Jain, “Segmentation Through Variable-Order Surface Fitting,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 10, pp. 167-192, 1988. [9] G. Bilbro and W. Snyder, “Range Image Restoration Using Mean Field Anneal- ing,” Tech. rep. NETR 8819, Center for Communications and Signal Processing, North Carolina State University, 1988. [10] A. Blake and A. Zisserman, Visual Reconstruction. Cambridge, MA: MIT Press, 1987. 167 [11] [12] [13] [141 [16] [171 [181 [19] [20] [21] [22] 168 D. Blostein and N. Ahuja, “Shape from Texture: Integrating Texture-Element Extraction and Surface Estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, pp. 1233-1251, 1989. R. C. Bolles and P. Horaud, “3DPO: A Three-Dimensional Part Orientation System,” International Journal of Robotics Research, vol. 5, pp. 3—26,.1986. C. Bouman and B. Liu, “Multiple Resolution Segmentation of Textured Images,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 13, pp. 99—113, 1991. P. Brodatz, Textures: A Photographic Album for Artists and Designers. New York: Dover, 1966. D. Buede, “Shafer-Dempster and Bayesian Reasoning: A Response to “Shafer- Dempster Reasoning with Applications to Multisensor Target Identification Sys- tems”,” IEEE Trans. on Systems, Alan and Cybernetics, vol. 18, pp. 1009—1011, 1989. J. Canny, “A Computational Approach to Edge Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, pp. 679—698, 1986. C. Chen and R. Dubes, “Experiments in Fitting Discrete Markov Random Fields to Textures,” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, pp. 298—303, 1989. S. W. Chen and A. K. Jain, “ Object Extraction from Laser Radar Imagery,” Pattern Recognition, vol. 24, pp. 587—600, 1991. P. Chou and C. Brown, “The Theory and Practice of Bayesian Image Labeling,” International Journal of Computer Vision, vol. 4, pp. 185-210, 1990. P. Chou, C. Brown, and R. Raman, “A Confidence-Based Approach to the Label- ing Problem,” in Proc. of IEEE Workshop on Computer Vision, Miami Beach, FL, pp. 51—56, 1987. J. Clark and A. Yuille, Data Fusion for Sensory Information Processing Systems. Boston, MA: Kluwer, 1990. F. Cohen and D. Cooper, “Simple Parallel Hierarchical and Relaxation Algo- rithms for Segmenting Non-Causal Markovian Random Fields,” IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 9, pp. 195-219, 1987. 169 [23] F. Cohen, Z. Fan and M. Patel, “Classification of Rotated and Scaled Textured [27] [281 [29] [30] [31] [32] [33] [34] Images Using Gaussian Markov Random Field Models,” IEEE Transactions on Pattern Analysis and IlIachine Intelligence, vol. 13, pp. 192—202, 1991. P. Cooper, “Parallel Object Recognition from Structure,” Tech. rep. 301, Uni- versity of Rochester, 1989. G. Cross and A. Jain, “Markov Random Field Texture Models,” IEEE Transac- tions on Pattern Analysis and IWachine Intelligence, vol. 5, pp. 25—39, 1983. ” in Proc. M. Daily, “Color Image Segmentation Using Markov Random Fields, IEEE Computer Society Conference on Computer Vision and Pattern Recogni- tion, San Diego, pp. 304—312, 1989. H. Derin and H. Elliott, “Modeling and Segmentation of Noisy and Textured Images Using Gibbs Random Fields,” IEEE Transactions on Pattern Analysis and IlIachine Intelligence, vol. 9, pp. 39-55, 1987. H. Derin, P. Kelly, G. Wezina, and S. Labitt, “Modeling and Segmentation of Speckled Images Using Complex Data,” IEEE Transactions on Information The- ory, vol. 36, pp. 76—87, 1990. R. C. Dubes and A. K. Jain, “Random Field Models in Image Analysis,” Journal of Applied Statistics, vol. 16, pp. 131—164, 1989. R. C. Dubes, A. K. Jain, S. G. Nadabar, and C. C. Chen, “MRF Model-Based Algorithms for Image Segmentation,” in Proc. Tenth International Conference on Pattern Recognition, Atlantic City, pp. 808—814, 1990. T. Fan, G. Medioni and R. Nevatia, “Segmented Descriptions of 3—D Surfaces,” IEEE Journal of Robotics and Automation, vol. 3, pp. 527-538, 1987. P. Flynn and A. Jain, “On Reliable Curvature Estimation,” Proc. IEEE Com- puter Society Conference on Computer Vision and Pattern Recognition, San Diego, pp. 110—116, June 1989. P. Flynn and A. Jain, “BONSAI: 3D Object Recognition Using Constrained Search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, pp. 1066-1075, 1991. P. J. Flynn, CAD-Based Computer Vision: Modeling and Recognition Strategies. PhD thesis, Department of Computer Science, Michigan State University, 1990. 7 'I-b“ 34-..-5 170 [35] E. B. Gamble, D. Geiger, T. Poggio, and D. Weinshall, “Integration of Vision Modules and Labeling of Surface Discontinuities,” IEEE Trans. on Systems, Man and Cybernetics, vol. 19, pp. 1576—1581, 1989. [36] D. Geiger and A. Girosi, “A Common Framework for Image Segmentation,” Proc. Tenth International Conference on Pattern Recognition, Atlantic City, pp. 502— 507, 1990. [37] D. Geman, S. Geman, C. Graffigne, and P. Dong, “Boundary Detection by Con- strained Optimization,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 609—628, 1990. [38] S. Geman and D. Geman, “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,” IEEE Transactions on Pattern Analysis and IVIachine Intelligence, vol. 6, pp. 721—741, 1984. [39] B. Gil, A. Mitiche and J. K. Aggarwal, “Experiments in Combining Intensity and Range Edge Maps,” Computer Graphics and Image Processing, vol. 21, pp. 395— 411,1983. [40] R. Gonzalez and P. Wintz, Digital Image Processing, Reading, MA: Addison- Wesley, 1987. [41] D. Griffeath, “Introduction to Random Fields,” in Denumerable Markov Chains, Kemeny, Knapp and Snell (eds.), New York: Springer-Verlag, 1976. [42] B. Horn and M. Brooks (eds.), Shape from Shading. Cambridge, MA: MIT Press, 1989. [43] J. Hackett and M. Shah, “Fusion of Intensity and Range for Segmentation,” Tech. rep. 14, University of Central Florida, 1988. [44] J. Hackett and M. Shah, “Multi-Sensor Fusion,” Tech. rep. 16, University of Central Florida, 1989. [45] J. Hadamard, Lectures on Cauchy’s Problem in Linear Partial Differential Equa- tions. New Haven, CT: Yale University Press, 1923. [46] R. Haralick, K. Shanmugam and l. Dinstein, “Textural features for image classi- fication,” IEEE Transactions on Systems, Man and Cybernetics, vol. 3, pp. 610- 621,1973. 171 [47] M. Hassner and J. Sklansky, “The Use of Markov Random Fields as Models of Texture,” Computer Graphics and Image Processing, vol. 12, pp. 357—370, 1980. [48] D. Ilillis, The Connection Illachine. Cambridge, MA: MIT Press, 1985. [49] W. Hillis and G. Steele, “Data Parallel Algorithms,” Communications of the ACM, vol. 29, pp. 1170—1183, 1986. R. Hoffman and A. Jain, “Segmentation and Classification of Range Images,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 9, pp. 608—620, 1987. G. 1111 and G. Stockman, “3—D Scene Analysis via Fusion of Light Striped Image and Intensity Image,” In Proc. of the IEEE Workshop on Spatial Reasoning and Multi-Sensor Fusion, St. Charles, IL, pp. 138—147, 1987. G. Hu and G. Stockman, “3-D Surface Solution Using Structured Light and Constraint Propagation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, pp. 390—402, 1989. T. Huntsburger and S. J ayaramamurthy, “A Framework for Multi-Sensor Fusion in the Presence of Uncertainty,” In Proc. of the IEEE Workshop on Spatial Reasoning and Multi-Sensor Fusion, St. Charles, IL, pp. 345-350, 1987. R. Hummel and S. Zucker, “On the Foundations of Relaxation Labeling Pro- cesses,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 5, pp. 267-287, 1983. K. Hwang and F. Briggs, Computer Architecture and Parallel Processing. New York: McGraw—Hill, 1985. [56] S. D. R. Corporation, I-DEAS User’s Guide. Milford, Ohio. [57] A. K. Jain and R. Hoffman, “Evidence-Based Recognition of 3-D Objects,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 10, pp. 783—802, 1988. [58] A. K. Jain and S. G. Nadabar, “MRF Model-Based Segmentation of Range - Images,” in Proc. Third IEEE International Conference on Computer Vision, Osaka, pp. 667—671, 1990. [59] R. Jain and A. Jain, Analysis and Interpretation of Range Images. New York: Springer-Verlag, 1990. 172 [60] A. K. Jain and S. G. Nadabar, “Markov Random Field Applications in Image Analysis,” in Data Analysis in Astronomy IV, V. Di Gesu et al. (eds.), pp. 39—50, Plenum Press, New York, 1992. [61] A. K. Jain and F. Farrokhnia, “Unsupervised Texture Segmentation Using Gabor Filters,” Pattern Recognition, vol. 24, pp. 1167—1186, 1991. [62] F. Jeng and J. Woods, “Simulated Annealing in Compound Gaussian Random Fields,” IEEE Transactions on Information Theory, vol. 36, pp. 94—107, 1990. [63] B. Julesz, “Textons, the Elements of Texture Perception, and Their Interac- tions,” Nature, vol. 290, pp. 91—97, 1981. [64] N. Karssemeijer, “A Relaxation Method for Image Segmentation Using a Spa- tially Dependent Stochastic Model,” Pattern Recognition Letters, vol. 11, pp. 13- 23, 1990. [65] J. Keller, S. Chen, and R. Crownover, “Texture Description and Segmentation Through Fractal Geometry,” Computer Vision, Graphics and Image Processing, vol. 45, pp. 150—166, 1989. [66] R. Kindermann and J. Snell, Markov Random Fields and Their Applications. American Mathematical Society, volume I, 1980. [67] S. Lakshmanan and H. Derin, “Simultaneous Parameter Estimation and Segmen- tation of Gibbs Random Fields Using Simulated Annealing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, pp. 799—813, 1989. [68] G. C. Lee and G. C. Stockman, “Obtaining Registered Range and Intensity Images Using the Technical Arts White Scanner,” Tech. rep., PRIP Lab., De— partment of Computer Science, Michigan State University, 1991. [69] G. C. Lee, Reconstruction of Line Drawing Graphs Using Fused Range and In- tensity Imagery. PhD thesis, Department of Computer Science, Michigan State University, 1992. [70] B. Manjunath, T.Simchony, and R. Chellappa, “Stochastic and Deterministic Networks for Texture Segmentation,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, pp. 1039—1049, 1990. [71] K. Mardia, “Multi-Dimensional Multivariate Gaussian Markov Random Fields with Application to Image Processing,” Journal of Multivariate Analysis, vol. 24, pp. 265—284, 1988. 173 [72] A. Margalit, “A Parallel Algorithm to Generate a Markov Random Field Image on a SIMD Hypercube Machine,” Pattern Recognition Letters, vol. 9, pp. 263- 278, 1989. [73] D. Marr, Vision. San Fransisco, CA: Freeman, 1982. [74] J. Marroquin, S. Mitter, and T. Poggio, “Probabilistic Solution of Ill-Posed Prob- lems in Computational Vision,” Journal of the American Statistical Association, vol. 82, pp. 76—89, 1987. [75] A. Mitiche and J. Aggarwal, “Detection of Edges Using Range Information,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 5, pp. 174-178, 1983. [76] J. Modestino and J. Zhang, “A Markov Random Field Model-Based Approach to Image Interpretation,” in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, pp. 458—465, 1989. [77] HP. Moravec, “Sensor Fusion in Certainty Grids for Mobile Robots,” AI Maga— zine, vol. 9, pp. 61~74, 1988. [78] A. Mood, F. Graybill and D. Boes, Introduction to the Theory of Statistics. New York: McGraw-Hill, 1974. [79] D. Murray, A. Kashko, and H. Buxton, “A Parallel Approach to the Picture Restoration Algorithm of Geman and Geman On an SIMD Machine,” Image and Vision Computing, vol. 4, pp. 133-142, 1986. [80] S. G. Nadabar and A. K. Jain, “Edge Detection and Labeling by Fusion of Intensity and Range Images,” in Proc. of SPIE Conference on Applications of AI: Machine Vision and Robotics, Orlando, vol. 1708, pp. 108-119, 1992. [81] S. G. Nadabar and A. K. Jain, “Parameter Estimation in MRF Line Process Models,” in Proc. IEEE Conference on Computer Vision and Pattern Recogni- tion, Urbana, IL, pp. 528—533, June 1992. [82] N. Nandhakumar and J. K. Aggarwal, “Multisensor Integration — Experiments in Integrating Thermal and Visual Sensors,” in Proc. First IEEE International Conference on Computer Vision, London, pp. 83—92, 1987. [83] P. Perona and J. Malik, “Scale Space and Edge Detection Using Anisotropic Diffusion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 629—639, 1990. [84] [85] [86] [87] [881 [891 [901 [91] [92] [93] [94] [95] 174 T. Poggio, V. Torre, and C. Koch, “Computational Vision and Regularization Theory,” Nature, vol. 317, pp. 314—319, 1985. N. S. Raja, Obtaining Generic Parts from Range Data Using a Multi- View Rep- resentation. PhD thesis, Department of Computer Science, Michigan State Uni- versity, 1992. R. Sekuler and R. Blake, Perception. New York: McGraw-Hill, 1988. B. Silverman, C. Jennison, J. Stander, and T. Brown, “The Specification of Edge Penalties for Regular and Irregular Pixel Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 1017—1024, 1990. T. Simchony, R. Chellappa, and Z. Lichtenstein, “Pyramid Implementation of Optimal-Step Conjugate-Search Algorithms for Some Low-Level Vision Prob- lems,” IEEE Trans. Systems, Man, and Cybernetics, vol. 19, pp. 1408—278, 1989. M. Stephens, R. Blissett, D. Charnley, E. Sparks and J. Pike, “Outdoor Vehicle Navigation Using Passive 3D Vision,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, San Diego, pp. 556-562, 1989. M. Swain, L. Wixson, and P. Chou, “Efficient Parallel Estimation for Markov Random Fields,” in Uncertainty in Artificial Intelligence, M. Henrion, R. Shachter, L. Kanal, and J. Lemmer (eds.), vol. 5, pp. 407—419, Elsevir Pub- lishers B.V., 1990. P. Switzer, “Some Spatial Statistics for the Interpretation of Satellite Data,” Bulletin of the International Statistical Institute, vol. 50, pp. 962—972, 1983. R. Szeliski, “Regularization Uses Fractal Priors,” in Sixth National Conference on Artificial Intelligence, Seattle, WA, pp. 749—754, 1987. Technical Arts Corporation, 100X 3—Dimensional Scanner: User’s Manual and Application Programming Guide. Redmond, Washington. D. Terzopoulos, “Regularization of Inverse Visual Problems Involving Discon- tinuities,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, pp. 413—424, 1986. Thinking Machines Corporation, “The Connection Machine System Overview, Model CM-2 Technical Summary,” Tech. rep. HA87-4, 1987. 175 [96] A. Treisman, “Features and Objects,” The Quarterly Journal of Experimental Psychology, vol. 40A, pp. 201—237, 1988. [97] D. A. Trytten, The Construction of Labeled Line Drawings from Intensity Images. PhD thesis, Department of Computer Science, Michigan State University, 1992. [98] R. Vaillant and O. Faugeras, “Using Occluding Contours for Recovering Shape Properties of Objects,” in Proc. IEEE Workshop on Interpretation of 3D Scenes, Austin, pp. 26—32, 1989. [99] A. Witkin, “Scale-Space Filtering,” in Proc. Ninth International Joint Confer- ence on Artificial Intelligence, Karlsruhe, Germany, pp. 1019—1022, 1983. [100] W. Wright, “A Markov Random Field Approach to Data Fusion and Colour Segmentation,” Image and Vision Computing, vol. 7, pp. 144—150, 1989. [101] M. Levine, Vision in Man and llIachine. New York: McGraw—Hill, 1985. [102] N. Yokoya and M. Levine, “Range Image Segmentation Based on Differential Geometry: A Hybrid Approach,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 11, pp. 643—649, 1989. [103] M. Zhang, R. Haralick, and J. Campbell, “Multispectral Image Context Clas- sification Using Stochastic Relaxation,” IEEE Trans. Systems, Man, and Cyber- netics, vol. 20, pp. 128—140, 1990. ”11111111111111 8