r _: w 33331.8? Michigan State University 4* Ji PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. L_JL |L__l LC] l Ell l' ILJSLJ 3-- fi-f—T m h MSU Is An Affirmative Action/Equal Opportunity Institution omens-m THE CON DRAWIN ' in THE CONSTRUCTION OF LABELED LINE DRAWINGS FROM INTENSITY IMAGES By Deborah Anne Trytten A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Computer Science Department 1992 THE COI‘ DRAWL The Man paw tision 1721. This } tom intensity ima reconstruction in a shown to perform that do not direc Ctiled perceptual paradigm to rect 6first is the co SeCOHd is 31) infe The col} ectior. 0/2,)? " (2%} 7 ABSTRACT THE CONSTRUCTION OF LABELED LINE DRAWINGS FROM INTENSITY IMAGES By Deborah Anne Trytten The Marr paradigm for object recognition has been widely used in computational vision [72]. This paradigm emphasizes the data driven reconstruction of 3D shape from intensity images. There are a large number of paradigms that can perform this reconstruction in a limited sense, but no paradigm or group of paradigms has yet been shown to perform this reconstruction in a general setting. A shortcoming of the Marr paradigm, as demonstrated by Lowe [67], is the failure to include grouping processes that do not directly reconstruct shape. These grouping processes are collectively called perceptual organization. Lowe proposed the addition of four paths to the Marr paradigm to rectify this shortcoming. Two of Lowe’s paths are used in this work. The first is the collection of tokens in the image plane into perceptual groupings. The second is 3D inference from these groupings. The collection of tokens in the image plane into perceptual groupings is done using an integration framework implemented as a blackboard system. Four modules were used as knowledge sources: weak membrane edge detection, curvilinear grouping, remit)“ grouping. . I we data is built engined using a mo' that uses figure gro: the Mail: algorithm Iii: the initial reprr of heuristics hat w are found in the re; all) interpretation Without explicitly re Processes P-‘OPosed l proximity grouping, and curvilinear line labeling. An initial representation of the image data is built using the first three knowledge sources. This representation is analyzed using a modified curvilinear line labeling algorithm developed in this thesis that uses figure—ground separation to constrain legal line labelings more closely than the Malik algorithm [69]. This modified line labeling algorithm can diagnose problems with the initial representation. Errors in the representation can be fixed using a set of heuristics that were created to repair common mistakes. If no irreparable errors are found in the representation, then the modified line labeling algorithm produces a 3D interpretation of the data in the input image. The 3D interpretation is created without explicitly recovering 3D shape, and is therefore similar to the 3D inference processes proposed by Lowe. Copyright © by Deborah Anne Trytten 1992 To my Parents This dissertatio adjr'erse group of I. this work and my 1 the contributions 0‘ My first acknov Who has had the w in computational vi year as a Master der point in my acaderr and respect. lwould also lii My committee con Page. Dr. Jain is tl {PRiP} of Michiga tothanlc Dr. lain this thesis would l have Created an er E"50 made a Sthblt shed the tongh ACKNOWLEDGMENTS This dissertation represents not an individual effort, but an effort on the part of a diverse group of people. If it weren’t for the help and support of these people, both this work and my life would have been greatly diminished. I gratefully acknowledge the contributions of the following people. My first acknowledgment must go to my thesis advisor, Dr. Mihran Tuceryan, who has had the wisdom of Soloman and the patience of Job. I became interested in computational vision when taking a his course, CPS 822, in the winter of my first year as a Master degree student. The day he invited me to work for him was a turning point in my academic career. He will always have my highest admiration, gratitude and respect. I would also like to acknowledge the contributions of my committee members. My committee consisted of Dr. Anil Jain, Dr. George Stockman and Dr. Connie Page. Dr. Jain is the director of the Pattern Recognition and Image Processing group (PRIP) of Michigan State University that I was a member of for five years. If I were to thank Dr. Jain for each of the benefits I received as a member of the PRIP group, this thesis would have a second volume. His hard work, dedication, and intelligence have created an environment for research that approaches perfection. Dr. Stockman also made a sizable contribution to the success of this endeavor. He was the man who asked the tough questions. Although I may have been less appreciative at the time when these questions were asked, my thesis was dramatically improved in finding the answers and I owe him my thanks. The final member of my committee, Dr. Connie Page, was the ultimate proof that a woman can thrive in a scientific discipline within the academic world. I would also like to acknowledge the contribution of three other faculty members Dr. Richard Dubes and Dr. John Forsythe from the Department of Computer Science at Michigan State University, and Dr. Bernard Dubuisson, from the Department de Genie Informatique of the Universite de Technologie de Compiegne, in Compiegne, France. Although he was not a member of my committee, Dr. Dubes made valuable contributions to both the content and presentation of my research. His continuous vi Q‘sestioniflg 0f ‘he inciuded. to 100k 1 made a contrib'thi nefrom leaViEg 5 19%. lhaye 116V6 had it not been f5 athree month sta lzirst came to see months in France. treated me as if I generosity of Dr. Q My family. R: Zairhary. and larr They have picked They were toleran when I needed far. this project woulc always given me I regret in finishin: hold most dear. ‘ éorts to help mt iffparation. Dur ‘0 Provide whate‘ consisted of lister graduate student, . ehrst three We r. e trme‘ it Seerm IWould also 1 questioning of the evaluation of research has forced many graduate students, myself included, to look more critically at our work and the work of others. Dr. Forsythe made a contribution to this work at a very early stage. He single handedly stopped me from leaving school with only my M.S. when a job Opportunity came along in 1988. I have never regretted staying to finish my dissertation, and would not have had it not been for his timely encouragement. I worked with Dr. Dubuisson during a three month stay in France. This was another turning point in my career, when I first came to see myself as a researcher instead of as a student. During my three months in France, he not only offered me an interesting project to work on, but also treated me as if I were a member of his family. I have only my thanks to repay the generosity of Dr. Dubuisson and his wonderful family. My family, Richard Wesley Trytten, Marilyn McCall Trytten, Melissa 'I‘rytten Zakhary, and James Zakhary, have played a large in the success of this endeavor. They have picked up my spirits when they were falling, and celebrated my victories. They were tolerant when work kept me from my familial responsibilities and available when I needed family to separate myself from work. Without their love and support this project would have been unthinkable. Although it is a cliche, my family has always given me both roots and wings, and lots of other neat stuff too. My only regret in finishing this thesis is that it has lead me 1200 miles from the people I hold most dear. I would particularly like to thank my parents for their Herculean efforts to help me maintain my composure during the final frantic weeks of thesis preparation. During this period they made themselves available at every juncture to provide whatever help was needed without question or pause, whether this help consisted of listening to yet another tearful collect phone call from an exhausted graduate student, babysitting dogs, or driving to Lansing to do my household chores. The first three words written in this thesis were the dedication “To my parents”. At the time, it seemed like a grand gesture. Now it seems insufficient. I would also like to acknowledge the contribution of my late grandparents. My paternal grandfather, John M. Trytten, was the family academic and educator. My long standing interest in education was sparked by his love of his career. My mater- nal grandmother, Mary Wave McCall, had the foresight to look at a group of young grandchildren and see that college tuition would be needed two decades away. Al- though they did not live to see the completion of this dissertation, or even the start of it, their support and inspiration are still appreciated; they are still missed. My life as a graduate student was greatly enriched by the the Pattern Recognition and Image Processing Group (PRIP). There were many PRIP researchers who came before me who influenced my work. Chaur Chin Chen, Sei Wang Chen, Ywhyng Lee vii 335mm Joe llilir guidance early in r 1"; took our compr We all started our game week. The lé without such WODC lwonld also li} have listened patift questions or sugge only acknowledge time at Michigan i nal wrath by com: Chen. John Courtr Huang. Tim New: 5W Walsh. Shar. The PRIP sys. k3owlfiige the help managers Who has my stay here, I w that Wu USCCl in i lWOUlCl also 1'1} Was “59d in the i provided a Variet y .he PRIP gr ' 0UP sn Si MAN Sl'mmetr nization, B; \C Greg Lee w: Sheleto Hoffman, Joe Miller, Farshid Farrokhnia, and Patrick Flynn all offered me advice and guidance early in my career. Greg Lee, Narayan S. Raja, and Satisha Nadabar and I all took our comprehensive examinations together and thus formed the Gang of Four. We all started our dissertation work within the same year, and ended it within the same week. The last year of work would have been much harder and less rewarding without such wonderful friends and colleagues. I would also like to acknowledge the next generation of PRIP researchers. They have listened patiently to hours of my talks and advice. They often asked interesting questions or suggested productive avenues for further pursuit. I would like to espe- cially acknowledge Sally Howden who has been my best friend through most of my time at Michigan State University. Although I’m destined to earn somebody’s eter- nal wrath by committing a sin of omission, I will fearlessly list their names: Jinlong Chen, John Courtney, Chittra Dorai, Marie-Pierre Dubuisson, Hans Dulimarta, Qian Huang, Tim Newman, Philippe Ohanian, Philippe Ballard, Sushil Bhattacharjee, Steve Walsh, Sharathcha Pankanti, Jian-Chang Mao, and Yao Chen. The PRIP system managers provided a wonderful working environment. I ac- knowledge the help of David Marks, Frank Northrup, John Lees and the many student managers who have been employed by the Department of Computer Science during my stay here. I would like to thank Honda Shing for creating the typesetting style that was used in this thesis. I would also like to thank the following researchers who supplied software which was used in the implementation of the ideas within this dissertation. Pat Flynn provided a variety of X windows utilities and did much of the early work of making the PRIP group such a nice place to do research. Ari Gross provided the code for the SYMAN symmetry analyzer. Frederic Leymarie provided the code for the grassfire skeletonization. Bilge Gunsel wrote the original weak membrane edge detector using GNC. Greg Lee wrote the code for line labeling. viii LIST OF TABLI 115T OF FIGL’I 1 Introduction 1.1 Bachgrour r o 1.3 1.1.1 TL 1.1.2 Fe 1.1.3 Li; Problem 5 Organizat 2 Integration; in ‘’1 Integratic 2.1.1 0 2.1.2 1' 21.3 x 2.1.4 Si Selection 2.2.1 E 2.2.2 E 2.2.3 C 2.24 S Summar TABLE OF CONTENTS LIST OF TABLES xi LIST OF FIGURES xiii 1 Introduction 1 1.1 Background ................................ 4 1.1.1 The Marr Paradigm for Object Recognition .......... 5 1.1.2 Perceptual Organization ..................... 9 1.1.3 Line Labeling ........................... 16 1.2 Problem Statement ............................ 20 1.3 Organization of the Thesis ........................ 26 2 Integration: Methodologies and Modules 27 2.1 Integration Frameworks ......................... 30 2.1.1 Optimization ........................... 31 2.1.2 Uniform Integration Frameworks ................ 34 2.1.3 Non-Uniform Integration Strategies ............... 36 2.1.4 Selecting an Integration Framework ............... 42 2.2 Selection of Modules for Integration ................... 43 2.2.1 Evaluating Proposed Modules .................. 45 2.2.2 Edge Detection .......................... 46 2.2.3 Continuity ............................. 51 2.2.4 Symmetry ............................. 54 2.3 Summary of Choices ........................... 57 3 Obtaining an Initial Line Drawing 59 3.1 The Representation of Piecewise Continuous Objects in a Blackboard 61 3.2 The Continuity Module .......................... 64 3.3 Creating the Initial Blackboard ..................... 70 3.3.1 Finding Regions .......................... 70 3.3.2 Creating the First Figure Hypothesis .............. 75 ix 3:1 Example: 3.5 Summary 4 The Revised 4.1 Detection 4.2 Common 1.3 Summary 0‘ Updating LinI 5.1 Correctinzl 5.1.1 til 5.1.2 311' 5.1.3 A; 5.1.4 L 5.1.5 (‘0 5.1.6 E): 5-2 Other lise 5.2.1 5).: 5-2-2 She 3'3 computati 6 CODCIUSions a 6.1 Contribmi 62 mee \Vc A Parameters B TeSt BEd or I 3.4 Examples of Processing .......................... 3.5 Summary and Conclusions ........................ 4 The Revised Line Labeling Module 4.1 Detection of Line Drawing Problems .................. 4.2 Common Problems with Line Drawings ................. 4.3 Summary and Conclusions ........................ 5 Updating Line Labeling 5.1 Correcting Errors in Line Labeling ................... 5.1.1 Mislabeled Vertices ........................ 5.1.2 Missing Arcs ........................... 5.1.3 Adding Phantom Vertices .................... 5.1.4 Other Line Labeling Improvements ............... 5.1.5 Control of Strategies ....................... 5.1.6 Experimental Results And Conclusions ............. 5.2 Other Useful Modules .......................... 5.2.1 Symmetry ............................. 5.2.2 Shape from Shading ....................... 5.3 Computational Complexity, and Timings ................ 6 Conclusions and fixture Work 6.1 Contributions of The Thesis ....................... 6.2 Future Work ................................ A Parameters B Test Bed of Images BIBLIOGRAPHY 84 94 97 98 108 120 125 127 128 131 135 143 145 149 152 152 154 156 159 161 162 163 166 175 2.1 The ABC] 3.1 The map; 3.2 A ramp Ct 3-3 Tangent _ 3.4 Initial pro 4-1 The initia «1.2 The numt 4-3 The dingy: ll Acompar PIOblems 5'1 The know errors ma 5.? The repai 5.3 A comm. Enelabeh 5'4 The resu] means th hé'pOtheS labeling ‘ that the mains tit that 110 c 0..) 5.6 0.7 The Com Commit; The run Al The para :‘12 T e Para 2.1 3.1 3.2 3.3 3.4 4.1 4.2 4.3 4.4 5.1 5.2 5.3 5.4 5.5 5.6 5.7 A.l A.2 LIST OF TABLES The ASCII version of the missing edge .................. 29 The mapping between modules and blackboard units .......... 63 A ramp edge from Figure 3.7 ...................... 73 Tangent and curvature properties of junctions ............. 82 Initial processing of test bed of images .................. 95 The initial labels for Figure 4.4 ..................... 103 The number of legal labelings for outside arcs. ............. 108 The diagnosis of problems in the test bed of images ........... 122 A comparison between the faults in the images from Table 3.4, and the problems that were diagnosed using the floating object heuristic. . . . 123 The knowledge sources used to create the line drawing and possible errors made ................................. 127 The repair knowledge sources used on the test bed of images ...... 150 A comparison between diagnosed problems and repaired problems in line labelings. ............................... 150 The result of the line labeling process. A “Yes” in the intial column means that a proper line labeling was found from the initial figure hypothesis. A “Yes” in the final column means that a correct line labeling was found after correction. An “A” in the final column means that the line labeling was correct except for a single error. A “B” means that a missing arc wasn’t replaced. A “No” in a column means that no correct line labeling was found .................. 151 The computational complexity of the algorithms used in this thesis. . 156 Computational complexity of repair knowledge sources. ........ 157 The run times for the sample images from Appendix B in minutes. . . 158 The parameters for the continuity grouping algorithm of Section 3.2. 164 The parameters from Section 3.3.1 ................... 164 A.3 The para Bl Characte 13.2 Characte 13.3 Charactc Bt Charactc A.3 The parameter values used in Section 3.3.2. .............. 165 8.1 Characterization of the images ...................... 171 B2 Characterization of the junctions. .................... 172 B.3 Characterization of the arcs ........................ 173 BA Characterization fo the surfaces ...................... 174 xii 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 LIST OF FIGURES The Marr paradigm. ........................... 6 A 2.5 D sketch of a cube .......................... 7 A simple generalize cylinder ....................... 9 A two level hierarchical representation of a hand ............ 9 The Gestalt rules of organization .................... 11 The effect of background density on linear grouping ........... l2 Lowe’s additions to Marr’s object recognition scheme [67]. ...... 13 Mohan and Nevatia’s non-uniform integrated system for perceptual organization from [75]. .......................... 14 Perceptual organization from the top down. .............. 15 A synthetic line drawing of a simple scene. ............... 16 Two legal line labelings for a cylinder. ................. 17 An accidental and non-accidental view of a cube ............ 18 An example of phantom vertices. .................... 19 An overview of the POLL system described in this thesis. Repair knowledge sources marked with an asterisk have not been implemented. 23 Some objects that are legal in the piecewise smooth object domain. . 25 An object that isn’t piecewise smooth. ................. 25 An intensity image with an invisible edge ................ 28 Position of Table 2.1 in the image ..................... 29 The Hearsay-II blackboard system .................... 38 A tree and a telephone pole ....................... 41 Lowe’s bicycle ............................... 47 A shortcoming of Canny edge detection. ................ 48 (a) An intensity image. (b) The output of running GNC on this image. 51 Hough spaces do not match human perception ............. 53 Lowe’s criterion for grouping proximate edge pixels .......... 54 Skew symmetry is not unique without terminations ........... 55 An example of heuristic symmetry. ................... 56 xiii 3.12 Cooper“: 31 An 0V8”? 3.2 The algor' 3.3 A Vorono. 3.4 A synthetl 3.5 A motion 3.6 Inputs for 3.7 The intez. shown in i 3.5 Thick ed; 3.9 A GNC St 3.10 The thins. 3.11 The count 3.12 A thinned 3.13 Initial ver 3.14 Initial V9: 3-15 An eitanrr 3.16 A image v 3.17 And imag 3-18 An object 3.19 The num‘: 2.12 Cooperation between SYMAN and the Grassfire Transform ...... 57 3.1 An overview of the initial processing. .................. 60 3.2 The algorithm for grouping based on continuity ............. 66 3.3 A Voronoi neighborhood of point P .................... 67 3.4 A synthetic continuity example. ..................... 68 3.5 A continuity example. .......................... 69 3.6 Inputs for the initial processing ...................... 71 3.7 The intensity values in the area of the image inside the square are shown in Table 3.2 ............................ 72 3.8 Thick edges produced by GNC. ..................... 73 3.9 A GNC segmentation failure. ...................... 74 3.10 The thinned edges. ............................ 75 3.11 The connected components. ....................... 76 3.12 A thinned image with noise. ....................... 77 3.13 Initial vertex and curve units for the synthetic image ......... 79 3.14 Initial vertex and curve units for the real image ............ 80 3.15 An example of perfect processing .................... 85 3.16 A image with two incomplete edges .................... 86 3.17 And image with mislabeled junctions .................. 88 3.18 An object as seen from a non-general vieWpoint ............ 89 3.19 The numbered curves and vertices obtained from the input in F ig- ure 3.18(a). ................................ 90 3.20 A legal line labeling for Figure 3.18 that is counterintuitive. ..... 91 3.21 A Curved object. ............................. 92 3.22 An image that was oversegmented .................... 93 4.1 An object that is not piecewise smooth ................. 99 4.2 Rotating an image changes convexity .................. 100 4.3 Four interpretations of a simple cube .................. 101 4.4 A cube with labeled junctions and curves. ............... 103 4.5 An image where the floating object heuristic fails ........... 104 4.6 Malik’s junction catalog. The “?” label indicates that this line could have any one of the six legal line labels .................. 105 4.7 Eight legal line labelings for an outside T junction. .......... 107 4.8 The outside junction catalog. ...................... 107 4.9 The revised line labeling algorithm. ................... 109 4.10 The junctions and curves found from processing Figure 2.1 ....... 111 xiv 4.11 [at The 1: 4.12 (a) The 1.1 close tha‘ 1.13 la) The i grouped '.i been groul 4.14 A L junc‘l 4.15 Paired 11.. 1.16 Curvatur' 1.17 (a) The i: «1.1-E (a) The i: labeling 1, 4.19 [3.1 Show labeling 5.1 An imag,. '53 The line 1’ 5'3 (a) The i; 5-4 Three hf} 5-5 The arc l; "6 (a) The i: 0.1 Three hy; The Corre 5'9 (a) The ir 5.10 An additi. 5.ll The line l 5'12 lal The in 5.13 Correct. In; 3.14 A Skew re 5.15 Lille labe1 . 1 ' Bl {mm 1 t 8.2 ages 7 1 B3 ImAgeS 13 B‘l ImAgeS 19 4.11 (a) The input image.(b) The curves and vertices. ........... 4.12 (a) The input image. (b) The curve endpoints near vertex v44 are so close that they are are grouped into a single vertex. .......... 4.13 (a) The input image. (b) Vertices v24 and v28 should have been grouped into a single vertex. Vertices v3 and v7 also should have been grouped. .............................. 4.14 A L junction and a curvature-L junction that are similar ........ 4.15 Paired mistakes cannot be diagnosed in cylinders ........... 4.16 Curvature-L junctions and three tangent junctions are problematic . . 4.17 (a) The input image. (b) The curves and vertices ............ 4.18 (a) The input image. (b) The result of processing. The modified line labeling paradigm failed at vertex v54. ................. 4.19 (a) Shows a legal labeling for a detected corner. (b) Another legal labeling without a corner. ........................ 5.1 An image with mislabeled vertex .................... 5.2 The line labels for the object in Figure 5.1 ................ 5.3 (a) The input image. (b) The labeled arcs and junctions ....... 5.4 Three hypotheses for the missing edge. ................. 5.5 The arc labels for the triangular prism .................. 5.6 (a) The input image. (b) The initial output. .............. 5.7 Three hypotheses for the missing edge. ................. 5.8 The correct line labels for Figure 5.7(c). ................ 5.9 (a) The input image. (b) The lines and curves .............. 5.10 An additional phantom vertex ....................... 5.11 The line labeling found from the object in Figure 5.9 .......... 5.12 (a) The input image. (b) The labeled lines and curves ......... 5.13 Correcting Improperly Grouped Vertices ................. 5.14 A skew rectangle ............................. 5.15 Line labeling and symmetry can cooperate ................ 5.16 Occlusion and symmetry .......................... B.1 Images 1 to 6 ................................ B.2 Images 7 to 12. .............................. B.3 Images 13 to 18 ............................... B.4 Images 19 to 22 ............................... XV 114 115 116 116 118 119 121 130 132 134 135 136 137 138 140 141 142 144 146 153 154 155 167 168 169 170 CHAPT Introdui Object recognitio: computational visi tational vision apj. robotic tasks such objects. which is a has no visual syst Since it would not can increase the u. and robust. Flexih rironment, for exa Chenges in the env Chnges in the pert St‘sterns is a di thou 011*: approach 1 W derived from i: 10 p61 40TH} mm be made. If 1 will have to be DTO] 0b j ect CHAPTER 1 Introduction Object recognition is one of the most important and frequently researched topics in computational vision, due to its potential for wide applicability to important compu- tational vision applications. Applications of object recognition include a variety of robotic tasks such as bin picking, where single objects are selected from a jumble of objects, which is a common task for a robot to perform in manufacturing. If a robot has no visual system, the objects that it tries to grasp must be accurately placed since it would not be able to see and analyze deviations in position. A visual system can increase the usefulness of a robot, but only if the visual system is both flexible and robust. Flexibility allows a robotic vision system to adapt to changes in its en- vironment, for example when a product line changes. Robustness means that small changes in the environment, such as spurious lighting, should make at most small changes in the performance of the visual system. Creating flexible and robust vision systems is a difficult task. One approach to object recognition is to match a representation of an object that was derived from image data to 3D descriptions of objects stored in a model database. To perform object recognition in this fashion, some representation of the image data must be made. If the representation of the image data is 2D, either the 3D models will have to be projected to 2D for matching, or a direct 2D to 3D matching will have to he done 35 in 2D image and n the 3D informati evident/9 and CO} ject recognition Wajjgm that 1 This paradigm C‘ nal sketch). 8X11 or intrinsic imagt then matched to In an intensi‘ Mmmmmfl and relative dire highly interrelate dhcult. There 1 38. 49. 871, shapt S1‘3160 114. 601. a 30m X” that car matflematical rig in a U 1 :general envir hnao ' The difficulty to be done as in [66]. Another alternative is to reconstruct 3D information from the 2D image and match this 3D information to the 3D object models. Reconstructing the 3D information from the image data is a good choice since the image can provide evidence and constraints for the reconstruction. A paradigm for performing 3D ob- ject recognition by reconstructing the shape from intensity images, called the Marr paradigm, that used the human visual system as a model, was discussed in [72, 9]. This paradigm consists of three steps: finding simple features in the image plane (pri- mal sketch), extracting the 3D shape of visible surfaces (2.5 dimensional sketch [72] or intrinsic image [9]), and using these surfaces to instantiate 3D primitives which are then matched to stored object models. In an intensity image, the light received by the camera is a function of several factors such as the shape and reflective properties of the surfaces, the number, type, and relative direction of light sources, and the viewing direction. These factors are highly interrelated making the direct recovery of the 3D shape of the imaged objects difficult. There is a broad literature on techniques such as shape from shading [35, 38, 49, 87], shape from texture [15, 16, 22, 56], shape from motion [5], shape from stereo [14, 60], and many other similar processes collectively referred to as “shape from X” that can be used to reconstruct shape. While these algorithms are gaining mathematical rigor [5], they still lack the robustness that is necessary to be useful in a general environment. If 3D shape cannot be reliably reconstructed from the 2D images, object recognition cannot proceed using the Marr paradigm. The difficulty of reconstructing shape from 2D images has lead computational vision researchers to look for other ways to obtain 3D information from 2D images. The human visual system offers some insight into this process as well [97]. The hu- man visual system can organize and interpret images even when 3D shape cannot be reconstructed. To do this, the salient features of the image, called tokens need to be identified, organized, and interpreted. The capability of the human visual system to gag? tokens tOE'I tents of the imaE-“l tokens in the 1m?- I he recurstt'el." grc | and competing gr I ist'ound. Groupir. been reconstructe . that a pair of par: would have little 1 can be grouped. i The nature of Illl.‘ Perceptual orr: Inon theme, instea and iImportance of Processes need to ad hOC methods, used. the resulting itnnr . - . one, PArtrcula group tokens together in a meaningful way, without any prior knowledge of the con- tents of the image, is called perceptual organization. Perceptual organization collects tokens in the image plane together into significant groupings. These groupings can be recursively grouped, or subdivided as the scene is further organized. Cooperating and competing groupings are explored until a globally good organization of the image is found. Groupings of tokens can be used to infer 3D properties that could not have been reconstructed on the basis of shape information alone. As an example, suppose that a pair of parallel lines were found in the image plane. Shape from X paradigms would have little chance of finding a reconstruction from this sparse data. If the lines can be grouped, however, a 3D inference can be made: the lines are parallel in 3D. The nature of this inference will be explained in Section 1.1.2. Perceptual organization is a broad term for a variety of processes with a com- mon theme, instead of being one specific process. To explore the general usefulness and importance of perceptual organization in the object recognition problem, these processes need to be combined. This combination could be done informally using ad hoc methods, or using a more structured approach. If informal methods are used, the resulting system may be nearly impossible to understanding, analyze, and improve, particularly as it became large enough to be useful. An alternative is to impose structure on the combination of perceptual organization processes using an integration framework. An integration framework is a method for permitting multiple sources of information to work cooperatively and competitively on the solution of a single problem in a structured and orderly way. The integration framework that will be used in this work is the blackboard system [32]. This thesis will describe the development of the POLL system (Perceptual Orga- nization for Line Labeling), to find labeled line drawings from single monochromatic intensity images using perceptual organization. A line drawing is a map of the depth and orientation discontinuities in the scene. A line drawing is said to be labeled if it he an interprets? but useful descri; an intermediate t more than a label The goal of ac. tious for a single ‘ til develop a ric}. level data and he terpret. and imprc database called tl. gives an archive 0 scene even if a corl may be useful in s 5686 a few surface Section 1.1 wil 10 understanding 1 Pel’Ceptua] Olgmiz ‘hfi thens will fol organization of th 6 has an interpretation of the 3D topology attached. Labeled line drawings are a terse, but useful description of the image data. The 2.5 D sketch, proposed to be used as an intermediate representation in 3D reconstruction for object recognition, is little more than a labeled line drawing with surface normal information added. The goal of achieving complete, correct line drawings in all cases is far too ambi- tions for a single thesis. The polyhedral case has been analyzed in [105]. This thesis will develop a rich representation of the image data that can both be found from low level data and be used by a general purpose intermediate level process to analyze, in- terpret, and improve the representation. This representation will be stored in a global database called the blackboard. A useful feature of a blackboard database is that it gives an archive of information and interpretations that have been derived from the scene even if a correct labeled line drawing is not found. This incomplete information may be useful in some circumstances. A robotic end effector, for example, may only need a few surfaces to determine a contact point for grasping. Section 1.1 will review the current literature in three areas that are important to understanding the purpose and sc0pe of the current research: object recognition, perceptual organization, and line labeling. The statement of the problem solved in this thesis will follow in Section 1.2. The final section, Section 1.3 will give the organization of the thesis. 1 .1 Background Using perceptual organization and inference to obtain 3D information from single in- tensity images was proposed by Lowe [67]. Lowe added paths to the Marr paradigm for object recognition to create a more general and robust object recognition system. The POLL system described in this thesis will use two of these additional paths to obtain some 3D information from single intensity images without performing shape reconstruction. surfaces. for exarr tinnit)’ in 3D or a etidence. beyond lla'r paradigm ar tions to the para-j lowe demons: tar take 2D grou; from 2D ETOUpirtg long history in co: 1.1.1 The l\ Discerning the id. sensor data is cal 19315 fall into one appt'OétCl] t0 the p: \ision. Recent litezl Based object recogh Modeling the hh “ion to Start, siri \ision. The humar. truerst at ever. Simdy mOdEllng ll. l $3.9m performs 0b 5 reconstruction. This 3D information consists of relationships between the imaged surfaces, for example whether an discontinuity in intensity represents a depth discon- tinuity in 3D or an orientation discontinuity. This thesis can be viewed as additional evidence, beyond that given by Lowe himself, that the paths added by Lowe to the Marr paradigm are useful. Section 1.1.1 will review the Marr paradigm. Lowe’s addi- tions to the paradigm, and perceptual organization will be discussed in Section 1.1.2. Lowe demonstrated the usefulness of 3D interpretation by describing rules that can take 2D groupings and produce 3D information. POLL obtains 3D information from 2D groupings in the image plane via labeling line drawings. Line labeling has a long history in computational vision, which will be reviewed in Section 1.1.3 . 1.1.1 The Marr Paradigm for Object Recognition Discerning the identity and location of known objects in the world on the basis of sensor data is called object recognition. Most computational object recognition sys- tems fall into one of two schools of thought: the school that seeks an engineering approach to the problem, and the school that tries to model and understand human vision. Recent literature reviews which emphasize the engineering approach to model based object recognition can be found in [10, 28, 104]. Modeling the human visual system seems like a natural place for computational vision to start, since the human visual system is the inspiration for computational vision. The human visual system is a large and complex system that is only partially understood at even the earliest levels of processing. If human vision were well under- stood, modeling it would be easier. Marr proposed a theory for how the human visual system performs object recognition in [72]. A similar theory was proposed by Barrow and Tennenbaum [9]. This theory divides the processing into three levels which move from the image centered primal sketch, through the scene centered 2.5 D sketch, and finally to object centered volumetric primitives (generalized cylinders). All processing Intensity Image(s) l Image Features I m 3 z o g o m m r1 H H- r: «v s o o a x e ! 2.5 D Sketch Recognition Object Models Figure 1.1. A paradigm for object recognition inspired by the human visual system [67]. is bottom up, and proceeds without feedback from higher levels. A schematic of the process, taken from Lowe [67] is shown in Figure 1.1. The first intermediate representation in the Marr paradigm is the image centered primal sketch. It has features such as edges, blobs, discontinuities, virtual lines, groupings, curvilinear organizations and terminations that can be directly found from the intensity image. Groupings are unique among the features in the primal sketch in that they are difficult to find, particularly relative to the other elements in the representation. Although there is a vast literature on clustering (see [55] for a good bibliography), the general grouping problem is difficult. More recent approaches to this problem, in particular that by McCafferty [73], will be discussed in Section 1.1.2. The second intermediate representation in this paradigm is a viewer centered rep- resentation called the 2.5 D sketch, or the intrinsic image in [9]. The features in this representation are the surface normals, depth discontinuities, orientation discontinu- L Figure 1.2. A gra represent discorrti sudace orientatio vector has a roug ities. and rough d implicitly has t‘r. discontinuities a: coordinate axis 1« that are tisible i , O l a" o o o 0 3°. -_ o 0 / / / / / / §_-_ 0 .......... / / / / / / _. .......... l / / / / / / g l l l / / / / / / g l l l l 3 l 'l I l l Figure 1.2. A graphical representation of a 2.5 D sketch of a cube. The solid lines represent discontinuities in depth and the dotted lines represent discontinuities in surface orientation. A circle represents an vector sticking out of the page. Each vector has a rough depth associated with it that is not shown. ities, and rough depth estimates for visible surfaces in the image. This representation implicitly has the scene segmented into surfaces, since the depth and orientation discontinuities are present. Viewer centered representations are 3D and have their coordinate axis located with respect to the camera coordinate system. Only surfaces that are visible from the camera vieWpoint are represented, separating it from the higher level object centered 3D representation which is independent of viewpoint. A simple 2.5 D sketch of a cube is shown in Figure 1.2. The short lines and circles represent the normal vectors to the surfaces of the cube. The dotted lines represent surface orientation discontinuities. The solid lines represent surface depth disconti- nuities. The approximate relative depth values that are a part of the 2.5 D sketch are not represented, but would be necessary for a true 2.5 D sketch. Many processes can be used to obtain the information necessary for the 2.5 D sketch. Shape from texture, shape from stereo, shape from shading, shape from contour. and 5h m“ by the hut: . I corszstent 2.5 D open questions 1 vision although The 2.5 D S.- centered represez. have been review V l geons [11. 94:. co tater three are us liars work lT‘T K '.' G C5 represent V0 9 . ”D Cr055 Section function that de on the axis. Fic axis - - “Dd nonlim .' l Slug 1e Scale 0f d mult- iple Scales i contour, and shape from motion (collectively referred to as “shape from X”) are all used by the human visual system and have been modeled for use in machine vision systems. How the outputs from these disparate processes are combined into a single consistent 2.5 D sketch, and the nature of interactions between these processes are open questions in both the understanding of human vision and in computational vision, although they are beginning to be addressed [5]. The 2.5 D sketch is the last intermediate representation before the 3D object centered representation. There are many possible 3D representations of shape, which have been reviewed in [10, 28]. Some other important representations of 3D shape are geons [11, 94], constructive solid geometry, superquadrics [86], and wings [27]. The later three are used primarily with range data. The 3D volumetric primitive used in Marr’s work [72], is the generalized cylinder (GC), also called the generalized cone. GCs represent volumes in a compact, generative form. The GC has an axis or spine, a 2D cross section which is consistent at every point along the spine, and an expansion function that determines the size that the cross section will have at this position on the axis. Figure 1.3 shows a typical CO, with a circular cross section, straight axis and nonlinear expansion function. GCs can be used as either a hierarchical or non—hierarchical description. In the non—hierarchical method, as in [81, 21], a single scale of description is used. In the hierarchical method, as suggested in [72], multiple scales are used. Figure 1.4 shows a human hand represented at two levels within a hierarchy. Object recognition is done in the Marr paradigm by matching the hierarchical generalized cylinders instantiated from the 2.5 D sketch to stored object models. The Marr paradigm has been widely discussed, extended, and criticized, and re- mains the single most influential model of the human visual system. The 2.5 D sketch, in particular, is a goal that is being approached by integration efforts like those of Aloimonos [3], and this thesis. Figure 1.3. A Sir and nonlinear €X‘ C figure 1.4. On t generalized cylin upon-ton reaction U )7 -’ \Q/ Figure 1.3. A simple generalized cylinder with a circular cross section, straight axis and nonlinear expansion function. Figure 1.4. On the left is a low resolution representation of a human hand using two generalized cylinders. On the right the same hand is represented at higher resolution. 1.1.2 Perceptual Organization The Marr paradigm has had enormous influence on the field of computational vi- sion. But human vision does not consist of the reconstruction of 3D shape alone. Human vision can also organize a scene into meaningful groupings without any ex- plicit shape information being present. An appreciation for this aspect of the human visual system dates back to the Gestalt school of psychology [114], which identified the first principles of what is now known as perceptual organization. These principles include proximity, similarity, good continuity, closure, symmetry, and figure-ground separation. Proximity groups tokens together based on their spatial location, as in Figure 1.5(a). Good continuity, as shown in Figure 1.5(d) collects the curves such that the curves on the left hand side are seen as two continuous intersecting lines, rather than two Geslt‘ti‘t group'illi dscussed in Sect Definition 1-1 ttm to dent‘f mi" of its contents it; In computati: tothe forefront ' stardom. meanir. argued that who have been unlike: segments should i very unlikely thaf lowe sharpened t of tokens is propt ‘0 have arisen by Collinear line segn S~“t’mertts are lSol at t]. , . we} are in a mon 10 rather than two separate objects that are coincidentally touching, as on the right. Gestalt grouping was performed computationally by McCafferty in [73], as will be discussed in Section 2.1.2. Definition 1.1 Perceptual organization: the basic capability of the human visual sys- tem to derive relevant groupings and structures from an image without prior knowledge of its contents [67]. In computational vision, the importance of perceptual organization was brought to the forefront by Witkin and Tenenbaum [115] who analyzed the importance of structure, meaning spatiotemporal coherence or regularity, in visual perception. They _ argued that when regular structure is observed, it is significant because it would have been unlikely to have arisen accidentally. For example, a pair of collinear line segments should be grouped into a single longer line segment since it would have been very unlikely that they would be aligned if they did not share a common causality. Lowe sharpened this idea in [67] when he proposed that the significance of a grouping of tokens is proportional to how unlikely the relationship between the tokens was to have arisen by accident, called non-accidentalness. In Figure 1.6, the same three collinear line segments appear on both the right and left hand sides. When the line segments are isolated, as on the right, the collinearity is perceived immediately. When they are in a more cluttered setting on the left, it requires attention to perceive the collinearity. The increased density of lines makes collinearity more accidental, and therefore less significant. Lowe proposed auxiliary paths to be added to the Marr paradigm for object recog- nition, described in Section 1.1.1, as seen in Figure 1.7. These additional paths give the visual system a way to use non-accidental properties to contribute to the 2.5 D sketch and to perform object recognition. The contribution to the construction of the 2.5 D sketch occurs through another application of non-accidentalness called 30 11 Proximity (a) O O O O O O Similarity (b) O O O O O O O 0 Closure E II I: I] E 3 Continuity (d) 5 f 5 Symmetry (e) E g E FigureGroundSeparation (f) Figure 1.5. The Gestalt rules of organization. (a) Tokens are grouped on the basis of proximity. (b) Tokens are grouped into pairs of dark and light circles using similarity. (c) Curves are grouped into closed figures. (d) Continuity dictates that these curves be interpreted as two crossing lines, instead of two touching shapes. (e) Symmetry groups these curves. (f) Figure ground separation causes the abstract figure to be seen in front of the rectangular background. lieu-re 1.6. In htF line segments. 1: the figure on the to tind the color. trjerence. 3D it. that led to any 5; there are a pair c tics of a pair ol ii the camera in a ' insuch a positic 3D. This is clost Section 1.1.3. There are m organization prir “Sing a relaxatit detect and verify or perceptual or T“"0 more ge 5101’]. and Moha both STOUp edgt Cam SlmCtUreS L. “Heron for p. 12 /l\\| / \— /\\_/\\l \ \ /—/|//\_/ \ \_ _/ |/— Figure 1.6. In both the figures on both the right and left side, there are three collinear line segments. In the figure on the right, the line segments are seen in isolation. In the figure on the left, there are additional noise line segments. It requires attention to find the collinear segments in the left hand image. inference. 3D inference uses the assumption that the camera was not in a position that led to any special relationships in the image plane. As an example, suppose that there are a pair of parallel lines in the image plane. These lines could be the projec- tion of a pair of lines that are not parallel in space, but that would require positioning the camera in a very singular position. Since it is unlikely that the camera was held in such a position, it is likely that parallel lines in the image plane are parallel in 3D. This is closely related to the general viewpoint hypothesis to be discussed in Section 1.1.3. There are numerous examples of special purpose systems that use perceptual organization principles. Tuceryan used perceptual organization to group dot patterns using a relaxation framework [112]. Rosin and West used perceptual grouping to detect and verify surfaces of revolution [100]. A curvilinear grouping algorithm based on perceptual organization was done in [111]. Two more general perceptual organization systems are Sarkar and Boyer’s system [102], and Mohan and Nevatia’s system [75]. These systems are similar in that they both group edge elements hierarchically into larger and more perceptually signifi- cant structures using non-accidentalness criterion. Sarkar and Boyer use a uniform framework for perceptual organization whereas Mohan and Nevatia use a non-uniform Figure I flatnework ShO' “5ng lines inst The distinc uniform {Rune Wildly Sign]. Wits easier the toms lha lOIIIl systems i hierarchy 0i p more ”bust, s is uniqudy 3d; dlSllDCtiO l3 "———> Intensity Image(s) ' L Image Features I m :- g 2 5 Perceptual m m n Organization '1 H- :r e g g 0 Verification x g , 2.5 D Sketch H Perceptual Groupings , 3D Inference Recognition *—- ' L Object Models [‘ Recognition Figure 1.7. Lowe’s additions to Marr’s object recognition scheme [67]. framework shown in Figure 1.8. An earlier version of the Mohan and Nevatia system using lines instead of curves in the domain of aerial imagery was shown in [74]. The distinction between uniform frameworks and non-uniform frameworks is that uniform frameworks use the same processing at every level in the hierarchy of per- ceptually significant structures. A uniform framework, like that of Sarkar and Boyer, permits easier extension because the basic grouping processing does not change as the tokens that are grouped become larger and more perceptually significant. Uni- form systems also have the advantage of being easier to implement and understand. Non-uniform frameworks can use different types of processing at each level of the hierarchy of perceptually significant structures. Non-uniform systems are usually more robust, since they can take advantage of special features and processing that is uniquely adapted to each level in the hierarchy of perceptually significant events. This distinction between uniform and non-uniform systems will be reinforced when Fir—1mm F—‘I figure 1.8. hloha oration from fr: 14 Symmetries t Proximity Figure 1.8. Mohan and Nevatia’s non-uniform integrated system for perceptual orga- nization from [75]. integration frameworks are discussed in Chapter 2. Sarkar and Boyer’s strategy for perceptual organization uses a relational graph data structure. Tokens (pixels, lines, joined straight lines, anti-parallel lines—APARS, and joined anti-parallel lines-PAPARS) are grouped using the same strategy at all levels of the perceptual hierarchy, making the system uniform. Each token is repre- sented by an attributed node in the graph. The attributes describe the properties of the token. For a straight line, the attributes are an identification number, the endpoints, the orientation, and the contrast across the edge. Compatible tokens are joined by an attributed arc, where the attributes in the linear case are proximity, angular compatibility, and continuity. Compatibility is determined by comparing a subset of token attributes to tolerances, using binning in the space of attributes. All selected attributes must be within tolerances for the tokens to be grouped. Fifi” 1.9. This contains the pic: the PQICCptual g: All Of the exar “P. or data (1m, The lime 00ntai; nation that a da if’Ee’ther into men of Spots can be 0} Perceptual 0,! not tied to any I: Figure 1.9. This image is hard to interpret without the top down information that it contains the picture of a dalmatian dog. Once the contents of the image are known, the perceptual groupings that identify the dog can be found [72]. All of the examples of perceptual organization discussed so far have been bottom- up, or data driven. Perceptual organization can also work top down, as in Figure 1.9. The figure contains a noisy image of a dalmatian dog. Without the high level infor- mation that a dalmatian is in the scene, it is nearly impossible to collect the spots together into meaningful groupings. Once the dog has been identified, the groupings of spots can be organized properly. Perceptual organization has the potential to produce flexible systems since it is not tied to any particular model of objects in the world. There are two important open questions in perceptual organization. The first question is which processes should be used to perform perceptual organization and the second question is how the combination of these processes should be done. This thesis will speak to both questions. 16 Figure 1.10. A synthetic line drawing of a simple scene. 1.1.3 Line Labeling In Lowe’s object recognition paradigm, there is a path for 3D inference from per- ceptual groupings. Lowe’s demonstrations of the use of this path are in the form of inference rules where particular groupings of 2D tokens give a 3D interpretation. There is another type of 3D inference that has been used in computational vision for many years: labeling a line drawing. A line drawing is a map of the orientation and depth discontinuities in a scene, which are two important aspects of the 2.5 D sketch. Figure 1.10 shows a line drawing of a simple scene containing a sphere, a cube and a triangular prism. Lines a, b, e, g, and h are depth discontinuities. Lines c and d are orientation discontinuities. Line f could be either an orientation discontinuity or a depth discontinuity depending on whether the back face of the triangular prism is touching the front face of the cube. Line drawings are labeled with symbols that describe the local geometry. The assignment of labels is constrained by an exhaus- tive list of combinations of labels that can be assigned within a prescribed domain of objects called a junction catalogue. There are three basic classes of edges (variously named by different authors) limb edges, jump edges, orientation edges. A limb edge is a depth discontinuity that is /\ . '1 figure 1.11. A C). labelings is only the in air. The interpre formed when the lir itself with respect 1 object, where two c are labeled s... 0 hand side of the ar 0D POlthdral obje depth that is not a directionEil convent 17 O C) l l l l Figure 1.11. A cylinder and two line labelings. The distinction between these two labelings is only the bottom edge. The interpretation on the left is a cylinder floating in air. The interpretation on the right is a cylinder resting on a table. formed when the line of sight is tangent to the imaged surface and the surface occludes itself with respect to the viewer. A limb edge is therefore not a physical edge on the object, where two distinct surfaces meet, but is an artifact of vieWpoint. Limb edges are labeled «-—s— or —e-> with the surface closest to the viewer being on the right hand side of the arrow as it is followed from tail to head. Limb edges do not occur on polyhedral objects. A jump edge is an edge which represents a discontinuity in depth that is not a limb edge. Jump edges are labeled as -+ or 4— using the same directional convention as a limb edge. Orientation discontinuity edges are formed when two surfaces with different tangent planes meet. The imaged surfaces must be physically adjacent in 3D, and both visible in the image. At a convex edge, labeled + the visible angle between the surfaces is greater than 180°, otherwise the edge is labeled concave —. Figure 1.11 shows two labelings for a right circular cylinder with a planar top face. The interpretation on the left is a cylinder floating in air (making the bottom edge a depth discontinuity). The interpretation shown on the right is a cylinder resting on a table (making the bottom edge an orientation discontinuity). In [79] the mathematical properties of line drawings were explored. Figure 1.12. (a) A i th.: viewpoint will topology of the line an e hall of viewpo draWing remains cor Junction labelin extended to general in [113], and origarr. apolyhedral line tin of the junction cata‘ World. each edge is f Line labeling usr 18 (a) (b) Figure 1.12. (a) A cube is shown from an accidental vieWpoint. Any deviation from this viewpoint will cause another side of the cube to become visible and change the topology of the line drawing. (b) A cube from a non-accidental vieWpoint. There is an 6 ball of viewpoints around the current position where the topology of the line drawing remains consistent Junction labeling was originally done in the trihedral world [51, 29], and later extended to general polyhedra [68], shadow edges and surface markings were added in [113], and origami objects were used in [58]. To automatically label the edges of a polyhedral line drawing and obtain a physical interpretation, an exhaustive search of the junction catalog for the object domain of interest is done. In the polyhedral world, each edge is further constrained to have a unique label along its entire length. Line labeling usually requires that the imaged objects be viewed from a general viewpoint. An object is said to be viewed from a general viewpoint when that overall structure of the line drawing of the object is consistent over all viewpoints in a small sphere of viewpoints surrounding the current viewpoint, called an 6 ball of vieWpoints. The metric properties of the line drawing will vary over this 6 ball, but the general topology must remain the same. Figure 1.12(a) shows a non-general view of a cube. A general view of the same cube is shown in Figure 3.18(b). A junction catalog for piecewise smooth objects was derived by Malik in [69]. Piecewise smooth objects differ from polyhedra in that the surfaces are not restricted Figure 1.13. This ‘ inside upper curved edge. and back to edge and a part of to be planar. but at shadows and marki definition of this dc in the piecewise sm labels along their l1 Vertex. called a ph shows an object it edge, (shown by a lhe effects of pha established, There are a assumed to be per the curvilinear “.0 19 Figure 1.13. This horseshoe cylinder has two phantom vertices. Both are on the inside upper curved edge. Here the edge changes from being a jump edge to a convex edge, and back to a jump edge with no real vertices intervening. The occluded limb edge and a part of the bottom edge are shown with dotted lines. to be planar, but are only restricted to be C3. Origami objects, wire frame objects, shadows and markings are not permitted in Malik’s object domain. A more precise definition of this domain will be given in Section 1.2. Unlike the polyhedral domain, in the piecewise smooth object domain curved edges do not necessarily have identical labels along their length. This labeling change in the middle of a curve creates a false vertex, called a phantom vertex on each curved edge in the line drawing. Figure 1.13 shows an object with two phantom vertices. Phantom vertices occur when a limb edge, (shown by a dotted line on the right hand side), is occluded by the object itself. The effects of phantom vertices on the combinatorics of line labeling have not been established. There are a number of difficulties with the line labeling paradigm. The input is assumed to be perfect in that no edges are missing, and there are no spurious edges. In the curvilinear world, changes in tangent direction must be distinguished from changes in curvature, which is a difficult process. Even with richer range images, accurately finding curvature is a challenging problem [36]. The interpretations produced by line labeling are not guaranteed to be unique, although they are usually limited to a small number of labelings. Finding polyhedral line drawings from imperfect range data was done in [62]. In the polyhedra group of 9 blocks 0 was successfully per :1052. In the polyhe planar. This makes PTOblem. A similar Such “5 quadric sur many objects. like . lhe basic theorems of linear equations quadric surfaces. sc general a the C 3 5 Line labeling 1 Object recognition classifies lines ant Perfect line label] liuel abellngs ate (All only produce 20 In the polyhedral world, line labeling and some object completion was done on a group of 9 blocks of known size and shape in [34]. General polyhedral line labeling was successfully performed, with the help of some manual editing, on intensity images [105]. In the polyhedral world all edges must be straight lines, and all surfaces must be planar. This makes it possible to use linear equations to constrain the solution of the problem. A similar constraint could be used with a more complex parameterization, such as quadric surfaces. These surfaces are still not sufliciently general as there are many objects, like an hourglass, that cannot be represented with quadric surfaces. The basic theorems that control the existence and uniqueness of solutions to systems of linear equations do not extend into more general parametric representations like quadric surfaces, so the probability of extending this approach to surfaces that are as general as the 03 surfaces is remote. Line labeling for curved-surface bodies was done by Chakravarty in [25]. 3D object recognition using curved objects was done in [26]. Chakravarty detects and classifies lines and then extrapolates the junctions. To perform the matching, the perfect line labelings from object models are analyzed to determine which imperfect line labelings are likely to arise from the given object model. Chakravarty’s approach can only produce a perfect line labeling after matching to the object model database. 1.2 Problem Statement Improving the robustness and flexibility of computational vision has been approached from many different perspectives. Perceptual organization is promising because of its generality, which encourages flexibility. Perceptual organization is not a single pro- cess, but rather a group of processes that cooperate and compete to find a good interpretation of the imaged data. Controlling this cooperation and competition is an important research issue, called integration. Previous efforts in perceptual orga- 21 nization have used a non-uniform but fixed structure, as in Mohan and Nevatia, and the uniform structure of Sarkar and Boyer. Both of these systems are data driven, which does not allow information to flow from the top down to improve the percep- tual organizations, unlike the system described in this thesis. This thesis will describe an integrated system for perceptual organization and 3D interpretation that obtains labeled line drawings from single intensity images. The system uses two of the paths added to the Marr paradigm for object recognition by Lowe: perceptual organization and 3D inference. This thesis will describe the construction of the POLL system (Perceptual Orga- nization and Line Labeling) that can find labeled line drawings from single intensity images. The input into the system will be a single intensity image of an object, or ob- jects in an object domain which will be precisely described shortly. The output from the system will be a line drawing labeled with a 3D interpretation, and a database of intermediate representations and partial results that may be useful for further pro- cessing. Finding labeled line drawings is a very difficult problem, and there will be some images which are not correctly processed. In these images, it is desirable to ob- tain intermediate information that could be used by other processing to work towards the goal. The integration framework to be used is the blackboard system [32]. A blackboard system, which will be further explained in Section 2.1.3, uses a common database called the blackboard to hold input, partial solutions, and control data. These data objects are processed by independent knowledge sources, each of which specializes in one type of processing. These knowledge sources evaluate the current blackboard configuration, look for opportunities to contribute to improving the solutions, and perform their specialized processing. The results of this processing are left on the blackboard for other knowledge sources to opportunistically use. Figure 1.14 shows the overall structure of the system. The knowledge sources are divided into for These knowledge St ration. They are a and make other CO terpretive knowled: aline drawing and terpretation of the diagnostic and llllt‘ source. This knowh is suitable for fixin; are Cilietl by the co PCYCCPlual knowleci global diagnostic in repair knowledge Si recomputation. The perceptual detection [ll]. an a [ill]. a module tlnl the type of line drl particular modules 22 are divided into four groups. The first group are the perceptual knowledge sources. These knowledge sources bootstrap the initial line drawing using perceptual organi- zation. They are also available to do specialized work for other knowledge sources, and make other contributions to the blackboard database. The diagnostic and in- terpretive knowledge source is a modified line labeling paradigm that can examine a line drawing and locate inconsistencies. If there are no inconsistencies, a 3D in- terpretation of the line drawing will be found. The information produced by the diagnostic and interpretive knowledge source is available for the control knowledge source. This knowledge source selects which of the available repair strategies, if any, is suitable for fixing a problems with the line labeling. The repair knowledge sources are called by the control knowledge source to fix the line labeling, and in turn call the perceptual knowledge source to reevaluate previous results in light of the the more global diagnostic information that is available. This system is integrated because the repair knowledge sources can return results to the perceptual knowledge sources for recomputation. The perceptual knowledge sources used in this system are weak membrane edge detection [13], an algorithm that performs edge grouping on the basis of continuity [111], a module that analyzes vertex proximity, and a module that can determine the type of line drawing junctions. Section 2.2 will explain the selection of these particular modules. A modified line labeling algorithm will be used as the diagnostic and interpretive knowledge source. The domain of objects that will be used in this thesis is a subset of the piecewise smooth objects used in [69]. Using the notation of Malik, let S,- be a bounded 03 coordinate patch in R”, such that the boundary for 5',- consists of a finite number of piecewise smooth simple closed curves. Let B¢(P) be an 6 ball about point P. Let B¢(P; M) E B¢(P) n M be the e neighborhood of P relative to M, where M C R3. Definition 1.2 A 03 piecewise smooth surface in R3 is a subset M C R.3 for which L‘ I I | . I l I I I I Repair Knowledge S 23 Perceptual Knowledge Sources Diagnostic and Interpretive Knowledge Source 7.7 g l Controller I L- I I I I I ................. Repair' Knowledge Sources : ................. : I I : : : l Reclassify In I : I , I I I I I I I I I : Add Art I l I I I ' I I I I , I : : : Add Phantom l g I , I I I I I : : —'[ [Group Junction ‘ l .1; : a : I I I I I I : Divide Junction "' g I , I n I I I I I a u l l Smooth Curve ‘ J l : a : I I I I Figure 1.14. An overview of the POLL system described in this thesis. Repair knowledge sources marked with an asterisk have not been implemented. time 611525 a call!- I l:l...n a 51‘, J (I ‘- tl'ee cortdtttomS f: 1. Theft crisis ‘4 2. There 61’le l. Btlps“[) : [L at P. 3. There are m BAPUVI = surface eleme' An additional re the object domain: object is limited to ‘ in Figure 1.15. Exa: origami objects. an not legal because th 24 there exists a collection of surface elements 5' = (Sl,Sg,...,S,.) with each of the S;,i = 1...n a subset of M. For every point P 6 M, exactly one of the following three conditions is true [69]: 1. There exists an S.- E S such that B¢(P; M) = B.(P; S.) for some 6 > 0. 2. There exist two surface elements 5; and Sj in .S' such that for some 6 > 0, B¢(P, M) = B¢(P; S;)UB¢(P;S,°). The tangent planes to S,- and 55 are distinct at P. 3. There are m 2 3 surface elements S;,,S.°,,...S B¢(P; M) = B¢(P;5.-,) U B.(P;S.-,) U ...B¢(P;S.-,,,). For any pair of these such that for some 6 > 0, surface elements, the tangent planes are distinct at P. An additional restriction is added in this thesis to help control the complexity of the object domain: the number of surfaces that can meet at a vertex in a domain object is limited to three. Some examples of objects that are in the domain are shown in Figure 1.15. Examples of objects that are not in this domain are wireframe objects, origami objects, and a single nappe of a right circular cone. Wireframe objects are not legal because there are no surface elements. Origami objects are not legal because some of the edge points are adjacent to only one surface instead of the required two surfaces. A single nappe right circular cone is not legal because it is neither smooth at the apex, so the apex cannot satisfy condition 1, nor is the apex adjacent to two distinct surfaces. A bent stove pipe, as in Figure 1.16 is also not within this domain since there is a point along the edge joining the two parts where the tangent planes of the two surfaces are equal. There are a group of other assumptions that must be made for the line labeling algorithm to be applicable. The projection model for the image must be orthographic. Orthographic projection plus a linear scale factor, called weak perspective, was shown 25 Figure 1.15. Some objects that are legal in the piecewise smooth object domain. Figure 1.16. An object that is not in the piecewise smooth object domain since the tangent planes to the surfaces are equal at some point along the edge. w be an approxime a general viewpom 1.3 Organ The organization ( perspective in the f modules will be an evaluation of ti explain how the inf l‘DOWledge sources have been properl; the diagnostic and n~‘PYESemation of defects Can be co: Chapter, lo Chapter r further Tesezm 26 to be an approximation to perspective projection in [52]. The objects must be seen in a general vieWpoint. Shadows, texture and markings are not allowed on the objects. 1.3 Organization of the Thesis The organization of the thesis is as follows. Chapter 2 will place this research in perspective in the context of the current vision work. A list of desirable properties for modules will be given and discussed. The choice of the integration framework and an evaluation of the modules available in the literature will be done. Chapter 3 will explain how the initial analysis of image information is performed with the perceptual knowledge sources. Chapter 4 will give the method proposed to discern images that have been properly analyzed by the initial processing from those that have not using the diagnostic and interpretive knowledge source. Once inconsistencies in the initial representation of the image have been found, Chapter 5 shows how some of these defects can be corrected using the control and repair knowledge sources. The final chapter, Chapter 6, will list the contributions of the thesis and suggest related topics for further research. CHAP'. Integra Module Visual Perceptuai COllecIion of proc. ETOUpings_ Flndix The number of p criteria for What ‘0 COoperzite am Processes need tc and “penise to All example ( hedral Scene Can the Simpler Poly L ale to Pr0cess ‘ CHAPTER 2 Integration: Methodologies and Modules Visual perceptual organization is not a single cohesive process, but rather a diverse collection of processes working towards the goal of organizing images into meaningful groupings. Finding good groupings in images is an inherently difficult search problem. The number of possible subsets of image tokens is enormous, and there are myriad criteria for what makes a good grouping. For these perceptual processes to be able to cooperate and compete to find meaningful relationships within the image, the processes need to be integrated. Integrated processes can use each others constraints and expertise to guide the search for good groupings. An example of the need for integration is shown in Figure 2.1. This simple poly- hedral scene cannot currently be labeled using curvilinear line labeling [69], or even the simpler polyhedral line labeling [113]; Sugihara’s line labeling system would be able to process this image only if the missing edge were added by hand [105, pg. 178]. The imaged object is within both the domain of polyhedral objects and the domain of piecewise smooth objects, but the front corner edge was not found by the edge detector. It is unlikely that any edge detection scheme could recover this edge, since edge detection relies upon spatial variation in intensities and there is no spatial 27 fit a d C .uly 0 28 (b) Figure 2.1. An intensity image of a simple square block. (a) Notice that there is almost no variation in intensity across the front corner of the block. This makes it difficult for edge detection to find this edge without finding other noise edges. (b) Canny edge detection was run on the image in (a). An edge parallel to the top and bottom of the block was found but the front corner is missing. This extraneous edge occurs because the side of the block is curved. variation in intensity values across the front edge of the cube. This can be seen in Table 2.1 where the raw intensity values of the front portion of the rectangular prism are shown. Line labeling modules require perfect input and cannot recover from edge detection mistakes. And edge detection modules cannot give perfect edges. There- fore, even an unavoidable edge detection failure, like that in Figure 2.1, will result in a failure of line labeling. This serial failure of modules is one of the problems that integrated systems hope to ameliorate. Integrating the perceptual organization processes processes into a single, general computational vision system is an ambitious goal. If the processes are combined in an informal fashion, the resulting system may be difficult to understand and ex- pand. Therefore, a more formal method for integrating processes, called an integration framework is needed. Integration frameworks have been used within the shape from X paradigms, as is suggested in the Marr object recognition paradigm where the 2.5 D sketch integrates processes such as stereopsis, shape from motion, shape from figure 2.2. T Where the inte '0 “A b—J tQ M H é;D p... I‘J (\D b—fi (A, (\J C (X H H to M ~J CH p—n h. H («0 (\D Hp—a (.5 HHfi—‘F—‘h—‘HH mmrgrnrn / H .5' 29 Figure 2.2. The input image from Figure 2.1 with a highlighted rectangle showing where the intensity values in Table 2.1 occur in the image. Table 2.1. The intensity values from the top front corner of the rectangular prism shown in Figure 2.1 The numbers in the 1905 are from the top of the prism. The numbers in the 1305 are the sides of the prism. There is no variation in intensity across the front edge of the prism. 124 121 117 113 125 154 178 1897 192 193 195 194 196 197 124 122 117 114 114 132 163 183 191 193 194 195 195 196 125 122 120 117 113 119 143 170 186 191 193 194 195 195 128 125 122 119 116 113 121 151 179 192 195 193 193 189 130 127 125 123 120 115 116 136 166 183 185 180 170 160 132 129 128 126 121 118 115 123 142 157 154 143 132 126 134 132 131 129 125 121 119 116 121 124 120 114 114 116 134 133 132 129 126 122 120 119 119 117 117 120 122 123 135 135 133 131 128 126 124 125 126 126 126 125 127 127 134 135 134 132 129 129 129 126 128 128 127 127 128 129 138 137 135 134 133 133 132 131 132 132 131 130 130 130 139 139 137 136 135 134 134 132 132 132 132 132 132 131 139 139 137 134 135 136 136 136 135 134 135 134 134 135 141 140 138 137 137 138 139 138 139 136 136 136 136 136 141 141 140 137 138 140 140 138 138 138 138 138 141 141 shading. color. imageis). An i. In order to addressed. The i"hich perceptu discuss the inter the blackboard ‘0 he used in th Section 2.2- 2.1 Integ Integration has b quantity of recent as the recent inte the dissatisfaction struction techniqt dissatisfaction of t modules and para 15 improving the rt i . Important that svs a Ser' ' 1&1 cornhmatio I ~ Obust m Its Weak 30 shading, color, texture, and edge detection, to help reconstruct 3D shape from input image(s). An integrated system for Gestalt grouping has also been created [73]. In order to integrate perceptual modules, there are two questions that must be addressed. The first is which integration framework will be used and the second is which perceptual modules should be integrated in this framework. Section 2.1 will discuss the integration frameworks that are used in computational vision, and select the blackboard framework as the model for integration for perceptual organization to be used in this thesis. The perceptual modules to be integrated will be chosen in Section 2.2. 2.1 Integration Frameworks Integration has become an important research topic in computational vision, as the quantity of recent literature on the subject attests [4, 31, 41, 76, 77, 62, 73, 3]. Just as the recent interest in perceptual organization can be interpreted as a measure of the dissatisfaction of the vision community with the weaknesses of 3D surface recon- struction techniques, interest in integration can be interpreted as a measure of the dissatisfaction of the vision community with the lack of robustness of the individual modules and paradigms that are currently available. The primary goal of integration is improving the robustness and flexibility of computational vision algorithms. It is important that systems that are created from a collection of modules be stronger than a serial combination of the individual modules. An integrated system that is only as robust as its weakest module is of little value. Integration techniques can be divided into two categories according to the unifor- mity of treatment of the modules. Uniform integration schemes treat every module being integrated in an equivalent fashion. This is similar to Sarkar and Boyer’s uni- form perceptual organization system [102] where the same grouping mechanism was used at every 1 simplicity. and power. since 5} sacrificing the unique to the allows unique robust system Awmmm that optirniza Optimization ' Will then be . dscussed in E 31 used at every level in a hierarchy of perceptual events. Uniformity has an attractive simplicity, and can be easy to control. These virtues are paid for by a loss of some power, since special attributes of individual modules cannot be incorporated without sacrificing the uniformity. Non-uniform integration strategies use methods that are unique to the specific modules being integrated. This can be advantageous since it allows unique attributes of the modules to be exploited, which may produce a more robust system. A commonality between both uniform and non-uniform integration frameworks is that optimization is frequently used. There are a variety of techniques for performing optimization that will be discussed in Section 2.1.1. Uniform integration strategies will then be discussed in Section 2.1.2. Non-uniform integration strategies will be discussed in Section 2.1.3. 2.1.1 Optimization Many integration strategies use some form of optimization. Optimization takes a heuristic criterion function that evaluates the goodness or desirability of a configura- tion and then searches the space of all possible configurations for the configuration that maximizes (or minimizes) this function. An example of a system that uses op- timization is the contour tracing paradigm of [7, pg. 136]. This paradigm uses two criteria for the goodness of an edge: the strength of the edge (from a local edge de- tection method, such as the Sobel operator), and smoothness of the edge. The best set of smooth edges, as defined by this heuristic criterion function, are found using dynamic programming. Dynamic programming is only rarely applicable in optimiza- tion problems since it requires the criterion function to be of a specific, non-general form, but when it is applicable it is guaranteed to find the best solution. A common optimization method in computational vision is energy minimization. Energy minimization is distinguished from other optimization methods by the pres- ence of a Phl'SiCa' distributions and - used for image rec: terr. with an energ 0‘ I h between Gibbs d1: Gibbs distributior. - . I Is performed usrn; a posteriori (MAI 5.43} Optimizatior ential Equations ( minimization usin Section 3.2. Vt hen energy the solution foun smoothness as a t its solution exists nation from the: tision problems edge detection. 1 functions that a' iZation [89]. 35 y, and that the a norm. 32 ence of a physical model that represents the goodness of a configuration. Gibbs distributions and Markov Random Fields (MRF) define an energy function which is used for image reconstruction. The physical model used is a statistical mechanics sys- tem with an energy function that determines its Gibbs distribution. The equivalence between Gibbs distributions and MRFs is used to change the more easily understood Gibbs distribution into a more mathematically tractable MRF. Energy minimization is performed using the stochastic simulated annealing method to find the maximum a posteriori (MAP) estimate of the image contents given the degraded observations [42]. Optimization can also be done using energy minimization using Partial Differ- ential Equations (PDE), and the Euler-Lagrange equations. An example of energy minimization using PDEs is the curvilinear grouping paradigm to be discussed in Section 3.2. When energy minimization problems are being solved, it is often desirable that the solution found be smooth. Regularization theory is the rigorous approach to smoothness as a constraint. A mathematical problem is said to be well-posed when its solution exists, is unique, and depends continuously on the initial data. Any de- viation from these conditions causes a problem to be ill-posed. Many of the classic vision problems are ill-posed, including surface reconstruction, shape from X, and edge detection. Regularization works by limiting the choice of solutions to a class of functions that are maximally smooth. There are two principle methods for regular- ization [89]. Assume that one is trying to solve the problem of finding 2 from the data y, and that the problem is ill-posed. Let A be a linear operator, and I] - [I represent a norm. Az=y In the first meth Here P is a lint is Within 6 of ill stabilized functic The parameter . smoothness. Larg Although the model the physic lems to all image shape from shadi prion). In [107]. ' using nonquadrat discontinuous. Ir. tion at points of . regularization by are salient in ima di ' . SCOHllnultieS in 33 In the first method, one chooses the solution which satisfies: min [I Pz || "Al-3H5! Here P is a linear stabilizing functional. The solution 2 is found such that A2 is within 6 of the desired data y and 2 has maximal smoothness by selecting the stabilized function of minimum norm. In the second method one minimizes: llAz-YII‘HHPZ” The parameter A determines the relative importance of faithfulness to data and smoothness. Large A values favor smoothness. Although the classic regularization theory is mathematically rigorous, it doesn’t model the physical world well in all respects. The stabilizer is applied in such prob- lems to all image areas equally, including areas of discontinuity (this is avoided in shape from shading by assuming that the contour of the isolated object is known a priori). In [107], Terzopoulos offers an improvement on standard regularization which using nonquadratic stabilizing functions with continuity control functions which are discontinuous. In this method, two binary variables are set to turn off regulariza- tion at points of depth or orientation discontinuity. This improves the usefulness of regularization by preventing it from smoothing over discontinuities. Discontinuities are salient in images, as can be seen by the inclusion of both depth and orientation discontinuities in the 2.5 D sketch of Marr, and as such should not be smoothed. Optimization minimizes a functional over the space of all possible solutions. This space is often prohibitively large, so a wide variety of techniques for finding interpre- tations that minimize complex functionals have been developed. If the functional is of the appropriate form, dynamic programming can be used to find the best solution. lithe functional i: also called hill clil descent may get tr plex functionals a: Deed lo llSCd. Th ods and stochastit is shown in Sectio fitting a weak me; cry minimization applicable. but can ninistic algorithm nitima instead, 5 good glOba] minim Optimization “ as well as (“.0 of t 21 2 Unlfo lute grallOD frame tb _ e nonuniform in I non-uniform lute .ln example of a ‘ 34 If the functional is convex over the space of all interpretations, then gradient descent (also called hill climbing) can be used. If the energy functional is not convex, gradient descent may get trapped in a local minima and never reach the global minimum. Com- plex functionals are rarely convex, so more sophisticated minimization methods often need to used. These methods can be divided into two classes, deterministic meth- ods and stochastic methods. An example of a deterministic minimization method is shown in Section 2.2.2, with the Graduated Non Convexity (GNC) algorithm for fitting a weak membrane to data. Simulated annealing is a common stochastic en- ergy minimization method [42]. Deterministic algorithms like GNC are not generally applicable, but can work well when they are applicable as in [12]. More general deter- ministic algorithms like gradient descent may not find global minima, but find local minima instead. Stochastic methods are often computationally expensive, but find good global minima. Optimization will be used in both uniform and non-uniform integration strategies, as well as two of the modules that are integrated. 2.1.2 Uniform Integration Frameworks Integration frameworks can be divided into two groups, the uniform frameworks and the non-uniform frameworks. Uniform integration frameworks are distinguished from non-uniform integration strategies by the similarity of treatment of the modules. An example of a uniform integration strategy used for sensor fusion is Moravec’s certainty grid from [76]. A certainty grid is a 2D lattice that represents the likelihood that a given square on the floor of a robotic workspace is empty. Multiple range sensor measurements are obtained from a diverse set of sensors on a robot. These sensor measurements are integrated using Bayesian reasoning to iteratively update the likelihood that each square is empty based on the latest sensor readings. The uniformity of this strategy comes from using the same Bayesian updating rule for all of the different t}‘ Gibbs distribt work. Gamble. C as texture. motio is similar to a lin tion discontinuit} edge] or shadow. module from colc of a line process. homogeneous reg MRF. Aloimonos pr Aloimonos SUgge: lives of the data disContinuous re! Conlinuititm. Sin: la“ the Continu them than that t the palafllttters . 1. MCCafieny I] l” in I73]. rat (Tet-lie a Single e “”0“ shown in th‘35e terms has Sir ‘ amt. The mi] 8 35 of the different types of sensors. Gibbs distributions, and MRFs can also be used as a uniform integration frame- work. Gamble, et. al. use edge detection in combination with other modules such as texture, motion, stereo and color [41] to create a good edge map. This edge map is similar to a line drawing, with each edge labeled as a mark, specularity, orienta- tion discontinuity (convex or concave edge), occlusion discontinuity (jump or limb edge) or shadow. The MRF model first combines edge detection with another single module from color, texture, motion, and stereo. Each of these MRF models consists of a line process, which models the edges, and a continuous process, which models homogeneous regions. These four MRF modules are then combined into a single MRF. Aloimonos proposed a uniform integration framework in [3]. In this framework, Aloimonos suggests that in addition to modeling the data and constraints, the deriva- tives of the data and constraints should also be modeled. He calls this framework discontinuous regularization. Fitting the derivatives has the effect of preserving dis- continuities. Since the discontinuities in an image are in some sense more important than the continuities, this idea has merit. This result produces a more uniform treat- ment than that of Terzopoulos, but has many parameters. A learning algorithm for the parameters is provided. McCafferty uses energy minimization to perform integrated perceptual organiza- tion in [73]. Taking his cue from the Gestalt school of psychology [114], he tries to create a single energy functional that captures some of the Gestalt rules of organi- zation shown in Figure 1.5. This functional includes terms for proximity, variance of proximity, similarity, closure, continuity, and the variance of continuity. Each of these terms has an arbitrary weight which represents the importance of this con- straint. The minimization of the energy functional is done by simulated annealing. He then shows that a single set of weights is inadequate for the task at hand and argues that these the examples gm organization. the they differ by as r 2.1.3 Non-l Any integration st ties discussed in 3 tion is widely used hne labeling [70]. ( tral integration fr, a}. An example of the quadtree regio detection With reg lm‘ and SOlVes a dim" arise with are not generally I the detection of ter 3 or the planar surfa t ‘ . it t ' methods. locate, examl shape from Shodin l 50 «2.1th scheme 36 argues that these weights should be set differently for different perceptual tasks. In the examples given, which show some interesting and unique results for perceptual organization, the weights vary wildly. In one example they are all equal, in others they differ by as much as five orders of magnitude. 2.1.3 Non-Uniform Integration Strategies Any integration strategy that does not exhibit the uniformity of the integration strate- gies discussed in Section 2.1.3 will be classified as non-uniform. This type of integra- tion is widely used for integrating pairs of processes, such as shape from shading and line labeling [70], or shape from texture and motion in [3]. There is also a more gen- eral integration framework for non-uniform integration, called a blackboard system [32]. An example of non-uniform integration is shown in [84], where a shortcoming of the quadtree region splitting and merging algorithm is overcome by combining edge detection with region grouping. The method is laden with heuristics and parame- ters, and solves a problem that is unique to quadtree segmentation. This problem doesn’t arise with other region segmentation paradigms, and therefore the heuristics are not generally useful. In [1], focus, vergence and stereo were integrated. In [16] the detection of texture elements (texels) is combined with extracting the parameters of the planar surface. In this method, texels are selected from a pool of candidate texels using exhaustive search to find a subset of texels that provides a unique pla- nar interpretation. Both of these methods show improved results over unintegrated methods. Another example of non-uniform integration is Malik and Maydan’s integrated shape from shading and line labeling in [70] where line labeling was used to create some of the boundary conditions that are necessary for shape from shading. The reg- ularization scheme of shape from shading is modified to include an additional term at nonvlimb edger assumptions that line drawing as in“: tion ii. The nece- iniormation. norn. ing restrictions. '1 in that its necessa shading and line i synthetic data are In .3] three di and motion. shape and stereo. In tht proofs of uniquene systematically exa the insights into a benoted that sorn correspondence p: IILKEES, Ell, her im r - . 37 at non—limb edges. Both line labeling and shape from shading have numerous strict assumptions that must be satisfied. For example, line labeling requires a perfect line drawing as input, and has the general viewpoint assumption, as discussed in Sec- tion 1.2. The necessary conditions for shape from shading include complete boundary information, normal directions at the boundaries, a known reflectance map, and light- ing restrictions. This work goes against one of the general principles of integration in that its necessary conditions are the union of necessary conditions of shape from shading and line labeling. In fact, the necessary conditions are so strong that only synthetic data are used in the examples. In [3], three different non-uniform methods were presented: shape from shading and motion, shape from texture and motion, and shape and 3-D motion from contour and stereo. In these integrated modules, careful attention is paid to mathematical proofs of uniqueness, existence, and stability. These modules are part of a strategy for systematically examining combinations of modules in hopes of being able to generalize the insights into a uniform framework, like those discussed in Section 2.1.2. It should be noted that some of the most difficult subproblems in these areas are not solved. The correspondence problem, for example, involves matching pixels or features in pairs of images, either images in a time sequence for motion, or images taken from different vantage points for stereopsis. In both the shape and 3-D motion from contour and stereo, and the shape from shading and motion it is assumed that the correspondence problem has been solved. This problem is difficult. In shape from texture and motion, the textured surface is assumed to be planar. When the number of processes to be integrated becomes large, creating a non- uniform system is more difficult. Typically many people need to be working on separate parts of the system simultaneously. The modules need to communicate, but if they do so directly then propagating changes, such as a data structure change in a single process, may require alterations to many other processes. If this integration “It I Figure 2.3. A1 is handled inform processes WhiCh w board system ha solutions called th that use the data em. and a control blackboard data a 38 Blackboard Level I: o O Level k-l . r— O O ' o O Level 2 Level 1 Bl eckboe rd Schedul ing Monitor Queues Focus of Control Database Scheduler Figure 2.3. An overview of the Hearsay-II system for speech understanding. is handled informally, chaos can result. A useful structure for integrating diverse processes which work on a common problem is the blackboard system [32]. A black- board system has three fundamental parts: the global database of data and partial solutions called the blackboard, the knowledge sources that are independent modules that use the data currently on the blackboard to advance the solution of the prob- lem, and a control structure that decides which knowledge sources will process what blackboard data at a given time. An overview of a typical blackboard system showing the interconnections of these modules is displayed in Figure 2.3 [33]. The blackboard structure was originally designed for the speech understanding task. A wide variety of other applications of blackboard systems exist including: ocean surveillance where a group of hydrophones concealed under the water are used to detect, localize and ascertain the type of ocean vessel within its range [82], the computer vision problem of object recognition in [31], the analysis of complex aerial photographs in [7 the planning of III The hlackboa | alternatives. final I contains hypothes l in generality as t I levels used in [33 and database intel including relationl associated with J control data is a the focus of corny The knOWlEdg the blackboar d a, t r. EO’JICeS Consist of the cor. ditions un , . met tor a partrcul Said to have been aspecitic blackbo. Can perform to irn Pfil‘ate data struc Np of related k ackboarc is in - ore Sophistica Control problems Palate hl Ther e at e Seve a know] e dge Sourr 39 photographs in [78], deriving the 3D conformations of proteins in solution in [48] and the planning of military missions using an autonomous vehicle in [85]. The blackboard is a global database that contains input data, partial solutions, alternatives, final solutions, and control data. It is organized into levels. Each level contains hypotheses of specific data objects at a fixed granularity. These objects grow in generality as the level number decreases. For the speech understanding task, the levels used in [33] were parameter, segment, syllable, word, word sequence, phrase, and database interface. The control data includes relationships among the hypotheses including relationships like OR, PART-OF, and AND. The credibility or certainty associated with each hypothesis is also part of the control data. Another type of control data is a set of scheduling parameters. These parameters are maintained in the focus of control database. The knowledge sources are independent modules that take the current state of the blackboard and seek to improve it using their specialized knowledge. Knowledge sources consist of two parts: a precondition and an action. The precondition specifies the conditions under which the knowledge source is useful. If the preconditions are met for a particular blackboard object and knowledge source, the knowledge source is said to have been triggered. For a knowledge source to be triggered, it must refer to a specific blackboard object. The action is the processing that the knowledge source can perform to improve the status of the current solution. A knowledge source has a private data structure which cannot be accessed by any other knowledge source. A group of related knowledge sources can be combined into a module that may have a private blackboard that can be shared only among these processes. This organization is more sophisticated than that shown in Figure 2.3 and engenders some special control problems. There are several knowledge sources that must exist in any application. One is a knowledge source that determines if the processing should be terminated. This knowledge source of resources (boti. important. This l; helps to determin current configurat inowledge source. linon‘ledge source determine which 1 Which select a po; are stored in the f Blackboard 5.“ tision. The CW“ described in [31l- c'oor environment hack pointers whi in any given part stantiations. Schc to control the se SChemas for deter sky. road line an< ll"? groups: featr 11‘: , ‘ m“ model m- d piece-55m. \ ‘ith. in ~ ' leboaI-d. Th e [rift-v 40 knowledge source should check both for successful completion and for the exhaustion of resources (both memory and CPU time). The strategy knowledge source is also important. This knowledge source examines the blackboard structure as a whole and helps to determine which of several predefined strategies is most beneficial given the current configuration. This knowledge source does not have information about other knowledge sources, since this would violate independence. Therefore, the strategy knowledge source doesn’t perform scheduling. It can adjust the parameters that determine which levels of the blackboard should be emphasized and the parameters which select a portion of the input for more intensive processing. These parameters are stored in the focus of control database. Blackboard systems have previously been used for image processing and computer vision. The UMass Schema system is a blackboard system for computational vision described in [31]. This system performs object recognition in a highly textured out- door environment. The system blackboard consists of object hypotheses, conflicts and back pointers which store information about which objects have been hypothesized in any given part of the spatial plane. The object hypotheses are called schema in- stantiations. Schemas are high level modules designed from basic knowledge sources to control the search for a particular object. Within the house schema, there are schemas for detecting foliage, grass, roofs, walls, shutters, windows, telephone wires, sky, road line and road shoulders. The knowledge sources that are available fall into five groups: feature based classification, perceptual organization and grouping, geo- metric model matching, relations (spatial, geometric, hypothesis space) and low level processes. Within a schema there are internal hypothesis, strategies, and a local blackboard. The internal hypotheses consist of image events (tokens) and endorse- ments that summarize the positive and negative evidence for the hypothesis. The strategies include simple control programs, control over the invocation of knowledge sources, and control over the invocation of lower level schemas. Within every schema Figure 2.4. Both a as a vertical ans v there is an object 3 blackboard. This scribed in two waft have their own loc heing focused on 0 Although the it: the schemas thmSI be sufficiently gene “leech understar. ‘93 0f sentences ra l'ndouhtedly the r notheri agei the. l he Omair T“goal 41 a A Figure 2.4. Both a tree and a telephone pole satisfy the generic description of a tree as a vertical axis with transverse branches. there is an object hypothesis maintenance strategy (OHM) which updates the global blackboard. This work differs from a general blackboard system as previously de- scribed in two ways. Instead of using the global blackboard exclusively, the schemas have their own local blackboards. Control is distributed to the schemas instead of being focused on one uniform strategy. Although the knowledge sources in this implementation retain general knowledge, the schemas themselves are sharply focused. While this strategy is useful, it may not be sufficiently general. The analogue of the focused image understanding schemas in a speech understanding framework might be to have the system look for a specific set of sentences rather than exemplars of a grammar with a restricted dictionary. Undoubtedly the recognition will be better, but whether the loss in generality is justified by the increased performance is debatable. The other dificulty with this type of focusing is that objects not within the domain may be classified as belonging to the domain. An example is shown in Figure 2.4. If a tree is defined as a vertical axis with transverse branches attached, then both objects in Figure 2.4 are trees. Another image understanding application that has used a blackboard is described in [78]. The domain is the interpretation of complex low altitude, aerial photographs. The goal is to segment the image into the following categories: crop field, bare soil, .' ._', 0: 34'“. [315 5?: to War Bia 42 forest, grassland, road, river, car and house. The blackboard stores the current seg- mentation, basic properties of the regions (which are the smallest blackboard entities), objects, (which are composed of groups of regions), spatial relationships between re— gions and objects, (both spatial relationships and dependencies, and contradictions), and the global imaging parameters. Each object detection subsystem (knowledge source) returns symbolic evaluations of regions to the global blackboard. These sym- bolic evaluations are used by object recognition knowledge sources to recognize ob- jects. Scheduling is performed by sequentially hypothesizing that each region is one of each of the eight basic types and reevaluating conflicts. The difference between this system and an ordinary blackboard system is in the control scheme. There are no waiting queues, scheduler, or blackboard monitor. Blackboard systems have become an important framework for integrating knowl- edge into a system which permits subsystems to both compete and cooperate to find globally good solutions. Image processing is a natural domain for blackboard systems because of the large amount of data, the need for multiple processes to interpret the data, and the flexible control mechanism. Work is being done on specialized architec- tures for blackboards, concurrency and parallelism, and real time blackboard systems [54]. This bodes well for the future of blackboard systems in computer vision. 2.1.4 Selecting an Integration Framework Blackboard systems have significant advantages over other integration frameworks for the research done in this thesis. Both discontinuous regularization and Markov random fields minimize complex energy functionals. As the number of processes integrated grows, the energy functionals may become highly non-convex which may make energy minimization difficult or impossible. It is also unclear how a process like line labeling could be incorporated into these other frameworks, although [70] may provide a method, subject to earlier caveats. 43 Blackboard system modules can be altered without undue hardship since knowl- edge source independence is maintained. This is important since there may well be significant improvements in some of the modules in the future. This type of flexibility permits a system to evolve as the understanding of computational vision increases. Another advantage of blackboard systems over the integration frameworks dis- cussed above is that the blackboard has an independent control module that can strategically explore the space of solutions. In the other integration methods, no such control structure exists. Bringing these control issues to the forefront, offers an opportunity to examine the interaction among the various modules. Methods with- out elaborate control structures cannot provide this type of introspection into their processing. Using a blackboard for an initial implementation of an integrated system can lead to an improved understanding of successful control structures and make cre- ating a less flexible but more efficient system easier. These advantages indicate that a blackboard system is a reasonable choice for this integrated system. 2.2 Selection of Modules for Integration Now that the blackboard framework has been selected, the choice of both general module types and specific paradigms must be justified. First the general module types will be selected. Although the list of potential modules is vast, the main modules that were considered for use in this thesis were shape from shading, texture, motion, stereo, edge detection, line labeling, symmetry, color, continuity, and shape from contour. For this thesis, the processing will use a single grayscale intensity image. This eliminates modules such as motion, stereo, and color. It should be noted that these modules could be included in an enhanced version of this system. There are three other criteria for modules: relevance, compatibility, and minimal- ity. Relevance means that a module must provide some part of the 2.5 Dimensional ’9‘! L at? v SlflCt line Olr 3a C011 44 sketch or provide some information that can be used to get part of that sketch. Com- patibility between modules is also important. Modules need to be able to use either raw input, or the imperfect results of other modules. Minimality is a practical con- sideration. It is desirable to use the smallest number of modules to accomplish this goal initially. Since the blackboard system allows for future expansion, other modules can be integrated later as the system evolves. Within the candidate list there are two modules that unequivocally must be used. The first is edge detection and the second is line labeling. Edge detection is necessary since it takes the raw intensity data and provides the initial image tokens for analysis. Line labeling is unique since it provides the 3D interpretation of the line drawing. Therefore both of these modules are highly relevant. The continuity module, (called co—curvilinearity in [75]) was selected on the basis of compatibility and relevance. Compatibility is established since this module uses as input the output of edge detection. Continuity is relevant because it provides a bridge between the raw edges and a line drawing. The choice of additional modules becomes more nebulous at this point. The final module selected was symmetry. Symmetry is relevant since it can use either the results of edge detection or continuity. Symmetry can provide evaluations of the goodness of regions that will be helpful in discovering and correcting both edge detection and continuity problems. Shape from shading or contour could also have been used. In fact, either would be a good choice for the next module. Texture could also have been used, although texture segmentation is a very difficult problem even in artificial situations. The final choice of type of modules is: edge detection, continuity, and line labeling. The continuity algorithm, which will be discussed in Section 3.2, is a contribution of this thesis. In? Out E53. I’ot. 45 2.2.1 Evaluating Proposed Modules With the general type of the modules fixed, there are many choices for the specific paradigms. The criteria discussed in Section 2.2 apply to the paradigms as well. The blackboard paradigm makes it possible to implement and test alternate paradigms within each module, but this is a time consuming procedure so paradigm decisions should not be taken lightly. There are eight criterion for paradigm selection. Input Availability: The input to the paradigm must be available from either the input image or another module. Output Usefulness: The output of the paradigm must either be useful to another module or be part of the 2.5 Dimensional sketch. Real Data: The paradigm must be able to work with data that is not synthetic. Robustness: The paradigm should result in only small changes in output if the input is changed slightly. Efficiency: The paradigm must be able to run in a reasonable length of time on current hardware. Potentially Parallel: It is preferable that all paradigms be potentially massively parallel. Since work is being done on blackboard systems which use parallelism, this will make it possible to implement this system in parallel later. No Detailed Object Models: The paradigm may use only general constraints (such as smoothness), and may not use specific object models. Compatible with Object Domain: The paradigm must be able to work in the domain of objects discussed in Section 1.2. sect; era: '1 nuts fire j thes inter 46 For each module type a paradigm needs to be selected that meets as many of these criteria as possible. For many of the modules, there will be no paradigm that meets all of these criteria, therefore compromises must be made. These compromises will be discussed as they are encountered. Line labeling was discussed in Section 1.1.3. Section 2.2.2 will discuss the choice of edge detection algorithm. Section 2.2.3 will examine continuity, although the final choice of continuity module will not be made until Section 3.2. Symmetry will be discussed in Section 2.2.4. 2.2.2 Edge Detection Boundaries of imaged objects are psychologically important, can help initiate the processing of other modules, and are useful for object recognition. The usefulness of boundary contours for other vision modules can be seen by examining the number of other modules that use boundary information to initiate processing. Line labeling, shape from shading, and symmetry detection all rely on object boundaries being reli- ably available. Lowe and Biederman show the usefulness of the boundary contours of an imaged object to the recognition problem. Lowe uses the boundary in combination with 2D to 3D inference, to contribute to the 2.5D sketch. He also uses the boundary to bootstrap object recognition, as the bicycle in Figure 2.5 shows. Biederman’s geons are instantiated using boundary based information. Many of the figures shown in this thesis contain only boundary information, and yet still provide ample information for interpretation. Edge detection is one of the earliest problems addressed in the vision literature (see [7], [63], or any other introductory text for simple edge detection methods). Torre and Poggio discuss the optimality of edge detection in [109]. It was shown that several edge detection schemes are near optimal. This has an ominous conclusion: edge detection may not be able to do much better than what it is already capable of, as was shown in Figure 2.1, and Figure 2.2. There has been much discussion of performing edge 47 /\\\\ /\\\‘ x \ /// \ /// / /\-'//\ \ / /\—//\\\ _// \ —// | / \/ (a) \ (b) Figure 2.5. This broken line drawing of a bicycle is very difficult to recognize in (a). With the addition of one segment, recognition is much easier in (b) since the front wheel of the bicycle can now be grouped into a complete boundary [67]. detection at multiple scales (i.e. different parameter values) [72, 109, 88, 24]. If edge detection as it currently exists is in fact near optimal, then the extraction of complete boundary information from images must be attacked using other means. One of the most commonly used recent edge detectors is the Canny edge detector [24]. A recursive version of Canny’s edge detector was reported in [30]. Although Canny edge detection has been widely used, it was not selected for this work because of two shortcomings. The first shortcoming with Canny edge detection appears at corners of regions. Vertices are very important to line labeling, and edge detector that does not give good vertices is not helpful. A second shortcoming of Canny edge detection is in the thresholding scheme. The thresholds are set using a global histogram of the image. This method causes problems when images have very different numbers of real edges. For an example, see Figure 2.6. Since it is preferable for this thesis that the edge detection perform well with a single set of parameters, Canny edge detection was eliminated from consideration. It should be noted that if the parameters were set independently for each image, Canny edge detection would perform very well. This problem with edge detection was discussed in [83]. The method of edge detection that was selected for this work is fitting a weak <\.I.lkfl l c 48 Figure 2.6. Canny edge detection is run with the same parameters of image that are similar except for the number of real edges. (a) An image with very few edges. (b) An image with many more edges. (c) The output of the Canny edge detector run on (a). (d) The output of the Canny edge detector run on (b). There is no single set of parameters that will give good edges on both of these images. The parameters used in both of these images were a = 2, masksize = 17, lower threshold = .60, higher threshold = .75. Many other combinations of parameters were tested. “P in: the 49 membrane model using the Generalized Non-Convexity (GNC) method of Blake and Zisserman [13]. This method has two advantages over Canny edge detection. It has only two parameters instead of four, which greatly simplifies setting parameter values. Secondly, the proportion of edges in the image is not used in the computations. The images shown in Figure 2.6 can both be segmented with the same set of parameters. The weak membrane model is fit to the image data using an energy minimization technique, ON C. It has a one dimensional equivalent called the weak string that will be used for illustrative purposes. The energy functional in the weak string is made up of three terms. Let u,- be the weak string solution to the functional. Let 2:,- be the initial data. Let I,- be a binary function such that I.- = 1 corresponds to a break in the string. The energy functional E for the weak string is defined to be. E = 2“!“ - 1';)2 + A20 - 1,)(u; - 11,-4.1)2 + 01;) (2.1) The first term penalizes the solution u,- for being far away from the data 2:5. The second term penalizes u.- for being far from its neighbor. This is a finite element approximation of the first derivative. This second term contributes to the energy only when the string is unbroken. The final term is a penalty for breaks in the string. These three energy penalties together encourage the weak string to be simultaneously close to the data, smooth, and unbroken. The parameters are set to balance smoothness with breaks. If A is large, the string will be heavily penalized for not being smooth, and will tend to break often. If a is large then the string will be broken less frequently, but will not be as smooth. The two dimensional counterpart of a break in the string is a tear in the weak membrane. The set of breaks in the weak membrane correspond to the edges of the detected regions. The energy functional needs to be minimized both with respect to 11,-, and with respect to 1;. This problem is not convex. A problem that is convex has a single global CE I“. an" 62 F! to c: ‘ 50 energy minimum that can be reached by successively taking a step in a direction that decreases the energy. When a problem is not convex, it has a set of local energy minima that can be mistaken for the global minima and confound energy minimization. In [13], it is proved that the energy functional is convex with respect to u.- if I,- is fixed. The energy functional is not convex with respect to 1;. To avoid this problem, GNC creates a functional that is similar to the original functional except that it is everywhere beneath the original function, and is convex. This functional will be minimized with respect to 1;. GNC finds the function with minimal energy from this convex functional and uses this solution as the starting point for the second functional. This second functional is more similar to the original than the first functional and is between the original functional and the first approximation. The second functional is not convex, but is nearly so. GNC then finds the function with minimal energy from this nearly convex functional and uses this function as the starting point for a third functional which will be a bit closer to the original function and a bit less convex. This process is repeated with the similar functional approaching the original functional and becoming increasing less convex with each iteration. When the similar function is suitably close to the original, the original functional is solved for u;. This procedure has been shown to converge to the global minimum energy for the weak string, the weak membrane, the weak rod, and the weak plate models if the discontinuities are much further separated than the scale parameter A. GNC was compared to the stochastic simulated annealing method on the weak string problem in [12], and was found to have been closer to the optimal solution and more computationally efficient. GN C is not applicable in many instances where simulated annealing would be. One advantage of selecting GNC for the edge detection module is that it tends to find thick but complete boundaries. An example can be seen in Figure 2.7. Another advantage is that GNC has only two parameters, both of which have a physical inter- pretation. These parameters can be changed into another pair of parameters which 51 Figure 2.7. (a) An intensity image. (b) The output of running GNC on this image. corresponds to scale, and contrast sensitivity. The contrast sensitivity represents the minimum difference in average grey level that will produce a break in the membrane. The parameters were set initially, after experimentation, and were not altered during the course of processing all of the images shown in this thesis. A shortcoming of GN C is efficiency. Although it could be parallelized, its serial implementation is slow. On a 240 row by 320 column image in the test set of images, Canny edge detection took 10.69 seconds. GNC run on the same image (under similar system load) took 215.12 seconds. Since it was not the intention of this research to create a real time system, this compromise can be made. 2.2.3 Continuity Edge detection alone cannot always provide perceptually meaningful lines and curves. The purpose of a continuity module is to take the edges that are detected and organize them into more perceptually meaningful structures, as was discussed in Section 1.1.2. This can be done by grouping neighboring image tokens, as perceptual organization would suggest. It could also be done by fitting a high level model to the data. If a 52 high level model could be properly selected, it could produce globally good solutions instead of locally good solutions. This would be propitious, but no known model is sufficiently general to fit all types of data. Much of the previous research in this area has been dedicated to detecting straight lines instead of curves [23, 105, 50]. The simplicity of straight lines makes them an attractive model, particularly since fitting techniques for straight lines are simple and well understood [93]. As long as the object domain is polyhedral, lines are adequate. Linear models are not adequate for piecewise smooth objects. An example of global organization is the Generalized Hough transform (GHT). The GHT [50, 6, 53] groups edge fragments into parameterized objects. Hough trans- forms were initially used for detecting straight lines in images. In this method, each edge casts a vote for every set of straight line parameters that are consistent with its position in the image plane. The set of parameters receiving the most votes is taken to represent the structure that is present in the image. In the linear case these parameters could be the slope and intercept (although a parameterization with bet- ter computational properties is usually used). In the circular case, there are three parameters: radius, and the (2:, y) position of the center. Hough spaces become difficult to manage when the number of parameters is large and have several other well documented shortcomings [45] including a relatively high probability of recognizing an incorrect grouping in noisy images. Biederman has demonstrated that the human visual system responds better to missing edges that can be recovered by perceptual organization, than those that cannot. Figure 2.8 shows three versions of a goblet. If a Hough space approach were used, the middle and right figures would be equally recoverable since approximately the same percentage of boundary has been removed. The human visual system recovers the middle goblet more easily than the right, thereby demonstrating that the GHT is not what is used in human perception. . (in r1 52 [)4 h 53 o F2 m i T: ll K) v Figure 2.8. On the left a complete goblet is shown. In the middle, a degraded goblet with contours deleted in regions where perceptual organization can recover the original. On the right is a degraded goblet with contours deleted in such a way that perceptual organization cannot recover the goblet. The left and middle versions can be easily recognized, but the right cannot. A continuity scheme which uses local criterion with a global model is Lowe’s implementation of perceptual organization. Lowe emphasized recursively grouping adjacent edge points into lines at multiple scales [67]. This grouping was done by examining triples of points and deciding whether they are related by proximity alone, or whether they should be grouped into a line. This decision was made on the basis of the angle 0 as shown in Figure 2.9. Lowe uses (2t /1r) to represent the probability that the pixels are related only by proximity and not by direction. When t is close to w/ 2, it is unlikely that these pixels represent a linear structure. This processing is shown to work very well in the linear case, and in a single example of the circular case. Grouping circular arcs was handled in [103]. Fitting proximate edge points to circles is of questionable value since there are a limited number of circumstances where projection to circles is achieved. Spheres project to circles, and circles project to circles if the viewPoint is constrained (as in aerial photography, the application that was used for an example in [67]). The work has not been extended to ellipses or other more general models and as such cannot be used in the piecewise smooth object domain. has? hr pron. 2.2.4 Sumo ngiS Wli rnym mane him: rudui hd and She fiuded l5 the s ills ex 54 K . Figure 2.9. This middle pixel is likely to be related to the two endpoint pixels only by proximity and not by direction since the angle t is relatively large. 2.2.4 Symmetry Symmetry has been widely studied in psychology, starting with the Gestalt psycholo- gists who recognized its importance in grouping image elements [114, 61], and contin- uing into more modern studies [8, 57]. Symmetry can be used in at least two different contexts. In the first context, symmetry can be used to recover shape information. In [108] 3D surface shape was reconstructed from the contour of an image using a regularized boundary value problem methodology when given a rough symmetry axis. In the second context, symmetry can be a cue that image entities might be related and should be hypothesized as grouped entities. A question that has been widely studied is the exact definition of symmetry that should be used. Bilateral symmetry is the simplest form. The image of an object is bilaterally symmetric if a straight axis exists in the image plane such that all lines perpendicular to the axis intersect opposite sides of the figure at points that are equidistant from the intersection of the axis and the perpendicular line. Several methods for the detection of bilateral symmetry have been proposed [71]. In [80] the conditions for existence of bilateral symmetry in the image plane were investigated. It is shown that bilateral symmetry exists in the image plane only when orthographic projection is being used on objects that are local surfaces of revolution, under general viewpoint conditions. Skew symmetry is more general. If projection is orthographic, the image of an object is skew symmetric if a straight axis exists in the image plane such that all lines that meet the axis at a fixed angle 0 intersect opposite sides of the figure at points 55 Figure 2.10. There are an infinite number of cross sections for this skew symmetric pair of lines. Two possible cross sections are shown. Connecting the lines into a parallelogram gives a unique cross section. The dotted line is the symmetry axis and triples of closely spaced lines graphically display possible 0 values. that are equidistant from the intersection of the axis and the angled line. Perspective projection requires a small modification to this definition. If an object is planar and bilaterally symmetric, its projection in the image plane is skew symmetric. Unlike bilateral symmetry, skew symmetry is ambiguous in that a unique angle 0 cannot always be found. The left side of Figure 2.10 shows two different 0 values that both satisfy the definition of skew symmetry. When the line figure is closed with ends, as in the right hand side of Figure 2.10, the ambiguity is eliminated. Progress has been made in the detection of skew symmetry [39]. A computationally expensive set of heuristics for finding symmetry within an image was suggested in [96]. Several heuristic definitions of symmetry have been proposed [18, 17, 20] and reviewed and compared in [98, 90, 91]. These definitions are more general, in that they allow objects to be called symmetric that do not fit into the strict mathematical framework for either bilateral or skew symmetry. For example, a gourd, as shown in Figure 2.11 is neither bilaterally nor skew symmetric. The axis shown, however, does indicate some type of symmetry. Leyton investigated the relationship between curvature extrema and symmetry axes and found that each curvature extrema has a unique symmetry axis that terminates at the curvature extrema [65]. A more recent work was done by Leymarie and Levine in [64], who use a grassfire transform to find the skeleton of regions. A skeleton is an informal symmetry axis. 56 Figure 2.11. This gourd is neither bilaterally nor skew symmetric, yet the dotted axis shown displays some type of heuristic symmetry In this method, a distance transform is found in the interior of a single segmented region. A snake, which is like a weak string or weak rod without the discontinuity terms [59], is placed at the boundary and given a weight. A distance transform is used as a potential surface where the distance from the boundary is thought of as a height. The snake is permitted to fall down the potential surface into the valleys, which are equidistant from both sides of the image giving a skeleton. An advantage of skeletonization over strict symmetry definitions is that a heuristic axis can be found, even if the region isn’t mathematically symmetric. A disadvantage is that there is no hard and fast decision made about whether the region is or is not symmetric. Gross and Boult [47], and Gross [46], developed a system called SYMAN, with two methods for detecting skewed symmetries in real images. The first method is moment- based. Since any moment-based method depends on the region being complete, this method will not work in the face of occlusion. They also proposed a local method of finding symmetry. In this method a necessary condition for a symmetric pair of points and their tangent directions is found. All pairs of points on the contour then get put in a bin on the basis of the slope of the line joining the points. In these bins, the necessary condition is used to find the sharpest peak in the angle of skew in this 57 Figure 2.12. This region is the union of two skew symmetric regions. SYMAN alone cannot analyze the symmetry. The Grassfire Transform cannot make symmetry decisions. But together they may be able to segment the region at the dotted line and verify that both subregions are symmetric. bin. The bin with the sharpest peak in calculated angles of skew is taken to be the correct symmetry axis. SYMAN, unlike the grassfire transform, gives a hard and fast decision about whether a region is skew symmetric. Both the methods of Leymarie and Levine [64] and Gross and Boult [47, 46] will be discussed. These methods have complimentary properties that can be used to good advantage in combination. As an example, SYMAN could be used to indicate that the region in Figure 2.12 is not skew symmetric. SYMAN can not tell that this region is the union of two skew symmetric regions. The Grassfire Transform will produce an axis that is broken at the intersection of these regions. If the region is separated at this break in the axis, SYMAN could verify that both regions are skew symmetric. This cooperation will be explored in Chapter 5. 2.3 Summary of Choices This Chapter explains a group of choices that were made that establish the basis for the POLL system. There were three categories of choices. The first choice made was the selection of an integration framework. The second set of choices made was the selection of the general types of modules that should be used in the framework. The r pron pestle nodu hoard SCCOHC draw it is cc 1116 iii Paradi be the their; which The 1 0nd v imple Stair 58 final set of choices was the selection of a particular paradigm for each of the previously selected modules. The blackboard framework was selected as the integration framework because it provides modular independence, and a flexible control structure. Modular inde- pendence is particularly important since research is still being done on in all of the modules that were selected for this system. The availability of a public domain black- board shell also factored into this decision. Four modules were selected for integration, along with paradigms for each. The first module was edge detection. This module was necessary since the other mod- ules depended on it. The weak membrane [13] will be used to detect edges. The second module selected was line labeling, this was necessary to get the labeled line drawing desired as output. Malik’s curvilinear line labeling with be used [69], since it is compatible with the object domain described in Section 1.2. A continuity mod- ule will be used to bridge the gap between edge detection and line labeling. The paradigm selected was the author’s [110] curvilinear grouping algorithm, which will be discussed in Section 3.2. The final module selected for integration was a sym- metry module. Since there are two types of symmetry paradigms available, both of which have advantages, two different paradigms were selected to be used in concert. The first paradigm was the Grassfire Transform of Leymarie and Levine. The sec- ond was SYMAN from Gross and Boult. The symmetry knowledge sources were not implemented in software, but uses for these knowledge sources will be discussed in Section 5.2.1. CHAPTER 3 Obtaining an Initial Line Drawing Line labeling is a method of extracting 3D information based solely on the legal configurations of junctions that arise in the physical world. A labeled line drawing is a very terse description of image data. This has both benefits and shortcomings. All that is needed to represent an image as a line drawing is an attributed list of junctions, and an attributed list of the arcs. The attributes for junctions are the junction type (L, curvature L, phantom, arrow, Y, three tangent, T, or terminal), and the names and ordering of the arcs at the junction. The arcs only attribute is a line label (jump, limb, convex, concave). While this simplicity helps to limit the computational complexity, it makes it impossible to recover from mistakes made in the creation of the initial representation. Even within the more simple polyhedral object domain, Sugihara had to do hand editing of simple images in order to be able to analyze line drawings [105]. The piecewise continuous object domain is much more complicated and subtle, as will be demonstrated in Section 3.3. If recovery from early mistakes cannot be made, the system will not be robust. An overview of the early processing is shown in Figure 3.1. This processing is used to analyze the input image and create a reasonable approximation of the structure of the image for further analysis. Weak membrane edge detection is run on the input image. The detected edges are grouped using a continuity module, and have vertices 59 1...... 1m. 8" as... M 0.... ”We as... l Detection - Proximity Salad 36‘ Select Best Arcsendlunctiou l (ludfylunaions I InitielObjectHypotlmis Figure 3.1. An overview of the initial processing. assigned at their endpoints. These vertices are collected using spatial proximity to group junctions for line labeling. The junctions are then classed and typed. This gives sufficient information for the first pass of the line labeling algorithm that will be discussed in Chapter 4. Since it is expected that none of the processes utilized in this thesis will produce perfect results, a richer description of the image data must be made than what would be necessary for a line drawing. This description will be discussed in Section 3.1. In Section 3.2, an algorithm for grouping curves using proximity and continuity will be discussed. Section 3.3 will discuss the processing that is used to create the initial representation of the imaged scene, including creating the first line labeling hypothesis for the image. The final section, Section 3.4 will show examples of this processing being used on both real and synthetic images. 61 3.1 The Representation of Piecewise Continuous Objects in a Blackboard In order to perform line labeling on the data that is obtained by processing the input image, a blackboard database will be created. This database will contain a variety of data, including the initial intensity image, the regions, curves, vertices, and objects that are output by the knowledge sources. To discuss the organization of the blackboard system being used in this research, some vocabulary is needed. The terms that are used here are those used in GBB (Generic Blackboard), the blackboard shell used in this thesis, and are given in [40]. In [19], similar ideas were discussed but with different terminology. The terms to be used in this thesis are taken from the former. The equivalences to ISR [19], a blackboard shell that was designed specifically for vision tasks, will be given once in parentheses. The term “blackboard” used by GBB will be replaced with “GBB blackboard” so as not to confuse the generic blackboard ideas discussed in Section 2.1.3 with the specific implementation of those ideas in GBB. A GBB blackboard (hierarchy of frames) is a hierarchical tree structure composed of GBB blackboards and spaces. Spaces (frames) are used to contain and index units. Units (tokens) are the basic blackboard objects. When a unit is created it is stored on a space. Such a unit is said to have been instantiated. Spaces are stored, in turn, on a GBB blackboard. Indexing of units on spaces is optional in GBB, but in image processing it can improve the efficiency. Units hold the hypotheses for entities that are found in the image. For the curvilin- ear line labeling paradigm, for example, there are units for the arcs and the junctions. Each unit can store three types of information. Slots (features) hold the basic data, such as the name and label of a junction. Indexes (spatial indexing grid from the example in [19]) determine how the unit will be stored on the space. GBB permits 62 a wide range of indexing possibilities, including Frenet box ranges which are useful for units which are not restricted to a single point. The final type of information is a link (handle). Links relate a given unit to other units. Different types of units may be related using links. The units used in this blackboard system are as follows. Blame: the initial input data before analysis. Figure: a group of regions, curves and vertices. Region: a two dimensional array of spatially connected pixels Curve: a one dimensional array of spatially connected pixels Vertex: a point at the ends of a curve All of the units could have been stored on a single space since they are all indexed in the same coordinate system. This would have kept the database overhead to a minimum, but would have made retrieval of a particular unit type less time efficient. By placing each type of unit on a separate space, retrievals can be done more quickly. The price for the retrieval efficiency is some additional overhead for the separate spaces. Since it is the units themselves that take the majority of the storage, and not the database overhead, separating the spaces does not decrease the storage eficiency. The spaces are all children of a single GBB blackboard. Each of the prOposed modules uses a subset of these units for its processing. Table 3.1 summarizes the mapping between modules and blackboard objects. The slots for the units reflect the needs of the modules using that unit. Some slots store information that is used by only one module. As an example, the frame unit stores the filename of the input data, and all units have some variables stored to index into the appropriate space. In GBB, information hiding cannot be done, as it could be in a system such as Schema [31], so all slots are accessible to all modules, even though 63 Table 3.1. The mapping between modules and blackboard units. M Unit 'ty curves Symmetry curves Proximity curves vertices curves vertices they may not be used by all modules. Most slots store information that is used by multiple modules. All units have a belief slot which gives a rough interpretation of the importance of this unit at the given point in time. If a hypothesis has belief zero, it is not used in further processing although it remains on the space unless it is explicitly removed. Although removing a hypothesis with belief 0 would save storage space and speed retrievals, it would also make it impossible to recover from a mistake. The links between units can be divided into four categories. All links are bidirec- tional, although this is not dictated by GBB but rather by the needs of this appli- cation. Adjacency links describe spatial adjacency relationships. An example of an adjacency link is the neighboring-curves link in the curve unit. This link joins two curve units that are spatially adjacent. Adjacency links can also join different unit types. An example of this is the link between regions and their boundary curves. The region side of the link is called bounded-by-curve, and the curve side is called bounds-region. A second type of link also exists between regions and curves. This link is used for curves that are in the interior of regions. Units of the same type are also linked as supporting-hypotheses and competing-hypotheses. The final class of links used are supplementary hypotheses. An example of a supplementary hypothesis is the axis of symmetry of a region, which is stored as a curve. 64 While Figure 3.1 describes the ordering of the processing, it does not adequately describe the blackboard implementation of the processing. Since the knowledge sources are independent, they do not call each other directly. The initial process- ing is controlled by the blackboard itself. As an example, the creation of a new region triggers the curvilinear grouping module to create the edges. This is not done by the module that created the new region, but is done as a consequence of placing a new region on the blackboard. Similarly, the curvilinear grouping module does not directly call the vertex module. The vertex module is automatically called by the blackboard when the curve unit is instantiated and placed on the blackboard. This type of processing greatly enhances the consistency of the blackboard. 3.2 The Continuity Module Curvilinear grouping was recognized as perceptually important by the Gestalt psy- chologists [114]. While grouping points into straight lines is frequently done in com- putational vision [23, 67, 102], curvilinear grouping is less common. In the piecewise smooth object domain used in this thesis, it is necessary to analyze both lines and curves since the domain objects have both linear and curved edges. Trytten and Tuceryan [110] designed an algorithm for grouping curvilinear edges. This method takes edges detected using the Canny edge detector [24] and segments them into smooth edgels. Edgels are a one dimensional representation of adjacent edge points as a function of a parameter similar to arc length. A generalized Voronoi neighborhood structure is then found for the set of edgels in the image and neigh- boring edgels are tested for continuity. Starting with the longest (and hence most perceptually significant) edgels, hypotheses are created that the neighboring edgels should be combined. These hypotheses are evaluated for smoothness and the best hypothesis is grouped, if any smooth hypothesis is found. The full algorithm is shown r-O 65 in Figure 3.2. The Voronoi neighborhood is defined as follows. Let H p(P, Q) be the half plane which contains P formed by the perpendicular bisector of the line joining points P and Q. The Voronoi neighborhood of a point P is defined to be the intersection over all Q of H p(P, Q). This is shown graphically in Figure 3.3. The Delaunay graph is the dual of the Voronoi tesselation. In [2], Voronoi neighborhoods are shown to be more perceptually significant than other similar graphs. Implementation details are given in [92]. The smoothness criterion used is similar to the weak string discussed in Sec- tion 2.2.2. Since the weak string can only detect breaks, i.e. spatial non-adjacencies, it cannot find corners. A second model called the weak rod can be used to detect cor- ners. It uses a second derivative term for regularization instead of the first derivative regularization term used in the weak string. Let u,- be the desired function. Let 1:.- be the input data. Let lc, be a binary function that determines whether the rod is bent at location i. The energy functional E for the weak rod is shown in Equation 3.1. E = 2((14; - $02 + #(1 - kr)(u.‘+1 - 2U." + 115-1)2 + flki) (3-1) The weak rod energy functional shown in Equation 3.1, was modified for use in this work. A first derivative regularization term was added to improve the smoothness of the solution. It should be noted that having a break in the rod where k,- = 1 will not eliminate the first derivative term from the energy functional, but will only eliminate the second derivative term. The mixed string/weak rod energy function is shown in Equation 3.2. E = :01; - $02 + A(u,- — Its-+1)2 + M1 - k;)(u;+1 - 2U; + tit—1)2 + 3"; (3-2) 66 Algorithm 1 Input: Gray level image [ Perform edge detection [ ledge +— Result of preprocessing the edge image Compute adjacency graph of the edgels in 1:49, E 4— Sort edgels in [edge in descending order of length G «— {}, the set of grouped edgels foreach edge] e E E in descending order of length 5 +- {l N (e) «— set of neighbors of e foreach e’ 6 N (e) de +— Delaunay edge between 6 and e’. Construct a grouping hypothesis H consisting of the edgels (e,de,e’) U «— Fit an energy minimizing curve to H finding curvature discontinuities simultaneously ' S +- S U {U} end Select from S the edgel that has no discontinuities and has the minimum energy. if the result is not empty remove the original constituent edgels and replace with grouped edgel. else remove edgel e from E and put into G end end Output: G, the set of grouped edgels. end Figure 3.2. The algorithm for grouping based on continuity. Figure 3.3. A Voronoi neighborhood of point P. The dividing lines of the half planes are dashed. This energy formulation is different from the mixed string and rod that was shown in [13] where a second binary function was used to permit both bends and breaks to be represented. The energy formulation with the second binary indicator function was shown to have performance inferior to both the weak string and the weak rod in [13]. A disadvantage of the energy functional in Equation 3.2 is that it is not convex. Unlike the weak string and weak rod, the GNC method cannot circumvent the non- convexity. There are other methods for minimizing non-convex functions. Simulated annealing, a computationally expensive stochastic technique could have been used. Another technique that is commonly used is gradient descent repeated with multi- ple starting positions [55]. This can also be computationally expensive and has no guarantee of reaching a global minimum. The technique used to minimize the energy functional in this thesis is multi-grid relaxation as proposed in [106]. Multi-grid relaxation superimposes a pyramid of grids at different resolutions on a finite element problem. Solutions found at a given resolution are injected both to higher and lower resolutions (if they exist). At lower resolutions, information propagates faster since there are fewer grid elements. This encourages global smoothness, and efficient processing. At higher resolutions, more 68 If r/ f“, /' l m. a} \ , . [" Th. . ‘, I f l\.\,\ l. [I " \ __, l_,_ ~ [ _. —. ._ , l-_ x K J '. “a .r' “v ..-/ Figure 3.4. (a) A synthetic input image. (b) The result of running the continuity algorithm without edge detection on the image shown in (a). The grouping results are displayed using color. Grouped curves are connected and share the same color. detail can be fit and local smoothness encouraged. This method offers no guarantees about finding global minima in non-convex functions. The parameters for the Canny edge detection, the mixed rod-string model, and multi-grid relaxation are shown in Appendix A. . Figure 3.4 is a synthetic image that is similar to a figure given in [69]. Canny edge detection was not used on this image. Many gaps are filled in using the Delauney edges. Most of the corners are found, with the exception of corners with large obtuse angles. The continuity algorithm makes no distinction between straight and curved edges. This uniformity of treatment is useful in images where both curved and straight edges are present. Another example of the processing is shown in Figure 3.5. Both the large and small sphere were correctly grouped, as the linear segments. Obtuse corners were not found. In [75], the same problem addressed in [110] was also solved. The initial proximate 69 (b) Figure 3.5. (a) An input intensity image. (b) The result of running the continuity algorithm with edge detection on the image shown in (a). Grouping results are displayed as in Figure 3.4(b). edge segments were broken into smooth curves at curvature extrema. These smooth curves are grouped recursively into larger curves. Curves to be grouped are selected from a fixed window around the endpoints. The threshold for joining depends on the difference in tangent direction, proximity, and curve length. This method does not have the flexibility in neighborhood relationships that the perceptually significant Voronoi diagram provides. It also examines more pairs of curves for grouping since all pairs of curves within a subwindow of the image must be examined. This makes the algorithm 0(n2), where n is the number of edges. The algorithm in [110] is 0(n) in the average case since the average number of Voronoi neighbors is bounded above by a fixed constant. The curvilinear grouping module of Trytten and Tuceryan will be used in this thesis. 3.3 Creating the Initial Blackboard Since the initial'blackboard database is the foundation for all future processing, it is important that it be an accurate representation of the important events in the image 70 plane. It is not necessary that it be perfect. In fact, if it were possible to reliably create a perfect initial blackboard database, the integration that is the subject of this thesis would be unimportant. During the initial analysis of the input image, competing hypotheses for image events are instantiated. These competing hypotheses will make it possible to back- track and correct poor initial decisions at a later point in time. With this strategy, operations like thresholding that are normally fragile can be used with more confi- dence. The first line drawing hypothesis is created in three parts. Section 3.3.1 describes the separation of the frame into coherent regions. In Section 3.3.2, the processing which starts with the region units and creates a competing set of curve and ver- tex units is described. Once the curve and vertex units have been established, the junctions need to have the number of incident arcs in the initial figure hypothesis identified, and type (i.e. three tangent, curvature-L) determined. 3.3.1 Finding Regions The purpose of the initial processing is to create a reasonable representation of the region, curve and vertex units from the frame unit. All of the processing described in this section is of a local nature. Local processing is advantageous at the early stages because it is potentially massively parallel, although this virtue is not exploited in this implementation. The disadvantage of local processing is that information in a small neighborhood is not always reliable in a more global context. Since it is known that the information may not be globally reliable, alternate hypotheses are kept. This makes it possible for later processing to backtrack and reverse decisions that were good locally, but are not good in a more global setting. The processing in this section will be demonstrated using two images. The first image is a synthetic line drawing created with an interactive drawing utility. This (a) (b) Figure 3.6. The input images. (a) is a synthetic image generated using an interactive drawing utility. (b) is a real image. image is as close to a perfect image as quantization allows. The second image is a real image that was captured in the PRIP laboratory. Figure 3.6 shows these images. The first step in the processing of the real image is to find the edges using region based edge detection. As was discussed in Section 2.2.2, the GNC algorithm [13] will be used for this purpose. Although GNC claims to produce thin edges, it didn’t in practice. The reason for this failure was that many of the edges in the processed images were ramp edges instead of step edges. This was caused by two factors. The first factor is discretization error. Since the division between regions does not always coincide with the division between pixels, some pixels will contain a mixture of two regions. This produces a narrow ramp edge instead of a crisp step edge. The second reason for GNC to produce thick edges is that the bath tub toys that were imaged don’t have flat sides. Even the sides of the toys that appear to be planar are actually slightly curved. This can be seen in the left most face of the object in Figure 3.7. This curving means that the edges are not step edges, which GNC expects, but are ramp edges. Figure 3.7 shows an input image with a ramp edge marked. The data 72 Figure 3.7. The intensity values in the area of the image inside the square are shown in Table 3.2 from inside this rectangle is shown in Table 3.2. GNC responds to the ramp edges as repeated step edges and produces thick edges. The edges found by GNC when run on the image in Figure 3.6(b) are shown in Figure 3.8. Parameters for GNC are shown in Appendix A. Although this image was correctly segmented, some images are not. As an example, the image shown in Figure 2.1 is not properly segmented by GNC. This failure is expected since there is very little evidence of any edge in the original image. GNC was not run on the synthetic images. In order to create one dimensional curves from the thick edges of GNC, thinning was done. Although there are many thinning operations available, a two step pro- cess has proven to be effective. The first thinning algorithm used is from [44]. The advantage of this algorithm is that it thins the edges uniformly in width. Unfortu- nately, it doesn’t completely thin edges in the image that are neither vertical nor horizontal, and shortens the length of curves. The second thinning algorithm used is from [101]. If this algorithm is used in isolation, it leaves some undesirable hairs connected to the edges. However, it does a better job of thinning diagonal edges than the first algorithm, and doesn’t shorten curves. The thinning algorithms were also 73 Table 3.2. A ramp edge from Figure 3.7 127 128 131 131 135 136 138 139 143 121 68 14 4 2 3 127 129 132 134 135 137 141 145 140 104 39 5 4 4 3 126 129 132 134 138 139 142 139 129 81 21 2 4 3 2 128 132 133 136 138 141 144 142 113 52 8 4 4 4 4 131 131 135 136 140 139 139 131 90 28 5 5 3 2 3 133 133 135 137 137 141 141 124 64 10 3 5 4 4 4 129 130 133 135 138 142 139 111 42 4 4 4 4 4 3 132 134 136 138 139 143 129 77 21 2 3 3 3 4 4 133 137 138 139 142 144 118 56 8 3 5 5 4 3 4 132 134 138 140 142 136 104 36 5 3 3 4 3 3 3 131 137 139 141 144 136 86 24 3 5 4 3 3 3 3 133 137 141 143 144 115 58 8 4 5 4 3 3 3 3 132 137 141 143 138 103 32 4 4 4 4 4 4 4 4 132 134 139 139 136 81 20 3 4 3 3 4 4 4 3 134 134 136 141 121 64 8 3 5 3 3 3 4 4 5 Figure 3.8. This image is the result of running GNC on the image in Figure 3.6(b). 74 Figure 3.9. The image shown in Figure 2.1 after processing with GNC. Notice that the front edge of the cube is missing, as was expected. The same parameters were used for GNC in all real images shown in this thesis. run on the synthetic image, even though it was already nearly thin, because near the vertices there were some two pixel wide edges. The serial combination of these two thinning algorithms produces thin, nearly hairless edges. Neither thinning algorithm has any parameters. The result of running the thinning algorithm on the images in Figure 3.6(a) and Figure 3.8 is shown in Figure 3.10. The thinned edges are the borders of the region units. To get the region units, a connected components algorithm was run [99], using an eight connected neighborhood relationship. Figure 3.11 shows the connected compo- nents found from Figure 3.10. Different grey values correspond to distinct regions. Small regions are given a unit with belief 0, and then included in the largest neigh- boring region. A region is labeled as small if it has fewer pixels than the threshold shown in Appendix A. Any unit with belief zero is not included in further processing, although it is still available on the blackboard for retrieval should it be necessary later. 75 V 1 Figure 3.10. The thinned edges. Figure 3.11. The regions found by the connected components algorithm. Hyna Jil'. v Cf ii 1'.” .4 DP ('7 C} 76 3.3.2 Creating the First Figure Hypothesis The first figure hypothesis is an initial guess at the proper line drawing that is found using only local processing on the image. The regions that are detected by GNC will be used to create this initial hypothesis. At this point in the processing, the input image has been segmented into region units. From these region units, the competing curve units and vertex units will be found. A coarse initial rating of these hypotheses will give an initial set of arcs and junctions that will form the basis of the first figure hypothesis. Both the junctions and curves are chosen from a set of competing junctions and curves. Curves that are derived from the same edges in the data are established as competing curves. Only one of a set of competing curves can be in any figure unit. Junctions that are spatially proximate also compete. Each curve and junction has a belief value which gives it importance relative to the other competing hypotheses. The belief values range from one to ten, but the only significance of the belief value is relative to the other competing units of similar type. The Curves and Vertices The boundaries of the region units are used to find the initial curve units. First the complete boundary of each region will be given a single curve unit. This curve unit is initially given a belief value of one, meaning that it should be included in processing but isn’t considered to be a favored hypothesis at this point. This boundary will then be broken into pieces at any point with more than three edge neighbors (triple points). Each piece will be a distinct competing curve unit with a belief value of 2 signifying that these curve hypotheses are more perceptually significant than the initial ones and should take precedence in processing. If no competing hypotheses are found, the initial curve hypothesis has its belief raised to 2, meaning that it has gained perceptual significance. The curvilinearity algorithm described in Section 3.2 will then be run on the triple point curve units, or the original boundary curve unit 77 if it was unbroken by vertices. If corners are found by the continuity algorithm, competing curve units will be created for each segment, with a belief value of three. If no corners were found, the previous curve hypothesis has its belief value raised to three. This leaves a unique hypothesis that is thought to be best at each of the points on the boundary of the region. Since the continuity algorithm is consistently finding better hypotheses than what triple points alone could provide, these hypotheses are given the highest confidence of the three curves initially. At the end of each of these curve units, a vertex unit is instantiated by the blackboard. There are edge points that are found by GNC that do not fall on the boundary of a region. Some of these points are caused by hairs left on the edges by GNC that have been thinned. Some of the other edges on the interior of a region are partially detected edges. To collect these edges into reasonable structures, the edges that are not on the boundary for each region are grouped using the continuity algorithm of Section 2.2.3. By grouping only within single detected regions, undesirable groupings between distinct regions are eliminated. These groupings are given a belief value of three if they are more than a few pixels long, with the threshold value shown in Appendix A, and a belief value of zero otherwise. At this point, a set of competing curve units has been created. Each curve unit will have either no endpoints (as with the boundary of an isolated sphere), or two end- points. Some of the curve units will be only a few pixels long, particularly those near the corners of the regions. Competing vertex units have not yet been created. Since the continuity algorithm typically leaves several bends near a vertex, particularly when working with real data, these small curves and vertices need to be combined. The method for doing this initially is simple. When a vertex is created, it tries to join with any other vertex in a local neighborhood to create a stronger vertex. The size of this local neighborhood is determined by a threshold shown in Appendix A. If a vertex is combined, the old vertex is given a belief zero. An artifact of this 78 Figure 3.12. An image with approximately ten small groups of edge points that are not on the boundary of regions. The group that is at the far right corner of the left triangular prism is perceptually significant, and is grouped by the continuity algorithm of Section 2.2.3. vertex combination is that there are short curves with both endpoints at the same combined vertex. These curve units are given a belief zero. This cleans away many short, unimportant curves. This processing results in an initial set of vertices and curves that will form the first figure hypothesis. The result for the synthetic image is shown in Figure 3.13. The result for the real image is shown in Figure 3.14. The labeling scheme is as follows. Curves are labeled with the letter “c” and an identification number. Vertices are labeled with the letter “v” and an identification number. The label for a curve is placed to the lower left of the middle of a curve. The label for a vertex is placed at the vertex. Around the edge of the image is a frame that completes any junctions that reach the edge of the image. This frame will be shown in all future images, and is labeled a priori as a jump edge. At this point the blackboard contains regions, a group of competing curve units, and a group of competing vertex units. Within the groups of both curve units and vertex units the competing units have been labeled with belief values to determine 79 030 Figure 3.13. The initial curve and vertex units found from the synthetic image origi- nally shown in Figure 3.6(a). 80 Figure 3.14. The initial curve and vertex units found from the real image originally shown in Figure 3.6(b)- 81 which appears to be best at this point in time. The blackboard also contains the beginning of the first figure unit: the collection of all non-competing curve units of highest belief value and the collection of all non-competing vertex units of the highest belief value. For the figure unit to be complete the vertices must be classified and typed. These operations are discussed next. Identifying Junctions First vertex units are classified by the number of incident arcs. This classification yields four categories, junctions with one incoming arc, junctions with two incoming arcs, junctions with three incoming arcs, and junctions with four or more incoming arcs. The domain described in Section 1.2 excludes vertices with more than three arcs, so junctions in the fourth category are not within the object domain of this thesis. It should be noted that the domain of piecewise smooth surfaces discussed in [69] does not have this restriction. A much more challenging problem than classifying the junctions is typing them. A summary of the attributes that determine junction type is given in Table 3.3. The table does not distinguish between Arrow and Y junctions. This distinction is made by examining the angle between the incoming arcs. If one of the angles is greater than 180 degrees, the junction is an Arrow. If all three angles measure less than 180 degrees the junction is a Y. In order to type the junctions, each of the curve units must have a tangent and cur- vature value at both endpoints. Finding curvature is a notoriously difficult problem [36], even in richer range images. Therefore the curvature values that are calculated should be treated with suspicion. Estimating tangent direction is somewhat easier. Since later processing can help to correct mistakes, it is not essential that the cur- vatures and tangents be perfect. The continuity module has been run on all of the curve units. This improves the estimation of curvature and tangent direction since the l lfiih Am JCL' 82 Table 3.3. This table summarizes the tangent and curvature properties of junctions. Arrow and Y junctions (*) are distinguished on another basis. The symbol given will be used to represent this type of junction in later figures. ILClass l Type —1 Symbol I Tangents [ Curvaturesfl [L One are I Terminal [ TM I None I None I] Two arcs Curvature L CL Equal Unequal L L Unequal Doesn’t Matter Phantom P Equal Equal Three arcs Arrow (*) A Unequal Doesn’t Matter Y (*) Y Unequal Doesn’t Matter T T Exactly Two Equal Same two Equal Three Tangent 3T All Equal A Two Equal curve being fit has already been smoothed, has had outliers removed, and has been separated at discontinuities. If this had not been done, the curvature and tangent directions could be even more unreliable. The simplest way to find curvature is to fit a circle to a set of points, and find the radius and the center. The multiplicative inverse of the radius is the curvature. A normal to the fit circle at the endpoint can be found from the vector starting at the center of the circle and ending at the endpoint of the arc. The tangent direction is perpendicular to the normal. The fitting is done using a linear least squares routine found in [93]. Circular arcs were fit in a neighborhood of the endpoint, with the size of the neighborhood determined by a parameter given in Appendix A. Although a straight line can be thought of as a circle of infinite radius, this is nu- merically troublesome. To avoid this problem, straight lines are fit in a neighborhood of the endpoint. The size of this neighborhood is determined by a parameter given in Appendix A. The curvature for a straight line is zero, and the tangent direction is calculated from the slope. The squared error, calculated in units of pixels squared, between the circular fit and the linear fit is then compared and the model with the 83 better fit is accepted. The curvature values calculated in this fashion are rough estimates. Therefore comparisons like those suggested in Table 3.3 must be revised. Even with synthetic data, it is unlikely that two curvatures or tangents would be exactly equal. To compare tangent directions, a threshold was used on the difference. This threshold is given in Appendix A. This threshold was set arbitrarily, but is not critical to the performance of the system since later processing can change this threshold to correct mistakes. The curvature values were found to very unreliable, particularly in the real images that were used. There are several reasons for this. The objects used in the images in this thesis have circular arcs, which project to ellipses. Fitting a circle to an ellipse, particularly at the ends of the major axis where the curvature is changing rapidly is difficult. Another problem is that the continuity algorithm may not break the upper and lower halves of an ellipse exactly at the corner. This means that the curvature will be calculated starting at different positions, which will certainly lead to different curvatures. To get around this problem, curvatures were placed into only three classes, which are described in Appendix A. Curvatures will be considered to be equivalent if they fall into the same class. It would have been possible to fit an ellipse to the curve data instead of a circle, but this is computationally more expensive and mathematically cumbersome. With the curvature and tangent information, and suitable definitions of equiva- lence among tangents and curvatures, the classified vertices in the initial figure unit can be typed. This information completes the initial figure unit, that is the first hypothesis for the arcs and junctions in the imaged scene. Section 3.4 will show ex- amples of images and their first figure unit along with a discussion of some of the problems encountered. 84 3.4 Examples of Processing Although errors in the initial processing can be repaired later, it is important that the first figure unit be an accurate representation of most of the important events in the image plane. Perfect line drawings should not be expected. In a test bed of twenty-two real images used to refine the techniques discussed in this chapter, only one image had a perfect first figure unit. Six real images and their initial processing results will be shown. All of these images were captured in the PRIP Laboratory on a Macintosh IIfx. No modifications were made or hand editing done on the images. All images use the same parameter values and thresholds, which are given in Appendix A. No hand editing of any first figure unit has been done. To display the type of the vertices, the vertex type is used to label the vertices. The labels were shown in Table 3.3. The curve names are included as before. The first image is the only real image processed which had a perfect initial figure unit. This image and the first figure unit are shown in Figure 3.15. Many of the real images had missing arcs due to an undersegmentation using GNC. The images in this category fall into two groups, those with completely missing edges, as in Figure 3.9, and figures with partially missing edges. Figure 3.16 shows an image where the GNC algorithm has failed to segment two regions, although there are some edge points found. Another problem with the initial figure units is data that should be interpreted as a single junction is being interpreted as a pair of nearby junctions. An example of an image with a junction labeling problem is Figure 3.17. In this image curves c4, c21 and c13 should combine to be a single arrow junction. Instead c4, c21 and cl5 form a T junction, and c15 and c13 form and L junction. This occurs when the curvilinearity algorithm puts breaks near triple points, but not so near as to permit the vertices to 85 (:16 ‘ -'- L\ .J-‘Eis -' --.__ II - 7 .- iI “ ~~_ 5K. II 221—. iii-“a 163 A- ‘ "H I ~ _ - f ‘ ”a: If 18 ‘1’ '1" lJ 19 i l u . 9 “EH“. ; ‘1‘: ' "~. I" ll£11 a. __ lam ; ‘ ._ I C51,} |' 7'3. .l J A "73 c1 7 I" (b) t image. (b) the first figure unit found by processing the Figure 3.15. (a) The inpu image shown in ( ). 86 ' on i i.“ ""-..___ _.__\__ l "I, 0‘6---'~-_ .i 1 “F"— I l. _. I,. .5" __~ —- l a“. .3 _’_ . 2'63 42‘ . __—-vez ‘1" I firmed—"G35 K"- __.,-—-——‘h. r *" ____---"V\QS vac c5 --’ nor’ «:88 ‘k, -. ,1 019 i. as". ‘1, "° i“ is "x ‘- q? .5- v 3'. 0%- I" '1'- ‘i c .l' i" ~ . x. H. __ I.‘ It '78 __..-'c t. "I. h?‘_ $4 I": '5 air ‘71 7 I". I". I". l_. ‘I ‘i ‘x "I. -—“ 4". °"? W 'L' 540—“ 078 ,' I t. '-. 1" J15 ”I ‘ll l'n_ ". I It. I._ la.- ______ 'k "u." yu—r c13 é. I. ll ‘L (C) figure unit. Figure 3.16. (a) The input image. (b) The thinned edges from GNC. (c) The first 87 be combined early in the processing. As an artifact of thinning the GNC edges, the vertices are not sharp. The curvilinearity algorithm is putting breaks at appropriate places from a local perspective. It is actually the triple points that are misplaced, and not the continuity breaks. Accidental alignments can cause problems with the junction labels as well. F ig- ure 3.18 shows a rectangular prism that is close to a non-general vieWpoint. Whether this vieWpoint is truly non-general is a question that is much harder to answer in practice than in theory. The L junctions are clearly correct, but the T junctions are suspect. Since it cannot be determined whether the viewpoint is accidental or not, it is impossible to evaluate the junction labels. If the viewpoint is accidental, the junction labels are correct. Line labeling will produce an interpretation that is counterintuitive. This interpretation is shown in Figure 3.20. Physically it can be interpreted as looking at an object through a rectangular hole. If this prism is in a non-accidental vieWpoint, then the junction labels are incorrect. The T junction between cl4, C20 and c3 should be an arrow junction and that the T junction between c9, c20 and c7 should be a Y junction, and a side of the cube is missing. The same processing that is used for polyhedral scenes can also be used for curved objects. Figure 3.21 shows the grey scale image of a ceramic taco holder and its initial processing result. A final image shows a bad processing result. Figure 3.22 shows an image where oversegmentation has occurred on the right hand side. The right hand side has been segmented into six curves: C43, C85, C36, C29, c117 and CH9 There are four phantom junctions, and one T junction found. The T junction is between curves C28, C79, and c117. This T junction could be an accidental alignment, but the arrow junction between curves C52, c3 and C66 on the left hand side of the image hints that this T junction is mislabeled. 88 x, | k '. II he. I' I 1. | | I l I I .4 I. I. _, are... L (b) - Figure 3.17. (a) The input image. (b) The first figure unit. There are two pairs of mislabeled junctions. In the upper left corner there is a T junction and a nearby L junction. This junction should have been an arrow. In the lower left corner at the front of the triangular prism is a T junction and a nearby L junction. This junction should also have been an arrow junction. 89 L as I: —- J, -_,_ "-m- Tam 7,2.” _- If _ ‘"'——~ _ l—"r- " ~-- L m— — l-' __ __ ,6, I7 ;' "I Is I #3 I .‘ 'lx Il .' I J» I file? L _ J' _, .1 __> .._ l as"- _ .- __~ __ j - ~L L as I'- Figure 3.18. The rectangular prism shown in (a) is seen from a non-general viewpoint since neither the right nor the left side can be seen. (b) The first figure hypothesis Notice that both sides upper edges have T junctions. Figure 3.19 shows from (a). the junctions labeled with numbers. 90 :15 v.2 -- I' " ~ __ J ”‘ ~— ' (:20 ' “ _, __ CI? l. ‘I .63 } II I | I l I ll 3 £7 V25 lIi iI - .‘ c5 -__m ,- v 3 C16 The numbered curves and vertices obtained from the input in Fig- Figure 3.19. ure 3.18(a). 91 (:15 " L---- I]. 7" —_ 3'” ‘ “air-2.-“ , ‘ --..__ ,L c20‘— - _ ,~ Ii ”’ ~-— ,_ 9‘9 4:17 _;’ "‘I an I I. .53 ,I .i I' If i! ." j, 3 I ii ;‘)C7 L - f .. .. :l ._ ».-.- ll cs- ' "— .‘A 'Is' _ "—L L c16 ‘ Figure 3.20. A legal line labeling for Figure 3.18 that is counterintuitive. 92 25 L &4 L (b) Figure 3.21. (a) An image of a ceramic taco holder. (b) The first figure hypothesis from (a). 93 (a) (b) :57 Figure 3.22. (a) The input image. (b) The segmentation by GNC. (c) The first figure unit. GNC segments this image into 42 separate regions, most of them are small regions on the right hand side. Thirty one of these small regions were deleted since their size was below threshold. The other small regions are larger than the threshold, and were not deleted. This leads to many false triple points, and therefore false vertices. 94 3.5 Summary and Conclusions To test this part of the system, the algorithms were run on the test bed of images shown in Appendix B. The results are summarized in Table 3.4. Each of these mistakes is attributed to the module that caused the initial error. As an example, an arrow junction might be missing an are causing it to be seen as an L junction. This error would be classified as as missing edge and not a junction type failure since the junction classification is correct given the incorrect input. This system avoids double counting of errors. There are eight different mistakes that can be made by these four modules. There are two errors that can occur in the weak membrane edge detection. The first is that edges can be missing or broken, and the second is that extraneous edges can be found. The vertex proximity module can make only two types of mistakes. Junctions that should have been grouped can be separated, and junctions that should have been separated can be grouped. The continuity module can make only two errors as well. Either an extra break can be added, or a corner can be missed. The junction typing algorithm also has two types of errors. An error can be caused by a mistake in tangents. Mistaking a three tangent junction for an arrow junction is an example of this error. Mistakes in curvature can also cause errors. An example of this error is mistaking a Curvature-L junction for an phantom junction. In this set of images there are a total of 245 visible junctions and 323 visible arcs, as shown in Table B.2 and Table B.3. There were a total of 113 errors found in the initial processing, so the majority of the visible junctions and arcs were properly processed. There was only one image with a perfect initial figure hypothesis, Image 8. There was also only one image with a very bad initial figure hypothesis, Image 7. The difficulties in Image 7 arise on the right hand side of the background block. The rightmost surface of this block is close to an accidental alignment and is therefore 95 Table 3.4. The results of processing the test bed of images shown in Appendix B are shown below. ME stands for missing edges. EE is extraneous edges. GJ are grouped junctions. BJ are broken junctions. MB are missed breaks. EB are extraneous breaks. CM are curvature mistakes. TM are tangent mistakes. MP are missing phantom junctions. These nine failures summarize all of the problems found in the test bed of images. The notation a was used for a Image 7 which had many extraneous edges that were difficult to count. Initial Line Labeling Problems Image ME EE GJ BJ MB EB CM TM MP 1 0 0 0 O 0 2 0 1 1 2 1 0 0 0 0 0 0 0 0 3 0 0 1 0 0 0 1 2 0 4 4 0 2 0 1 1 0 1 0 5 O 0 0 2 0 0 0 0 0 6 0 1 0 0 0 0 0 2 0 7 2 a 0 1 0 3 0 0 0 8 0 0 0 0 0 0 0 0 0 9 0 0 0 0 0 1 0 1 0 10 0 0 0 1 2 1 0 2 0 11 1 0 0 0 0 1 0 0 0 12 1 0 0 0 1 0 0 0 0 13 0 1 0 0 1 3 1 0 0 14 O 0 0 1 0 2 0 2 0 15 3 0 0 1 2 0 0 l 0 16 3 0 0 0 O 1 1 l 0 17 2 0 1 1 2 0 0 2 0 18 3 0 0 1 1 1 0 0 1 19 0 2 0 1 1 3 1 0 0 20 0 2 3 0 0 3 2 1 0 21 0 1 0 3 2 0 0 0 0 22 0 7 0 1 0 4 1 0 0 Total 20 14 7 13 13 26 7 16 2 96 segmented into a large number of small surfaces that this algorithm has difficulty analyzing. Appendix A shows the parameters that were used to calculate the initial figure hypotheses. Table 3.4 shows that the choice of parameters has balanced the errors. As an example, the vertex proximity threshold was set so that approximately the same number of junction were incorrectly grouped (7) as were broken (12). The same can be said for the continuity algorithm and the missing (13) and extraneous breaks (13). The weak membrane edge detection is also creating roughly the same number of extraneous edges (14) as missing edges (20). The missing phantom junctions are not errors in the initial figure hypothesis. In Malik’s line labeling paradigm [69], each curved arc was given a phantom junction. These phantom junctions were not included in the initial figure hypothesis, but will be added later as will be discussed in Section 4.2. One particular problem noticed in this processing was that all three tangent junc- tions were mistaken for arrow junctions. Almost all of the curvature-L junctions were mistaken for L junctions. This happens because finding the curvature near junctions is difficult. Better methods for distinguishing three tangent and arrow junctions and curvature-L and L junctions will be proposed in Section 5.2.2. The initial figure hypothesis is giving a good initial analysis of the significant events in the image. This information will be analyzed in Chap 4 to discover errors, which will be corrected Chap 5. CHAPTER 4 The Revised Line Labeling Module Line labeling has long interested vision researchers because it provides a way to use local information about curves and vertices to build a more global interpretation of image data. The practical difficulty in using line labeling, and especially in curvilinear line labeling, is that the local information about curves and vertices is often imperfect. Since the rigorous mathematical framework built in the proof of the junction catalog assumes that a perfect line drawing is available as input, the algorithm is doomed to failure in all but the most contrived cases. This is particularly true since the algorithm was not designed to provide diagnostic information about the nature of the failure. In this chapter, a revised line labeling algorithm will be presented that can provide some diagnostic information about line labeling failures. Section 4.1 will discuss the method for finding errors in line drawings. Section 4.2 will classify the errors that can occur in line drawings. Examples of line drawings obtained from real images using the method of Chapter 3 will be shown, along with the diagnostic information provided by this new line labeling module. 97 98 4.1 Detection of Line Drawing Problems Section 3.4 showed examples of the types of problems that were commonly seen in the generated line drawings. The paradigms that have been used to produce these line drawings cannot produce perfect output from real images. Therefore it is important to be able to analyze a line drawing and evaluate its strengths and weaknesses. This section will show a modification to the standard line labeling scheme of [69] which provides better diagnostic information about the weaknesses of the current line drawings. This diagnostic information will be entered into the blackboard database and later be used by other modules to fix some of the problems in the line drawing. In the piecewise smooth object domain, the junction catalog is guaranteed to give one or more correct labelings for each object in the object domain. An object that is not within this domain may still be legally, and perhaps even correctly, labeled as shown in Figure 4.1. To restate this in a way that is more relevant to this discussion: there is no guarantee that if a mistake is made in the initial line drawing that line labeling will fail to interpret the image. Therefore, line labeling cannot be the only arbiter of correctness in line drawings. This chapter will demonstrate how line labeling can be instrumental in fixing incorrect line drawings. The visual system has a collection of heuristics that it appears to use when input is ambiguous. As an example, the interpretation of a shaded sphere changes when the image is turned upside down. If the lighter side of the sphere is at the top, the sphere will appear convex. If the lighter side is at the bottom, the sphere will appear concave [95]. The heuristic in use is that, unless there is reason to believe otherwise, light comes from above. This is a reasonable and useful heuristic that removes a source of ambiguity from the shape from shading problem. It can lead to incorrect conclusions, particularly in situations such as Figure 4.2(a) which are the visual equivalent of 99 (a) (b) Figure 4.1. The cone shown in (a) is not in the domain of piecewise smooth objects. The apex of the cone is an edge point with only one adjacent surface, which is not allowed. When the apex of the cone is away from the camera, an image of the cone would look like (b). The curvilinear junction catalog labels would label this image as either a limb or jump edge. The jump edge interpretation is correct, but this is only a coincidence. garden path sentences? These occasional mistakes are a small price to pay for unique interpretations of images under ordinary circumstances. One of the difficulties with using line labeling as a diagnostic tool is that there are often multiple interpretations of a single object. Figure 4.3 shows four legal line labelings for a cube. Each of the interpretations is physically possible, but the human visual system seems to pick one as its favored interpretation. The preferred interpretation is that the cube is resting on its base, and not adjacent to either the left hand wall (if such a wall exists) or the background. This is a reasonable interpretation since the human visual system, unlike computer vision systems, has a sense of verticality. The human visual system is trained in an environment where gravity reigns. Objects are usually resting on surfaces or other objects, just like they are usually lighted from above. A heuristic like this, which I will call the gravitational heuristic (GH), may be useful in computational vision as well. A difficulty with using the gravitational heuristic in computer vision is that most ‘A garden path sentence is one that is intentionally misleading. An example is “The granite rocks with the quake”. Having the word granite and rocks together, leads the listener to misinterpret the verb as a noun. 100 Figure 4.2. The round shaded object in (a) is seen as convex because the visual system assumes that lighting is from above in the absence of other evidence. (b) is the same image as (a) rotated by 180 degrees. The round shaded object now appears to be concave. The image is of a crater on the lunar surface. camera systems do not have a sense of verticality. They do not know when they are aligned with the gravitational field and when they are not. Even if the mechanical challenges of including this functionality could be solved, the integration into a vision system would entail solving a number of open research questions that are far afield from this thesis. Rather than open another set of research issues, it may be preferable to look for a similar heuristic that doesn’t require an understanding of gravity. Of the interpretations for the cube shown in Figure 4.3, interpretation (d) is unique in that gravity appears to have been suspended. In the other three interpretations, the cube is supported in some way, whether it be by gluing it to the wall or setting it on a table. This suggests that a floating interpretation may be more useful for a computational vision system since this interpretation doesn’t require any naive physics. 101 (C) (d) Figure 4.3. This simple line drawing of a cube has four different convex interpretations from line labeling. (a) is the cube glued to the side wall. (b) is the cube glued to the back wall. (c) is the cube sitting on a table. ((1) is the cube floating in air. The human visual system prefers interpretation (c), but interpretation (d) may be more useful for computational vision. 102 In order to more precisely describe this interpretation, some definitions are needed. Suppose that the image has been segmented into regions. Further, suppose that these regions have been correctly labeled as either background or foreground. The objects of interest should be in the foreground. The labeling of the background region is a method for performing figure-ground separation, which is one of the Gestalt principals shown in Figure 1.5. It would be preferable to have the system select the background automatically. Although this assumption may appear to be overly restrictive, it can be met by the processing shown in [75]. Definition 4.1 Outside Arc: An image curve is an outside are if it is adjacent to a background region over its full length. Definition 4.2 Inside Arc: An image curve is an inside are if it is not adjacent to the background region over its full length. Definition 4.3 Outside Junction: A junction is an outside junction if at least one of its incident arcs is an outside arc. Definition 4.4 Inside Junction: A junction is an inside junction if none of its in- cident arcs are outside arcs. Definition 4.5 Floating Object Heuristic: All outside arcs in a line labeling should be labeled as either limb or jump edges with the direction being such that the foreground region will be interpreted as being in front of the background region. The labels on inside arcs are not restricted. In Figure 4.4, curves cl5, c14, cll, c9, c5 and c3 are outside arcs. Curves c2], c22 and c24 are inside arcs. Junctions v25, v13, v28, v29, v26 and v27 are outside junctions; and junction v22 is an inside junction. The floating object heuristic would give the labels shown in Table 4.1 as the initial labels for the curves. m—._..._..- .__.-_.__ 103 “C m 'a" do 017 Figure 4.4. A cube with the curve and junction numbers identified. Table 4.1. The curves in Figure 4.4 and their initial line labels. The vertices are ordered in the direction column. The line label should be put between the ordered vertices to determine the direction. For c3, for example, the labeling should be read as v27 > v25, or v27 > v25. Curve Direction Label c3 v27 —vv25 —», —p—) C5 V26 -§V27 —§, —§—+ 69 V29 -)V26 -+, —)—) cll v28 -+v29 —§, _,_, cl4 v13 —»v28 —;, -+-+ cl5 v25 —w13 -+, —>—) 021 v22 -WZ5 —i —v—9, 4—, 4—§—, +, — C22 V28 —w22 -# —)—), (—, +—4—, +, .. C24 v26 —bv22 -) —§—b, «—, +—4—, +, ._ 104 Figure 4.5. An image where the floating object heuristic fails. One interpretation for this image is that the front corner of a cube is being seen through a round peep-hole. Another interpretation is that the bottom inside corner of an empty box is being seen through a peep hole. There are other interpretations as well. If the sphere and the Y junction are labeled as the foreground, as they would be by most systems, including [75], then there will be no legal labeling of this object using the floating object heuristic. There are examples where the floating object heuristic won’t work. One of these is shown as a line drawing in Figure 4.5. This image could be interpreted as the front corner of a cube being seen through a round peep-hole. The confusion between foreground and background in this image causes the floating object heuristic to fail. In fact, the terms foreground and background are somewhat ambiguous in this example. While the cube may be the object of interest, it is located behind the peep-hole. Images where the foreground and background are ambiguous, which are probably rare in most applications, will cause the floating object heuristic to fail. Line labeling with the floating object heuristic is much less combinatorially explo- sive than ordinary line labeling. There are two contributing factors. The first factor is that there are fewer interpretations of outside junctions. A casual inspection of Malik’s original junction catalog, shown in Figure 4.6, shows that most of the legal interpretations for inside junctions will no longer be legal interpretations for outside 105 Terminal. % mm m V\‘ W /\‘ ’/\ Curvature-L:_/L ‘ {L (F.— (f:— —» #1» .fl‘ {(3 "YYY’YY Figure 4.6. Malik’s junction catalog. The “2’” label indicates that this line could have any one of the six legal line labels. Three Tangent : junctions. A second factor that improves the combinatorics of line labeling is that phantom junctions are not necessary with this heuristic, since a necessary phantom junction can be identified by the line labeling process. Theorem 4.1 If an image can be segmented into foreground and background regions, outside junctions that have been labeled by the floating object heuristic will have at most two interpretations, except for T junctions which may have as many as four interpretations. Proof: This theorem is proved by systematically examining the junction catalog in Figure 4.6. All references to junction labelings will be from left to right, in the appropriate category. 106 Terminal junctions have at most two interpretations. However, it should be noted that a terminal junction cannot be an outside junction since the foreground and background are not separated at a terminal junction. Outside L junctions can only be labeled with the first and second labelings. Only one of these labelings will be applicable, depending on the location of the adjacent background region. Outside curvature-L junctions can be labeled only with the second and fourth labelings. There are only two labelings for three tangent junctions, only one of these will be applicable depending on the location of the adjacent background region. Outside Arrow junctions can only be labeled with the first labeling. Outside Y junctions can be labeled with the third, fourth and fifth labelings, but only one of these labelings will be applicable, depending on the location of the background region. T junctions can have the eight labelings shown in Figure 4.7. Four of these labelings will be applicable, depending on the location of the background region. The outside junction catalog is shown in Figure 4.8. The background region for each legal junction label is marked. The reduction in the number of legal labelings for outside arcs is shown in Table 4.2. This completes the proof. C] Table 4.2 shows a vast improvement in combinatorics for outside junctions with the floating object heuristic. Five different outside junctions have a unique labeling with the floating object heuristic. These unique labelings of arcs on a local basis will improve the combinatorics of labeling inside arcs as well. This combinatoric improve- ment can also be exploited to improve the line labeling paradigm. Outside arcs and outside junctions are providing the strongest constraints. When this information is 107 // i/ // .... // .... .. /, .. /, .../, /, Figure 4.7. Eight legal line labelings for an outside T junction. When the background region is known, there are only four possible label choices. BRO Bus-9 —» “9 Curvature-h‘ff—A {— _» Bks Three Tangent: ‘7\ m.— mm I“ Y: ‘1 - F ‘ _ kan 3w < «— *w «— 77:57:7-TW <«—«—«—«— m7—su-7-sm Tut-7— Figure 4.8. The outside junction catalog. The region labeled bkg is the background region. 108 Table 4.2. The number of legal labelings for outside arcs. unction ' ' tsi TM CL A T used first in the line labeling algorithm, the legal labelings on the inside arcs are also restricted. Algorithm 2 is the modified line labeling algorithm which takes advantage of this combinatoric reduction. The principal advantage of Algorithm 2 over the three step algorithm proposed by Malik in [69] is that diagnostic information is provided for the blackboard in the form of the failed vertex. Since the number of legal line labelings is often reduced to a single set by the improved combinatorics, this diagnostic information can pinpoint the area of the line drawing that is incorrect. Section 4.2 will show images processed with this system and the diagnostic infor- mation returned by this algorithm. It should be noted that after a legal labeling is found using the floating object heuristic, the full set of legal line labelings can still be obtained using Malik’s algorithm. 4.2 Common Problems with Line Drawings When line drawings are obtained from real images, there are a number of problems that can occur. In this section, line labeling problems that result from the processing described in Chapter 3 will be shown. These problem line drawings will be analyzed using the floating object heuristic, and the resulting diagnostic information will be 109 Algorithm 2 Input: A set of arcs A A set of junctions J A" <— The set of outside arcs in A A‘ «— The set of inside arcs in A A’ +—-A°L"JAi change 4— true while change +— true change <— false foreach are a E A’ in order L +— the set of labels that are still legal for a foreach label I e L J «— junctions adjacent to a. J ° +- The set of outside junctions in J J f <— The set of inside junctions in J J’4—J"CJJi (in order) foreach j in J’ in order if label I is not a legal label for arc a at junction j change «— true remove I from L if L is empty return j as the failed vertex end end end end end end Figure 4.9. The revised line labeling algorithm. The notation 0 is used to indicate that the sets of junctions and arcs are concatenated, while maintaining the order. 110 discussed. Since the line labeling paradigm has no knowledge of the inner workings of the other modules, it is not able to fix any of the problems that it detects with- out violating the independence of knowledge sources. The suggestions for fixing the problems will come from control knowledge sources to be discussed in Chapter 5, and the other modules. In the example images that were used in testing this system there were four com- mon problems: missing or incomplete edges, mislabeled junctions, missing junctions, and extraneous junctions. There are other problems that can occur in line drawings, such as extraneous edges, but the vast majority of the incorrect line drawings that were used to test this system fell into these four categories. This is a result of a conscious decision to select parameters for the weak membrane edge detection that undersegment the image, i.e. that divide the image into too few regions. Incomplete and missing edges occur when the edge detection algorithm does not completely separate two regions. An example of an unavoidable missing edge was shown in Figure 2.1. When this image is processed, the curves and junctions found are shown in Figure 4.10. When the floating object heuristic is used to process this image, vertex v25 is correctly flagged as a problem vertex. Mislabeled junctions occur in line drawings because it is difficult to find the type of a junction using only local information. Confusion occurs between junction types of the same class, for example three tangent, arrow, Y, and T vertices. These mistakes are inevitable, since in the mathematically rigorous curvilinear line labeling paradigm the only distinction between a T junction and a Y is T junctions have two arcs that have perfectly aligned tangent directions. Since this perfect alignment rarely happens in practice due to positional noise, a threshold was used to determine how close to perfect alignment two arcs must be to be considered a T junction. When this is done, a Y junction or an arrow junction that falls within this thresholded range will be improperly typed as a T junction. Figure 4.11 shows a triangular prism with a 111 # 61. —-——fl _—»--""'"- “I f_—'- 1:“ xx ‘1“. all ,I ‘5‘ ix m a? - «A ., N. 5 K K \ \ kI "a I. a" . a L. x x , '\ . \ __.--*""', ‘I‘. \ a}. | 1 \ f- C‘ x“ vb— ____,_ 3‘ .p-F'S-P- e ’F ee—r-F as 617 Figure 4.10. The junctions and curves found from processing Figure 2.1. 112 junction v25 that is incorrectly labeled as a T junction. This junction was flagged as being faulty by the revised line labeling algorithm. Another example of a mislabeled junction is shown in Figure 4.12. In this image, a pair of junctions are so close together that the vertex proximity knowledge source has grouped them into one junction. This problem is easy to diagnose, since there are more than three incident arcs at vertex v44. The object domain was restricted to junctions with at most three incident arcs. The modified line labeling module correctly flags v44 as a problem vertex. A more challenging problem to diagnose is shown in Figure 4.13. In this image, two vertices that should have been grouped into a single vertex are too widely separated for the distance threshold in the vertex proximity knowledge source to permit them to be combined into a single vertex. This problem is the opposite of that shown in Figure 4.12. This demonstrates the fragility of thresholding. The modified junction labeling paradigm detects vertex v24 as the problem vertex. The most difficult junctions to properly analyze are the curvature-L junctions and three tangent junctions. In a local neighborhood, they are very similar to arrow junctions and L junctions respectively. Figure 4.14(b) shows a curvature-L junction that looks very much like the L junction in Figure 4.14(a), particularly in a small neighborhood of the junction. To further complicate matters, junction labeling is vulnerable to paired mistakes in junction types. In Figure 4.15 shows three legal labelings for a cylinder. Only the interpretation in Figure 4.15(a) corresponds to the usual interpretation. Missing junctions are another problem in the input line drawings. Missing junc- tions occur when the continuity algorithm does not detect a corner. The only type of junctions that can be missed are L and Curvature-L junctions. Junctions with three incident arcs cannot be missed because they occur at a position where three surfaces are joined, and all such points are forced to be vertices. Junctions with three incident 113 m W’s—"‘77P? e 7 til, i. u I 5‘11 3|; 2.“ .1, 'l' ‘3“ I, l'. ‘v. 1 \‘il. __:i “flee etl (b) Figure 4.11. (a) The input image.(b) The curves and vertices. 114 a ’._,_r'¢ ,3 dv—.—_ " / i a ——~u no .401 «In H an - a ‘13 x .I’ .\ , :‘-\ d7“ \‘\ ,I’ l \ \ '§_‘ I} as; \\ \ dl\ r \ ‘. ..§ "\ M Ru “\ ‘1 .1 \\ .5" ‘ '\.‘ (I ~. ,7 air \_ x ,‘ ‘~ . '2" \ \ ~ I «I. l “K ’ "\ "x? m“ \ K. “Held. ' :- (b) Figure 4.12. (a) The input image. (b) The curve endpoints near vertex v44 are so close that they are are grouped into a single vertex. 115 ell r16”? I"? i I l, I .I' ( . u .l I, l, e I "‘5‘" a? ‘ I; a 0 HR, i. l\ ”(a l I iv?“ (b) Figure 4.13. (a) The input image. (b) Vertices v24 and v28 should have been grouped into a single vertex. Vertices v3 and v7 also should have been grouped. 116 (a) (b) Figure 4.14. (a) The lower right hand side of this object is an L junction. (b) The lower right hand side of this object is a curvature-L junction. In a local neighborhood of the junction there is very little difference. (a) (b) (c) Figure 4.15. The cylinder shown in (a) has the junctions correctly typed and the correct line labeling interpretation. The cylinder in (b) has a pair of mistyped junc- tions. This line labeling shown is legal. The cylinder in (c) has two pairs of mistyped junctions. This line labeling is also legal. 117 arcs may be mislabeled, or misplaced, but they will not be missing. Figure 4.16 shows an image where a pair of curvature-L junctions were missed on curve c2. The modified line labeling algorithm does not detect this problem. It does detect the problem with vertex v3 which was incorrectly typed as a T junction. Vertex v4 was labeled as a P. Most of the extraneous junctions in these images occur because the weak mem- brane is not fitting the edge data well, or because the continuity algorithm has found a false corner. An example of a combination of these failures is shown in Figure 4.17. In this image the weak membrane has segmented the image into 38 regions. Most of these regions are along the right hand edge. These small regions could be filled with some morphological operation like successively expanding and then contracting the edges, but this would distort the junctions. Since the junctions are critical, this sacrifice cannot be made. These extraneous regions create extraneous curves, which create extraneous vertices. Vertices v118, v116, and v117 were all created this way. The modified line labeling algorithm identifies junction v115 as the problem junction, since it is of type M. There are a number of other problems in this image including c3 and c114, which are noise edgels. Both of these noise edgels have a terminal junction. They will fail when labeled as well. Figure 4.18 shows an image with a phantom junction on curve c39. In the original line labeling algorithm, all curved arcs must have an extra phantom junction added, as was discussed in Section 1.1.3. Up to this point, no phantom junctions have been added. Some junctions have been identified as phantoms since their tangents and curvatures are not sufficiently different to be classified as either Curvature-L junctions or L junctions, but these junctions are not necessarily placed properly to be good phantom junctions. There has been no distinction made between curved arcs and linear arcs up to this point. In fact, the only place where this distinction is made in the original line labeling paradigm is in the placement of phantom vertices on curved arcs, and in the junction types in the catalog. The modified line labeling 118 (b) Figure 4.16. A right circular cylinder where two curvature-L junctions were not detected. Neither of the three tangent junctions were properly detected. Vertex v3 is flagged as being in error. 119 an ”1““ .J “"H. ultra.K m1 /“" I f ‘L a 8““; \k ’IUK‘ w “.1 h—P‘f “'3‘ —~_\ I‘d-h. ”k \h‘ ms M\ / m. ‘ J m “a. “H ’“ .-’ .n k‘x ./ “I “M “at I: 2"“ f1 ' M?“ ’7, jar R‘- " 1 , r. ' J; aha” 6", mo {- 1 F‘Hk- v”.- thh I. ”'H‘J- \ \. ‘1 ll 9 «a, x. Elk. || \ Gu—eflz n 8’ - - (b) Figure 4.17. (a) The input image. (b) The curves and vertices. 120 algorithm failed at vertex v54, which is adjacent to the are that needed a phantom junction added. As an informal experiment on the robustness of the detection of the missing phantom vertex, the ordering of the outside arcs and inside arcs was reversed. The labeling failure then occurred at v6, which is the other end of the arc that needed the phantom vertex. 4.3 Summary and Conclusions The test bed of images in Appendix B was used to test the ability of the modified line labeling algorithm to diagnose line drawing problems. Table 3.4 summarizes the errors that were made by the initial processing in these images. Table 4.4 shows a comparison between the faults that exist in the images and the ones that were diagnosed by the algorithm. Multiple diagnoses for a single image were attained by manually correcting problems as they were discovered to permit further problems to be diagnosed. The correction process is the topic of Chap 5. Table 4.4 summarizes how well the problems found in the initial figure hypothesis have been detected by the modified line labeling paradigm using the floating object heuristic. Grouped junctions are easily detected since having more than three arcs at a junction is not permitted within the object domain under study in this thesis. Curvature and tangent mistakes in classification of the junctions are also found well. The exception to this occurs when a pair of mistakes hide a problem. As an example, if a three tangent junction is mistakenly identified as an arrow and an adjacent curvature-L junction is also mistaken for an L junction, then the mistake cannot be diagnosed. In this case, the limb edge will be found to be a jump edge. While this is unfortunate, it should be noted that both limb edges and jump edges represent discontinuities of the scene depth and it is this similarity that permits confusion. 121 a d*'§ I \ J", “a.“ 1‘ ‘-~_ /’ ha. /" I‘ -..'. \- .-'/ {3}“ / f— r l H x {u n ./ [ff ‘5" "T "I l .I—P—wa 1"". «I r '- uo ' G” «I». ‘1’ “a a x ‘ hie 2" if: --.._‘_ ‘x. “mg ‘4 :Z' ._\ a / E, .a.‘ ”4:, ‘ db ‘9' "' x, r \. Jr“ ‘5 cm (b) Figure 4.18. (a) The input image. (b) The result of processing. The modified line labeling paradigm failed at vertex v54. 122 Table 4.3. A listing of the problems in the test bed of images from Appendix B show- ing which problems are diagnosed. ME stands for missing edges. EE is extraneous edges. GJ are grouped junctions. BJ are broken junctions. MB are missed breaks. EB are extraneous breaks. CM are curvature mistakes. TM are tangent mistakes. MP are missing phantom junctions. Notation a means that the problem diagnosis was found at a vertex adjacent to the vertex where the problem actually existed. In Image 7, the notation b was used to indicate that one of the problems was correctly diagnosed, but the image was too cluttered to be fixed using the methods of this thesis. ccch—tc HCOODG essay—cc cg‘ccoc occcoc ooccoc o occr-toc oo NOHMOOE soccer—I H p “HMQCCOOOHOOOCO 0 0 0 1 l 0 0 1 3 0 2 0 0 0 0 NOCWOOHOOOOCOCCC HOOOCNHHOOOC oowwocr—HcI-‘Hccccco ccccocccoocccoco wwccccog‘cwcccoco qwcwgcowocI—cco wccccg‘cccooocov—o H N H y—a H N 123 Table 4.4. A comparison between the faults in the images from Table 3.4, and the problems that were diagnosed using the floating object heuristic. Source ME EE GJ BJ MB EB CM TM MP Initial 20 14 7 13 13 26 7 l6 2 Diagnosed 12 11 7 8 0 3 7 11 2 (a) (b) Figure 4.19. (a) Shows a legal labeling for a detected corner. (b) Another legal labeling without a corner. The mistakes that are the most difficult to diagnose are the missing breaks and extraneous breaks in continuity. The difficulty with diagnosis is that both labelings shown in Figure 4.19 are legal. This causes problems with the diagnosis of both missing and extraneous breaks. Although the problem in these cases cannot be diag- nosed, it should be noted that the line labeling that is obtained is correct. In fact, the correctness of the line labeling with out regard to the placement of these breaks is what causes the poor diagnosis result. The localization of the diagnosis of problems in the figure hypothesis is important. In all of the cases reported in Table 4.3 except three, the problem was diagnosed at the correct junction. In the remaining three cases, the problem was diagnosed at an adjacent junction. The modified line labeling algorithm has been shown to be useful for identifying 124 problems in the initial line labeling hypothesis. Not all problems can be identified, but many problems can be. In all cases in the test bed of images, the junction that was diagnosed as being the problem was either was the problem vertex, or was adjacent to the problem vertex. As would be expected, there were no cases found where a correct line labeling was diagnosed as being faulty. CHAPTER 5 Updating Line Labeling Previous chapters have shown how an initial line drawing can be found from a sin- gle intensity image, and how problems in this line drawing can be diagnosed. This chapter will focus on the repair knowledge sources that were created to use the di- agnostic information coming from the modified line labeling algorithm to fix some of the problems in the line labeling, and the controller knowledge source that governs which repair knowledge source is chosen. This is the point at which the independence of the knowledge sources starts to pay dividends. While it would be possible for the line labeling module and try to correct problems that are the result of other modules, the complexity of this structure would be high. The line labeling module would have to know about all of the data structures from all of the other modules, and be able to modify its data structure to call them. As an example, the continuity algorithm represents curves differently than the blackboard. If the line labeling module were to call the continuity module directly, either the line labeling module would have to understand the data structure and routines from the continuity module, or the continuity module would have to understand the data structure and routines from the line labeling module. This would leave one of the two modules vulnerable to changes in the other module. When this is multiplied by all the pairs of modules, the complexity would be unmanageable. 125 126 A blackboard system solves this problem by having each module translate to and from the standard blackboard objects. Control knowledge sources are designed to use the standard blackboard objects alone. Control knowledge sources have no knowledge of the detailed inner workings of the other modules. They only know what blackboard objects are legal input to what modules, and the control parameters, (such as the parameters for edge detection), that each knowledge source has made available to the blackboard. This division of labor makes it possible to make changes within a single module without propagating them into other modules. The control knowledge sources do need to know how to call individual modules, but they do not need to know anything about the internal data structure of the modules. This structure is ideal for experimentation with controlling the modules, since the modules themselves do not have to be altered. In order to fix line drawings, two types of knowledge sources are needed. The first type are repair knowledge sources. These knowledge sources know how to fix a particular problem in a line labeling. For example, there are repair knowledge sources that can insert phantom vertices into a line drawing, and hypothesize where edges that are missing could be placed. These knowledge sources are instantiated by the controller knowledge source, which has a strategy for selecting which of the repair knowledge sources should be selected next. This chapter is divided into three sections. In Section 5.1 the repair knowledge sources that can be used to repair line drawing problems will be investigated, along with the control knowledge source. Section 5.2 will discuss ideas for other perceptual knowledge sources. Section’ 5.3 will give the computational complexity of all of the algorithms discussed in this thesis, as well as the hardware specifications and run times for the examples images given in Appendix B. 127 Table 5.1. The knowledge sources used to create the line drawing and possible errors made. Module Error Edge Detection False Edges Missing Edges Continuity JFalse Corners Missing Corners Vertex Proximity Too much grouping Not enough grouping Junction Typing Wrong type Wrong class Phantom vertex missing 5.1 Correcting Errors in Line Labeling A variety of problems with the initial line labelings were found by the modified line labeling module. These problems now need to be identified and corrected. There were four modules used to create the line drawing: edge detection, curvilinear grouping, vertex proximity grouping, and junction classing and typing. Each of these modules can, and does, make errors. These errors are cataloged in Table 5.1. The errors are not mutually exclusive, for example a missing edge will cause two junctions to be of the wrong class. These errors will be fixed with a set of knowledge sources called the repair knowl- edge sources. Each repair knowledge source is an experts at fixing one type of problem that can occur in an image. The repair knowledge sources will be activated by the controller knowledge source that decides which of the repair knowledge sources is most likely to fix the current problem vertex and instantiates the knowledge source. The success or failure of the knowledge source will be reported back to the controller, which can then either accept the solution, make more modifications, or stop the processing. 128 Sections 5.1.1, 5.1.2, and 5.1.3, 5.1.4 will describe some repair knowledge sources. Section 5.1.5 will describe the control knowledge source. 5.1.1 Mislabeled Vertices Since the junctions in the line drawing have their types selected using thresholds for similarity on both the tangents and curvatures, it is not surprising that mistakes occur. An example of a typical mistake is shown in Figure 5.1. There are two failures in this image. The first occurs at vertex v11. The continuity algorithm has broken what perceptually is one curve between v29 and v28 into two pieces. The tangent and curvature approximations at v11 have caused this junction to be labeled as a phantom junction. This problem does not cause a line labeling failure, since the phantom junction has a labeling that matches the correct interpretation. This problem will be solved in Section 5.1.4. The more serious problem is that vertex v25 has been labeled as a T junction. This is a reasonable initial guess for the local junction labeling module to make, but it is not consistent with the other information in the image. The modified line labeling paradigm fails at vertex v25. When a T junction is suspected of causing the line labeling problem, as in this case, one way to reexamine it is to adjust the tangent threshold that was used to label the vertex. In fact, since it is known that the T labeling is not correct, it is best to set this threshold to zero. If this setting were made initially, T junctions would almost always be mislabeled. After this adjustment is made, only vertex v25 is relabeled. The new junction label is an arrow, which is correct. The modified line labeling scheme succeeds on this new image, and the correct labeling is found as is shown in Figure 5.2. Although this line labeling is correct, it will not be perfect until v11 has been removed. This processing shows a possible place where compilation, which is can be an important result of a blackboard system, could occur. In both of these examples, a T 129 e1! III I X 3. '1’ ' ‘1. i l i" II it .3 . ‘1‘ 4.4.1 010 (b) Figure 5.1. (a) The input image. (b) The labeled curves and vertices. 130 d5 II 1,, 1K *l‘ *2 ’i. .3“. ‘1: ‘-., I "1’ (Kt 4:1 3L4“... a. a. d. Figure 5.2. The line labels for the object in Figure 5.1. 131 junction was found with the two arcs with the same tangent directions being outside arcs. This configuration is never legal with the floating object hypothesis, as careful inspection of the junction catalog in Figure 4.6 shows. This information can now be taken and incorporated into the algorithm that decides the types of junctions. While this could have been discovered by other methods, this insight came from experimentation with the blackboard system. 5.1.2 Missing Arcs Since the modified line labeling algorithm can detect missing arcs, a procedure for adding arcs is needed. The problem is approached by hypothesizing where a missing edge could go and then picking the best of these hypotheses. This approach will be useful in cases like that shown in Figure 2.1 where edge detection of any sort is unlikely to find an appropriate edge. Figure 5.3 shows an example of a triangular prism. Vertex v22 is labeled as prob— lematic by the modified line labeling algorithm. Assume that the control knowledge source has decided to hypothesize a missing arc, as will be discussed in Section 5.1.5. A vertex needs to be found as for the second endpoint for the missing arc. This vertex must be in a region that is adjacent to the problem vertex. The second type of vertex to be considered is a vertex with two incoming arcs. Adding a third arc to this type of vertex would permit an interpretation that is legal in the current object domain. Phantom vertices will not be used for the second vertex since the phantom vertices that are presently in the line drawing are there as the result of failures in the continuity algorithm and have no perceptual significance. Vertices which already have three incoming arcs could also be used for the second endpoints. This was not done since the object domain specifically excludes objects with more than three faces adjacent to a vertex. Once the set of second vertex candidates is found, a curve is hypothesized between 132 (b) Figure 5.3. (a) The input image. (b) The labeled arcs and junctions 133 the first vertex and each of the second vertex candidates. These curves are added to the initial object hypothesis individually, and this new hypothesis has its junctions typed and labeled. If the labeling does not fail at the either of the vertices that were modified, then the new object is put on a list of candidate objects. The best object hypothesis will be chosen from these candidates in Section 5.2.1. For the triangular prism in Figure 5.3, three alternate objects were proposed. These are shown in Figure 5.4. If the restriction on the number of faces adjacent to a vertex were to be removed, vertices with three incoming arcs could be given a fourth arc. In this case, it would be best to add a fourth incoming arc to a vertex only if there were no other legal interpretations. This could be regarded as a simplicity criterion, similar to Malik’s requirement that a junction with more than four incoming arcs be interpreted in such a way that the minimum number of hidden surfaces are used in the interpretation. When the hypothesis in Figure 5.4(a) is labeled, junction v22 is labeled as a T junction. It is within only a few degrees of the acceptable T threshold. Even if it had missed the threshold, it would have been labeled as an arrow junction. Neither an arrow nor a T junction is a legal interpretation for this vertex. Figure 5.4(c) also has v22 labeled as a T junction, and the hypothesis is rejected. The interpretation that is accepted is in Figure 5.4(b). This interpretation labels v22 as a Y junction, and gives the final labels as shown in Figure 5.5. Another example of this problem is shown in Figure 5.6. This image has two problems. The bottom left corner of the block, on curve CS, was missed by the continuity algorithm. This mistake is similar to the problem shown with curvature- L junctions in Figure 4.16, except that this junction should have been an L. This method cannot recover from this mistake, since the labeling is legal regardless of whether a vertex is found on CS or not. The missing front edge of the block can be recovered. As before, there will be three alternate object hypotheses. If the continuity 134 0' #~ ‘ m 4" “- M} m ‘ T‘wx ' ‘I .a \ ‘ fi “3 .PF- n i ‘01P...” ‘1 ll I * .ir"—._:--'l l \ arr“ I \ "ro- ‘ l :k we 0. m 'P er~- '” 410 UP ,I' ‘ - ~. In 4' or- _‘ d’ N ‘fl ‘\.\\ \ \‘q ¢\- V ""~ \ \‘ -\- a \ g .\ . a\\. _ \“ :1“ I17“. I - ' A ....- a ,. -v'"- u . .d-Fpnl es tr“ “A \ 3"" _,_-.a “__fi_-" \ ”fl- £-‘“ i‘....—-" d of. (b) (C) Figure 5.4. (a) Vertex v22 is joined with vertex v27. (b) Vertex v22 is joined with vertex v6 (c) Vertex v22 is joined with vertex v26. Interpretation (b) is accepted as the best. 135 it 01‘ — ——II ‘I ell? $3 -.__> OI. m"---._ _ L“, . . ‘- i m I Is 3+ '1 J- . 2. __fl 1 "‘ r - ~ ‘ _ V E , Ir—«r’ a - : inl+f _ _-_m I .:_--__(. mu __ _ .4 Figure 5.5. The arc labels for the triangular prism. algorithm had been able to discern the corner on curve c5, there would have been four hypotheses. These are shown in Figure 5.7. The hypothesis in Figure 5.7(a) produces an arrow junction at vertex v38, which cannot be labeled. The hypothesis in Figure 5.7(b) produces a T junction at vertex v38, which also cannot be labeled. The final hypothesis in Figure 5.7(c) produces a Y junction that is correctly labeled. The legal labeling is shown in Figure 5.8. In Section 5.2.1 a heuristic will be discussed that can distinguish between interpretations even if the center vertex is labeled as an Y junction, instead of being labeled as a T. 5.1.3 Adding Phantom Vertices There are two fundamental differences between polyhedral line labeling and curvi- linear line labeling. The first is that polyhedral line labeling can be aided by linear numerical analysis as shown by Sugihara in [105]. Curvilinear line labeling cannot rely on similar help because curved arcs cannot be represented adequately with linear 136 G10 011 (b) Figure 5.6. (a) The input image. (b) The initial output. 137 on K "‘ I j ‘\ 1' 1‘ ’5', ‘\ r \ ." Lin. u- r / ,i l/ .1 no 1;, I: m I )' /'I .. p ,1" I" l, .x' , \‘A '/ "g .n’ a a! (a) r ~ '" l ‘7‘ F i“ \\ i ' f ‘ .. .' \ x [I ‘. .\ I; ‘\ 0‘ . ll "‘ l I", j u- l . I I 1 I . I f l! f / Il / I” I .- ,I . o1» '1: f I» m I, s '2’ I» l i ’P I, f. l. I". I'. " a I ,I ,I / I l f . / f I .1! ! "y "3‘ j I i ‘ II" ." g, I a e" m (b) (C) Figure 5.7. Three hypotheses for the missing edge.(a) v38 is joined to v33. (b) v38 is joined to v32. (c) v38 is joined to v31. Interpretation (c) is accepted as correct. 138 “O --__>¢. 1 ' "7 01d '. . I {In “'7 3 ,' _ ' - i " m_.‘_"—.-m ; ‘I' if m , l A :‘ i I. '8 'l . “r . - , d. '3’ In" ,' l ‘2. u 4 .V 3 d C” : ." a" i I.I (5 j ." " ' " , -4“ l “(or F m # 0‘7 Figure 5.8. The correct line labels for Figure 5.7(c). equations. The second important difference is that curvilinear line labeling requires all curved arcs to include an extra phantom vertex to accommodate possible hidden limb edges. These extra vertices damage the combinatorics of curvilinear line label- ing, and are often unnecessary. It was demonstrated in Chapter 4 that the modified line labeling paradigm can flag and locate vertices near a necessary phantom vertex that is missing. Phantom vertices identified in this manner can be inserted into the line drawing and the line drawing relabeled. In this way, phantom vertices do not have to be placed around the line drawing but only put in when and where they are needed. In order to decide which arcs adjacent to the a problem vertex should receive a phantom vertex, a distinction must be made between curved arcs and linear ones. This distinction was made by comparing a line drawn between the vertices to the actual arc. If the distance between the line and the arc was more than a few pixels, as defined in Appendix A, the arc was labeled as curved. The threshold is kept small 139 so that if an error occurred it would be more likely to produce an extra phantom vertex than neglect a necessary one. All curved arcs adjacent to the problem vertex have junctions added initially since there is no way to diagnose which of the curved arcs needs the phantom vertex. This should still add far fewer phantom vertices to the line drawing than the original strategy proposed by Malik where a phantom vertex was added initially on every curved arc in the image. When Figure 5.9(a) is processed, vertex v54 is flagged as the point of failure. The phantom vertex insertion knowledge source then inserts phantom vertices along any arcs incident to v54 that are curved. At vertex, v54 arcs C47 and C38 were both found to be straight, so no phantom vertices were added on these arcs. A phantom vertex was added in the center of C39. The junctions were typed again, and the line labeling was successful Figure 5.10 shows the object after the phantom vertex has been labeled. The line labels found are shown in Figure 5.11. - The placement of v55 in the center of the arc is suboptimal. The phantom vertex in the image is probably closer to v6 than v54. By using shape from shading, shape from contour or another module that can provide normal directions for the surfaces, it might be possible to provide better placement of the phantom vertex. In Malik and Maydan [70], shape from shading is used to place the phantom vertices properly. A similar idea could improve the placement of the phantom vertex in this image. There are two other small errors in this image that the modified line labeling algorithm cannot detect. The first is that v20 is labeled as an L junction. The labeling of curves c2? and c23 is correct. The second flaw is that v38 is labeled as a phantom vertex. These are small mistakes that do not prevent the proper interpretation of the image. 140 an pfi's r” x“- x’ “a. // a~.\ f/ 3“"- .- a lf'” I“ ," ”fl—5x / M 14 ,r” ,x’ “ vi / t' x” IE I) grid—“r3 J“ a. t“. '1. 5“- " .62 is. “K. ~. . \ L ~_\ n“- .\ .\_ A? a, .45x ,6: ' K WM 1" . ' 43. ‘x .9 an (b) Figure 5.9. (a) The input image. (b) The lines and curves. 141 no ’m.5 I '5..- I '3‘... I -‘- ff’ \ ‘5. r‘la ‘H' .4' I!“ J-b— .l I f f...” 3““ ’uda -- d {W " x" -a’f #M d a .4”; Jr" “0 "a'flp “3 ' + "x 1. 5s“ '3 an em... "K. x a“! N. ‘5. s, Ks... _//7. ~~._ a... ,4» ‘ fi'x‘. I? I" ‘ I X. JV“ -— fit— ‘43 61 Figure 5.10. Phantom vertex v55 has been added. Curve C39 was broken into c63 and CM. The placement of the phantom vertex is at the center of the curve, which may not be perceptually correct. 142 v a 4‘ «M's. ”r H- 3"~_\ 4"! ‘~ a.“ f“ I .. + 4’ ' m . " I!“ 337 ,L .-' ,J“ «a x mm“ an § “-~ u '3’ ““3: “Ru 5 ‘w. I ”H. «a \x + A’ I “~ . (am A: ‘ at.“ ‘9 F.“ 4‘. as ch as Figure 5.11. The line labeling found from the object in Figure 5.9. 143 5.1.4 Other Line Labeling Improvements There are other improvements that can be made to the line labeling that have not been implemented yet. The first improvement is to remove unnecessary phantom junctions from the line labeling. Unnecessary phantom junctions occur in images when the continuity algorithm makes an extraneous break in an edge. This happens most frequently when the edge is noisy. When these unnecessary breaks have their tangents and curvatures calculated and are labeled, they are frequently labeled as phantom junctions signifying that they have no significant difference in either tangent or curvature. Removing these arcs is more than a cosmetic improvement, as can be seen in Figure 5.12. In this figure, vertex v37 has been labeled as a curvature-L. This has happened because the adjacent pair of curves c6, and C36 have different approxi- mated curvatures, but the same tangents. When phantom junctions v36 and v38 are removed, the tangents and curvatures will be recalculated for the longer arcs. It is likely that a relabeling of this junction at that point will produce more accurate curvature values, and permit this vertex to be labeled as a phantom vertex, which can then be removed. The method for doing this removal is as follows. The two curves that are adjacent to the phantom junction are joined by a line between the endpoints. This arc will often have zero length, but can have nonzero length when several breaks occurred within a small area in the image. Then the curves will be returned to the continuity algorithm with an additional constraint: no breaks are to be made. The continuity algorithm then will use the energy functional E in Equation 5.1 to prevent breaks from being made. The original curve data is represented as 22;, u.- is the curve of minimum energy. This function is identical to Equation 3.2, except that breaks are no longer allowed. The energy minimization procedure is the same as before, including the use 144 C1. JFA‘A _.H' \. 1" I" x" \. x.‘ \ T‘ b "1. ~. ‘ “a. \ . H‘- .‘ ~ \ "‘--. L" dB at." ‘-.,, -., a m" \x. \\ fl, ‘ x. \. N. G— \‘l. J\.\ ‘5 \‘x ‘hx, 1. "‘ a \» fr” ‘~_ .441 / W , , .- "°- g ,.~"£I: \ , at '1. (b) Figure 5.12. (a) The input image. (b) The labeled lines and curves 145 of multi-grid relaxation. Since this energy functional is convex, multi-grid relaxation was not strictly necessary. E = 2(1‘.‘ - an)2 + A(u.- - um)2 + Mum - 2a.- + um)2 (5.1) This will force the continuity module to smooth the curve, which will produce more reliable curvature values and tangent directions. When the continuity algorithm has finished smoothing the curve, the new curve will replace the old curves in a new figure hypothesis, which will be relabeled to verify the solution. In Chapter 4 it was noted that the vertex proximity module made some errors in grouping vertices. When these errors are diagnosed they can be fixed. Examples of problems with the vertex proximity module were shown in Figures 4.13, where a pair of proximate vertices were not grouped, and Figure 4.12 where a pair of vertices that should have been disjoint were grouped. Both of these cases of mistaken junc- tion identity can be separated from other instances, like the mislabeled junctions of Section 5.1.1, by noticing that there is a curve adjacent to the vertices that is very short in proportion to the other curves. To group the proximate vertices, this curve needs to be removed. To separate the incorrectly grouped vertices, the endpoints of this curve need to be separated. In this later case, if there is no separation between the vertices, an accidental alignment has occurred. Line labeling offers no guarantees when accidental alignments are involved. Figure 5.13 shows an exaggerated example of these processes. 5.1.5 Control of Strategies The repair knowledge sources are each capable of using their specialized knowledge to correct exactly one type of problem in an image. Selecting which repair knowledge source should be used first is the job of the controller knowledge source. The strategy 146 c2 c2 (a) (b) Figure 5.13. (a) A pair of proximate vertices that should be grouped. Curve Cl is the small arc that should be removed. The dashed boxes show the original neighborhood for vertex proximity.(b) A pair of proximate vertices that should be separated. The dashed box shows the original neighborhood for vertex proximity. Arc c1 will be used to perform this separation. implemented here is to use a fixed ordering of the repair knowledge sources. A more elaborate strategy will be necessary as the number of repair knowledge sources is expanded. The general strategy for fixing these problems is to address only one problem junction at a time. If there are several problems in the image, they will be solved sequentially. A problem is identified as being fixed by a repair knowledge source when the vertex where the failure occurs moves to a different location. A list of vertices where corrections have already occurred should be kept to prevent endless looping around the image. If a vertex is flagged as being a failure a second time, then the processing should either stop or backtrack to the first correction of the vertex, and try a different correction. This is a search process and, like all search processes, it could be computationally expensive. The more closely the repair knowledge sources can be matched with the diagnosed vertex problems, the less risk of combinatoric explosion. 147 The following repair knowledge sources are available: 1. Re-type junctions 2. Replace missing arcs 3. Insert a phantom vertex 4. Break a vertex into two vertices 5. Join two close vertices 6. Remove a phantom junction There are two criteria for selecting the ordering of repair knowledge sources. Re- pair knowledge sources that can be uniquely identified, and are unlikely to be falsely triggered should be used first. Repair knowledge sources that are risky, meaning that they could produce a correct line labeling without correcting the real mistake(s) in the image, should be tried last. An example of a repair knowledge source that can be uniquely identified is the repair knowledge source that breaks vertices apart. It can be uniquely identified by the presence of a junction with more than three incident arcs. Similarly the repair knowledge source that groups proximate vertices together is well identified by the presence of a curve incident to one of the junctions that is much shorter than the other incident curves. These knowledge sources should be tried first. Risk is another consideration when ordering the repair knowledge sources. A repair knowledge source is risky if it can make alterations that may produce a legal line labeling without fixing the underlying problem in the image. In some sense, all of the repair knowledge sources are risky, but two of them are particularly so: the repair knowledge source that inserts phantom vertices and the repair knowledge source that replaces missing arcs. These strategies should be tried as a last resort. Whether 148 phantom vertices damage the combinatorics of line labeling has not been established, but they do certainly do not improve the combinatorics and are risky in the sense that they relax an important constraint: that an arc may have one and only one label. Inserting missing arcs is also risky. The risk in this knowledge source can be limited by checking the arc that gets inserted against the raw image data, to look for a subtle edge that the edge detection algorithm may have missed, or using symmetry constraints as will be discussed in Section 5.2.1. The repair knowledge source the re-types junctions is neither unique nor risky. It should therefore be tried after the repair knowledge sources that break apart a vertex and joins close vertices and before the repair knowledge source that inserts phantom junctions and missing edges. The only remaining repair knowledge source is the phantom vertex remover. Since it will not be triggered falsely, it has no risk and can be run at anytime. In the interest of improving the line drawing, it should probably be run first. This gives the ordering used for triggering repair knowledge sources in the con- troller. 1. Remove a phantom vertex 2. Break a vertex into two vertices 3. Join two close vertices 4. Re-type junctions 5. Insert a phantom vertex 6. Replace missing arcs As the number of implemented repair knowledge sources grows, more subtle strate- gies can be developed for selecting among the choices. If a test set of images with 149 ground truth is available, an analysis could be done of the probability of each repair knowledge source fixing a problem for each junction type. This type of experimentao tion could lead to a better understanding of the frequency of each type of error and more sophisticated strategies for calling repair knowledge sources. 5.1.6 Experimental Results And Conclusions Since the full set of proposed repair knowledge sources has not been implemented, testing of the repair knowledge sources was done by manually editing the blackboard data structure. The author was careful to not fix problems using high level knowledge such as the desired final line labeling and apply the repairs mechanically. A comparison can be made between the problems that have been diagnosed in Chapter 4 and the problems that can be repaired using the repair knowledge sources in this chapter. This comparison is shown in Table 5.3. This table shows that most of the diagnosed problems can be fixed using the repair knowledge sources. The extraneous edge problem that was not fixed was in image 7 where a near accidental alignment lead to an excessively cluttered area of the image. The overall performance of this system in detecting correct line labelings will be discussed in Chapter 6. The final measure of the success of this system is the number of images in the test bed of images in Appendix B where correct line labelings are found. Of the twenty-two images in the test bed given in Appendix B, one image could not be labeled, one image had a convex edge mistaken for a limb, and five images had limb edges mistaken for jump edges, and one image had a missing are that wasn’t replaced. Therefore fourteen of the twenty-two images were processed completely correctly. Of the eight failures, only one produced no line labeling. In the other seven failures, only one or two arcs were mislabeled. Integration and diagnosis have dramatically improved the initial figure hypotheses, as can be seen in Table 5.4. 150 Table 5.2. A listing of the problems in the test bed of images from Appendix B showing which problems are repaired. ME stands for missing edges. EE is extraneous edges. GJ are grouped junctions. BJ are broken junctions. MB are missed breaks. EB are extraneous breaks. CM are curvature mistakes. TM are tangent mistakes. MP are missing phantom junctions. Notation a indicates a correction that was manually made that may be beyond the scope of the repair knowledge sources discussed in this chapter. 3 a: Z Z "a oo-qcscnaswwr— OOOOOHO HCOOOO OOMt—ICO CHCOOc COOOOO OGOOOO ¢OOHCO NOHNOO CCOCCH HHHv—ito wwwo g‘cccco H uh mmHo—IHi—a Hocooouc: 0 0 0 l 1 0 0 15 1 3 0 2 0 O 0 0 “HMWOCOCOHOOCOO M N «cowccr—cccccccoc oowwccv—u—‘co-u-icccccc ccccccocoocccccc wwccccocou—cccooc Scccccmv—Hcccocwc wcoccwcoccccccco O’HCNOCOHOO p—o N H y... Table 5.3. A comparison between diagnosed problems and repaired problems in line labelings. Source ME“ EE GJ "Bi MB EB CM TM MP fignosis 12 ll 7 8 0 3 7 12 2 Repair 12 11 7 8 o 2 6 10 2 151 Table 5.4. The result of the line labeling process. A “Yes” in the intial column means that a proper line labeling was found from the initial figure hypothesis. A “Yes” in the final column means that a correct line labeling was found after correction. An “A” in the final column means that the line labeling was correct except for a single error. A “B” means that a missing arc wasn’t replaced. A “No” in a column means that no correct line labeling was found. Image Initial Final 1 N 0 Yes 2 No Yes 3 No Yes 4 N 0 Yes 5 No Yes 6 No Yes 7 N o No 8 Yes Yes 9 No Yes 10 No A 11 No Yes 12 No Yes 13 No Yes 14 No 15 No 16 No Yes 17 No A 18 No Yes 19 No Yes w> 20 No A 21 No A 22 No A 152 5.2 Other Useful Modules To keep the implementation of the integration ideas manageable, some interesting and useful modules were omitted from the implementation. Two of these modules could be particularly useful in future expansions of the system: symmetry, and shape from shading. Symmetry is a higher level module which can help in the diagnosis of line labeling problems. Symmetry can also be used to improve the line labeling. Shape from shading can provide the normal directions of surfaces, which is necessary for a more complete 2.5 dimensional sketch. It can also be integrated with the line labeling algorithm to improve the robustness of line labeling. 5.2.1 Symmetry The only high level process that has been used in this system so far is line labeling. Including a symmetry module in this work would provide another avenue for top down analysis and control. The symmetry algorithm under consideration is Gross and Boult’s SYMAN system [47, 46], which can offer a binary evaluation of skew symmetry in an image. Both types of symmetry module can provide useful information to the blackboard system. A example illustrating where SYMAN could be useful is choosing between the possible legal labelings found when filling in missing edges, as in Section 5.1.2. If the tangent threshold were set differently, Figure 5.6 could have two legal labelings instead of one. Both of these labelings would have a Y vertex in the center of the block, which is legal and an arrow vertex at one of the corners. Line labeling cannot distinguish between these two hypotheses, one of which is clearly preferable to the other. Symmetry can make this distinction. In the hypothesis in Figure 5.7(c), an asymmetric region was divided into two symmetric regions. The hypothesis of Fig- ure 5.7(b) divides an asymmetric region into two asymmetric regions. Since symmetry 153 v1 v2 v4 v2 Figure 5.14. A skewed rectangle. Vertices v1 and v3 are more likely to be missed by the continuity algorithm than vertices v2 and v4. If v2 and v4 are found, v1 and v3 can be found using symmetry information. is preferred over asymmetry, the hypothesis of Figure 5.7(c) is better than that of Figure 5.7(b). Although the same idea could be applied to Figure 5.3, it would not be needed. Even if the tangent threshold were low enough to force the center vertex to change the T label, the junction label found would be an arrow. An arrow junction is not legal at this position, as a look at the junction catalog in Figure 4.6 will verify. Symmetry could still be used to select the hypothesis. Figure 5.6 shows another place where symmetry could be useful. In this image, a corner was not detected along curve c5. Once the missing edge has been reinstated, as is shown in Figure 5.7(c) and discussed above, this missing vertex could be detected using the symmetry of the region bounded by v29, v38, and v31. The problem occurs frequently in images. Figure 5.14 shows a skew symmetric rectangle. The vertices at v2 and v4 are much sharper than those at v1 and v3. Since the continuity algorithm essentially uses a curvature threshold to break corners, vertices v1 and v3 are more likely to be missed. Line labeling can also provide useful information to the symmetry module. In the case of occlusion, as in Figure 5.15, line labeling can inform the symmetry module the curve C] is an occluding curve that is part of another region and should not be used in 154 c1 c3 c4 Figure 5.15. If curve cl is labeled as a limb edge, it can be separated from c2, c3 and c4 when symmetry is being analyzed. the determination of whether the region is symmetric or not. There are cases where this cannot be helpful, as in Figure 5.16. In this instance, the occlusion has blocked most of the symmetric pairs in the curve from being seen. It is very unlikely that SYMAN would agree that the region bounded by cl, c2, c3, c4 and c5 is symmetric. 5.2.2 Shape from Shading Shape from shading uses the detected contours in the image, a model of the illumina- tion, and the image intensities and reconstructs the surface normals. The integration of shape from shading with line labeling was done in [70]. In this work, shading in- formation is combined with a complete, correct, labeled line drawing to give surface normals. The algorithm is tested only on synthetic images. Shape from shading could be used in at least two different ways to enhance this system. Shape from shading can be used to improve the performance of this system on three tangent and curvature-L junctions. This is one place where the performance of this system is lacking. Since limb edges have a signature that is different from jump edges, as was shown in [70], they may be identified before junction typing begins. If this can be done, many of the problems with curvature-L junctions and three tangent 155 c1 c5 PA Figure 5.16. The object on the left has been occluded by the cube. It is unlikely that the region bounded by c1, c2, c3, c4, and c5 will be seen flagged as symmetric. junctions will be eliminated. A vertex with three incident one of which is a limb edge must be labeled as a three tangent junction. A vertex with two incident edges one of which is a limb edge must be labeled as a curvature-L junction. Another advantage of the early labeling of limb edges would be that the detection ' of curvature-L junctions could be improved. Figure 4.16 shows an image of a right circular cylinder discussed earlier. Both of the curvature-L junctions on curve c2 were not detected. Shape from shading might be able to discern that the ends of this curve are limb edges while the central portion is a jump edge. This information could be used directly to break the curve, or could be sent to the continuity algorithm. The continuity algorithm could either be asked to return the single most likely breakpoint by first computing a solution without breaks, as was discussed in Section 5.1.4. This solution could then be searched for the single point with the highest energy in a small area around the place where shape from shading says that the limb edge changes to a jump edge. Another approach is to have the continuity algorithm drop the threshold for breaks so break points would be more likely. 156 Table 5.5. The computational complexity of the algorithms discussed in this thesis. V is the number of vertices. A is the number of arcs. N is the number of pixels in the image. M is the number of edge points found by the weak membrane edge detector. p 'ty W Mem N tinuity M ertex roximity Vertex M ' Line ' 0 V V A more mundane use of shape from shading would be to provide the normal directions that would be necessary for a more complete 2.5 dimensional sketch. This could be done as in [70], after the line labeling has been successfully performed. 5.3 Computational Complexity, and Timings The computational complexity of the algorithms discussed in this thesis are sum- marized in Table 5.5. Table 5.6 gives the computational complexity of the repair knowledge sources. All computational complexities are worst case. The computa- tional complexity of the modified line labeling algorithm is exponential in theory, although it is much more efficient when a background region is present. As Table 4.8 shows, when a background region is given there are unique labelings for most outside junctions. These unique outside labelings propagate to unique labelings for inside junctions in many cases. This improves the average performance of the modified line labeling algorithm, but does not change the worst case complexity as given in Table 5.5. i The algorithms discussed in this thesis were implemented in Lisp, C, and Fortran on a Sparcserver 690MP. The majority of the implementation was done in Allegro Common Lisp version 3.1.4, using the GBB blackboard system shell version 1.2. The 157 Table 5.6. The computational complexity of the repair knowledge sources. AA is the number of vertices in the adjacent regions which have have two incident arcs. N are the number of pixels in the combined curves. vertices ' arcs p tom vertices extra vertices Grou vertices B vertices lisp was compiled before timings were done. C was interfaced to Lisp to improve the execution speed of the program. Most array operations were implemented in C, as was the energy minimization in the continuity algorithm. Fortran was only used for the program that performs the Voronoi tesselation. The run time of the edge detection, thinning and connected components algo- rithms are all independent of the contents of the image, run times are given for one image (image 2 in the test set in Appendix B). The weak membrane edge detection ran in 214 seconds (3.5 minutes). The thinning took .7 seconds. The connected compo- nents algorithm took 10.4 seconds. Table 5.7 shows the run time for the blackboard system after weak membrane edge detection, thinning, and connected components have been performed. The run times of these algorithms were found using the time command in Lisp. All times exclude garbage collection. 158 Table 5.7. The run times for the sample images from Appendix B in minutes. Image Time 1 6.3 2 3.2 3 5.7 4 12.8 5 3.9 6 3 7 15 8 3 9 2.6 10 1.7 11 2.8 12 2.9 13 3.9 14 4.9 15 7.9 16 9.5 17 7.2 18 6.5 19 10 20 14.6 21 4.6 22 7.2 CHAPTER 6 Conclusions and Future Work This thesis described the POLL system for obtaining labeled line drawings from single intensity images using an integrated blackboard system. The processing in the POLL system is consistent with Lowe’s additions to the Marr object recognition paradigm. The modules that were integrated in POLL are weak membrane edge detection, a continuity grouping algorithm, and line labeling. Suggestions of other modules to be included in an enhanced system include shape from shading, and symmetry detection. The current system is able to generate an initial line drawing from an intensity image in the domain of piecewise smooth objects. This initial line drawing is obtained by integrating edge detection, a continuity grouping algorithm, and junction grouping. The initial line drawings are then analyzed using a modified line labeling algo- rithm that gives diagnostic information to the global blackboard. The diagnostic information gives a location in the image where an inconsistency in the line labeling occurred. The modified line labeling algorithm can find errors such as missing edges, improperly typed vertices, and missing phantom junctions. In almost all cases, the error in the initial line labeling was diagnosed at the proper junction. In a small number of cases, the error was at an adjacent junction. In no case was the actual error more than one junction away from the diagnosed location. A collection of repair strategies were created that can modify the initial line la- 159 160 beling in search of a more consistent solution. Repair strategies for missing edges, improperly typed vertices, and missing phantom junctions were implemented. Repair strategies for improving the grouping of junctions, and removing extraneous junctions were discussed. Perfect line drawings remain an elusive goal in computational vision. This system represents another step towards reaching this goal. This system is unique in that it uses piecewise smooth objects, instead of the more restrictive planar or quadric surface models that are often used, particularly in range imagery as in [37, 62]. This enhances the generality of the system, but also greatly increases the difficulty of the problem. Some images of objects with non—quadric surfaces were used to test the system. The system also works without reference to a set of object models. This distinguishes the system from efforts like the Schema blackboard system [31], where object models were extensively used. Another difference between this work and the Schema system is the scale of the project. The Schema system was developed by a research group over a period of years, unlike the system in this thesis that was developed by a single researcher. The POLL system correctly processes sixteen of the twenty-two images presented in the test bed of images in Appendix B. Of the remaining six, five have limb edges mistaken for jump edges and one was not processed well. A suggestion for improving the performance of the POLL system on limb edges was discussed in Section 5.2.2. Mohan and Nevatia have an impressive system for perceptual organization [75], as described in Section 1.1.2. The Mohan and Nevatia system performs perceptual organization in a more complex environment than the POLL system described in this thesis. The objects imaged have surface markings and specularities which POLL does not permit since Malik’s curvilinear line labeling scheme cannot interpret surface markings. The Mohan and Nevatia system does not provide a line labeling, or any other 3D information from the perceptual organizations, unlike POLL. An interesting result of the Mohan and Nevatia system is that individual objects are segmented, 161 even in cluttered environments. This segmentation would be particularly useful in the present system since it would permit the floating object heuristic to be used on all objects in the scene individually. This would correspond to the objects each being floating in space, but occluding each other. 6.1 Contributions of The Thesis The contributions of this thesis fall into three categories. In the first category are the early decisions regarding the selection of the framework and modules, and the combination of modules that create the original interpretation of the image data. The second category of contributions consist of contributions to the modules that were integrated. The final category is a set of strategies used to analyze the diagnostic information returned by the line labeling algorithm to improve the line drawing. 0 Establishment of the Framework for Integrated Perceptual Organization 0 Selection of blackboard system framework 0 Combination of initial modules to provide the first figure hypothesis 0 Contributions to Modules 0 Development of the continuity module 0 Contributions to line labeling module 0 Definition of the floating object heuristic a Modification of line labeling algorithm to diagnose line labeling prob- lems 0 Integration methods for repairing diagnosed junction problems 0 Method for altering junction types that are close to threshold 162 o Insertion of necessary phantom vertices into line drawings 0 Strategy for placing missing edges in a line labeling 6.2 Future Work This system was designed to be expanded into a larger and more robust system as additional modules are integrated. The potential benefits of incorporating shape from shading and symmetry detection were discussed at length in Section 5.2. Of these two modules, the shape from shading module holds the most promise for rapid improvement of the system since it will improve the handling of three tangent and curvature-L junctions that are not well handled in the present system. It would also be interesting to integrate a second type of edge detection to ameliorate some of the difficulties with the weak membrane edge detection, in particular, its lack of smoothness of edges. Canny edge detection produces very smooth edges, and is more compatible with the continuity algorithm. Corner and vertex detection can be performed on a gray scale image, as done for trihedral vertices in [43]. This could improve the system by providing vertex and corner information from a second independent source. There is also some interesting work to be done in the control of the knowledge sources for fixing line labeling problems. A small variety of strategies for improving line labelings were demonstrated in this work. As the system expands, it is likely that the number of strategies for improving the line labeling will also expand. This will require a better method for choosing which of the strategies should be selected to try to fix the line labeling. There is some interesting work going on in the parallel implementation of black- board systems [54]. Vision is the perfect domain for parallelism because of the high computational demands and potential for massive parallelism in visual problems. APPENDICES APPENDIX A Parameters Many of the modules and algorithms used in this thesis have parameters. These parameters were set empirically, after experimentation with the test bed of images which will be shown in Appendix B. In order to make this work reproducable, all of the parameters used are given in this appendix. The parameters are in the order of first discussion. Section 3.2 describes the continuity grouping algorithm of Trytten and Tuceryan. The parameters for this algorithm fall into three categories: parameters for the energy formulation of Equation 3.2, parameters for Canny edge detection, and the multi- grid relaxation parameters. These parameters and their physical interpretations are summarized in Table A.1. In Section 3.3.1, edge detection was performed using the weak membrane edge model and the GNC algorithm for energy minimization. The parameters for the edge detection are shown in Table A2. Curve hypotheses were created for the boundary of regions that were sufficiently large, the specific numerical definition of large is given in Table A2 Edgels that were too short were removed from consideration. The parameter for this is also given in Table A2. Section 3.3.2 also used a variety of parameters. The circular arc fitting parameters were used to determine how many pixels near the endpoint of a curve should be used 163 164 Table A.1. The parameters for the continuity grouping algorithm of Section 3.2. A1 rit arameter anny ° 0 size tinuity ratio Multi-grid relaxation number of resolutions num iterations c maXimum iterations size Table A.2. The parameters from Section 3.3.1 Parameter num iterations Curve creation Small ' 20 ° urve creation 5 ' 165 Table A.3. The parameter values used in Section 3.3.2. arameter ertex proximity ' arc tting minimum max1mum ear tting minimum maXimum p' Vertex typing t t 'us curvature more 1000 us curvature to us curvature for approximating the curvature and tangent direction. The curvature values were found to be unreliable, and were assigned to three classes: straight, curved and very curved. These symbolic curvatures were used in all comparisons. In Section 5.1.3, a parameter for the number of pixels that separate a line from an arc is discussed. The value for this parameter is three pixels. APPENDIX B Test Bed of Images The POLL system was tested on twenty-two intensity images captured using a Mac- intosh Il-fx in the PRIP Laboratory of Michigan State University. These images are labeled and shown in Figure B.1, Figure B.2, Figure B.3, and Figure B4. The variety and complexity of these images is characterized by examining the number of objects in each imaged scene, as well as the number and type of the surfaces, junctions, and arcs that are visible. Table B.l gives the raw data on the number of objects, junctions, arcs, and surfaces that are visible in the image. Table B.2 shows the number of each type of junction visible in the images. Table B.3 shows the number of arcs of each interpretation that are visible in the image. The surface types are shown in Table B.4. Although planar surfaces are also quadric, only surfaces that are quadric and not planar will be listed as quadric. The surfaces that are labeled as planar in Table B4 are often not absolutely planar. Most of the side surfaces of the bath tub toys used in Images 1 to 18 have shrunk and are curved. These surfaces were still labeled as planar in Table B4. 166 167 Figure B.l. (3.) Image 1. (b) Image 2. (c) Image 3. (d) Image 4. (6) Image 5. (f) Image 6. 168 (0 Figure B.2. (a) Image 7. (b) Image 8. (c) Image 9. (d) Image 10. (e) Image 11. (f) Image 12. 169 Figure B.3. (a) Image 13. (b) Image 14. (c) Image 15. (d) Image 16. (e) Image 17. (f) Image 18. 170 Figure B.4. (a) Image 19. (b) Image 20. (c) Image 21. (d) Image 22. 171 Table B.1. A characterization of the complexity of the images used in this thesis. jects unctions 13 7 12 27 5 NwHi—HHHHH i—IHQDOONQCHAODMH Hocoooqczcnuxw HO mgfigfigathmqfia “wcqaawfiqofigo wwwwqcoouxwuwwwwawwgpwm «mosaic 1 1 2 3 1 1 2 1 l 1 l 12 1 l 2 3 3 3 2 1 2 1 1 N M 172 Table B.2. A characterization of the visible junctions in the image used in this thesis. TM stands for terminal junctions. P are phantom junctions. CL are curvature L junctions. L are L junctions. 3T are three tangent junctions. A are arrow junctions. Y are Y junctions. T are T junctions. ‘ Image Class 1 Class 2 Class 3 TM PCLL 3TA YT 1 0 10 4.0 5 21 2 | o 0 0 3 0 310 3 0 0 0 6. 0 4 0 2 4 0 [[0 010 0 7 2 8 5 0 0 0 3 0 2 0 0 6 0 0 0 4|0 2 0 0 7 0 0 010|0 416 8 0 0 0 3|0 310 9 0 “a 0 3|0 2 0 0 10 0 0 2 0|2 0 0 0 11 | 0 0 0 2|0 310 12 | |0 0 3|0 310 13 | |0 0 3|0 310 14 | |01 3|12 0 3 15 | 0 0 0 5|0 7 3 6 16 | 0 0 0 6|0 9 3 4 17 |n|o 1 4|16 3 5 18 |u|1o em 5 2 4 19 | |0 2 0|0 0 0 4 20 |n|0 2 0|2 0 0 2 21 |n|o 2 0|2 0 0 2 22 |n|o 2 0|0 0 0 2 Total |n| 2 12 78| 8 70 2149 173 Table B.3. A characterization of the visible arcs in the images used in this thesis. Cvx stands for convex edges. Ccv is concave edges. 1: 3 0046:0116-me r—ti—Ito HO 12 NHHHHHHU—i ccmqmmpw tot—t .— 591—4 H acaao‘wo‘aao’m‘ouah. N H QUOQOHOCHOOONOOOOOODOO N N ficwwcqooowwwwwwwmwwmmwmg 174 Table B4. A characterization of the visible surfaces in the images used in this thesis. mNOQCflthDMh-b HCO O MMHl—‘HHHHHH Hocoooqacnm-wm 4 3 4 9 2 2 6 3 2 1 11 3 3 3 3 9 9 8 6 0 0 l 1 {\D [Q O’COOOHHOOHOOOHOOOOCHOOH QMNWNOOCOOOOODOOCOOOOOO 00 IO BIBLIOGRAPHY BIBLIOGRAPHY [1] A. L. Abbott and N. Ahuja. Surface reconstruction by dynamic integration of focus, camera vergence, and stereo. In Proceedings of the Second International Conference on Computer Vision, pages 532—543, 1988. [2] N arendra Ahuja. Dot pattern processing using Voronoi neighborhoods. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-4(3):336- 343, May 1982. [3] J. Aloimonos. Unification and integration of visual modules: An extension of the Marr paradigm. In Proceedings of the DARPA Image Understanding Workshop, pages 507-551, 1989. [4] J. Aloimonos and A. Basu. Combining information in low-level vision. In Proceedings of DARPA Image Understanding Workshop, pages 862-906, 1988. [5] J. Aloimonos and D. Shulman. Integration of Visual Modules. Academic Press, San Diego, California, 1989. [6] D. H. Ballard. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognition, 13:111-122, 1981. [7] D. H. Ballard and C. M. Brown. Computer Vision. Prentice Hall, New Jersey, 1982. [8] H. B. Barlow and B. C. Reeves. The versatility and absolute efficiency of detecting mirror symmetry in random dot displays. Vision Research, 19:783- 793, 1979. [9] HO. Barrow and J .M. Tenenbaum. Interpreting line drawings as three dimen- sional surfaces. Artificial Intelligence, 17:47-75, August 1981. [10] P. J. Besl and R. C. Jain. Threedimensional object recognition. Computing Surveys, 17(1):75—l45, March 1985. 175 176 [11] I. Biederman. Recognition-by-components: A theory of human image under- standing. Psychological Review, 94(2):115-147, 1987. [12] A. Blake. Comparison of efficiency of deterministic and stochastic algorithms for visual reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence, ll(1):2—12, January 1989. [13] A. Blake and A. Zisserman. Visual Reconstruction. The MIT Press, Cambridge, Massachusetts, 1987. [14] A. Blake, A. Zisserman, and G. Knowles. Surface descriptions from stereo and shading. In B. K. P. Horn and M. J. Brooks, editors, Shape from Shad- ing, chapter 2, pages 29—52. Massachusetts Institute of Technology, Cambridge, Massachusetts, 1989. [15] D. Blostein. Recovering the Orientation of Textured Surfaces in Natural Scenes. Ph.D. thesis, University of Illinois, 1987. [16] D. Blostein and N. Ahuja. Shape from texture: Integrating texture-element extraction and surface estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, ll(12):l233—1251, December 1989. [17] H. Blum. A transformation for extracting new descriptors of shape. In W. Wathen-Dunn, editor, Models for the Perception of Speech and Visual Form, pages 362—380. MIT Press, Cambridge, Massachusetts, 1967. [18] M. Brady and H. Asada. Smoothed local symmetries and their implementation. International Journal of Robotics Research, 3(3), 1984. [19] J. Brolio, B. A. Draper, J. R. Beveridge, and Allen R. Hansen. ISR a database for symbolic processing in computer vision. Computer, pages 22—30, Dec 1989. [20] R. A. Brooks. Symbolic reasoning among 3 D models and 2 D images. Artificial Intelligence, 17:285-348, 1981. [21] R. A. Brooks. Model-based 3d interpretations of 2d images. IEEE' Transactions on Pattern Analysis and Machine Intelligence, PAMI-5zl40—150, 1983. [22] L. G. Brown and H. Shvaytser. Surface orientation from projective foreshort- ening of isotropic texture autocorrelation. In Proceedings of IEEE Computer Vision and Pattern Recognition Society, pages 510—515, Ann Arbor, Michigan, 1988. 177 [23] J. B. Burns, A. R. Hanson, and E. M. Riseman. Extracting straight lines. In M. A. Fischler and O. Firschein, editors, Readings in Computer Vision, Issues, Problems, Principles, and Paradigms, pages 180—183. Morgan Kaufmann, 1987. [24] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(6), November 1986. [25] I. Chakravarty. A generalized line and junction labeling scheme with applica- tions to scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2):202-205, April 1979. [26] I. Chakravarty. The use of Characteristic Views as a Basis for Recognition of Three-Dimensional Objects. Ph.D. thesis, Rensselaer Polytechnic Institute, Troy, New York, 1982. [27] SW. Chen. SD Representation and Recognition Using Object Wings. Ph.D. thesis, Michigan State University, East Lansing, Michigan, 1989. [28] R. T. Chin and C. R. Dyer. Model-based recognition in robot vision. Computing Surveys, 18(1):67-108, March 1986. [29] M. B. Clowes. On seeing things. Artificial Intelligence, 2:79—116, 1971. [30] R. Deriche. Using Canny’s criteria to derive a recursively implemented optimal edge detector. International Journal of Computer Vision, 1:167—187, 1987. [31] B. A. Draper, R. T. Collins, J. Brolio, A. R. Hanson, and E. M. Riseman. Issues in the development of a blackboard-based schema system for image understand- ing. In R. Engelmore and T. Morgan, editors, Blackboard Systems, chapter 8, pages 189-218. Addison Wesley, 1988. [32] R. Engelmore and T. Morgan. Blackboard Systems. Addison Wesley, New York, 1988. [33] L. D. Erman, F. Hayes-Roth, V. R. Lesser, and D. R. Reddy. The hearsay—II speech-understanding system: Integrating knowledge to resolve uncertainty. In R. Engelmore and T. Morgan, editors, Blackboard Systems, chapter 3, pages 31—86. Addison Wesley, 1988. [34] G. Falk. Interpretation of imperfect line data as a three dimensional scene. Artificial Intelligence, 3:101—144, 1972. 178 [35] F. P. Ferrie and M. D. Levine. Where and why local shading works. IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 11(2):l98-206, February 1989. [36] P. J. Flynn and A. K. Jain. On reliable curvature estimation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 110-116, San Diego, California, June 1989. [37] Patrick J. Flynn. CAD-Based Computer Vision: Modeling and Recognition Strategies. Ph.D. thesis, Department of Computer Science, Michigan State University, 1990. [38] R. T. Frankot and R. Chellappa. A method for enforcing integrability in shape from shading algorithms. In B. K. P. Horn and M. J. Brooks, editors, Shape from Shading, chapter 5, pages 89—122. The MIT Press, 1989. [39] S. A. Friedberg. Finding axes of skewed symmetry. Computer Vision, Graphics, and Image Processing, 34:138—155, 1986. [40] K. Q. Gallagher, D. D. Corkill, and P. M. Johnson. be reference manual. Technical Report 88-66, University of Massachusetts at Amherst, Amherst, Massachusetts, July 1988. [41] E. B. Gamble, D. Geiger, T. Poggio, and D. Weinshall. Integration of vision modules and labeling of surface discontinuities. IEEE Transactions on Systems, Man, and Cybernetics, 19(6):1576-1581, November/December1989. [42] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(6):721-741, November 1984. [43] G. Giraudon and R. Deriche. On corner and vertex detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 650—655, Mauii, Hawaii, 1991. [44] R. C. Gonzalez and P. Wintz. Digital Image Processing. Addison-Wesley Pub— lishing Company, Reading, Massachusetts, 2 edition, 1987. [45] W. E. L. Grimson and D. P. Huttenlocher. On the sensitivity of the Hough transform for object recognition. In Proceedings of the Second International Conference on Computer Vision, pages 700-706, Miami, Florida, 1988. 179 [46] A. D. Gross. Shape from a symmetric universe. Technical Report CUCS—065- 90, Columbia University, New York City, New York, 1990. [47] A. D. Gross and T. E. Boult. Analyzing skewed symmetries. Technical Report CUCS—064-90, Columbia Univerisity, New York City, New York, 1990. [48] B. Hayes-Roth, B. Buchanan, O. Lichtarge, M. Hewett, R. Altman, J. Brinkley, C. Cornelius, B. Duncan, and O. Jardetzky. PROTEAN: Deriving protein structure from constraints. In R. Engelmore and T. Morgan, editors, Blackboard Systems, pages 417-431. Addison Wesley, 1988. [49] B. K. P. Horn. Obtaining shape from shading information. In Shape from Shading, pages 123-173. The MIT Press, Cambridge, Massachusetts, 1989. [50] P. V. C. Hough. Method and means for recognizing complex patterns. U. S. Patent, no. 3069654, 1962. [51] D. A. Huffman. Impossible objects as nonsense sentences. In B. Meltzer and D. M. Michie, editors, Machine Intelligence, volume 6, pages 295-323. Edin- burgh University Press, 1971. [52] D. P. Huttenlocher and S. Ullman. Object recognition using alignment. In Proceedings of the First International Conference on Computer Vision, pages 102-111. IEEE, Jun 1987. [53] J. Illingworth and J. Kittler. A survey of the Hough transform. Computer Vision, Graphics, and Image Processing, 44:87-116, 1988. [54] V. Jagannathan, R. Dodhiawala, and L. S. Baum. Blackboard Architectures and Applications. Harcourt Brace Jovanovich, Boston, 1989. [55] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, New Jersey, 1988. [56] Y. C. Jan and R. T. Chin. Shape from texture using the Wigner distribution. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 515-523, Ann Arbor, Michigan, June 1988. [57] B. Jenkins. Redundancy in the perception of bilateral symmetry in dot textures. Perception and Psychophysics, 32(2):171-177, 1982. [58] T. Kanade. A theory of origami world. Artificial Intelligence, 13:279-311, 1980. 180 [59] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. International Journal of Computer Vision, pages 321-331, 1988. [60] B. Kim and P. Burger. Calculation of surface position and orientation using the photometric stereo method. In Proceedings of IEEE Computer Vision and Pattern Recognition Society, pages 492-498, Ann Arbor, Michigan, 1988. [61] K. Koffka. Principles of Gestalt Psychology. Harcourt Brace, New York, 1935. [62] Greg C. Lee. Reconstruction of Line Drawing Graphs for Objects with Quadric Surfaces using Wing Features. Ph.D. thesis, Michigan State University, East Lansing, Michigan, 1992. I [63] M. D. Levine. Vision in Man and Machine. Computer Engineering Series. McGraw Hill, New York, 1985. [64] F. Leymarie and M. D. Levine. Simulating the grassfire transform using an active contour model. IEEE Transactions on Pattern Analysis and Machine Intelligence, l4(1):56—76, January 1992. [65] M. Leyton. A process—grammar for shape. Artificial Intelligence, 34(2):213—247, March 1988. [66] D. G. Lowe. Three—dimensional object recognition from single two-dimensional images. Artificial Intelligence, 31:355—395, 1987. [67] David G. Lowe. Perceptual Organization and Visual Recognition. Kluwer Aca- demic Publishers, Boston, 1985. [68] A. K. Macworth. Interpreting pictures of polyhedral scenes. Artificial Intelli- gence, 4:121—137, 1973. [69] J. Malik. Interpreting line drawings of curved objects. International Journal of Computer Vision, 1:73-103, 1987. [70] J. Malik and D. Maydan. Recovering three—dimensional shape from a single image of curved objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(6):555-566, June 1989. [71] G. Marola. On the detection of the axes of symmetry of symmetric and almost symmetric planar images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(1):]04-108, January 1989. 181 [72] D. Marr. Vision. W.H. Freeman and Co., San Francisco, 1982. [73] J. D. McCafferty. Human and Machine Vision, Computing Perceptual Organi- zation. Ellis Horwood Series in Digital and Signal Processing. Ellis Horwood, New York, 1990. [74] R. Mohan and R. Nevatia. Using perceptual organization to extract 3D structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI—ll:1121—1139, November 1989. [75] R. Mohan and R. Nevatia. Perceptual organization for scene segmentation and description. IEEE Transactions on Pattern Analysis and Machine Intelligence, l4(6):616-635, June 1992. [76] H. Moravec. Sensor fusion in certainty grids for mobile robots. AI Magazine, pages 61-74, Summer 1988. [77] Sateesha G. Nadabar. Markov Random Field Contextual Models in Computer Vision. Ph.D. thesis, Michigan State University, East Lansing, Michigan, 1992. [78] M. Nagao, T. Matsuyama, and H. Mori. Structural analysis of complex aerial photographs. In R. Engelmore and T. Morgan, editors, Blackboard Systems, chapter 9, pages 189—218. Addison Wesley, 1988. [79] V. S. Nalwa. Line—drawing interpretation: A mathematical framework. Inter- national Journal of Computer Vision, 2:103-124, 1988. [80] V. S. Nalwa. Line-drawing interpretation: Bilateral symmetry. IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 11(10):1117—1120, October 1989. [81] R. Navatia and T.O. Binford. Description and recognition of curved objects. Artificial Intelligence Journal, 8:77-98, 1977. [82] H. P. Nii, E. A. Feigenbaum, J. J. Anton, and A. J. Rockmore. Signal-to—symbol transformation: HASP/SIAP case study. In R. Engelmore and T. Morgan, editors, Blackboard Systems, chapter 6, pages 135-157. Addison Wesley, 1988. [83] T. Pavlidis. Why progress in machine vision is so slow. Pattern Recognition Letters, 13:221-225, 1992. 182 [84] T. Pavlidis and C. Liow. Integrating region growing and edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-12, March 1990. [85] G. Pearson. Mission planning within the framework of the blackboard model. In R. Engelmore and T. Morgan, editors, Blackboard Systems, chapter 21, pages 433—442. Addison Wesley, New York, 1988. [86] A. P. Pentland. Perceptual organization and the representation of natural form. Artificial Intelligence, 28:293-331, 1986. [87] A. P. Pentland. Local shading analysis. In B. K. P. Horn and M. J. Brooks, editors, Shape from Shading, chapter 15, pages 443-488. Massachusetts Institute of Technology, Cambridge, Massachusetts, 1989. [88] P. Perona and J. Malik. Scale space and edge detection using anisotropic dif- fusion. In Proceedings of the IEEE Computer Society Workshop on Computer Vision, pages 16—22, Miami, Florida, 1987. [89] T. Poggio, V. Torre, and C Koch. Computational vision and regularization theory. Nature, 317:638-643, 1985. [90] J. Ponce. Ribbons, symmetries, and skewed symmetries. In Proceedings of the DARPA Image Understanding Workshop, pages 1074—1079, Cambridge, Mas- sachusetts, 1988. [91] J. Ponce. On characterizing ribbons and finding skewed symmetries. Computer Vision, Graphics, and Image Processing, 52:328—340, 1990. [92] F. P. Preparata and M. I. Shamos. Computational Geometry: An Introduction. Springer-Verlag, 1985. [93] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 1988. [94] Narayan S. Raja. Obtaining Generic Parts from Range Data Using a Multi- view Representation. Ph.D. thesis, Michigan State University, East Lansing, Michigan, 1992. [95] V. S. Ramachandran. Perceiving shape from shading. Scientific American, pages 76—83, August 1988. 183 [96] K. Rao. Shape Description from Sparse and Imperfect Data. Ph.D. thesis, University of Southern California, Los Angeles, California, December 1988. [97] Irvin Rock. The Logic of Perception. The MIT Press, Cambridge, Mas- sachusetts, 1982. [98] A. Rosenfeld. Axial representations of shape. Computer Vision, Graphics, and Image Processing, 33:156—173, 1986. [99] A. Rosenfeld and A. C. Kak. Digital Picture Processing. Academic Press, New York, 1976. [100] P. L. Rosin and G. A. W. West. Detection and verification of surfaces of rev- olution by perceptual grouping. Pattern Recognition Letters, 13:453-461, June 1992. [101] T. Sakai, M. Nagao, and H. Matsushima. Extraction of invariant picture sub- structures by computer. Computer Graphics and Image Processing, 1:81—96, 1972. [102] S. Sarkar and K. L. Boyer. A highly efficient computational structure for percep- tual organization. Technical Report SAMPL—90—06, The Ohio State University, November 1990. [103] E. Saund. Labeling of curvilinear structure across scales by token grouping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 257—263, Champaign, Illinois, 1992. [104] P. Suetens, P. Fua, and A. J. Hanson. Computational strategies for object recognition. ACM Computing Surveys, 24(1):5—61, March 1991. [105] K. Sugihara. Machine Interpretation of Line Drawings. The MIT Press Series in Artificial Intelligence. The MIT Press, Cambridge, Massachusetts, 1986. [106] D. Terzopoulos. Image analysis using multigrid relaxation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(2):129-139, March 1986. [107] D. Terzopoulos. Regularization of inverse visual problems involving discon- tinuities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(4):413-424, July 1986. [108] D. Terzopoulos, A. Witkin, and M. Kass. Symmetry-seeking models and 3D object recognition. International Journal of Computer Vision, 1:211-221, 1987. 184 [109] V. Torre and T. Poggio. On edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(2), March 1986. [110] D. A. Trytten and M. Tuceryan. Segmentation and grouping of object bound- aries using energy minimization. Technical Report CPS—90-07, Michigan State University, East Lansing, Michigan, December 1990. [111] D. A. Trytten and M. Tuceryan. Segmentation and grouping of object bound- aries using energy minimization. In IEEE Conference on Computer Vision and Pattern Recognition, pages 730-731, Mauii, Hawaii, June 1991. [112] M. Tuceryan. Extraction of Perceptual Structure in Dot Patterns. Ph.D. thesis, University of Illinois, Urbana-Champaign, Illinois, 1986. [113] D. Waltz. Understanding line drawings of scenes with shadows. In P. H. Win— ston, editor, Psychology of Computer Vision, pages 19—91. McGraw-Hill, New York, NY, 1975. [114] Max Wertheimer. Laws of organization in perceptual forms. In W. D. Ellis, editor, A Source Book of Gestalt Psychology. Harcourt Brace, New York, 1939. [115] A. P. Witkin and J. M. Tenenbaum. On the role of structure in vision. In J. Beck, B. Hope, and A. Rosenfeld, editors, Human and Machine Vision, pages 481—543. Academic Press, New York, 1983. "I[1|||l|l||||l|||||||lllf