swanky. . a - fif, {111-540}...- -:.... ._‘ wl~b .n. r: u, «“9 mu. v; r ‘r - 1 :J $.13? W .‘Y'v 4%??? ..». If": 13‘ “4;" 4": \‘ IV 3‘3» 3-???- .- 14‘ £91" aux viii: W iii hf} &k&( “I: « ‘f 'l.-.,~v'vu "$133535“ $21.. 33$" giL-u-. V VI}. . ‘f f . {5"}!‘1‘dflr 5. 1.35m. ”1 -‘~- 1‘ v‘ 3 «'33 J 1}; 3 .zft‘r- “ , ‘ 5‘. f . , 3??” § . . V2“,- 72 i m. "a? ’2?» a"; ) 2! t 12:;3". , 33¢: 3-way: -3- o - 2:5 :If‘h‘} ‘ x "=1 . 1 3:. m. 9‘- f. r ”Inf-ff; E u. .. b “.14. . 1' E L- ‘ ~. 3 Lu- “_W[4~ .3. ti. wfifl u‘ r. “mum “.33 -4 , .o-o "_'...v.- -m ":3;- ,1”? d‘ :' .3...- ., -.. ‘ 'v w w—ro ¢~ - Lemar Michigan State University .Ay‘b>.“ . . M'CWGAN STA WT 93 0060 III/III 3 This is to certify that the dissertation entitled TY LIBRARIES ”ill/.i/l’i/M 691 3-D REPRESENTATION AND RECOGNITION USING OBJECT WINGS presented by Sei-Wang Samuel Chen has been accepted towards fulfillment of the requirements for Ph.D. Major professor Date a 8 Jul? I967? MS U is an Aflirmatiwe Action/Equal Opportunity Institution degree in Computer Sc ience 042771 L PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before dete due. DATE DUE DATE DUE DATE DUE :7 MSU Is An Affirmative Action/Equal OpportunIty Institution 3-D REPRESENTATION AND RECOGNITION USING OBJECT WINGS By Sci-Wang Samuel Chen A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Computer Science 1989 (goo/(’1 ()9 ABSTRACT 3-D REPRESENTATION AND RECOGNITION USING OBJECT WINGS By Sci-Wang Samuel Chen A set of 2 l/Z-D primitives, called object wings, are introduced for representation and recognition of general 3-D rigid objects. The rudimentary philosophy of wing forma- tion is to integrate available contextual shape information of contour and surface into object features. We had several goals in mind when proposing this set of new features. The new features should be able to (I) handle general objects, (2) overcome imperfect feature extraction, (3) index models in a database and (4) compute object pose. In terms of this set of primitives, a theory concerning object representation, called wing represen- tation theory is proposed. There are 34 simple wings used as prototypes. A simple wing is defined as a triple including a pair of surface patches separated by a contour segment. Simple wings can be grouped into composite wings through non-accidental relationships such as proximity, similarity, connectivity, collinearity, curvilinearity, cotermination, parallelism and sym- metry. Both simple and composite wings, together with their spatial structures, can be used to construct internal models of real world objects. Spatial structures include unary and binary geometric properties of object wings. In terms of object wings, an associated computational framework of wing represen- tation theory is introduced in which techniques of geometric modeling, view generation and wing representation are developed. In geometric modeling, an objects is specified by a set of triangles whose union geometrically approximates the object shape. An object view can then be generated by projecting geometric model onto a plane perpendicular to a pro—defined viewing direction. A sphere whose surface forms a 2-D manifold of view- ing directions (viewpoints) is referred to as viewsphere. Object views generated accord- ing to the viewpoints of the viewsphere will be clustered into so-called aspects. The object views within an aspect possess the same topological structure which is defined in terms of the wing sets of views. An object will be represented by a set of aspects which are in turn sets of object views. Each object view is characterized by wing features and their spatial structures. In order to test the robustness of wing representations, the properties of uniqueness, stability, compatibility, terseness and locality are investigated. The uniqueness property requires that the representation of an object be distinguishable from representations of other objects. The stability property demands that small variations in the input have little influence on the final results. As for the requirement of compatibility, the object representations in a database should be comparable to the representations derived from real images. Object representation should also be terse enough to make processing efficient. Finally, locality enables the representation to cope with partial occlusion. A recognition procedure based on wing representation is introduced. This procedure is composed of four steps to achieve the purposes of indexing, consistent labeling, parameter estimation and decision. Wing features detected in an image are first used to index models in the database. A model test process is then invoked to verify candidate models. This process uses interpretation tree search to determine the correspondences between sensed and model features. Afterwards, a view transformation algorithm based on the technique of curve matching is performed to estimate the parameters of object poses. Object recognition is finally accomplished by clustering the set of possible poses. Experimental results support the claim that wing representations have the properties of uniqueness, stability, compatibility and locality, but not terseness. Further studying the tactics to solve the terseness problem will be needed. Experiments on recognition based on wing representation also reveal that wing features possess enough information to be of use for both indexing models and determining instance poses. In future research, we hope to extend wing representation theory and its associated techniques through compo- site wings to deal with more complex problems such as high level object representation and object segmentation in real images. To the memory of my parents for providing me with the best education, to my lovely wife Szu-Li for her support and encouragement, and to my children Zouw-Hu and Zouw-Zen for their wonderful love. ACKNOWLEDGEMENTS I wish to thank my major professor, George Stockman, for his help and guidance, both in thesis preparation and in his encouragement of my research efforts. Special thanks should also go to Professors Richard C. Dubes and Anil K. Jain, who provide excellent knowledge in pattern recognition, image processing and computer vision. I also thank professors Robert Falco and CY. Wang, who serve on my Ph.D. committee and provided me with valuable comments, based on which this thesis was established. I sin- cerely appreciate the discussions I had with Dr. Mihran Tuceryan on several problems in perceptual grouping and human vision. I would like to give my special thanks to Deborah T rytten, Patrick Flynn and Joe Miller for their kindly enthusiasm about my writting. Acknowledgement is due also to the other members of the Pattern Recognition and Image Processing Laboratory for diverse assistance: Dr. Nelima Shrikhande, Dr. Gong- zhu Hu, Dr. C.C. Chen, Farshid Farrokhnia, Sally Howden, Jezching Ton, Ming-Xia Man, Tim Newman, Sateesha Nadabar and G. Lee. My thanks also go to the system managers Dongyul Ra and David Marks for providing me with reliable research environ- ments. Acknowledgement should also be made of the financial support I received during my long stay at MSU from NSF grant IRI-8600371 and CDA-8806599, and from Depart- ment of Computer Science. Table of Contents Chapter 1 Introduction ............................................................................................ 1 1.1. Human Visual Processing ........................................................................ 1 1.2. Machine Vision Problems ........................................................................ 2 1.3. Overview of Problem Area ...................................................................... 3 1.4. Related Research on Object Modeling .................................................... 5 1.4.1. Psychological Evidence ........................................................................ . 6 1.4.2. Theory of Parts ...................................................................................... 7 1.4.3. Lump of Clay and Fractal Model .......................................................... 7 1.4.4. Recognition-By-Components ............................................................... 8 1.4.5. Conclusion on Part Representation ....................................................... 9 1.5. Related Background on Model-Based Recognition ................................ 10 1.6. Organization of the Thesis ....................................................................... 11 Chapter 2 Representation Theory: 2 1/2-D Object Wings ................................... 13 2.1. What are Wings ....................................................................................... 14 2.1.1. Complete Set ......................................................................................... 15 2.1.2. Proper Sets ............................................................................................ 17 2.2. Physical Meaning ..................................................... 17 2.3. Simple Wings ........................................................................................... 20 2.3.1. Extraction Rule ..................................................................................... 20 2.3.2. Structural Constraints on Wings ........................................................... 21 2.4. Composite Wings ..................................................................................... 24 2.5. Illustrating the Power of Object Wings in Recognition ........................... 26 2.6. Reconstructing Line Drawings from Wing Representations ................... 27 2.6.1. Reconstruction Algorithm ..................................................................... 28 2.6.2. Examples ............................................................................................... 29 2.6.3. Comments ............................................................................................. 30 2.7. Summary .................................................................................................. 32 Chapter 3 Wing Representation: Computational Framework ............................ 33 3.1. What is the Goal of Wing Representation Theory ................................... 34 3.2. How can W‘mg Representation Theory be Implemented ......................... 36 3.2.1. Geometric Modeling ............................................................................. 36 3.2.2. Augmenting Geometric Models ............................................................ 44 3.2.3. View Generation ................................................................................... 46 3.2.4. View Organization ................................................................................ 48 3.3. Summary .................................................................................................. 50 Chapter 4 Implementation of Wing Representation Theory ............................... 52 4.1. Model Building ........................................................................................ 53 4.1.1. Sensing System ..................................................................................... 53 4.1.2. Registration ........................................................................................... 54 4.1.2.1. Triangulation ...................................................................................... 55 4.1.2.2. Transformation and Inconsistency Reduction ................................... 57 4.1.2.3. Redundancy Removal ........................................................................ 59 4.1.2.4. Gap Connection ................................................................................. 61 4.1.2.5. Hole Filling ........................................................................................ 64 4.1.3. Discussion of Model Building .............................................................. 65 4.1.3.1. Limitations of the Sensing System .................................................... 66 4.1.3.2. Calibration Error Analysis ................................................................. 66 4.1.3.3. Transformation Error Analysis .......................................................... 69 4.1.3.4. Difficulties and Suggestions .............................................................. 72 4.2. Augmenting Geometric Models ............................................................... 74 4.3. Object Representation .............................................................................. 75 4.3.1. View Representation ............................................................................. 75 4.3.2. Views and Aspects ................................................................................ 78 4.3.3. Preliminary Discussion of Wing Representation .................................. 79 4.4. Summary .................................................................................................. 80 Chapter 5 Testing the Wing Representation ......................................................... 86 5.1. Uniqueness ............................................................................................... 87 5.1.1. Criteria of Uniqueness .......................................................................... 88 5.1.2. Distinct Wing Set, Appearance Frequency and Distinguished Wing Set ........................................................................... 89 5.2. Stability .................................................................................................... 91 5.2.1. View Resolution ................................................................................... 92 5.2.2. Map Resolution ..................................................................................... 94 5.2.3. View Variation ...................................................................................... 99 5.2.4. Convolution Masks ............................................................................... 103 5.3. Compatibility of Wing Features .............................................................. 107 5.4. Wing Detection from Real Images .......................................................... 109 5.4.1. Hoffman-Jain Technique ...................................................................... 110 5.4.2. Combined Signs of Directional Derivatives ......................................... 115 5.4.2.1. Computation of Directional Derivatives ............................................ 119 5.4.2.2. Synthetic Images ................................................................................ 120 5.4.2.3. Real Images ........................................................................................ 126 5.5. Summary .................................................................................................. 127 Chapter 6 Recognition Based on Wing Representation ....................................... 132 6.1. Object Recognition Techniques ............................................................... 133 6.1.1. Feature Correspondence ....................................................................... 134 6.1.2. Parameter Estimation ............................................................................ 135 6.1.3. Decision ................................................................................................ 136 6.2. The Definition of Problem and the Procedure of Model Test ................. 137 6.3. Indexing into Models and Views ............................................................. 138 6.4. Interpretation Tree Search ....................................................................... 141 6.4.1. Pruning Algorithm ................................................................................ 141 6.4.2. Estimating the Expected Number of Surviving Paths .......................... 144 6.4.3. Time Complexity .................................................................................. 149 6.5. View Transformation ............................................................................... 151 6.5.1. Curve Matching .................................................................................... 151 6.5.2. Experimental Results ............................................................................ 153 6.6. Clustering Procedure ................................................................................ 158 6.6.1. Radius of Cluster .................................................................................. 160 6.6.2. Experimental Results and Discussion ................................................... 162 6.7. Summary .................................................................................................. 166 Chapter 7 Summary, Conclusions and Future Extensions .................................. 168 7.1. Summary and Conclusions ...................................................................... 168 7.2. Contributions ........................................................................................... 170 7.3. Future Work ............................................................................................. 173 Appendix A Computational Methods ....................................................................... 176 Appendix B Construction of Viewspheres ................................................................ 179 Appendix C Proof of Theorem 4.1 ........................................................................... 181 Appendix D Criteria of Uniqueness .......................................................................... 185 Appendix E Interpretation Tree Search .................................................................... 187 Appendix F Curve Matching .................................................................................... 190 Appendix G Reconstructing Line Drawings from Wing Representations ............... 194 References ................................................................................................................ 213 List of Tables Table 2.1 Set of Wing Types for Practical Implementation ...................................... 18 Table 5.1 Wings and Appearance Frequencies (320 Views,96x96) .......................... 90 Table 5.2 Wings and Appearance Frequencies (240 Views,128x128) ...................... 93 Table 5.3 Wings and Appearance Frequencies (320 Views,128x128) ...................... 95 Table 5.4 Numbers of Distinct Wing Primitives ........................................................ 96 Table 5.5 Numbers of Aspects Using Different Map Resolutions ............................. 97 Table 5.6 Lists of Distinct Wing Primitives of Views in Figure 5.3 ......................... 100 Table 5.7 Lists of Distinct Wing Primitives of Views in Figure 5 .4 ......................... 102 Table 5.8 Lists of Distinct Wing Primitives of Views in Figure 5.5 ......................... 104 Table 6.1 The Number of Paths at Levels using Edge Evidence ............................... 147 Table 6.2 The Number of Paths at Levels using Face Evidence ................................ 148 Table 6.3 The Number of Paths at Levels using Mixed Evidence ............................. 149 Table 6.4 The Probabilities of Combined Binary Constraints ................................... 149 Table 6.5 Matching Results Using Data Set 1 ........................................................... 155 Table 6.6 Matching Results Using Data Set 2 ........................................................... 155 Table 6.7 Matching Results Using Data Set 3 ........................................................... 156 Table 6.8 Matching Results Using Data Set 4 ........................................................... 156 Table 6.9 Matching Results Using Data Set 5 ........................................................... 157 Table 6.10 Thresholds and Cluster Radiuses for Data Set 1 ...................................... 157 Table 6.11 Thresholds and Cluster Radiuses for Data Set 2 ...................................... 158 Table 6.12 Thresholds and Cluster Radiuses for Data Set 3 ...................................... 158 Table 6.13 Thresholds and Cluster Radiuses for Data Set 4 ...................................... 159 Table 6.14 Thresholds and Cluster Radiuses for Data Set 5 ...................................... 159 Table 6.15 Match Errors using Different Rotation Radiuses ..................................... 163 Table 6.16 Sizes of Cluster Corresponding to Results of Table 6.4 .......................... 163 Table 6.17 Match Errors using Different Translation Radiuses ................................ 164 Table 6.18 Sizes of Cluster Corresponding to Results of Table 6.6 .......................... 164 List of Figures Figure 1.1 Examples of Jumbles of Objects .............................................................. 4 Figure 1.2 Diagram of A Presumed Recognition System .......................................... 5 Figure 2.1 Diagrammatical Definitions of the Koenderink Theorem ........................ 16 Figure 2.2 A Hill Appearing Just in Front of a Valley .............................................. 16 Figure 2.3 Synthetic Wing Primitives of a Cup and a Can ........................................ 19 Figure 2.4 A Bowl, a Half Grapefruit, and a Clam .................................................... 26 Figure 2.5 An Example Demonstrates the Reconstruction Algorithm ...................... 29 Figure 2.6 Reconstruction of an Accidental View of a Polyhedron .......................... 30 Figure 2.7 An Object with a Hole and Multi-Face Vertices ...................................... 30 Figure 3.1 The Overall Wing Representation Theory ............................................... 37 Figure 3.2 Examples of Geometric Models ............................................................... 38 Figure 3.3 An Example of the Stripe Network .......................................................... 39 Figure 3.4 The Intermediate Results of the Triangulation Process ........................... 40 Figure 3.5 An Inconsistency Phenomenon Between Adjacent Networks ................. 41 Figure 3.6 The Procedure of Redundancy Removal ....................... 43 Figure 3.7 The Results of Gap Connection ................................................................ 43 Figure 3.8 A Unit Viewsphere and its Viewpoints .................................................... 47 Figure 3.9 All Generic Aspects of a Bowl ................................................................. 50 Figure 3.10 The Octants of a 3-D Space .................................................................... 51 Figure 4.1 The Schematic of the Modeling Procedure .............................................. 53 Figure 4.2 An Active Structured Light Sensing System ............................................ 54 Figure 4.3 Stripe Images of the Cobra Head with Different Poses ............................ 54 Figure 4.4 Networks Derived from Stripe Images ..................................................... 55 Figure 4.5 The Diagram of the Registration Process ................................................. 55 Figure 4.6 The Triangulated Networks ...................................................................... 57 Figure 4.7 After Transformation and Inconsistency Reduction ................................ 59 Figure 4.8 The Constructed Geometric Model of the Cobra Head ............................ 65 Figure 4.9 Error e Pixels yield Radial Error r' .......................................................... 68 Figure 4.10 Points can be Anywhere in the Intersection ........................................... 69 Figure 4.11 A Range Image and Its Corresponding Wing Map ................................ 76 Figure 4.12 The Procedure Diagram of Wing Detection ........................................... 77 Figure 4.13 The Structural Properties of Wing Features ........................................... 82 Figure 4.14(a) 48 out of 320 Views of the Cobra Head ............................................ 83 Figure 4.14(b) 48 out of 320 Views of the Coffee Cup ............................................. 84 Figure 4.15 Range, Segmented and Wing Images of Model Views .......................... 85 Figure 5.1 Distinct Wing Sets versus Image Resolution ........................................... 97 Figure 5.2 Stability of Wing Representation ............................................................. 98 Figure 5.3 17 Views of the Cobra Head .................................................................... 99 Figure 5.4 17 Views of the Coffee Cup ..................................................................... 101 Figure 5.5 16 Views of the Cobra Head .................................................................... 103 Figure 5.6 Scale Spaces ............................................................................................. 106 Figure 5.7 Registration of Zero Crossings ................................................................. 108 Figure 5.8 Real Images Taken Using the ERIM Sensor ............................................ 111 Figure 5.9 Segmentation Results Using Hoffman & Jain’s Technique ..................... 112 Figure 5.10 Jump Edges of Images ............................................................................ 113 Figure 5.11 The Results of the First Segmentation Method ...................................... 114 Figure 5.12 The Signs of Directional Derivative ....................................................... 116 Figure 5.13 Two Intersected Half-Planes .................................................................. 116 Figure 5.14 Perturbation Test Using the Block Model .............................................. 121 Figure 5.15 Rotation Test Using the Block Model .................................................... 122 Figure 5.16 Convolution Size Test Using the Block Model ...................................... 122 Figure 5.17 Noise Test Using the Block Model ........................................................ 123 Figure 5.18 Perturbation Test Using the Cobra Head Model .................................... 124 Figure 5.19 Rotation Test Using the Cobra Head Model .......................................... 124 Figure 5.20 Convolution Size Test Using the Cobra Head Model ............................ 125 Figure 5.21 Noise Test Using the Cobra Head Model ............................................... 125 Figure 5.22 Sign Maps of f,‘ of the Cobra Head ........................................................ 126 Figure 5.23 Segmentation Results of Real Images .................................................... 128 Figure 5.24 Results of Real Images Using Different Filter Scales ............................ 129 Figure 5.25 Results of Real Images Using Different Filter Scales ............................ 130 Figure 5.26 Sign Maps of the First Derivatives and Mean Curvature ....................... 131 Figure 6.1 An Example of Consistency Labeling Problem ....................................... 142 Figure 6.2 A Model View and a Scene View ............................................................ 151 Figure 6.3 A Consistent Wing Pair ............................................................................ 152 Figure 6.4 An Abstract 2-D Feature Space of Transformation .................................. 160 Figure 6.5 Situations Due to Inadequate Radiuses of Cluster ................................... 161 Figure 6.6 Definitions of Adequate Radius, Error Curve and Trough ....................... 165 Figure 6.7 Troughs of Error Curves Tend Toward Larger Radius ..... . ....................... 165 Figure 6.8 A Cluster Suddenly Increases at Some Radius of Cluster ........................ 165 xiv +5.. N... m“. \‘~~ Chapter 1 Introduction The recognition problem which humans solve effortlessly involves a number of difficulties for machine vision systems. According to vision psychologists [She84, C0084], humans recognize objects because they saw "similar" ones before. Only objects "familiar" to humans can be recognized. Otherwise, humans "leam" the objects through a learning process that forms so-called mental models. A machine vision system, which copies even a small portion of human vision in object representation and recognition, involves tremendous effort. In this thesis, we do not attempt to complete an entire recog- nition system but focus on the development of theory and procedures which attack several critical problems commonly encountered by vision researchers. In this chapter, the recognition problem to be solved is first defined. Primary tasks are then proposed. Afterwards, object modeling and recognition are reviewed. Finally, the organization of this thesis is presented. 1.1 Human Visual Processing Human visual perception, a natural process, involves a series of complex processes which to date are not clearly understood [Gib52, Sek85]. These processes create a "mind’s eye" [Vi/0186] and proceed far beyond what is present on the retina. Research in anatomy, neurophysiology and psychophysics has shown that the visual input goes through a number of associated nerve components. Visual perception arises from the l "In. urn-hen lA ‘1‘.' n n ‘s...’ s. 1‘. L" x‘l It At 2 cooperation of these components which use information previously collected as the basis for inferences about the state of the world. Since what our eyes receive is just a collection of incomplete information, scene understanding should entail a great deal of vision intelligence in which sensory data are analyzed, fused and interpreted. Therefore, what we see is a creation of our intelligent mind. There are three fundamental steps in a visual process: reception, extraction and inference [Wol86]. The first step is done by receptors in the retina that get external stimuli. Then, the input is forwarded to the visual cortex of the nerve system, where preeminent feattn'es such as edges, surfaces, orientation, motion, color and texture, are extracted. These features may further be processed to form higher level features by which an intermediate representation of the scene is established. It is this intermediate represen- tation and the internal mental models that visual intelligence uses to infer the state of the world. Aside from these mental representations, it is generally believed that the visual sys- tem possesses additional capabilities for computations on visual elements. Such compu- tations are called "unconscious inferences" in 19th century by German scientist Hermann von Helmholtz. Although psychologists, psychophysicists and mathematicians have pro- vided substantial means to study inference rules from various fields such as illusion [Kan76, Gi180], hallucinations [Sie77], mental models [C0084], human brain, visual machinery of animals [Hor77] and differential geometry [Whi55, Koe84], a comprehen- sive investigation into the rules is nevertheless a nontrivial task. 1.2. Machine Vision Problems A large assortment of machine vision problems can be classified in terms of several aspects along which one can constrain the problem. Some prominent aspects include: a) Applications - inspection, automation, manipulation or navigation. b) Goals - identification, localization or both. c) Techniques - engineering, biological or both. d) Environments - indoor or outdoor. e) Status - moving or static objects. 0 Configm'ations - single object, multiple objects, macroscopic or microscopic. g) Objects - natural, manufactured, rigid, nonrigid, regular or sculpted. h) Sensing systems - monocular, stereo, active or passive. i) Input data - 2—D or 3-D. Each group can be divided into finer subclasses; for instance, a stereo sensing system could employ photometric or binocular techniques. In general, to design a system which needs to deal with a complex problem or several different problems simultaneously, it is better first to design a number of system components such that each handles a small prob- lem. The complete system is then created by integrating the components in a proper way. This strategy is commonly referred to as the principle of modularization. It can also be found in the biological world, for instance, the compound eye of insects [Hor77] in that each eye has its own channel to perform a particular visual function and then scene interpretations are performed by fusing the information come from individual channels. The spatial frequency multi-channels of human visual system [Mar82] is, in some sense, analogous to this. 1.3 Overview of Problem Area In this thesis, we limit ourselves to a restricted 3-D recognition problem. This prob- lem can be stated as follows. Given a range image of a scene which can be a jumble of either distinct or similar rigid objects, specific objects in the scene are to be recognized based on the given single image and a set of object models. Examples of scenes are shown in Figure 1.1. They are intensity images for illustrative purposes. Experimental data will be range images of such kinds of scenes. In the left pic- ture of the figure, the scene is composed of a coffee cup, a wooden block and a clay t ”.“I "Ian I 4 model of a cobra head. In the jumble on the right the coffee cup has been replaced by a toy tank. The working objects are rigid and opaque but can be arbitrarily shaped. Range images are a kind of inu'insic image, which contain depth information from scenes. They can be directly obtained using range finders or derived from techniques of stereopsis or shape from approach. Range values are sensory data with rich shape infor- mation from which a set of shape descriptors such as contours, surface normals, gradients and curvatures are readily computed. Although range images provide such explicit shape information, the entire task of recognition is still a formidable task. In this thesis, range images are provided by an active structured light sensing system (available in Michigan State University, Pattern Recognition and Image Processing Laboratory PRIP) and a laser range finder (in Environmental Research Institute of Michigan ERIM). The goal of this research is to deal with both the identification and localization problems. Figure 1.1. Examples of jumbles of objects. It is believed that there is a learning process in human vision system to tackle unfamiliar objects. If a machine vision system is to simulate the behavior of human visual system, then two questions arise: how to implement the learning process and how to store object models. Aside from these high level problems, a smaller issue states that shapes depicted in images are usually portions of objects because of occlusion and image formation. How can an incomplete appearance of an object be recognized by a machine vision system? These problems have long troubled computer vision researchers. For the time being, we do not attempt to complete an entire recognition system for dealing with all of these problems, but focus on the development of theory and pro- cedures which attack several critical problems commonly encounted by a machine recog- nition system. (1) Introduction to wing primitives and investigation of their properties. (2) Development of a 3-D object representation theory using the proposed wing primitives. (3) Development of a computational framework for implementation of wing representation theory. (4) Development of a recognition procedure based on the wing representations of objects. Figure 1.2 depicts a proposed recognition system and some major tasks addressed by this research. sensing . scene system segmentation T representation «r control . . 4: system =) recogmtlon 0 .' """"""""""""" 1 ' geometrical model ' - E modeling representation : matching E data base 5 I. ........................... .I Figure 1.2. Diagram of a presumed recognition system. 1.4 Related Research on Object Modeling Object modeling and model-based recognition have become increasingly important since Roberts [Rob65] used object models to identify a set of blocks. A wide variety of representation schemes have since then been proposed to provide configurations for building databases of objects. Boundary-based, volume-based and parameter-based approaches are commonly adopted. Most schemes favor objects with regular shapes such as planar, quadric surfaces and cylinders. Few people have paid serious attention to the problem of building models for arbitrarily-shaped objects. Possible reasons are that the tasks for modeling such general objects seem implausibly complex, and that researchers hadn’t been provided with strong evidence from either psychology or biology to confirm the notion of the model-based recognition paradigm 1.4.1 Psychological Evidence Psychologists [She71,84, Coo75,84] have designed a series of experiments using sin e-stimulus and paired-stimulus techniques to quantitatively demonstrate several interesting aspects of mental process. In their experiments, three-dimensional "armlike" blocks generated from computers are depicted as perspective line drawings. Through the study of mental operations from those line drawings, they came to the conclusion that there are mental models, learned from visual experience, stored in our brain. Object recognition in the human visual system is then performed by matching mental models to real world objects. This immediately gives rise to the question of how such dynamical matching operations proceed in our brain. In order to answer this question, they demon- strated that matchings could be done by mental transformations of models that are analo- gous to their physical counterparts. Evidence of greater importance to computer vision researchers is that the subjects’ mental images represented the three—dimensional struc- tme of the objects portrayed and not simply the two-dimensional features of the draw- ings. This might give us clues to how models are configured. From experiments executed by Shepard, Metzler and Cooper, the reasoning that representation models play an important role in the tasks of object recognition was confirmed. Their persuasive results also showed that object models should possess spatial relationships among part components. Unfortunately, they didn’t give the answers to what object parts are, how the object can be decomposed into parts, and what relational evidence between parts can be used to construct a model. A feasible answer to the first question may be found in the classical work of Gibson [Gib50]: the elementary impressions of the human’s visual world are those of surface and edge. This principle of surface-edge evidence looks so simple and straightforward, that it has dominated the thinking of researchers within the computer vision community. 1.4.2 Theory of Parts Hoffman & Richards [Hof84] introduced a minimum rule to decompose objects into components, based on the uniformities of nature (the regularity of transversality) to parti- tion object surfaces into so called natural parts. This regularity says that when two arbi- trarily shaped surfaces are made to interpenetrate, they always meet in a contour of con- cave discontinuity of their tangent planes. Therefore, the partition rule can be simply stated: object surfaces can be divided into parts at loci of negative minima of each princi- pal curvature along its associated family of lines of curvature. However, this rule sufl'ers from several weaknesses. First, only surfaces complying with the assumptions of differential geometry can be segmented using this rule. Secondly, contours subject to the minimum rule may not be closed. Partitioning based on such contours will inevitably produce ambiguous choices. Finally, curvature computa- tions relying on second derivatives are noise-sensitive and fragile. Aside from the weaknesses of the minimum partition rule, Hoffman and Richards haven’t presented methods for describing object parts once they have been extracted. A solution to this problem was suggested by Pentland [Pen87] and described below. 1.4.3 Lump of Clay and Fractal Model Pentland proposed a computational theory and 56 descriptive parts characterized by a parameterized family of shapes, called superquadrics, which are described by the fol- lowing equation: cos81 t] cos?”2 a) X (n, 0)) = cose‘n sinezco sin81 n where 11 and to are latitude and longitude with respect to the coordinate system of each individual part and 81 and 82 are parameters which control the surface’s shape. This fam- ily of functions can describe cubes, cylinders, spheres, diamonds, pyramidal shapes and intermediate shapes. Regardless of size factor, each primitive can be sufficiently represented by the two parameters 81 and 82. Afterward, these basic primitives can be deformed by bending, stretching, twisting or tapering to form more complex prototypes. High level objects are then represented by a set of prototypes, a collection of deforma- tions, and a sequence of Boolean operations. Clearly, this process of object formation is identical to the process of constructive solid modeling [Req80]. Superquadrics are still constrained by the assumptions of smoothness (differential continuity) and isotropy (symmetrical shape). A fractal model [Pen87] was proposed to relieve the restriction. This model is in fact governed by a topological dimension T and a fractal dimension D, which satisfy the relation D = T + r, where r is the ratio of the number of features (i.e., the fractal dimension) of one size to the number of features of the next larger size and is a constant. The topological dimension T controls the general outline of an object, whereas the fractal dimension D controls the roughness of object surfaces. Therefore, if (D = T and r = 0), then shapes are smooth; while if (D > T and r = 1), then shapes are rough. Although Pentland has proposed efficient models to describe a wide range of part primitives, most of his work focused on model building rather than methods of how to recognize objects using models characterized by his proposed part category. A possible reason that he hadn’t proposed the recognition procedure is that the recovery of part primitives (or part parameters) fiom image data is difficult and the computational method is still unreliable. 1.4.4 Recognition-By-Components Biederman [Bie87] proposed a theory of Recognition-By-Components (RBC) from a psychological viewpoint. He presented a set of 36volumetric primitives, called geometri- cal ions (geons), which is a sub-collection of generalized cylinders [Bin71]. His 9 experiments on the theory of RBC demonstrated that humans can often recognize objects based on a small set of geons (three will be enough for a moderately complicated object) and their spatial relationships. Although the conclusion is appealing, the extraction of geons from images is still governed by Hoffman and Richards’ minimum rule. Inferring volumetric components from 2-D information would easily suffer from viewpoint varia- tion, which means that difl'erent primitives might be instantiated for the same component from difl‘erent object views. 1.4.5 Conclusion on Part Representation Certainly, the representation problem must eventually be confronted by researchers in model-based recognition. This problem is dependent on the domain of application. In many circumstances, a single representation scheme is not enough to represent all work- ing pieces. A multiple-scheme representation may provide flexible choices but will increase complexity of a system. Furthermore, sophisticated objects may demand more powerful representation schemes. The part models enjoy several advantages and seem to offer a paradigm with poten- tial for the near future. They can handle assorted configurations of articulated com- ponents without substantial adjustments. A finite vocabulary of primitives can also pro- vide tremendous power in dealing with an overwhelming class of objects by combining primitives. The database of a recognition system using part models is usually easier to refine when new objects are encountered that require the addition of primitives, and is therefore flexible. Moreover, these systems can handle objects even though they are transformed, degraded or occluded. Unfortunately, so far the proposed part models and their associated theories are too idealized to be implemented in a machine vision system. Thus, we fall short of our ambi- tion to directly extract 3-D part components fiom image data; we instead look for features of object wings that are 2 1/2-D primitives (see Section 2.3), and attempt to develop a representation theory, called wing representation theory, in terms of this set of primitives. 10 1.5 Related Background on Model-Based Recognition Few recognition systems based on part models have been developed. Most existing model-based recognition systems use vertices, holes, edges, arcs and surfaces as features. Several methods ad0pted in our research are closely related to the previous work and described below. Goad [Goa83] introduced a system which could automatically construct a fast spe— cial purpose program through path search for recognizing and locating objects from a pile. Although consistent labeling is done by exhaustive search, the precomputation of parameter ranges efficiently narrows down a search space and provides potential con- straints for determination of viewpoint. The concepts of automatic programming and the locus image representation of objects are the most notable ideas in his research. Later, these notions were adopted by Ikeuchi and Kanade [Ike8 8] and extended by introducing techniques for automatically compiling both object and sensor into a recognition process, which uses a multiple-aspect scheme to represent objects. Lowe’s work [Low87] pioneered the idea of perceptual organization for low level feature groupings and championed the technique of 3-D from 2-D recognition. Viewpoints are computed through the Newton numerical method, which recursively modifies a vector of view parameters starting from a predicted viewpoint. This method exempts 3-D recognition from the need of recovering depth information. Similar con- cepts can be found in other work [Fau86, Tho87, Hut87, Lin88, Sho88], where the posi- tions of objects are computed by aligning model features with detected image features. Jain and Hoffman [Jai88, Hof87] developed an evidence-oriented technique for identifying 3-D rigid objects with arbitrary shapes. A rule-based database was con- structed using evidence conditions based on notable features to represent objects. In their approach, techniques of pattern recognition were used to determine similarities between model and scene representations. They also developed a learning process for automati- cally extracting evidence conditions from different views of objects. Their recent work has considered pose determination but not multiple-object scenes. 11 Grimson and Lozano-Pe’rez [Gri87] used an interpretation tree search to solve the feature correspondence problem. They ad0pted a set of constraints on surface patches in terms of their distances and angles to prune the tree search. Pose determination is per- formed by computing a 3-D transformation having up to six degrees of freedom relative to the sensor. Other model-based vision systems that have influenced our research include ACRO- NYM [Bro84], 3DPO [B0183] and the works of Schwartz & Sharir [Sch87], Lamdan & Wolfson [Lam88], Basri & Ullman [Bas88], and Shoham & Ullman [Sho88]. Schwartz and Sharir developed an elegant 2-D curve matching technique. This method transforms matching problems into optimization problems. It uses a complex mapping technique to transform rotations of Euclidean vectors into multiplications of e ‘9 terms in the complex space. We extend their method (in Chapter 6) by including a scale factor when comput- ing transformations. Lamdan & Wolfson introduced the geometric hashing technique using coordinates of multiple-reference frames as keys to access information of critical points. Through this data structure, both object class and frame basis can be efficiently accessed. Object pose is then determined based on the given frame basis. Basri & Ullman presented a method based on point curvatures to predict the silhouette of a new object pose from the silhouette of a given object pose. This method allows objects to be represented by a small number of object views. A new appearance of an object can be estimated from the representative views. One advantage of this method may be its terse- ness in object representation because only a few object views are involved. Shoham & Ullman introduced a 3-D from 2-D recognition technique which allows the object pose to be determined without recovery of 3-D information. 1.6 Organization of the Thesis The remainder of this thesis is devoted to a theory called wing representation theory. Starting with this theory, the associated computational frameworks for represen- tation and recognition are designed. Wing representation theory is characterized by a set of new primitives referred to as object wings. Definitions and properties of these wings 12 are investigated in Chapter 2. A computational framework of wing representation forms the major content of Chapter 3 in which techniques for representation are addressed Then, we turn in Chapter 4 to an implementation of the representation theory. It includes three stages. The first stage is to construct the geometric models of objects. The second stage is to augment models by adding surface characteristics to the geometric models. In the final stage, objects are represented by multiple viewpoints in terms of wing primitives. In order to examine the adequacy of wing representation, the properties of unique- ness, stability, compatibility, terseness and locality are tested This is done by analyzing the influences of convolution masks, view and map resolutions on object representations. Techniques of feature extraction with real images are studied to examine the consistency between the wings detected in images and the model wings in the database. All these stu- dies are collected in Chapter 5. In Chapter 6, a recognition procedure based on wing representation is designed and tested. Recognition must achieve two goals: identification and localization of objects. The recognition process involves four stages: indexing, consistent labeling, parameter estimation and decision. Associated techniques for implementing these stages include feature comparison, interpretation tree search, curve matching and clustering. Principle contributions, implications of this research, and future work are summar- ized and discussed in Chapter 7. Chapter 2 Representation Theory: 2 1/2-D Object Wings Recently, part models have attracted many vision researchers’ attention for representation of 3-D real world objects. Among those models, natural components [Hof84, Bie87] have received much appeal because they reflect the physical structure of objects at a high level of abstraction. Such abstraction is generally believed to be enough for object recognition. Meanwhile, associated computational theories of part models have also proliferated [Bar84, Hof84, Tver84, Bra85, Pen87, Bie87, Baj87]. Unfortunately, most models have suffered from a number of difficulties in imple- mentation. The most bewildering concern may be the reliability of extracting 3-D com- ponents from 2-D image data, which can easily lead to computational instabilities and problems of uniqueness. These difficulties have hampered the application of part models to representation and recognition problems even though they possess a number of elegant properties»- In this chapter, a set of primitives, called object wings, is defined to serve as a feature set for 3-D representation and recognition. Wings are object features which con- tain shape information of contour and surface. Major considerations for construction of wings are to be able to handle general objects, overcome imperfect feature extraction, index models and compute object poses. Thirty-four simple wings are proposed for prac- tical use. They are defined as triples consisting of a pair of surface patches separated by a contour segment. Composite wings with higher level configurations are then formed by 13 5 ‘.ol tin Q... no“ ‘ V‘ ”a 14 grouping simple wings that comply with certain non-accidental relationships such as cur- vilinearity, symmetry and parallelism. Although many researchers have already worked on features of contour and surface, most researchers considered them separately. Object wings are, in fact, compound features of contour and surface. In general, the properties of a compound feature will be different from the properties of individual features which form the compound one. In this chapter, definition and property of object wings are studied. 2.1. What are Wings? Suppose one can use some underlying segmentation techniques to extract contours and regions from an input intrinsic image. Let C denote a set of detected contours and R be a set of regions detected from an image. C={Cl,02,'°‘.cm‘}. R={r19r29...9rm2}- Definition 2.1: A region r; is said to be adjacent to a contour c; if there exists at least a pair of pixels pe r,- and as cj, which are eight—connected. In other words, q is a boundary pixel of r,-. A binary relation, called or- adjacency, between a contour c; and a region r,-, denoted by a, is defined as follows: ci a r,- iff C] and r,- are adjacent. Let C, represent an N-tuple of contour attributes and T be a set of permissible instances for the tuple. Each instance t,- of T is, therefore, an N—dimensional vector in which each component corresponds to an attribute. The properties of a contour can thus be specified by a vector, where T = “I! t2, Jul}- In a similar vein, let Ra denote an M-tuple of region attributes and S be a set of per- missible instances for that tuple. Each instance sf of S is an M-dimensional vector of values associated with region attributes defining the properties of a region. Since wing piimitives will be defined along detected contours, the adjacent regions of a contour may 11m be available in some cases. Thus the set S needs to include an additional vector, 15 called the null instance, denoted by so, to represent an undefined region. Let S = {30’ 81, ' ' ° ’snzj- Definition 2.2: A wing primitive w is configured as a contour with its adjacent regions, and is characterized by a triple, which is in the product set S x T x S. 2.1.1 Complete Set Let W = { (sg, tj, sk)|i,k =O,--n2 ; j =1,-°n1 } denote the complete set of wing primitives. There are n1(n2+1)2 distinct members in W. From analyzing the process of image formation, it is not difficult to discover that several of these combinations are impossible. Proposition 2.1: The complete set W is a superset of valid wing primitives under general image formation processes. Equivalently, not all combinations of contours and surfaces can have corresponding physical wings. Proof: To prove this, recall Koenderink’s theorem [Koe84] about solid shape from occluding contour. Referring to Figure 2.1, consider point P on the transverse cru've. Let K denote the Gaussian curvature, K, represent the radial curvature, and K, be the transverse curvature, all of which are defined at point P in the 3-D space. Let Kapp be the apparent curvature at the projection point P' of P on the image plane, and d be the dis- tance fi'om the viewer to P. Then, Koenderink’s theorem simply states that Kd K,K,=K andKapp=K,d= K. r (2.1) Equation (2.1) indicates that K4,, has the same sign as K, since both K and d are positive. IfK, is positive, then the part of the radial curve with negative curvature K, will be self-occluded, or the corresponding radial curvatures can not change signs when trac- ing along the apparent curve. For example, if there is a valley (concavity, K, < O) in a radial curve, there cannot be a hill (convexity, K, > 0) just in front of it. Otherwise the valley would not be seen because of the occluding hill. A similar analysis can be 16 projection plane viewer p \ apparent curve curve Figure 2.1. Diagrammatical definitions of the Koenderink theorem. performed for the case of K, > 0. Figure 2.2 clarifies this. VICWCI' 111.11 4 /%y Figtne 2.2. If there is a hill appearing just in front of a valley, the valley will be invisible to the viewer. From the above specific example, it is known that the region adjacent to a concave contour generated by an object limb cannot be a convex surface. We conclude that not all the combinations of contours and surfaces are possible in a projection plane. I As a matter of fact, Koenderink’s theorem can provide six more rules by which some additional members of W can be removed. To collect the complete set of rules is not impossible but also not easy. Potential rules may come from difl‘erential geometry [Bra85], physics [Cae82] and mathematics [Na188]. The proof of Proposition 2.1 indi- cates a way to find a valid set of wing primitives. For a triple in W, if there exists a corresponding physical wing, the triple will be called a legal triple. The resulting collec- tion of legal triples forms the legal set of wing primitives. 17 2.1.2 Proper Sets In many situations, it may be unnecessary to use the whole legal set for practical implementation. For instance, if the direction of contours is not important to a task, the set of wing primitives can be reduced by ignoring the directional attributes of contours. Let us call such a process a collapsing process and the resulting set of primitives the col- lapsed set. In contrast to the collapsing processes, there are augmenting processes by which one can extend the wing catalogue. Let us call the resulting set the augmented set. Both collapsed and augmented sets are proper sets of complete set W. One of the major purposes for collapsing wing primitives is to simplify the catalo- gue. However, when the shapes of objects become sophisticated, more detailed wing primitives will be needed. By collapsing and augmenting processes, one can create as many "scal " catalogues as needed. Then based on the set of scaled catalogues, a coarse-to-fine recognition system can be developed. For some cases, such hierarchical strategies can improve performance and reliability of a recognition system. 2.2. Physical Meaning As mentioned earlier, a region was defined by a set of attributes. In general, any available information of geometry, topology and statistics can form feasible attributes for regions. However, the availability of feature properties typically depends on the input data, for example range or intensity data. It is usual to include only the necessary con- su'aints for applications. For this reason, in this section attributes of contour and region types are considered. To further simplify the discussion, restricted sets of feature types are employed. Define the set of contour types as {<<, >>, <, >, +, -l. The first two symbols represent limbs. The next two symbols denote occluding convex edges, and the last two elements represent convex crease and concave crease edges respectively. Here, we have followed the notation used in the analysis of line-drawings [Ma187]. For simplification, the set of contour types can be collapsed into {!, <->, +, -} by ignoring orientations of limbs and occluding convex edges. This set then represents silhouettes, jump edges 18 (self-occluding only). convex crease and concave crease edges. Next, define a set of sur- face types as [n, p, +, -, s], which represents nil, plane, convex and concave surfaces, and saddle respectively. The nil type is included in order to represent each wing primitive as atriple,incaseanobjecrwinghasanundefinedadjacentregion. Norethatthereisnonil type for contours because wing primitives have to be defined along known contours. xzeeeeeeeeee er @®®®@®@@@O surface types: n: null; p: planar; +: convex; -: concave; s: saute. edge types: !: silhouette; (t : jump edge; +: convex crease; - : concave crease. Table 2.1. Set of wing types for practical implementation. Based on the definition of low-level feature types (four contour types and five sur- face types), in principle, 524 = 100 distinct wing types can be defined. As shown earlier in Lemma 2.1, not all these wing types have corresponding physical wings; in other words, not all these wing types are legal. A set of wing primitives for practical imple- mentation is shown in Table 2.1, in which 34 distinct wing types have been filtered out from the complete set of 100 wing types by examining the legality of each wing type. There is no formal proof provided in this thesis for the type filtering. Let us classify this set of 34 wing types into four subgroups according to the con- tours along which they are defined: silhouette wings, jump wings, convex-crease wings l9 and concave-crease wings. This results in four distinct silhouette wings and ten wings for each of the other subgroups. Examples of synthetic wing primitives defined on the line drawings of a cofi'ee cup and a coke can is shown in Figure 2.3, where the small black circles denotethe boundaries of wing primitives. (a) coffee cup Figure 2.3. Synthetic wing primitives defined on the line drawings of objects. Wing types listed in Table 2.1 can be easily extended by any suitable augmenting processes. We present three extension principles for futm'e use with objects that have more complicated shapes. First, wing types can be extended through introducing finer classifications of either contour or surface types or both. For example, a convex surface can be further classified into peak and ridge surfaces, and a concave surface can be further divided into pit and valley surfaces. Similarly, for a saddle, it can be classified into subtypes of minimal surface, saddle ridge and saddle valley [Bes86]. Secondly, wing types can also be defined along the projections of any kind of spatial curves such as lines of curvature, geodesic cru'ves, asymptotic lines, parabolic lines or critical lines, as long as they can be detected reliably. Finally, wing types can be further detailed by including orientations and curvilinearities of contours. For example, "+(-" and "+)-" can be categor- ized as different wing types, where "+" and "-” represent convex and concave surface types respectively, and "(” and ”)" denote contours with different directions of curvature. 20 2.3 Simple Wings In order to compensate for difficulties of feature extraction, rather than applying res- toration procedures to the contour maps, such as cleaning (noise), filling (broken), recov- ering (missing) and grouping (structure), we simply include the contextual surface infor- mation with each detected contour. By this, we mean the surface type of regions adjacent to the contour. Now, a simple wing can be described as follows. A simple wing is composed of a pair of 3-D surface patches (one but not both of which can be nil) separated by a 2 -D contour segment. Therefore, it contains both 2-D and 3-D shape information and is aptly called a 2 1/2 -D object primitive. Later on, we will present a rule to extract object wings from images. Here let us first investigate some notable properties of wing primitives. Object wings are entirely charac- terized by local properties. These properties enable recognition even when objects are transformed, occluded and degraded. It is the contour information of wings that makes the parameter estimation of object poses feasible in the final stage of recognition. This will be discussed further in Chapter 6. Regions adjacent to a contour in a projection plane may not necessarily adjoin each other on the object surface. For instance, one of the regions adjacent to a detected jump edge will not be adjoining the corresponding contour in 3-D space. Therefore, through an appropriate rotation a jump wing can change to a convex-crease wing, and vice versa. Thus, object wings are not viewpoint invariant. This property is called type variation and can be used to delimit aspects [Ike88] of an object in a multiple-view representation. 2.3.1 Extraction Rule Suppose that a reasonable collection of contours and surfaces have been achieved using some segmentation techniques. There are two stages in labeling wings along con- tours: dense labeling and sparse labeling. Dense labeling assigns a wing type for each contour pixel, whereas sparse labeling assigns a wing type for an entire contour segment. 21 In experiments, dense labeling is performed prior to sparse labeling. Boundaries of wing primitives are then located by the following rule: Extraction Rule: Wing boundaries are located on a contour at locations of change in either the contour type or adjoining surface types or both. The implementation of this rule can be described as follows. First dense labeling is performed on the detected contour pixels. Then, connected contour pixels with the same label are grouped to form a wing primitive. Note this rule depends only on qualitative types. It should be more reliable than the minimum rule [Hof84] which is based on quan- titative curvature values. 2.3.2 Structural Constraints on Wings Investigating the structural properties of wing primitives not only provides a further understanding of wings, but gives an indication of how a set of unrelated primitives can be organized into more meaningful configurations. There are two kinds of wing con- straints: unary constraints and binary constraints. The unary constraints indicate proper- ties of individual wings, whereas the binary constraints specify relations between two wings. Constraints for more than two wings are possible but beyond the scope of this research. By using a multiple-view representational scheme, constraints are only required to be invariant within a viewpoint. That means the structural properties of wings exist in 2- D space rather than the 3-D space. For this reason, one has wide flexibility to choose wing constraints. A set of experimental characteristics for defining unary constraints for individual wings is given below. Note here that for explicitrress, "curve" and "contour" will be used to denote lines in 3-D and 2-D spaces respectively. 1) 3-D wing length: the length measured along the corresponding 3-D curve of a 2-D wing contour. 22 2) 3-D depth range: the difference between the maximum and the minimum range values along the corresponding 3-D curve of a 2-D wing contour. 3) 2-D angle span: moving the tangent vector field of a 2-D wing contour to a predefined coordinate system; the angle span of the wing is then defined as the angle sector swept by the tangent vector field. 4) 3-D maximum and minimum turning angles: a turning angle is defined as the angle swept by the tangent vectors of two adjacent points on the corresponding 3-D curve of the 2-D wing contour. 5) 2-D maximum and minimum turning angles: similar to Item 4 but directly defined along a 2-D wing contour. 6) 3-D maximum and minimum tangent magnitudes: measured along the corresponding 3-D curve of a 2-D wing contour. 7) 3-D maximum and minimum curvatures: measured along the corresponding 3-D curve of a 2-D wing contour. 8) 2-D maximum and minimum curvatures: similar to Item 7 but directly meas- ured along a 2-D wing contour. Parameters of the wings are computed in terms of their geometrical properties. Before computation, a circular operator is applied to resample a set of points along each wing contour. Thus, distance intervals between any two points are the same, which equals to the radius of the circular operator. A piecewise cubic B-spline r(t) can be used to obtain 2-D parameters of curvature, tangent magnitude, and tangent vector at every n sampled point. A point r(t) of a cubic spline curve is defined as r(t) = 2P;N,;k(t), i=0 where P,- represents the control points, Nata) denotes the blending function, and k is the order of the curve. The blending functions, also called basis splines, can be constructed 'via the mansion (t -xi)Ni.It-1(t) + (Jim - I)Ni+1,k-1(t) Ink-r “xi 1m - xi+r ’ IVA/r(t) = 23 _ 1 xi StSXm Nhlm — {0 otherwise ’ Nil“) = 0, if xi=xi+li where 13,8 are knots from a predefined knot vector. By defining a uniform knot vector, a point on the piecewise B-spline can be written as - '1 3 l -13 -31 12 r(:>=[’y‘§3] =tPo.P1.P2.Ps]-6— 4363‘} ‘, . (2.1) l 0 00 L1 where t is the parameter of the spline defined on the knot vector, and Pg’s are control points. From this equation we can compute the first and second derivatives of vector function r(t), such that the tangent vector T and curvature k are r=LI-,t.—.L"_>,<£1. (2.2) |r | If I3 For 3-D cases, the vector function r(t) of Equation (2.1) includes one more com- ponent z (t) and the control point vector becomes a 4 by 4 matrix with the constant matrix remaining unchanged. A set of relational structures for defining binary constraints between two wings includes: 1) 3-D maximum and minimum distances: the maximum and minimum dis- tances between the corresponding SD curves of two 2-D wing contours. 2) Ratio of 2-D minimum to maximum distances: the ratio of the minimum to maximum distances between two 2-D wing contours. 3) 3 -D maximum and minimum distances between the convex hulls of centers of gravity: a center of gravity (or first moment) of a 3-D curve segment is defined as the average of coordinates of the curve points; a 3-D convex hull [Pre85] of centers of gravity of a wing is the smallest convex volume contain- ing centers of gravity of all possible 3-D curve segments of the 2-D wing con- tour. 24 4) Ratio of 2-D minimum to maximum distances between the convex hulls of centers of gravity: similar to Item 3 but convex hulls are defined in the 2-D space and using the ratio of distances instead of distances. 5) Parallelism: the separation and overlapping between the angle spans of two 2-D wing contours. Parallelism is computed using the following equations. MAX = max {510,1mm - SP3, , SPLg, - SPfim} MIN min {51D}mm -SP§,,,, , Spfnin 410%,“) where SP m and SPmin are maximum and minimum tangent angles of contour points with respect to a predefined coordinate system. Therefore, a parallelism measurement of a pair of sensed wings should be within the range of its corresponding model pair. Note that the proposed sets of wing constraints may have redundancy. For example, in the set of unary constraints, Item 4 is in some sense equivalent to Item 7. An approach to studying the power of constraints will be presented in Chapter 6. In practice, a subset of the aforementioned constraints will be adequate. One should also provide tolerances of parameters and use a penalty strategy to get rid of computational instability. 2.4 Composite Wings A fundamental objective for forming a composite wing is to look for simple wings which come flour a single physical object. Typically, composite wings should possess more potential utility for object recognition than simple wings. Our major purpose is to capture salient component structure of objects while keeping the probability of false grouping very low. To this end, criteria for composite wing grouping must be defined. Feasible criteria include: 1) proximity - neighborhood, spatial relationships. 2) similarity - characteristic properties. 3) collinearity and curvilinearity - orientation continuity. 25 4) connectivity - reachability, through contours or surfaces. 5) cotermination - junctions. 6) symmetry, and 7) parallelism. Clearly, the above grouping criteria are similar to those of perceptual organization [Zuc85, Vis85, Low85,87, Tuc86]. A prominent difference between this research and the previous work is the different tokens employed in perceptual organization. Dot patterns [Zuc79, VisSS, Tuc86] or line segments [Low85,87] were often used. Wings serving as tokens have different dimensionality from those of dots and lines. The search for group- ing criteria suitable for wing primitives will form an extension of current wing represen- tation theory. Details of definitions and functionals of the aforementioned criteria can be found in the works of Zucker [ZucSS], Lowe [Low85], Tuceryan [Tuc86]. Biederman [Bie87] and Malik [Ma187]. A composite wing is composed of a small set of simple wings which are common to a single physical object and are subject to some non-accidental relationship. In order to match the objective that simple wings should belong to a single object, the above criteria haven’t provided sufficient conditions to follow because there are often competition among criteria [Zuc85, Tuc86]. To alleviate competition, rules for combin- ing wings in terms of physical principles, heuristics and domain assumptions need to be constructed. Once this job has been achieved, category and structure relationships for composite wings can be introduced. Wing representation theory should go further to address the formation of object models in terms of composite wings and their structures. This is also of importance for object separation (high-level segmentation) when dealing with real images with multiple objects. 26 2.5 Illustrating the Power of Wings in Recognition Figure 2.4(a), (b) and (6) show edge maps labeled using Malik’s extension to the Hufi‘man-Clowes label set [Mal87] for a bowl, a half grapefruit and a clam. Note that junctions at A, B, A" and B" are not included in Malik’s junction catalogue because the bowl and clam are not in his object set. It is clear that the 3-D shapes of the objecrs in the figure can be precisely interpreted through the line labelings. However, since the labeling procedure always starts with a set of junctions and reaches consistency of labeling through hierarchical determination (filtering, backrrackin g, or derivation of reciprocal figures in gradient space), the connectivity between and within lines is critical for infor- mation propagation. (d) a, (15 Figure 2.4. Line drawings for a bowl (a), a half grapefruit (b) and a clam (c) using Malik’s labels. (d), (e) and (0 show wing features labeled along degraded contour seg- ments for the objects in (a), (b) and (c) respectively. Figure 2.4(d), (e) and (f) show incomplete edge maps for the same set of objects. To deal with such cases, techniques of Gestalt based grouping or perceptual organization 27 may provide a solution. However, without appealing to support reasoning, it will be necessary to rely on a premise of non-accidentalness. This greatly reduces the generality of the techniques. To compensate for this, contextual surface information is included for those incomplete contours and results in object wings. As shown in the figure, even under degraded conditions, objects can be easily distinguished based on labeled wings. We claim that many object wings can be reliably sensed from image data, whereas perfect curves and surfaces cannot be. Were a range image available, the object wings would be readily obtainable. In addition, as indicated in Figure 2.4(d), (e) and (t), surface data can even provide links for contours. This not only removes labeling ambiguity but can form high level composite features, which are critical cues for indexing into a data- base. The junctions in the figure have the same 2-D shape as the "three tangent" of Malik [M2187]. While there are 63 x32 contour/surface label combinations according to our sim- ple labeling scheme, only a handful of them are realizable as real composite wings. Although we haven’t sought a formal theory of the structure of the objects we wish to recognize, we do know that, as in classical line drawing analysis, correct detection of a junction will place strong constraints on the object surfaces creating it. Such constraints supported by the capability to sense surfaces is of great benefit to segmentation and recognition procedures. 2.6 Reconstructing Line Drawings from Wing Representations In the last section, we demonstrated the power of object wings in identifying objects. In this section, we consider the adequacy of wing representation for polyhedral scenes. To this end, we demonstrate that if a wing representation has been derived fiom the image of a polyhedral scene, the spatial structure of the scene could be recovered from the wing representation. In other words, given the wing representation of a scene containing polyhedral objects, reconstruct all visible parts of objects based on the given wing representation - that is the coordinates of all visible vertices, the equations of all visible faces, and the completely labeled line drawing of the scene. 28 Here our purpose is not to interpret the 3-D meaning of a 2-D line drawing, but rather to construct the 2-D line drawing and its 3-D interpretation from appropriate frag- ments of the 3-D scene. It is fairly natural for us to start with the surface equations before reconstructing line drawings. This is because the surface information available in object wings is enough to compute equations of polyhedral object surfaces which are part of the wings. In the following subsections, we study a theoretical approach to interpreting polyhedral scenes based on given wing representations. The details of this study such as assumptions, the definition of object domain, non-accidental scenes, general viewpoints, reconstruction rules and several terms used in the following can be referred to Appendix G. 2.6.1 Reconstruction Algorithm An associated algorithm characterized by the set of reconstruction rules is presented here. The input data include the range image and its associated wing representation of a polyhedral scene. The output result includes a labeled line drawing, the equations of visi- ble faces and the coordinates of visible vertices. The basic idea of this algorithm is first to recover the line drawings of individual visible object faces; the entire scene present in the image is then reconstructed by integrating the recovered faces. (1) Compute the equations of planes containing object faces using the wing segments in the given wing representation; (2) Cluswr wing segments into wing groups according to their associated plane equations; (Rule 1); (3) Fa each wing group (3.1) generate a line for each wing in the wing group; (3.2) compute the intersections among the generated lines; (3.3) remove line segments with their endpoints located on the picture frame (Rule 2); (3.4) preserve line segments which overlap a wing (Rule 3); (3.5) remove any line segment unless points on exactly one side of the line segment are located on the given plane (Rule 4); (3.6) delete intersections which are not shared by exactly two noncollinear line segments (Rule 5); (3.7) record the 3-D coordinates corresponding to the remaining intersections by intersecting 29 line equations; (3.8) label line segments with the wing types which appear on the line segments (Rule 6); (4) Integrate reconstructed line drawings of individual object faces in order to discover T junctions; (5) If a line segment in the integrated line drawing possesses a compound type, the components of the compound type are separated and are assigned to distinct parts of the edge according to thediscoveredeunctions andtheoriginal wingmap(Rule 7). (6) Output the resulting line drawing with edge labels, the plane equations corresponding to object faces, and the coordinates of intersections corresponding to 3-D vertices and 2-D T junctions. 2.6.3 Examples Let us look at the example in Figure 2.5(a). Grouping wing segments based on com- puted plane equations, we obtain five wing groups: g1 a {1, 2, 3, 4, 5, 6, 7, 8}, g2 = { 2, 3,14,15,16,l7], g3 ={4,12, 13,14}, g, = {6, 7, 10,11} and g5 = {8, 9,10}. Each ' group corresponds to an object face in this particular example. The line drawings of indi- vidual faces are first recovered based on the information provided by wing groups and the original range image. Recovered faces are then integrated and edges are relabeled by referring to the discovered T junctions and the original wing map. The reconstructed line drawing of the object is shown in Figure 2.5(b). Other special examples are shown in Figures 2.6 and 2.7, where the accidental view of a polyhedron and the object with a hole and multi-face vertices can all be reconstructed by the algorithm. (a) wing map (b) labeled line drawing Figure 2.5. An example demonstrates the reconstruction algorithm. 3O (a) (b) (C) Figure 2.6. (a) A polyhedral object, (b) an accidental view of the object, and (c) the reconstructed line drawing. (a) wing map ' (b) labeled line drawing Figure 2.7 . An object with a hole and multiple-face vertices. 2.6.4 Comments Many previous line-drawing researchers started with a perfect line drawing and attempted to achieve an interpretation of the line drawing in terms of 3-D spatial struc- tures. Unfortunately, this usually leads to a set of possibilities. Moreover, if the initial line drawings are imperfect, the problem becomes extremely difficult. In order to solve the problems of multiple interpretations and imperfections, assumptions and additional information were employed. Although parts of the problems have been successftu solved by introducing exu'a knowledge, the overall schemes are not efficient. The reason is that most researchers have separated their schemes into two major modules: recon- struction module and interpretation module. Additional information is only used in the interpretation module to reduce ambiguity rather than be used in the reconstruction 31 module. In fact, available information should be used everywhere in the whole scheme as long as the information is helpful at those places. In this research, we used range values as additional information in both reconstruc- tion and interpretation modules. This not only greatly simplifies the design of procedures but also increases the efficiency of processing. In addition, instead of following the con- ventional sequence, we separate the modules into several stages and rearrange the stages in a hierarchical manner. For instance, in our algorithm individual faces are first recon- structed and interpreted. and then the entire scene is considered. One apparent advantage of this hierarchical rearrangement is that it increases the flexibility of the algorithm in dealing with a wide variety of object classes. Another important aspect of this research is the use of wing representation as input data to the reconstruction algorithm. In contrast to perfect line drawings, wing represen- tations are assumed to be imperfect. On the other hand, wing representations contain the information of range values and thus surface normals, which is not available in line drawings. In practice, wing representations are more realistic than line drawings. The additional information not only considerably compensates for certain deficiency of wing representation in interpreting scenes but also leads the reconstruction procedure to a unique result. The judgement of correctness of a labeled line drawing is also greatly simplified due to this additional information. It is our conjectrn'e that after the wings are extracted, the line drawing can be con- structed without further access to the underlying range image. The proposed algorithm applies the first three rules used previously. It then departs into a state-space search for the polygonal circuits that form the faces in the given plane. At each step of the search one line segment is added to the line drawing and all constraints checked. It is easy to see that the search is finite although exponential. The key point to prove is that there is a unique set of circuits satisfying all the constraints that is the goal state of the search. While such an algorithm has less practical value than the one detailed here, it is of theoretical interest and will be pursued in future work. 32 Currently, we only consider polyhedral scenes. If a scene contains only one object, the proposed algorithm can handle more general objects than those defined in the current object domain. We need to study how to extend this work to include scenes with curved objects. 2.7 Summary Starting from a general definition of wing primitives, we proceeded to construct a complete set of primitives. Then we developed the concepts of legal, collapsed and aug- mented sets of wings. By defining the types for contour and region, we were led to a set of 34 simple wings for practical implementation. Their properties are investigated by studying the structural constraints and the extraction rule of object wings. This set of primitives was then clustered to form composite wings under non-accidental relation- ships. Both simple and composite wings will enable us to explore the machinery of wing representation theory. The basic idea of the feature proposal was originally motivated by the difficulties commonly encountered by current approaches to feature extraction and defect recovery. The notion of wing primitives has stemmed from a need to restore detected features by integrating available contextual shape information. Via integration of information, both powers in discriminating objects (Section 2.5) and in interpretating scenes (Section 2.6) can be greatly increased. So far we haven’t yet answered two important empirical questions: how to detect object wings in images, and how to construct internal representations of physical objects in terms of wing primitives. A computational framework for dealing with these two ques- tions is given in the next chapter. The associated techniques of implementation are presented in Chapter 4. Chapter 3 Wing Representation: Computational Framework Wing representation theory is a computational theory concerning the representation problem characterized by a specific set of wing primitives. Computational theories in computer vision are somewhat unlike the usual mathematical theories. The latter gen- erally involve substantial equations, derivations, theorems and proofs, while computa- tional theories may instead include a set of physical principles, inference rules, algo- rithms, or simply descriptions. A computational framework in which associated tech- niques are developed is a realization of an abstract computational theory. In this chapter, a framework for constructing wing representations of real world objects is presented. The last chapter has implied that wing primitives are not all view-invariant. The representations constructed in terms of wing features, called wing representations, will be viewpoint dependent and be composed of a set of views. Therefore, wing representa- tions are a kind of multiple-view representation whose data structure will be referred to as the viewsphere throughout this thesis. Major steps for construction of wing representa- tions include geometric modeling, model augmentation, view generation and organiza— tion. Following Marr’s thesis [Mar82], the formation of any computational theory should include answers to three questions: What is the goal of the theory? 33 34 How can it be implemented? Why is it appropriate? In the last chapter, the primary objective and appropriateness of wing representation theory have been briefly addressed. In this chapter, after a description of secondary goals, we discuss how to construct wing representations for physical objects. The answer to the question of why wing representations are adequate will be scrutinized in later chapters where the implementation and testing are presented. 3.1 What is the Goal of Wing Representation Theory? According to Shepard, Metzler and C00per [She7l,84, Coo75,84], mental opera- tions are analogous to physical processes by which internal models are aligned with objects displayed in images through a series of transformations. We refer to such transformations as model transformations because the models are rotated and translated in order to align them with objects. Looking at the problem from another perspective, we envision that the mind attempts to discover a view of the model which is consistent with a retinal image produced by an object. This behavior (looking for instances of models to coincide with images) is clearly different from the previous one in that the models remain stationary while the mind moves freely. It is equivalent to switching a reference flame fiom the mental flame to the model frame. We refer to such operations as mental transfonnations. Undoubtedly, both transformations, although difl‘erent in manner, would have the same influence on perception. However, when they are implemented in machine systems, both the representation schemes and the recognition processes would be dif- ferent. In the model nansformation paradigm, each object model can be represented by a set of object features together with their structural relations. Those features and relations are described with respect to a pro-defined model frame. It is, therefore, an object- centered representation. There are a variety of choices for selecting features and rela- tions, and also a myriad of ways to organize them into models. A critical point is that borh features and relations should be view-invariant. Generally speaking, for the 35 paradigm of model transformation, objects are modeled as individual entities within which view-invariant features are connected through a set of view-invariant relations. For a vision system with such representation schemes, techniques of either 3 -D to 3-D [Gri87, Sto88] 0r 3-D from 2-D [L0w85, Hut87, Lin88] are commonly used. Recognition tasks are then achieved by matching model features to sensed features. For the mental transformation paradigm, each model is composed of a set of views, called characteristic views. Each view is then represented by a set of features and rela- tions present in that view. A requirement for selection of features and relations for a view is that they must be invariant to its viewpoint. Recall that the features and relations used by the model transformation paradigm are invariant to all viewpoints. This is an impor- tant difl'erence between these two paradigms. Thus, there is more flexibility within men- tal transformation paradigm to choose features and relations. A representation con- structed by this method is clearly a viewer-centered scheme. For a vision system with this representation scheme, the recognition task is typically dominated by searching rather than matching. In this research, the paradigm of mental transformation is adopted. It immediately gives rise to the problem of having infinite number of views for each object. How can all the views be derived? Of interest is a conclusion made by Cooper [C0084]: Although the identity of objects displayed in difi'ering orientations can require imagining a rotation through intermediate orientations, it doesn’t contend that the rotation is continuous in the strict mathematical sense, which requires that it sweep through all possible intermediate angles. Cooper’s argument indicates that a finite set of views may be used to represent an object. Let us refer to such views as the significant views of an object. More questions immedi- ately emerge. Given an object, how do we define the significant views for that object? Once the views have been defined, what kinds of features and relations can be used to represent views? We have answered this question by proposing a set of wing primitives and structural constraints. A final question is then how can we organize views in a way suitable for the recognition process. Answering these questions is the goal of wing 36 representation theory. 3.2 How can Wing Representation Theory be Implemented? The overall wing representation theory is depicted as a tree structure in Figure 3.1. The following discussion will start from the node of computational fi'amework and proceed through the tree in a depth-first fashion. To create the wing representation of a real world object, there are four intermediate tasks: construction of geometric models for objects, augmentation of geometric models, generation of view representation and view organization. 3.2.1 Geometric Modeling The simplest way to acquire different views for an object may be to put the object in front of a sensor and take images of the object from various directions. This method measures the relative position between the sensor and the object for every view. Without precise tools for measuring distances and orientations, this will produce considerable errors and inconsistency between views with one another. A representation constructed from these error-contaminated views will increase the difficulty of the recognition pro- cess. Thus we propose another method that first generates a geomeuic model for the object of interest; then various views of the object can be easily obtained by repetitions of rotation and projection of the geometric model. A notable advantage of this method is that once a geometric model is constructed, view generation can be done automatically. We define a geometric model of an object as a set of vertices and triangles formed via registering a set of different sensor views of the object. The triangles are used for convenience. The union of views should ideally cover the whole object surface. It is desirable that each model be a connected single entity composed of a set of vertices and triangles. It should be topologically equivalent to the object shape and geometrically approximate the object surface. Examples of four models and their physical counterparts are shown in Figure 3.2. The construction of a geometric model complying with the above criteria is nontrivial [Pot83, Hen85, Bha87, Wan87]. Models that will be used in 37 Wing Representation Theory Wing Computational Testing I urpose Primitives Framework 7 _ _ _ i _____ 1 /\ 5 Algorithms E I I I . : Implement-i View OrgvairCiZation : ation : R sentation I l ; cprc l ------ “I ----- 1: VCfifi- I A : tessellation :: cation I Geometric Augmenting View 5 I EL—_-u. ....4 Modeling Geometric Generation I . Models " J- \ l'CpI'CSant- .- ----------- .- ------------- e ------------- . -- ations . I View . Principal . I 3-D Data I : : E ACQuisition ' Registration : Component : g I I 5 Projection : r ___________ : 'bratr ' I I I I Pro'ection I : Cali on Tnangulatron : Characteristic : i J I : : II|ml 1.me min tion E Computation : E Segmentation E : 3° ............. i : . : : Processing Inconsistency i- u . W1“? . ; | ‘ Redulction : : Dctclcnon : ' Coordinate I i i I g - I Redundan I I Structural ' L Computanon .5 R em 0v afy : models : Cons . ts : - - - - ..... I I. .......... J u- E Gap CoTnection : 11 w 5 Hole filling : 6,, mp- ‘ ‘ i I : sentation : Refinement : I. a? E ii: Figure 3.1. The overall wing representation theory is depicted as a tree structure. some specific applications may fall short of these ambitions. For example, in our research of construction of wing representations, as long as models satisfy the criterion of geometric approximation, they are already enough for the purpose. Figure 3.2. Examples of geometrical models for a wooden block, a coffee cup, a toy tank and a clay cobra head. The construction of geometric models is commonly referred to as a task of model- building, which includes two major stages: 3-D data acquisition and view registration. During data acquisition, a set of object views are taken using a range sensor. Each view contains a set of distances to points sampled from the object surface. In general, for the purposes of representation and recognition, it is not necessary that sampling points should be dense in order to achieve models with high similarity to their corresponding objects. One example of a sensing system that satisfies the above constraints is the active structured light sensor, which is inexpensive and produces precise measurements. A structured light sensing system is composed of a camera, a projector, and an image pro- cessor. Before using the system, both camera and projector must be calibrated. With this particular sensing system, each view contains a small set of light stripe networks rather than a set of individual points. Figure 3.3 displays an example of the stripe network extracted in a structured light image. A computational model for deriving coordinates of stripe networks has been formulated by Chen & Stockman [Che85]. 39 After obtaining a set of views for an object, the next stage is to register this set of views to form a connected model. Although our current representation application requires only geometric similarity, the following description includes a more general method since these ;models may be used for other purposes in the future. One possible additional. purpose is automatic manufacturing. Seven procedures are proposed. The first two of these procedures will be enough for our application. These procedures include tri- angulation, transformation, inconsistency reduction, redundancy removal, gap connection and refinement. The objectives and methods for these procedures are described below. Implementation details and results will be given in the next chapter. Figure 3.3. An example of the stripe network extracted in a stripe image taken by a structured light sensing system (MSU PRIP Lab). 1. Triangulation: As shown in Figure 3.3, stripe networks are formed by a set of dis- torted meshes. Within a network, there sometimes are holes, and boundaries are usually not connected. Therefore, there are two essential tasks. The first is to sequentially con- nect the terminations on network boundaries to form closed networks. A strategy previ- ously developed by Hu and Stockman [Hu89], called the tum-right strategy, is employed for this purpose. The second purpose of this procedure is to triangulate networks by applying a 2-D hole filling algorithm to each mesh of the networks. The triangulation procedure is illustrated in Figure 3.4. 2. Transformation: Since the networks from each sensor view are derived with respect to a predefined global frame known to the viewer (the viewer-centered frame), networks Figure 3.4. The intermediate results of the triangulation process. have to be transformed to the model frame (the object-centered frame). Transformations are computed based on a set of control points measured a priori. Assume that P,- is the subset of control points associated with view i. Then at least three points of P,- should be noncollinear in order to obtain a unique transformation. Suppose also that there are N dif- ferent combinations C = {C -, j=l,N} of three noncollinear points in P;. For each combi- nation C} a {p}, ,pj2,p,-3 }, a transformation T; from the global frame to the model frame can be computed. Let r1, r2, and r3 denote the coordinate vectors of the points of C ,- with respect to the model coordinate system, and r1, r'z, and r} be with respect to the global coordinate system. Define Vgl=rz-l'1, and Vol=rz-l'1. From Equation (A.2.2) in Appendix A, the first rotation transformation is - V81 X V01 R1 = R(a, v31, Vol) . where a - Mr X Vol'. Define ye; =R1(r3- r1) , and v02 = r3 - r1. From Equation (A.2.2) and (A.2.3) in Appendix A, the second rotation transformation is R2 = (Von V32, Va)- The two coordinate systems are now parallel. What we need next is merely a translation T to bring them into a complete alignment. 41 13, where r. = r(vt). v.=r.-R2R1 (r1). 1M“ r r:— -3 i 1 Finally we get the transformation H ,- for combination C -, n, = (1‘ R2 R1)“. The resulting transformation H for view i is obtained by averaging all H j’s, M2 1 H=— H-. Ni,’ 3. Inconsistency Reduction: Inconsistencies, which are undesirable displacements between networks, are usually observed after the frame transformations of networks. An illustrative example of displacement between two networks is shown in Figure 3.5. Inconsistency may result in either shape distortions or spurious boundaries on the final model surface. An optimization technique is proposed to reduce the inconsistencies among networks. Figure 3.5. An inconsistency phenomenon between two networks. Consider the two networks shown in Figure 3.5. Suppose there is a 4 by 4 transfor- mation H, which transforms network N2 so that the inconsistency between N; and N2 is reduced. The transformation H is composed of a 3 by 3 rotation matrix R and a 3 by l translation vector t, so that a transformation possesses up to six degrees of freedom. Let 01 and 02 denote respectively the overlap of networks N1 and N2. Each con- tains a set of triangles. The equations, T = {T;(n,-,d,-), i=1,m1], of the planes containing the triangles of 01 can be easily computed. Each plane equation is characterized by its surface normal, II", and the perpendicular distance from the origin of the object-centered 42 coordinate system to the plane. Let V = {v}, j=l,m2] be the set of vertices of 02. For each vertex v,- of V, compute the minimum distance dfnin from the vertex to 01. The minimum distance of a vertex is determined as follows. Let {d{, i =1,--~,m1 ] be the set of minimum distances from vi to triangles of 01. Let d1 = min{d{, i=l,~--,m1} and T,,,(n,-,,,d,-,,) be the plane containing the triangle which results in di. Let dint. = n jij - djk- (3.1) However, after transforming network N2, the minimum distance becomes dfim where dine. = (R njk) Vj - ((R njflt + djlt)~ (3.2) Now we can formulate the optimization problem: determine a transformation that minim- izes the following sum of squared distances. I'Iz ,2 IM e = 51de = j§=_:1((R njlr) Vj - (R njk) t- 61,192. (3-3) To solve Equation (3.3) [Gun87], one can use numerical methods such as the Lagrange formula or the Newton-Raphson approach. 4. Redundancy Removal: After transformation and inconsistency reduction, the surface of the object which is located at the object-centered frame should be completely covered by the resulting networks. There are, however, overlaps among them. To achieve a single connected model, the redundant triangles must first be removed. The needed computa- tional techniques of this procedure have been developed by Chen & Stockman [Che85]. An example of the procedure for redundancy removal is shown in Figure 3.6, where the dark area which originally belongs to network N 2 has been removed and results in two smaller networks C 1 and C 2. 5. Gap Connection: The above procedures will produce a set of nonoverlapping net- works. Any gaps in this set must be bridged to completely connect the networks. There are two problems to be solved in this step. The first is to determine the sets of vertices which belong to different networks but are close enough to be connected. The second 43 (a) (b) Figure 3.6. An example for demonstrating the procedure of redundancy removal. issue is then how to connect them properly. These issues are addressed in the next chapter. The output of the gap connection procedure for the example in Figure 3.6 is displayed in Figure 3.7. Figure 3.7. The Result of gap connection for the example of Figure 3.6. 6. Hole Filling: Holes may occur in networks for two reasons. First, holes may appear in a network after triangulation due to the absence of stripes in some areas as shown in Fig- ure 3.4(c). Secondly, the procedure of gap connection may also leave holes. For example, using our method, the gap between C r and C 2 in Figure 3.7 will eventually form a hole in the resulting network after they are connected to their neighborhood. A procedure similar to the 2-D hole filling algorithm in the triangulation procedure was developed to solve this problem. It is called the 3-D hole filling algorithm to distinguish it from the 2- D hole filling algorithm. 44 7. Refinement: One of the major purposes of refinement is to smooth undesirable bumps on the surfaces of constructed models, which were primarily formed during the connec- tion of networks. Methods of averaging the nearest neighbors of each vertex can be used for this purpose. Also, due to the computational errors, it is possible that the resulting shapes of models may be distorted. This defect can be attenuated by manual editing using a CAD system. 3.2.2 Augmenting Geometric Models Once a geometric model has been constructed, it has a set of vertices and triangles. One can augment it by adding surface curvatures, color, intensity or other features. For our purpose, each model vertex has surface curvature, surface type, surface roughness, fit error, edge magnitude, and the angle between principal vectors as additional features. These properties are collectively called the characteristics of the model surface. Their mathematical definitions will be given later in this section. Computation of surface characteristics at each vertex is performed by first looking for the plane tangent to the model surface at that vertex. Then, the object model is pro- jected onto this plane. The resulting projection is a range map. There are several methods to compute desired surface characteristics directly from this range map. One can first fit it with a smooth patch such as a cubic spline, a Coons patch or a quadratic surface. All characteristics can then be computed analytically in terms of the surface function. We present surface equations of B-splines and quadratics for this purpose. B-splines: A spline surface patch can be defined as a combination of two spline curves. The definition of spline curves has been given in Chapter 2. 45 H"001"10P20P30q P331 P P P P 2 r(t,s)=[t3 t2t1]A nggpgpg; A‘ “’3 . _P03P13P23P33j _1‘ where ng’s are connol points, t, s are parameters of the spline patch, and —13-31 A__; 3-604 ‘6 -33 31 1000 Quadratics: A quadratic surface can be defined in terms of three discrete orthogo- nal polynomials [Bes86]: A(A+1)) ¢o(t)=l. ¢1(t)=t. ¢2(t)=(t2- 3 where A = Milli, W is the window size, and parameter t, t E {#9 ...’ -1, 0’ 1’ ...’ L‘V—z-ll} The normalized versions a (t) of functions ¢(t) are: 3 “20): 1 02-4““) A(A+l)(2A+1)t’ P(A) 3 ao(t)= . 01(t)= ). .1. W where P(A) = $5115 + $14 + %A3 - $12 -1—15-A. Then the surface function becomes 2 ’0‘. S) = Z Cij¢i(t)¢j(s). i. j =0 where Cij = if (1.3)0i(t)0j(5)- 1.3 46 The surface characteristics are then defined as follows. Fit error: the mean squared error between the fit surface and the measured surface. Edge magnitude: e = (r,2 + r51”, where r, and r, are the first derivatives of surface function r(t,s). Roughness of surface: R = rfi + 2r£ + r33, where r“, r“ and r3, are the second deriva- tives of surface function r (t,s). _ 2 Gaussian curvature: K = EAL-MT, where 130 —F E=r‘.rt, F=rt.rs, G=rs.rs, L =r,,°N, M=r,,'N, N=r_,,~N, and N: lrtxrrl' EN+GL-2FM Mean curvature: H = 2 2(EG —F ) Surface type: determined from the Gaussian and mean curvatures. -B +VB2 -AC C Angle between principal vectors: A = tan‘l( ), where A =EM-FL, 23 = EN—GL, C =FN-GM. 3.2.3 View Generation By projecting the augmented model of an object onto a synthetic view plane, a range image and a number of registered characteristic maps and a preliminary segmenta- tion of the specific view can easily be generated. The wing representation of this view is then derived via a top-down procedure, which is guided by the object model. The wing detection algorithm proceeds as follows. First, contour pixels are determined based on an integration of information provided by characteristic maps. For example, jump contours can be determined using range data, and crease edges can be determined using combined information of surface roughness, edge magnitude, fit error and zero crossings of curvature. The final contour map is the 47 integration of those intermediate results. A thinning algorithm is then applied to the con- tour map. The resulting contours may be shifted from their actual positions due to the thirring procedure. Such shifts will not influence the extraction of wing features but will affect the computation of wing constraints. This is one possible source of errors in the final result. The surface type of connected regions from the preliminary segmentation, originally determined from signs of curvatures, are iteratively smoothed. The smoothing procedure replaces the region type of a pixel by the majority of its first order neighbors using a 3 by 3 window. This procedure refers to the contour map in order to prevent counting neighbors located on the other side of a contour. The procedure will stop when no changes of surface type can be made or a maximum number of iterations is reached. Based on the resulting contour and surface maps, wing primitives can be extracted through a labeling procedure, which is an implementation of the extraction rule defined in Section 2.3.1. Once object wings are determined, their structural constraints can be computed following the definitions described in Section 2.3.2. The object view is then represented in terms of wing features and structural constraints. 4‘ .v ' v“! u 1 5 w Viewpoints Figure 3.8. A unit viewsphere whose surface forms a 2-D manifold of visual directions. In practice, it is partitioned 'into a finite set of uniformly distributed viewpoints. 48 3.2.4 View Organization Refer to Figure 3.8. Since a 3-D object has an infinite number of 2-D views, a multiple-view configuration characterized by a unit sphere, called the viewsphere, is introduced. The viewsphere is composed of a set of viewpoints distributed on the surface of the sphere, which form a 2-D manifold of visual directions. It is similar to view characteristics [Dud77], property spheres [Fek83], and locus images [Goa83]. This notion is a natural creation of the mental transformation paradigm -- object recognition through a search of model views. Each object is initially represented by a set of object views created according to viewpoints of the viewsphere. Each view is generated by pro- jecting the geometric model of the object along the direction from the center of the view- sphere to a viewpoint. The representation of this view in terms of object wings and wing constraints is then constructed by the procedure presented in the last section. There are a variety of approaches to determine the viewpoints of an object. Some are object-independent [Ba588], and some are object—dependent [Kor87]. In the object- independent approach, the viewsphere is tessellated into fixed number of viewpoints. As a result, objects are represented by a fixed set of views. Although this method is quite straightforward, a large number of viewpoints is required to represent complex objects. This will make the representations of simple objects unnecessarily complex. For the object-dependent approach each object has to be processed separately. The determination of optimal viewpoints for an object is not only a formidable task but also inefficient when dealing with a large number of objects. To compromise, we first tessellate a continuous sphere surface into a set of uniform cells, called viewpoints. This step is similar to the object-independent approach. Then, viewpoints sharing some object properties are gather into groups called aspects [Koe76,79, Ike88]. The result of this step will be analogous to using an object-dependent method. In Appendix B, we present approaches to construct discrete viewspheres with 320 and 240 viewpoints respectively. Other viewspheres with 20, 80 or 140 viewpoins are intermediate results when generating the above viewspheres. In the last section, we presented a procedure to generate a wing representation for an arbiuary object view. Following that procedure, for each object, a set of view 49 representations with respect to the viewpoints of the viewsphere can be created. Since every view representation contains a set of wings and structtu'al constraints, we can clus- ter viewpoints into aspects [Koe76,79,84,87, Pla87, Gig88, Ike88] according to the topo- logical structure of wing sets detected in viewpoints. The boundary between any two aspects corresponds to a visual event which signifies a change of tolpological structure of wing sets across the boundary. A visual event is defined as an appearance or a disappear- ance of any wing primitives. This means that views within an aspect should possess the same set of wing primitives and be topologically equivalent. As a consequence, an object is finally specified by a set of aspects. In a polyhedral world, since the faces of objects are all planar, there is no difference between an object wing and a labeled object edge. As a consequence, it is natural for us to treat object wings as object edges in the polyhedral world. Previous research in con- structing aspect graphs for polyhedral objects have used either faces [Pla87], edges [Gig88] or k-faces (k=0,1,2,3) [Bow89] as object features. Boundaries of aspect are then determined based on the changes of topological structure of features. In the Plantinga & Dyer work [Pla87], a change of topological structure is defined as a change of visibility of features. However, in the Gigus et al work [Gig88], changes of topological su'ucture are defined as distances between junctions and line segments going down to zero. Although wing features are similar to object edges, instead of following the Gigus et al approach, we will use the changes of visibility of object wings to define the changes of topological structure. For a highly symmetrical object, without considering accidental views, the number of aspects is small. Figure 3.9 shows the three generic aspects of the bowl, which are labeled as 1, 2 and 3. To prove this, let us divide the 3-D space into 8 subspaces, which are labeled from Octant I to VIII as shown in Figure 3.10. Let the bowl be located with its top plane aligned with the x-y plane of the spatial frame and the circle is centered at the origin of the space. Suppose a viewer stands in Octant I. There are only two distinct aspects which can be seen by the viewer, i.e. Aspect 1 and 2 of Figure 3.9. The viewpoints of these two distinct aspects are separated by planes tangent to the rim of the 50 bowl. The same set of aspects also occurs in Octant II, III and IV. However, if the viewer stands in either of the remaining octants (V ,VI,VII,and VIII). There is only one distinct aspect, which is Aspect 3 in Figure 3.9. There is no way to see the concave surface of the bowl from the lower part of the space until the x-y plane is crossed. Through this exhaustive examination, three distinct aspects are obtained for the bowl. We can even estimate the area on the viewsphere for each distinct aspect. A more detailed study of object aspects will be discussed in Chapter 5. A die» "" t? a It (1) (2) (3) 69 Figure 3.9. All generic aspects of a bowl. 3.3 Summary This chapter presented a computational framework for constructing wing represen- tations of real world objects. Implementation details will be presented in the next chapter. Experiments and testing of wing representations will be presented in Chapter 5. Wing representation theory is defined in terms of a set of wing primitives. In order to construct the wing representation of an object, the geometric model of the object is first constructed. There are two major stages involved in the task of geometric modeling: 3-D data acquisition and view registration. Data acquisition includes three steps: calibra- tion, image processing and coordinate computation. In view registration, there are seven steps: triangulation, transformation, inconsistency reduction, redundancy removal, gap connection, hole filling, and refinement. 51 tangent planes Figure 3.10. The octants of a 3-D space. Through the geometric model of an object, different views of the object can be readily generated. Techniques of wing detection and computation of structural con- straints are then applied to each view to obtain a view representation in terms of wing primitives and their relations. Views are then grouped into aspects according to their wing sets. An object is finally represented by a set of aspects. Chapter 4 Implementation of Wing Representation Theory In this chapter, algorithms for implementing wing representation theory are developed. The associated experiments start with physical objects and end with their wing representations. A clay cobra head (Figure 3.2) which has irregular shape will be used throughout this chapter as the primary example object. Later, a coffee cup (Figure 3.2) is used as another experimental object. Further experimental results will be presented in the following chapter. To construct the wing representation for an object, a multiple-view scheme, called the viewsphere, is used. At the beginning, objects are presumed to be located at the center of the sphere. Each objecr is initially represented by a set of views, which are generated by projecting a pre-constructed geometric model of the object along the directions from the center of the sphere to a set of viewpoints distributed on the surface of the sphere. Each object view is in trrrn represented by a set of wings and their structural properties. Intermediate results and difficulties of the experimentation will be described. Through this implementation, weaknesses of the theory originally overlooked are discovered and discussed. Remedies will be the basis for future work. 52 53 4.1 Model-Building In the experiment, a geometric model is constructed by integration of a set of sensor views taken by an active structured light sensing system. Views are then registered through a series of procedures to form a connected triangulated surface. Figure 4.1 shows a schematic diagram of the modeling process. 3-D Data Stripe - - Object Acquisition Networks ch‘smm Model Figure 4.1. The schematic of the modeling procedure. 4.1.1 Sensing System To acquire 3-D coordinates from object surfaces, an active structured light sensing system is used [Che85]. The configuration of this system is shown in Figure 4.2, which is composed of a camera, a projector and an image processor. A slide with a 10 x 10 square grid is used as the pattern mask and inserted in the front of the projector. It is best to arrange the orientation of the camera such that the optical axis of the camera intersects any light sheet at about 45 degrees in angle. The camera with 50mm focus length is located at about 175cm (the standoff) away fiom the origin of a global coordinate sys- tem. The global frame is fixed at the center of the light grid, with the x and y axes coin- ciding with the grid axes, and the z axis pointing upward. The projector is positioned above the workbench at about 145cm. Before the sensing system can be used, both the camera and projector have to be calibrated. The experimental object (the cobra head) is then put in the FOV (Field of View) of the sensor, so that the light sheets created by the projector can be properly pro- jected onto the object surface. Figure 4.3 shows four stripe images taken from scenes where the cobra head has been posed. To construct a complete model, every part of the object surface should be covered by at least one sensor view. Thresholding, thinning, clearing and clustering [Hu89] are applied to the images. Features of the light stripes can thus be extracted. The spatial coordinates of those features are readily computed by 54 projector camera System I I I I I l I I I I : Processrng : I I I I I I I I (x.y,z): global coordinate system (x',yl,zl): object coordinate system X object Figure 4.2 The configuration of an active structured light sensing system. solving the consistent labeling problems and triangulation presented in Appendix A. Figure 4.3 Stripe images of the cobra head with four different poses. 4.1.2 Registration Figure 4.4 displays networks derived from four stripe images corresponding to four different poses of the cobra head. They will be used as the input to the regisu'ation pro— cess which integrates the networks into a connected model. Procedures in the registration 55 process include triangulation, transformation, inconsistency reduction, redundancy remo- val, gap connection, hole filling and model refinement, and are depicted in Figure 4.5. Figure 4.4 Networks derived from four stripe images corresponding to four different object poses (the cobra head) are to be used as the input to the following registration process. 0".tringu-_uusfor-} m gap her._tncd.1 object W lain: union I? maul connection filling refinerneru model Figure 4.5. The diagram of the registration process. 4.1.2.1 Triangulation To triangulate a network the terminations of the network, which are endpoints of the suipes, are located and connected to form a closed network. Then a 2-D hole filling algo- rithm is applied to each grid cell. The hole filling algorithm proceeds as follows: (1) Start with any vertex, say v1, ofa grid cell. (2) If all vertices are collinear or the number of vertices equals two, stop. (3) Select the next two adjacent vertices from v 1 , say v2 and v3. (4) If the vertices v1, v2 and V3 are noncollinear, a triangle is constructed in terms of these three vertices. Otherwise start with the second vertex v2 (set v1 = v2) and go to step 2. 56 (5) If the constructed triangle overlaps its neighbors, start with the second ver- tex v2 and go to step 2. Otherwise the triangle is preserved and v2 is deleted from the list of vertices. Then, start with V3 (set v1 = v3) and go to step 2. A detailed algorithm appears below (Algorithm 1). Algorithm 1: 2-D hole filling Objective: Triangulates a 2-D polygon into triangles. Input: A 2-D polygon P1. Output: A triangular polygon P2 of the polygon P1, Procedure : begin V =- the vertex listofP1; n = the number ofvertices ofV; v1: V[1]; n1 = n; times =- 0; while (It > 2) begin if (n 0 n1) begin til-n: times-0; end; else begin times - times + 1; if (times 3 11) return; end: v2-thefirstadjacentvertex ofvl; v3 8 the second adjacent vertex of VI; if (vl,v2,v3 are collinear) begin v1 I V2; repeat; “Id; T a triangle(vl, v2, v3); if (T overlaps its neighbors) begin v1 a v2: repeat: end; add T ‘0 P2; delete #2 from V; nan-h v1 8 v3; end: end; (" end of procedure “) An overall procedure for triangulation of a stripe network appears below as Algo- rithm 2. The result of the example in Figure 4.4 is shown in Figure 4.6. 57 Algorithm 2: Triangulation Objective: Triangulates a connected network. Input: A connected network N1. Output: A triangular network N2, Procedure : begin look fa ruminations of network N1 using turn-right strategy; connect terminations sequentially to form a closed network N; for (each polygon P,- of the closed network N) apply 2-D hole filling algorithm to P1; add the triangularized P1 to N2; end: end; (" end of procedure ") Ill” c-uu" ' ' “ . .\"";z§:§~ IV; '5" ‘P v 4 « :¢:¢:¢;;'tv1§z$‘? ‘. V Inna?» i (YAV‘Ig'AV; 4 Figure 4.6. The triangulated networks derived from the example in Figure 4.4. 4.1.2.2 Transformation and Inconsistency Reduction The networks in Figure 4.6 are given in terms of the global coordinate system. They have to be transformed to the object—centered coordinate system so that their union can cover the object surface properly. The transformation of a view is computed through a set of control points detected in that view. It is necessary for at least three points in the set to be noncollinear so that the transformation can be uniquely determined. Control points should be well separated to reduce the effects of angular error because the computed transformation usually gives better results for the areas close to those points. It is best to mark control points at edges of an object so that they can be easily seen by the sensor from as many viewpoints as possible. If an object can be fixed on a regular external wire frame, then some feature points of the frame can be selected as 58 control points. One could even stick hatpins into the object to provide control points. A computational method of transformations based on a set of control points has been formulated in Chapter 3. Computed transformations are applied to their correspond- ing views. The resulting networks approximately cover the object surface. At this moment, there may be some positional shifts among transformed networks. Such shifts will be referred to as the inconsistency phenomenon. Serious shifts can result in shape distortion and spurious edges on the final model. Inconsistency can be improved using either CAD systems or a program to interactively adjust the networks. A mathematical model based on Equation (3.3) can be used to automatically calcu- late transformations between two inconsistent networks. This model assumes that incon- sistency among networks is initially small so that precise matching can be achieved by recursive positional refinements using optimization techniques. An algorithm based on the mathematical model is given as Algorithm 3 below. Figure 4.7 shows the networks after the procedures of transformation and inconsistency reduction. Algorithm 3: Inconsistency Reduction Objective: Reduces inconsistency between two networks. Input: Networks N1, N2, and error tolerance 8. Output: The rectified network N2, Procedure : begin set e to any value larger than 8; sete, toanyvaluelargerthan 0; repeat basin e, a e: compute the overlaps O1 and 01 of networks N1 and N1; compute plane equations '1‘ of the triangles of 01, T a [Ti(nitdi)o i=1,"l1}; letVbethesetofvertices ofOz; compute optimal transformation H by solving Equation (3.3): N2 3 H N2; compute e by substituting H into Equation (3.3); end; until ((e > e) and (e < e,)) end; C" end of procedure *"') 59 Figure 4.7. Networks after transformation and inconsistency reduction. 4.1.2.3 Redundancy Removal After transformation, the networks should overlap one another. Referring to the example in Figure 3.6(a), an algorithm called redundancy removal is developed and specified in Algorithm 4. In the example, networks N1 and N2 partially overlap. The overlapping area of N2 is called the redundancy R (hatched area) which is to be removed in order to obtain a larger connected network by connecting the nonoverlapping net- works. ‘ (1) For each triangle T13 of network N2, check whether there is a triangle T}- in network N1, that is close to T,3 and overlaps it. (2) If such a T} exists, T12 is a redundant triangle and is removed from N2. (3) Extract the neighboring triangles s}- of r}- on network N1, which also over- lap triangle r}. (4) The union of T}- and S} is referred to as the contribution set of triangle T1? (5) The contribution set of whole redundancy R is the union of the contribution sets of triangles in R. This set will be used in the gap connection procedure. In the above algorithm, to check the overlap between triangles, two constraints must hold. These two constraints stem from the assumption that if two triangles overlap, these two triangles should cover nearly the same part of the object surface. Therefore, their 60 positions and orientations should be similar. We examine both the maximum and minimum distances between two triangles. The computation of position and orientation change between two triangles in 3-D space are addressed elsewhere [Che85]. The thres- holds of the minimum distance and the angle subtended between two surface normals are 5 mm and 50 degrees respectively, which have been determined fiom an analysis of error in calibration and transformation to be presented in Section 4.1.3.2 and 4.1.3.3 respec- tively. Algorithm 4: Redundancy Removal Objective: Removes redundant triangles from the second network N2, and looks for the contribution set of the redundancy. Input: Networks N1 and N2. Output: A set of surviving components of network N2, and the connibution set S of the redundancy R. Procedure : begin for (each triangle T12 of network N2) for (each triangle T} of network N1) begin if (T? and T} do not satisfy angle constraint) continue the inner for loop; if (T,2 and 7’}- do not satisfy distance constraint) continue the inner for loop; if (r? and r} overlap) r? is a redundant triangle; delete it from network N2; look for neighbors S} of T} in N1; for (each triangle T}, of 8}) if (r? and r}, overlap) add T}, to the contribution set S? of T3; break 1009 it end: end; ‘ end; (*"' end of procedure ") Figure 3.6(b) shows the result of the example in Figure 3.6(a) after the removal algorithm. In a similar vein, the final result for all networks can be achieved by applying the above procedure in parallel to all pairs of networks. 61 4.1.2.4 Gap Connection Continuing the example of Figure 3.6, network N1 remains unchanged. In N2, removing the redundancy results in two disconnected components C1 and C2. Consider connecting N1 and C1 into a single network. Before describing the algorithm, let us first define the boundary elements of a network. A triangle is a boundary triangle of a net- work if one of its vertices has an empty sector (no triangle). An edge is a boundary edge if it is an edge of a boundary triangle and not shared by another triangle. Finally, a vertex is a boundary vertex if it terminates a boundary edge. According to the above definitions of boundary elements, an algorithm (Algorithm 5), which looks for boundary triangles, edges and vertices of a network, was developed. It is called the network boundary algorithm Note that the boundary elements output from this algorithm are sequentially ordered. Let us refer to the boundary portion of a network to be connected as a connection boundary. The procedure will include two stages. The first stage is to determine the connection boundaries of both N1 and C1. The second stage is to fill the gap between the networks with triangles in terms of their connection boun- daries. Algorithm 5: Network Boundary Objective: Looks for bormdary triangles, edges, and vertices of a network N, and output them in order. Input: A netwu'k N. Output: Ordered boundary triangle list TL, edge list EL, and vertex list VL. Procedure : begin (" looks for boundary elements of network N ") for (each triangle 7',- of network N) for (each edge e} of triangle T.) if (e,- hasn’t been shared by another triangle of N) begin e ,- is a boundary edge: storeejinBE; recordl} inBT; end; C" determine the orders of boundary edges **) while (BB is not empty) begin start with the first edge of BB; move this edge to EL and move its corresponding triangle to TL; 62 store the edge vertices in VL; set the first edge vertex as HEAD; set the second edge vertex as TAIL; delete the edge from BE; while (TAIL not equal to HEAD) begin look for all boundary branches B of vertex TAIL; if (single branch) begin store the edge in EL; store its second vertex in VL; set its second vertex as TAIL; delete the edge from BE; end; else (multiple branches) return vertex TAIL and boundary branches B; end; end: for (each boundary vertex v,- of VL) begin look for all triangles T; sharing vertex v;; determine the orders of triangles T,- based on their edges. store the ordered T,- in TI... end: end; (“' end of pocedure ") Using the output of the above algorithm, the gap connection procedure is described below and details are presented in Algorithm 6. (1) Determine the connection boundary CCB of C1. (1a) Apply the boundary algorithm to C1. Let CBT, CBE and CBV denote the boundary triangles, edges and vertices of C1 respectively. (lb) Apply the boundary algorithm to redundancy R. Its boundary triangles RBT, edges RBE and vertices RBV can be derived. (1c) The connection boundary CCB of C1 is then determined as the intersec- tion of CBE and RBI-3. CCB = CBE n RBE. (2) Determine the connection boundary NCB of N1. (2a) Apply the boundary algorithm to N1. Let NBT, NBE and NBV denote the boundary elements of N1. 63 (2b) Look for the triangles CT from the set of redundant triangles R, such that each triangle of CI‘ is adjacent to at least one vertex of the connection boun- dary CCB Of C1 . (2c) Determine the contribution set ST of CT from the contribution set of R. Let SE be the set of edges of the contribution set ST. (2d) The connection boundary NCB of N1 is the intersection of SE and NBE. NCB = SE n NBE. (3) Connect the gap between N1 and C1 by sequentially filling the connection boundaries CCB and NCB with triangles. Algorithm 6: Gap Connection Objective: Connects surviving component C of network N; to network N1 after removing the redun- dancy intersection R from N2. Input: Networks N, C and R. Output: The connected network N. Procedure : begin (‘lookfortheconnectionboundaryofC *) apply the boundary algorithm to C; obtain the boundary elements CBT, CBE, and CBV of C; apply the boundary algorithm to R; obtain the boundary elements RBT, RBE, and RBV of R; theconnection boundaryofCisCCB = CBE nRBE; look for disconnected boundary pieces CBC of CCB: apply boundary algorithm to N1; obtain the boundary elements NBT, NBE, and NBV of N; for (each boundary piece CBC.) begin ("' look for the connection boundary of N, *) look for triangles CI} in R which possess at least one vertex Of CCBi; determine the contribution set ST,- of triangles CT}; let SE, be the set of edges of the contribution set SI}; the connection boundary of N, is NCB,- = SE,- n NBE. create triangles between NCB.- and CBC,- for filling their gap; delete created triangles, which intersect with their neighboring triangles; end: end: (""' end of procedure ") 4.1.2.5 Hole Filling Repeatedly applying Algorithm 6 to all pairs of networks, we can complete a con- nected network. Unfortunately, small holes may still exist in the resulting network. For example, in Figure 3.7, after C1 and C2 are connected to the other networks, a hole will eventually be formed between C1, C2 and N1. We need another algorithm to fill those holes, called the 3-D hole filling algorithm. This is different from the 2-D hole filling algorithm addressed in Section 4.1.2.1. The 3-D hole filling algorithm assumes that the input network is almost complete except for some small holes. Therefore, the boundaries of the network are equivalently the boundaries of the holes. The detail of this algorithm is specified in Algorithm 7. (1) Apply the boundary algorithm to the network to look for boundary vertices, which are the boundary vertices of holes. 2 (2) Individual holes are discovered by seeking disconnected sets of boundary vertices. (3) For each hole, patch it with triangles. Algorithm 7: 3-D hole filling Objective: Fills holes of a 3-D network. Input: A 3-D network N. Output: The netwu'k with holes filled. Procedure : begin apply the boundary algorithm to N; obtain boundary elements NBT, NBE, NBV; look for disconnected sets BVN of NBV; for (each disconnected set BVN,-) begin V a the vertex listofBVN;; n s the number ofvertices of V; v1 = V[1]; n1 = n; times = 0; while (It > 2) begin if (n 0 n1) begin n1 = n; times = 0; end; else 65 begin times =- times + 1; if (times a n) return; end; v2=thefirstadjacentvertexofvh v3 = the second adjacent vertex ofvl; if (v1.v2,v3 are collinear) begin vi a v2; repeat: end: T a triangle(vl, v2, v3); if (T intersects its neighbors) begin v1: v2; repeat; end; Md T to P2; delete v2 from V; n = n - 1; v1: v3; end: end; ("' end of procedure "') The final version of the constructed geometric model of the cobra head is shown in Figure 4.8 with three different orientations. '3‘! . 3:?r‘ .g ' _ \s ’1 o _¢.'.'9-t 1'14 '\ 4 'A :v bid ? I.) 7/ .4. Figure 4.8. The constructed geometric model of the cobra head displayed in different orientations. 4.1.3 Discussion of Model Building Since data acquired by the sensing system is a set of networks instead of discrete points, the global shape of objects is explicitly preserved in the networks. The registra- tion takes advantage of the connectivity of networks and greatly reduces the workload in the triangulation step. With the benefits of low cost and high precision, the structured light sensing system is suitable for geometric modeling but has some limitations. Later 66 we will present an analysis of error in calibration and transformation. Through this analysis, the thresholds of distance and angle used by the procedure of redundancy remo- val can be provided. Finally, the difficulties and future improvement to the model build- ing are given. 4.1.3.1 Limitations of the Sensing System Structured light sensing systems sufi’er from several limitations. For instance, even under all possible positions, some deep concavities existing on an object surface may not be simultaneously seen by both the camera and the projector. Such concavities are referred to as blind regions. To reduce the size of such blind regions, one can move the camera and the projector closer to each other. However, this increases the error in com- puting the coordinates of 3-D points from triangulation. The sensing system also has difficulty with some areas on an object surface, such as the top of a typewriter keyboard. We call an area with many texture-like elements a busy area. Stripes lying on a busy area will be very disconnected. Their grid coordinates are thus difficult to identify. It is better to remove busy areas from consideration by filling gaps with material or covering with papers. For some narrow regions, such as the handle of a coffee cup and the barrel of a toy tank, points are either difficult to sample, or the sampled points are not sufficient to cast a significant surface patch. This occurs where the feature widths are smaller than the inter- vals of light sheets. We call these regions dead patches. Grid slides with finer resolution may be used to compensate for this difficulty. One can also use a CAD system to add the features to the model. 4.1.3.2 Calibration Error Analysis Suppose that the combined error caused by feature point detection and modeling by the camera matrix is assumed to be e pixels in the image plane. We want to predict the error in the computed 3-D point location. First let us consider the one camera case. Refer to Figure 4.9. Suppose a point P is located on the plane 2 = 2",. Clearly, the error 8 will 67 be a function of coordinate c, coordinate error e, tilt angle 6, focus length f, and standoff d. The following refers to the configuration in Figure 4.9, and considers the worst case of e. . c c + e Srnce tanG = — , and tanG = , 1 f 2 f we et r = d sine and r = d sine g ‘ tan(9:t91) ’ .- tan(9i92)' The error is 1 _ 1 l tan(9 i 92) tan(9 i 91) 8=|rc—r,|=dsin9| However, tan(9:t91) = ILfn—eii ’ and tan(9;t92) = [tfn9i(czte) f+ctan0 f+(c:l:e)tan9 dfe (1 + tan29)sin6 fztanze -fe tan9+ c(c -e) Thus, we obtain 8 = | | in the worst case. This error equation can be further simplified by the fact that the standoff d is usually much larger than the object size, so that tan6 can be approximated by sine. This reduces the equation for e to a much simpler equation, 8 __ de fsine' (4.1) Suppose we have e ~ 2 pixel ~0.026mm, d ~ 1750mm, f = 50mm, and 6 ~45 degrees. Substituting these values into (4.1). We get 8 ~ 1.28mm. Thus a 2-pixel error in the image plane will cause about 1.28mm radial error on the workbench 1750mm away. If we extend the consideration to different planar levels, an error cone can be formed with the view point as the vertex of the cone. The previous discussion examined the error which is possible when imaging points on a known plane (the workbench surface). However, we would like to take the projector into consideration. If we assume that the lens of the projector is similar to the lens of the camera, we shall expect similar modeling errors in the calibrated projector matrix. The error cone corresponding to the projector can then be formed (refer to Figure 4.10). Sup- pose a point P is located within the field of views (FOV) of both the camera and the 68 optical ray true point computed point N II N i .12---------------------- E ) qI—I- I.- Figure. 4.9. Error of e pixels in image yields radial error r' on given plane. projector. Let Pa be the image point of P on the camera plane, and Pb be the projected point of P on the projector plane. Then the 3-D coordinates of P can be computed by intersecting the camera ray, which passes through Pa, and the projector ray, which passes through P5. The mathematical computation could fix the position of P anywhere within the intersection of the two error cones. It is neither simple nor necessary to derive an exact measurement of this 3-D error volume. 69 camera projector Figure. 4.10. Points computed from triangulation can be anywhere in the intersection of the two error cones. 4.1.3.3 Transformation Error Analysis The following analysis describes how initial errors of positions and angles will be augmented after transformations. Similar analysis can also be found in [Gri85]. The results of this section and the last section will provide thresholds on distance and angle needed by the redundancy removal procedure. Theorem 4.1: If a line segment a 102 with positional error of at most r0 for each of its endpoints is rotated, the endpoints of the rotated line segment have a relative positional error bound. p’ 5er —(1 - 1:2)“2 +p~1§ . (4.2) 22;, . 2r; , . where p = T, p = T and d is the length of the hne segment, d =|0102|. r; is the new positional error after rotating. 70 Proof: See Appendix C. To make this theorem more explicit, let us look at an example. Suppose the original rela- tive error p0 of points is 1%. From Equation (4.2), the augmented error after a rotation is: p1 = 241-(1-p3)fi+poxf§ = 2x/1-(1-0.012)1’2 +0014? = 0.0315. After two rotations, the error will have increased to [)2 = 0.0944, ie. 9.4% error. Next, let us consider the error propagation through a translation. As indicated in Section 3.2.1, the translation is computed by averaging three translation vectors. These are obtained by subtracting positional vectors of the starting points from the target points. Suppose the errors of the starting point and the target point are respectively e 1 and eg. Then for the worst case the resulting error will be increased to e 1 + e2 after a translation. Each transformation is composed of two rotations and one translation, so the final transformation error bound results from their combination. 0 Suppose the original positional error of points is re. Zro d o The relative error is p0 = o The relative error after performing the first rotation is p1 =2\/1-(1-pé)rrf+po‘5. o The relative error after performing the second rotation is p2 =2\11 - (1 -pf)“z +P1\/3_- , d o The error after performing the translation is r = p: + r0. o r' is the upper error bound or the worst case error. From the error analysis (1.28mm) of calibrations in the last section, let us assume the positional error of points is about 1.5mm; r0 ~ 1.5. Suppose the diameter of an object is about 250m. Hence the initial relative error is p0 ~ 1.2%. After the first rotation, the 71 error is raised to p1 ~ 3.8%, and up to p2 ~ 11.2% after the second rotation. That is, after two rotation transformations, the initial error 1.5mm will increase to 14mm in the worst case. Adding this error to the translation error (1.5mm), we get the error bound r' ~ 15.5mm. Note that the computation of positional error due to rotations is quite dependent on the distances between points to the rotation center. The result of 15.5mm is the worst case. Actual error should be less than this value. In addition, the inconsistency reduction procedure may reduce the positional error. Now, let us consider how error in the initial angle is increased after the transforma- tion. First we need to figure out by how much the orientation of a line segment will be shifted from its original direction, if a positional error is introduced to its endpoints. Based on the result, we want to estimate the angular shift of the surface normal of a trian- gle between the before and the after of a positional error being introduced to the triangle vertices. Proposition 4.2: If the endpoints of a line segment possess a positional error of at most r0, then the maximum angle shift A9 from the true orientation is A9 = cos'1 (1 -p%)“2 2'0 . . — wherep = T, and d 1s the length ofthe lrne segment; d =|0102| . Proof: See [Che87]. Based on the above proposition and the fact that a triangle is composed of three line segments, we have the following result. Proposition 4.3: If a plane is represented in terms of three points, each point with the largest positional error r0, then the maximum angle shift of the sur- face normal from its true direction is 72 A6 = cos'1 (1 - 11%)“ 2 (43) Zr _ wherep = 70-, anddis the length ofthe line segment; d =|0102|. Proof: See [Che87]. Note that the value d here represents the diameters of triangles of networks. Sup- pose the observed diameters of triangles range from 15mm to 25mm. According to these values, the angle shift computed from (4.3) will be disastrous because the ratio of error to signal is too large in the worst case. For general cases, the positional error is about 0.7mm. Let us choose 20mm as the general diameter of triangles. From the equation of transformation error bound, the augmented positional error of points after transforma- tions becomes 7.6mm. Therefore the relative positional error p is 76%. Substituting these values into (4.3), we obtain the resulting angular shift of the surface normal is about 49.5 degrees in the worst case. Such large errors, however, are not typically observed. One reason is that worst cases are not frequent. Another reason is that error fiequently affects points in a neighborhood systematically yielding a cancelling effect at least for computa- tion of normals. Shrikhande & Stockman [Shr88] showed that normals can be computed to within 3-6 degrees. 4.1.3.4 Difficulties and Suggestions Since the model shown in Figure 4.8 was constructed using only four sensor views of the cobra head, there was a hole on the top back of the constructed model, which had been patched by the hole filling algorithm. To get rid of this problem, more sensor views will be needed. Currently, the number of sensor views needed to construct a model is determined by the Operator. There is no general strategy to figure out a priori the most suitable number of views. There is also no optimal method for deciding which views should be registered first. A different ordering of views can result in a different final net- work. 73 Typically, using slides with finer grids should improve the quality of a constructed model. This may not be true for objects with simple shapes such as polyhedral objects and quadratic surfaces. The techniques proposed here are suited for objects with moderately complicated surfaces, which can not be easily handled by CAD systems. For such objects, the selection of grid slides becomes important. In general, it depends on aspects such as the complexity of the objects and the purposes of application. It is believed that the silhouettes of object views can provide clues for registration. Several researchers [W an87, Pot83] have worked in this direction. However, since their working objects possess symmetrical shapes such as bottles and cars, it is not clear if their techniques can be applied to arbiuarily-shaped objects. An approach developed by Henderson and Bhanu [Hen85, Bha87] using the k-d tree data structure of sampling points to connect points into a network has more applicability in dealing with general objects. Of the seven steps in our registration process, five of them are quite stable and reli- able when applied to our object database. However, the gap connection and 3-D hole filling procedures have encountered difficulties. In the gap connection algorithm, there is a subroutine which looks for boundary elements in a network. This routine has no prob- . lem finding all boundary elements but has trouble arranging them into ordered sequences if a network possesses multiple edge branches in its boundary. Unfortunately, this situa- tion often occurs at twisted regions on the object surfaces. Holes appearing in such areas are not filled completely using the 3-D hole filling algorithm. These difficulties are due in part to current approaches used in the gap connection and hole filling algorithms, both of which use straightforward heuristics. Methods based on computational geometry may provide better solutions to these problems. The procedures which follow the inconsistency reduction procedure could be per- formed by a single procedure which might avoid the aforementioned difficulties. The fundamental idea is that instead of removing redundant overlaps among networks, they could be directly webbed together by connecting the overlapping vertices to their nearest neighbors. A possible drawback of this strategy is that the resulting model will have a 74 different vertex distribution over the model surface. That means the sizes of triangles will be nonuniformly distributed over the model surface. It may be necessary to either rear- range the positions of vertices to adjust sizes of triangles, or to directly tessellate large triangles into smaller ones. 4.2 Augmenting Geometric Models The geometric models constructed in this research are useful in a variety of applica- tions such as computer graphics, collision avoidance, manipulation, and automatic manufacturing. They are not suitable as prototypes in most recognition tasks. For recog- nition purposes, objects are usually represented in terms of their notable features and structural relations. To extract features and relations directly from objects has its own difficulties. Geometric models provide an intermediate stepping-stone to facilitate the generation of recognition-oriented object representations. Model augmentation is one of the essential steps toward this end. A geometric model initially contains a set of vertices and a set of labeled triangles. To augment such a model, we need to include potential surface characteristics for each model vertex. Later, the surface properties of triangles are readily derived from those of vertices via weighting strategies such as barycentric coordinates [Far87]. Surface proper- ties include surface type, curvature, edge magnitude, surface roughness and the angle between principle directions. They have been precisely defined in the previous chapter and will serve as a guide to segment images generated from the model. Consider a model vertex. An approximate surface normal at this point is estimated by averaging the surface normals of neighboring triangles. Then, we can construct a tangent plane passing through that vertex and perpendicular to the estimated surface nor- mal. This plane will be used as a projection plane on which the object model is projected [F0182] and a range image can be obtained. A quadratic surface function [Bes88] is then employed to fit range values. Parameters of equation are determined by minimizing a least squares error function. Surface characteristics at the model vertex can then be com- puted based on the derived parameters. Model augmentation is achieved by applying the 75 above procedure in parallel to all model vertices. The method proposed here enjoys two advantages. The first is that current tech- niques of Computer Aided Geometric Design (CAGD) haven’t provided enough approaches to compute all surface characteristics which we need from a single connected network. Secondly, our method is analogous to the techniques commonly adopted in computing surface characteristics from a real image, so that the characteristic values of augmented models will be compatible with those computed from real images. However, different image resolutions can affect the computational results due to discretization CTI'OI'. 4.3 Object Representation For representing objects, a multiple-view scheme called the viewsphere, is adopted. Objects are presumed to be located at the center of the sphere. Each object is initially represented by a set of views. These are generated by projecting the augmented model of the object along the directions from the center of the sphere to a set of viewpoints distri- buted on the surface of the sphere. Each object view is in turn represented by a set of wings and their structural properties. It is natural for us to attempt to use as few views as possible to represent an object. Researchers [Ike88] have tried to group views into so-called aspects such that each aspect possesses views with certain consistent properties such as linear shape changes or the same set of features. This partitions the surface of a viewsphere. Object representa- tions with such partitions may suffer from ambiguity in determination of the optimal viewpoint at the boundaries of partitions. 4.3.1 View Representation An object view is generated by projecting an augmented geometric model to a plane tangent to the viewsphere at a given viewpoint. The view is composed of a set of arrays containing the range values and other surface characteristics available in the model. Fig- ure 4.11(a) displays a range image (32 x 32) generated from the cobra head model. Based 76 on these maps, the wing representation of that View can be derived. The procedure pro- posed for wing detection based on the characteristic maps is given as follows. , \ L}. “I” \\ r z, t , ( {KN-JV L / 1\\ / (z A) (a) range image (b) wing map Figure 4.11. (a) The range image of a View of the cobra head has been generated from the cobra headmodel. (b) The resulting wing map of this view. Given a set of maps corresponding to an object view (including a range image, a preliminary segmentation of the image, and maps of other surface properties) contours are first located based on the characteristics of range values, edge magnitudes, fit errors and angles between principal directions. Contour types of interest include object silhouettes, internal jump edges and crease edges. After extracting contour information, the algorithm proceeds to make the preliminary segmented images uniform by integrat- ing the contour information. This process currently uses a 3 by 3 window to replace the surface type of a region pixel by a new type which is determined as the majority set of the types adjacent to that pixel. Previously detected contours are used to prevent this pro- cess from counting neighboring pixels across contours. This process continues iteratively until no change of surface types can be made or the number of iterations exceeds a limit. Based on the resulting contour and surface maps, wing primitives can be found through a labeling procedure, which implements the extraction rule presented in Chapter 2. A 77 diagram of the entire procedure is depicted in Figure 4.12. The resulting wing map for the examme in Figure 4.11(a) is shown in Figure 4.11(b). iterate over all views in viewsphere geometric augmented , characteristic model model maps augmentation projection segmentation extraction Figure 4.12. The procedure diagram of wing detection. After deriving the wing features, their structural properties can be computed using the computational methods presented in Chapter 2. The results of computation for the wings in Figure 4.11(b) are illustrated in Figure 4.13. The view information is grouped into three classes: A, B and C. Class A contains the properties of the view in which the number of distinct wing types (6 in the example), the list of wing types (2,3,49,10,11, refer to Table 1.1) and the number of wing features (8 wings) detected from the view are indicated. Following Class A is the class B information, where each detected wing is specified by a set of feature properties including length, depth range, tangent magnitude, curvature and so on. This class concerns unary properties of individual wings. In Class C, the binary relations between wings were described. They include distances, ratios and angles defined in Chapter 2. In graphic terms, the data structure in Figure 4.13 is a specification of a relational attributed graph, and is the wing representation of the object view. 78 4.3.2 Views and Aspects In the last section, we computed the wing representation for an arbitrary viewpoint. However, a viewsphere is initially composed of 320 viewpoints. The tessellation pro- cedure presented in Section 3.2.4 generates these viewpoints on a unit sphere. Therefore, according to these viewpoints, 320 views of an object and their corresponding wing representations can be derived. Figure 4.14(a) displays 48 of the 320 views of the cobra head. The views in Figure 4.14(b) are those of the coffee cup. For each such object view, the procedures of wing detection and structure computation are applied. Thus, we obtain a set of 320 view representations characterized by wing features and their structures. Examples of range, segmented and wing images of model views of the cobra head and the coffee cup are shown in Figure 5.15. The computation is obviously expensive but it can be done off line. Recall that the tessellation procedure starts with an icosahedron which has 20 tri- angular faces. Each face is then divided into four finer triangles to obtain 80 triangles. Thereafter, each triangle is further divided into four smaller triangles. A set of 320 trian- gles, which correspond to 320 viewpoints, is created. This procedure inherently provides a representational scheme with a hierarchy of three levels. The first level contains 20 faces of the icosahedron, the second level consists of four subtriangles for each of the 20 faces in level one and the last level contains four smaller triangles for each subtriangle in level two. Suppose we are given an unknown view. We want to look for a viewpoint from which the corresponding object view is consistent with the unknown view. We can start by comparing the unknown view with the 20 viewpoints in the first level. Then, the four viewpoints belonging to the best candidate of the last level, which is the most con- sistency with the unknown view, are compared. This procedure similarly continues to the third level. In order to reduce the number of object views, we cluster the views sharing the same set of wing primitives into groups called aspects. For the example of the cobra head, 139 aspects are obtained from the initial 320 views. However, 36 aspects are needed for the coffee cup. The number of aspects can reflect the complexity of an object 79 shape. The cobra head has higher complexity than the coffee cup. The above hierarchical organization of the viewsphere may be useful for views represented by object wings once there is wing property that complies with the inherent hierarchy of the viewsphere. We also found that some aspects whose sets of wing primitives are subsets of other aspects. Aspects could be arranged into the structure of lattices so that they can be tightly organ- ized. The performance of the recognition process might thus be improved. We will inves- tigate these problems in the next chapter. 4.3.3 Preliminary Discussion of Wing Representation There are several factors that can affect the final object representation. These factors include map resolution, tessellation of a viewsphere and convolution filters. In this sec- tion, a preliminary discussion about these factors is presented. A closer examination together with a series of experimental results will be given in the next chapter when we discuss the validity of the representation in terms of wing features. In this way, a better understanding of the characteristics of wing representation will be gained. During model augmentation, the fundamental task is to compute surface characteris- tics for model vertices. Computation is based on a set of maps generated by projecting a geometric model to the tangent planes at model vertices. However, different map resolu- tions will lead to different results in the computed characteristics. This problem also occurs during the generation of the representation for a view. In order to investigate the effects of resolution, we used various map sizes ranging from 32 x 32 to 128 x 128 to generate wing representations for the cobra head and the coffee cup. The experimental results and their comparisons are given in the next chapter. Although the number of aspects is directly related to the complexity of object shapes, it is also possible that the initial resolution of a viewsphere may influence the final representation. In order to understand this, we generated two viewspheres with dif- ferent tessellation resolutions (240 and 320 viewpoints respectively). For the case of 320 viewpoints, the solid angle of a viewpoint (or the view resolution) is about 0.039 steradi- ans (the angle interval between two adjacent viewpoints is about 12.5 degrees), whereas 80 0.052 steradians (angle interval about 14.5 degrees) for the case of 240 viewpoints. For a moderately complicated object, it is possible that different wing features could be obtained as the viewpoint varies within a small neighborhood in a triangulation of the viewsphere. In practice, one often finds that in spite of small differences many features remain the same. Viewpoint perturbation will be further discussed in the next'chapter. Another point of interest, which has been addressed by Besl & Jain [Bes88], is the fact that different sizes of convolution masks used to compute shape information will suffer from computational variations which are a function of mask size and the area covered by the mask. This problem is commonly referred to as a scale problem. It is pos- sible for us to construct a model to simulate these computational variations so that better results may be achieved using the estimated variations. We will present a general simula- tion model for this purpose and give a specific use of it in the next chapter. Several researchers [Low88] have used this concept to estimate computational variations and improve their results. 4.4 Summary A procedure for construction of wing representations of real world objects was presented. This procedure included four major steps: model building, model augmenta- tion, view generation and view organization. In model building, 3-D data acquisition and view registration were two essential steps. A geometric model was specified in terms of a set of triangles whose union was a topological and geometrical approximation to the object surface. Surface characteristics were computed at every vertex of a model. Geometric models containing this information were called augmented models. Various views of an object were generated by projecting its augmented model onto a viewsphere which was composed of a set‘ of uniformly distributed viewpoints. Object views were then clustered into aspects, where the views within an aspect possessed the same set of either distinct or distinguished wing primitives. As a consequence, an object was finally represented by a set of aspects which were in turn sets of object views. Each object view was characterized by wing features and their structural relations. The model 81 construction procedure was demonstrate on the cobra head object. More example objects and the properties of representations will be discussed in the next chapter. 82 View information View label: 1 Class A: view properties: number of wing types of the view - 6 wing types: 2 3 4 9 10 11 number of wings - 8 Class 8: unary properties of wings wing label: 1 type: 2 30 length: 68.3, 30 depth range: 21.9, 20 angle span: 31.0 20 maximum.tuzning angle: 22.5, minimum.tu:ning angle: 0.0 30 maximum turning angle: 24.5, minimum.turning angle: 4.2 30 maximum tangent magnitude: 11.1, minimum tangent magnitude: 7.1 20 maximum curvature: 6.6, minimum curvature: 0.0 30 maximum curvature: 0.1, minimum curvature: 0.0 number: of points - 9 (-78.3 -28.5 4506.1) (-73.1 -23.4 4495.8) t-70.9 -l6.6 4489.0) (-70.9 -9.5 4486.7) (-70.9 -2.4 4485.4) (-70.9 4.7 4484.2) (~70.9 11.8 4485.5) (-70.9 18.9 4489.0) (-71.0 26.0 4494.1) wing label: 2 ' type: 11 30 length: 35.5, 30 depth range: 1.1, 20 angle span: 22 20 maximum.tuxning angle: 22.5, minimum.turning angle: 0.0 3D maximum.turning angle: 23.0, minimum.turning angle: 0.4 30 maximum tangent magnitude: 7.1, minimum tangent magnitude: 6.5 20 maximum curvature: 0.0, minimum-curvature: -11.5 30 maximum curvature: 0.1, minimum curvature: ' 0.0 number of points - 6 (-42.4 14.1 4473.5) (“35.3 14.1 4473.4) (~28.3 14.1 4473.5) l-21.2 14.1 4473.5) (~14.1 14.1 4474.3) ( -9.1 9.1 4473.3) Class C: binary properties of wings number of constraint tables - 4 ' table a: 30 maximum and minimum distances: 56.3 166.0 125.6 128.4 132.6 85.3 80.4 85.3 98.2 133.4 182.8 183.4 143.4 106.6 142.0 131.6 42.5 50.3 56.5 158.8 84.6 109.4 92.0 118.0 13.4 12.1 98.6 85.1 158.4 119.6 124.3 96.2 90.5 14.7 13.3 130.6 22.8 90.0 96.8 118.3 65.5 53.4 88.8 82.8 71.6 22.2 62.5 51.2 20.7 57.8 65.7 69.7 70.9 10.2 42.5 60.8 31.0 71.2 99.8 46.7 97.4 18.6 38.0 33.6 table of 2!) ratio of minimum to maximum distances: table of 20 minimum angles: -31.0 59.0 49.2 -31.0 36.5 -22.5 59.0 59.0 -270.0 -180.0 -189.8 -270.0 -202.5 -261.5 -180.0 -180.0 -135.0 -45.0 ~54.8 -135.0 ~67.5 -126.5 ~45.0 -45.0 -120.7 -30.7 -40.5 -120.7 -53.2 -112.2 -30.7 -30.7 -90.0 0.0 -9.8 -90.0 -22.5 -81.5 0.0 0.0 -31.0 59.0 49 2 -31.0 36.5 -22.5 59.0 59.0 -90.0 0.0 -9.8 -90.0 -22.5 -81.5 0.0 0.0 -112.5 -22.5 -32.3 ~112.5 -104.0 -22.5 -22.5 -45.0 Figure 4.13. The structural properties of wing features in Figure 4.11(b). 83 “it “‘25.; . Figure 4.14(a). 48 of the 320 views of the cobra head. 84 Figure 4.14(b). 48 of the 320 views of the coffee cup 85 segmented and wing images of model views. Examples of range, .15. 4 gure Fi Chapter 5 Testing the Wing Representation A good object representation should have the properties of uniqueness, stability, compatibility, terseness and locality. The property of uniqueness requires that the representation of an object be distinguishable from the representations of other objects in a domain. The stability property demands that small variations in the input have little influence on the representation. As for the requirement of compatibility, the representa— tions in a database should be comparable to the representations derived from real images. In addition, the representations should be also terse enough to make processing efficient. Finally, locality enables the representation to cope with partial occlusion. Several procedures have been developed to examine the validity of wing representa- tions with respect to the aforementioned properties. In the following sections, the proper- ties of uniqueness, stability, and compatibility are closely investigated. The property of locality was shown in Chapter 2. However, the current representation scheme does not produce terse representations because it is based on multiple views. To avoid this draw- back, we need to either reduce the number of aspects [Sho88] or use other representation schemes [Lam88]. Later in this chapter, we show that a strategy based on sets of dis- tinguished wing primitives of individual models can further reduce the number of aspects. First, the uniqueness of wing representation is investigated by examining the number of aspects, the distinct and distinguished wing sets, and the wing structures of 86 87 object models. A set of uniqueness criteria will be proposed for this purpose. They form necessary but not sufficient conditions. Next, we examine the property of stability, including investigations of the reliability of algorithms for constructing wing representa- tions under varying resolutions of characteristic maps, tessellations of the viewsphere and convolution filters. It is essential that object wings extracted from real images be comparable with those of internal representations. Otherwise, the recognition procedure will not succeed in identifying scene objects based on the representations. We call this property compatibil- ity. Our discussion of this property doesn’t attempt to present a complete procedure for wing detection. Instead, we will demonstrate that wings detected from real images can be comparable to model wings. While some broader issues are left for future research, for the time being, we con- clude based on the experimental results that for the object domain used wing representa- tion has the properties of uniqueness, stability, compatibility and locality, but not terse- ness. 5.1 Uniqueness The uniqueness of representation requires that the representation of an object be dis- tinguishable from the representations of other objects in a domain. Suppose we are given an object domain 0 to be represented. Let f be the mapping from the object domain 0 to an image domain R which is the set of object representations. To prove that object representation is unique in the domain, we need to prove f is a reversible function, or equivalently f is an one-to-one and onto mapping between the range and the domain of the function. Suppose R1 and R 2 are representations of two distinct objects 01 and 0 2. The uniqueness property requires that R1 at R2, iff 01 at 02. For wing representation, there are several ways to discriminate two representations. Recall that a wing representa- tion is composed of a set of object wings and spatial constraints. If two wing representa- tions are equivalent, they should possess the same set of wings and constraints. It is pos- sible that two objects have the same set of object wings, for example, in some cases of 88 polyhedral objects. However, it is very unlikely that two different shaped objects have exactly the same set of both wings and constraints. There are several criteria in terms of object wings and wing structures, which can be used to justify the uniqueness of wing representations. These criteria form a set of neces- sary but not sufficient conditions. A formal proof of uniqueness should come along with the algorithm of constructing multiple-view representation. Violation of any criterion will produce a penalty. A similarity function of two representations is then constructed based on the penalties. A similar notion could be employed to construct a priority func- tion for object recognition. For example, given a scene representation R, and a set of object representations, the object representation R," that has the largest similarity with the scene representation R, will be tested first. 5.1.1 Criteria of Uniqueness A set of criteria is presented to serve as evidence for the uniqueness of wing representations. Some of these criteria will later become important for the recognition procedure to index into a model database once sensed wings are available. Before describing the criteria, several terms and their symbols need to be defined. First, we want to distinguish between a wing primitive and a wing feature. A wing primi- tive p; simply indicates the symbolic feature type of a wing, whereas a wing feature f,- denotes a wing together with its unary properties. As mentioned in Chapter 2, the unary properties of a wing include wing type, length, depth range, angle span, curvatures, etc.. Therefore, wing features provide detailed attributes of wing primitives. An aspect of a model is a collection of model views which satisfies a grouping con- dition which states that views with the same set of wing primitives can be clustered. A more restricted definition of the grouping condition can be made by including the relative positions of views and considering the set of wing features rather than just wing primi- tives. By the relative positions of views, we mean that only adjacent views satisfying the grouping condition can be clustered into an aspect. 89 The appearance frequency apj of a wing primitive j is defined as the ratio of the number of viewpoints, in which the wing primitive appears, to the number of viewpoints of a viewsphere. A wing primitive with high appearance frequency in an object model means that the wing primitive is a key primitive of the model. The key primitives of a model k form the distinguished wing set D k of the model. The criteria for justification of the uniqueness of wing representations are listed in Appendix D in which the sequence of criteria is ordered with larger w,- ’s representing more rigorous criterion. We can also make 2w,- = 1 via normalization. Note that the cri- t teria of uniqueness are necessary but not sufficient conditions because their inverses may not be true. For example, in criterion 1, if R1 at R; it doesn’t imply that n1 at n2. Let S = 25,-, where s,- is the penalty of criterion i. Then, the larger S, the more distinguishable t the pair of representations. In order to make a similarity function with its value between 0 and 1, we define F = e", so that if F = 1, two representations R1 and R2 are exactly the same. On the other hand, if F is close to 0, the two representations are well separated. We believe that by introducing higher level wing features such as composite wings into the representation the probability that two wing representations constructed for two different objects satisfy all the criteria of uniqueness would be very small. 5.1.2 Distinct Wing Set, Appearance Frequency and Distinguished Wing Set In this section, the wing representations for the cobra head and the coffee cup are used as an example to illustrate that wing representations can be unique with respect to the criteria addressed in the last section. Table 5.1 displays the lists of distinct wing primitives with their appearance fre- quencies in 320 viewpoints (Appendix B) using map resolution 96 x 96. There are 27 dis- tinct wing primitives for the cobra head and 23 for the coffee cup. Let us look at the first row of the table in which the wing primitive with type 1 appears in 115 views for the cobra head and thereby has appearance frequency 0.359. In a similar vein, the wing prim- itive with type 1 is also possessed by the coffee cup and appears in 147 views for the 9O Viewsphere of 320 Viewpoints, Map Resolution 96 x 96 Cobra Head (27 distinct wing primitives) Coffee Cup (23 d'mtinct wing primitives) Wing Primitive #Appearances Frequency Wing Primitive #Appearances Frequency 1 115 0.359 1 147 0.459 2 303 0.947 2 320 1.000 3 259 0.809 3 145 0.453 4 275 0.859 4 210 0.656 5 27 0.084 5 1 0.003 6 127 0.397 6 181 0.566 7 64 0.200 7 19 0.059 8 72 0.225 8 11 0.034 9 294 0.919 9 155 0.484 10 212 0.663 10 124 0.387 1 1 256 0.800 1 1 76 0.237 12 182 0.569 12 42 0.131 14 115 0.359 14 5 0.016 15 119 0.372 15 66 0.206 16 101 0.316 16 6 0.019 17 17 0.053 19 268 0.837 18 25 0.078 20 1 0.003 19 278 0.869 21 10 0.031 20 120 0.375 22 4 0.013 21 175 0.547 24 15 0.047 22 160 0.500 25 38 0.119 24 158 0.494 27 9 0.028 29 15 0.047 32 42 0.131 30 3 0.009 - - - 31 104 0.325 - - - 32 54 0.169 - - - 34 43 0.134 - - - Table 5.1. The lists of distinct wing primitives and their appearance frequencies in 320 viewpoints using the map resolution 96 x 96 for the cobra head and the coffee cup respectively. coffee cup. Its appearance frequency is 0.459. Among distinct wing primitives, some have high appearance frequencies. There are two ways to define the set of distinguished wing primitives for a model. Either the primitives with highest appearance frequencies or I the primitives with fi‘equencies larger than a pro-defined threshold can be used as the dis- tin guished primitives of a model. For example, suppose we select wing primitives with 91 appearance frequencies larger than 0.6 as the distinguished wing primitive. Then the set of distinguished wing primitives of the cobra head is {2, 3, 4, 9, 10, 11, 19} and {2, 4, 19} for the coffee cup. On the other hand, if we select the first three primitives with higher frequencies as distinguished wing primitives, then the cobra head has the dis- tinguished wing set [2, 9, l9} and the coffee cup {2, 4, 19}. According to the results in Table 5.1 which only show distinct wing primitives, appearance frequencies, and distinguished wing primitives of the cobra head and the cof- fee cup, the computed similarity value based on these results and a set of criterion weights {1, 2, 3, ....} is about 0.258. In the next section, we will show that their numbers of aspects are different too, and the complete set of criteria produces a similarity value for the object representations that should be close to zero. 5.2 Stability The property of stability demands that small variations in the input data have little influence on the final results. There are several sources of variations which can affect the resulting wing representations. The first variation comes from using different tessella- tions of the viewsphere, called the variation of view resolution. To study its effect, we use two viewspheres with 240 and 320 viewpoints respectively. In Section 5.2.1, experi- mental results from the new sphere will be compared with the results in Table 5.1 and 5.2. The second variation is due to using different map resolutions related to scale space for the algorithms of model augmentation and view generation. A map here denotes a two dimensional array of values. Therefore, range images are a kind of map with depth values, while edge images are another kind of map with edge labels. There are other maps such as characteristic maps with values such as curvature, gradient, etc.. The stabil- ity of algorithms can be drastically affected by map resolution. Although using a high resolution map to construct object representations can preserve details of objects, the significant structures of objects may be vague as a result of too many details. 92 There is another question about what happens with a perturbation of a model with respect to a viewpoint by a small angle before the model is projected. If an object representation is sensitive to view perturbations, a viewsphere with fine tessellation will be needed. We refer to such perturbations as view variations and will be explored in sec- tion 5.2.3. Finally, a variation due to using different sizes of convolution filters may be the most important one which exists in many algorithms ranging from smoothing, feature exuaction, to computations of characteristic values. The effect of mask size significantly influences the constructed wing representation. A comprehensive investigation into this variation is a nontrivial task. We will present a general model for estimating the compu- tational variations due to using difl‘erent convolution masks in Section 5.2.4, and give examples of edge detection using LOG filters [Nal86] and smoothing using Gaussian filters [Low88] to illustrate use of this model. 5.2.1 View Resolution In order to study the influences of using different tessellations of viewspheres on the resulting representations, two viewspheres (with 240 and 320 viewpoints respectively) have been constructed (Appendix B). Table 5.2 shows the results obtained by repeating the experiments of uniqueness in the last section using the viewsphere with 240 viewpoints and map resolution of 128 x 128. Recall that the results shown in Table 5.1 were obtained using the viewsphere with 320 viewpoints and map resolution of 96 x 96. In these two tables, both view and map resolutions are different. If we compare these two tables by computing their sum d of squared difference of appearance frequencies f,-, 34 1 2 2 m d = [2m -f.-)1 . (5.1) i=1 and d turns out to be 0.588 for the cobra head and 0.494 for the coffee cup. Note that the wing catalogue has 34 wing primitives. If a primitive doesn’t appear in an object model, the appearance frequency of this primitive is assumed zero. Later, we will compare 93 Viewsphere of 240 Viewpoints, Map Resolution 128 x 128 Cobra Head (27 dktinct wing primitives) Coffee Cup (22 distinct wing primitives) Wing Primitive #Appearances Frequency W'mg Primitive #Appearances Frequency 1 47 0.196 1 1 16 0.483 2 157 0.654 2 240 1.000 3 176 0.733 3 53 0.221 4 182 0.758 4 187 0.779 5 18 0.075 5 1 0.004 6 78 0.325 6 135 0.562 7 37 0.154 7 10 0.042 8 52 0.217 8 l 1 0.046 9 222 0.925 9 122 0.508 10 181 0.754 10 98 0.408 11 186 0.775 1 1 1 16 0.483 12 163 0.679 12 41 0.171 14 l 19 0.496 14 1 1 0.046 15 89 0.371 15 52 0.217 16 79 0.329 16 8 0.033 17 19 0.079 19 214 0.892 18 34 0.142 21 34 0.142 19 222 0.925 22 5 omr 20 132 0.550 24 84 0.350 21 155 0.646 25 26 0.108 22 169 0.704 27 4 0.017 24 164 0.683 32 44 0.183 29 31 0.129 - - - 30 1 0.004 - - - 31 96 0.400 - - - 32 56 0.233 - - - 34 72 0.300 - - - Table 5.2. Using the viewsphere with 240 viewpoints. another results (in Table 5.3) obtained using different resolutions. At present, from the computed d values, what we can say is that Table 5.1 and 5.7- are not well consistent with each other. Some frequencies even differ a lot. For example. the frequency of the wing primitive #2 of the cobra head in Table 5.1 is 0.947, whereas '11 is 0.654 in Table 5.2. We don’t know which resolution (view or map or both) causes this difference. On the other hand, if we sort wing primitives according to their ap cc 94 frequences, the first few wing primitives (11 for the cobra head and any number for the coffee cup) form a set which is the same for both tables. This reveals that distinguished wing primitives have better stability than that of distinct wing primitives with respect to view and map resolutions. 5.2.2 Map Resolution In many situations, map resolution is a key factor that can affect the representation results. Although using a high resolution map to construct object representations can preserve details of objects, the significant structures of objects may be vague as a result of too many details. For this reason, it seems desirable that objects with different shape complexities have different map resolutions for their representations. The problems are how can we measure the shape complexity of an object and what is the relationship between shape complexity and map resolution. Is it appropriate to use hierarchical scales to represent objects? This section shows some experimental results for the cobra head and the coffee cup using different map resolutions. Table 5.3 shows the results using map resolution of 128 x 128 and the viewsphere with 320 viewpoints. Computing d values (Equation 5.1) for Table 5.3 and 5.1, we obtain 0.575 for the cobra head and 0.447 for the coffee cup. These results are close to the results (0.588 and 0.494) of Table 5.1 and 5.2. It means that the map resolution is an important factor which affects object representations. Furthermore, if we compute a values for Table 5.3 and 5.2, we obtain 0.137 and 0.095 respectively. These results indi- cate that the view resolution has a little influence on the representations. We conclude that using different view resolutions hasn’t led to prominent differences in final represen- tations but using different map resolutions may have. The number of distinct wing primitives for the cobra head and the coffee cup are listed in Table 5.4 along with the map resolution used to create the wing representations. In the experiment, different map resolutions (fiom 32 x 32 to 128 x 128) have been used to investigate the effects of map resolution. As shown in the table, for the case of 320 viewpoints where using map resolutions larger than 80 x 80, both objects have stable sets 95 Viewsphere of 320 Viewpoints, Map resolution 128 x 128 Cobra Head (27 distinct wing primitives) Coffee Cup (23 distinct wing primitives) Wing Primitive #Appearances Frequency Wing Primitive #Appearances Frequency 1 49 0.153 1 152 0.475 2 212 0.663 2 320 1.000 3 234 0.731 3 76 0.237 4 239 0.747 4 250 0.781 5 21 0.066 5 1 0.003 6 101 0.316 6 179 0.559 7 60 0.188 7 18 0.056 8 62 0.194 8 21 0.066 9 293 0.916 9 167 0.522 10 238 0.744 10 127 0.397 11 263 0.822 11 152 0.475 12 200 0.625 12 57 0.178 14 144 0.450 14 21 0.066 15 1 18 0.369 15 66 0.206 16 108 0.338 16 8 0.025 17 21 0.066 19 290 0.906 18 38 0.119 20 1 0.003 19 297 0.928 21 47 0.147 20 168 0.525 22 2 0.006 21 198 0.619 24 90 0.281 22 234 0.731 25 39 0.122 24 228 0.712 27 l 1 0.034 29 34 0.106 32 47 0.147 30 2 0.006 - - - 31 134 0.419 - - - 32 69 0.216 - - - 34 80 0.250 - - - Table 5.3. Using the map resolution 128 x 128. of distinct wing primitives. The cobra head possesses 27 (the same for the case of 240 viewpoints) out of 34 wing primitives, whereas the coffee cup has 23 (22 for the case of 240 viewpoints). The variation of distinct wing set due to different map and view resolu- tions is depicted in Figure 5.1. Note that the handle of the coffee cup changes the shape regularity of the object and makes the global shape of the coffee cup almost as complex as that of the cobra head. In addition, the size of the cup handle is so small with respect 96 to the whole object. Wing features located on the cup handle are small too. Using either a map with low resolution or a viewsphere with coarse tessellation, these wing features can easily be discarded. The high variation shown in Figure 5.1 for the coffee cup reveals this fact. Figure 5.1 also shows that the two curves of the cobra head completely overlapped each other. It means that although the cobra head has complicated shape, the shape com- plexity distributes uniformly over its surface. Viewsphere of 320 Viewpoints Viewsphere of 240 Viewpoints Map Resolution Cobra Head Coffee Cup Cobra Head Coffee Cup 32 x 32 18 l3 18 14 48 x 48 26 18 26 18 64 x 64 27 19 27 17 80 x 80 27 22 27 18 96 x 96 27 23 27 22 112 x 112 27 23 27 22 128 x 128 27 23 27 22 Table 5.4. Numbers of distinct wing primitives present in views of the cobra head and the coffee cup versus different map resolutions. Table 5.5 shows the influence of view and map resolutions on the resulting aspects. According to this table, the number of aspects increases as the map resolution increases. This tendency is explicitly revealed in Figure 5.2, which indicates that after a map reso- lution of 64 x 64 the number of aspects of the cobra head is stable (arriving at 292 for 320 viewpoints and 224 for 240 viewpoints), while the coffee cup gradually attains sta- bility (145 and 122) after resolution 96 x 96. Unfortunately, the numbers of aspects for both object are too high, violating the terseness criterion. As shown in Figure 5.2, to reduce the number of aspects, using low resolution maps (for example 32 x 32) and coarse tessellation viewspheres (for example 240 viewpoints) is preferred (122 aspects for the cobra head and 38 aspects for the coffee cup). A possible way to further reduce the number of aspects is to use weaker grouping conditions, for example using the dis- tinguished wing sets instead of the distinct wing sets. Another method is to select notable representatives from the set of aspects [Bas88]. Then other aspects can be derived from these representatives as needed. 97 A) 35 1* cobra head number 30 1' ~ - g of distinct 25 wing 20 -> ,. primitives 15 ,, -‘9"'"0 V‘ 10 .. 5 1* r i l I 32 48 64 80 96 112 128 map resolution — : 320 viewpoints ----- : 240 viewpoints Figure 5.1. The variation of distinct wing set versus the map resolution. Viewsphere of 320 Viewpoints Viewsphere of 240 Viewpoints Map Resolution Cobra Head Coffee Cup Cobra Head Coffee Cup 32 x 32 139 36 122 38 48 x 48 234 50 178 48 64 x 64 271 76 206 66 80 x 80 292 106 224 86 96 x 96 294 121 224 94 112 x 112 299 145 226 122 128 x 128 296 152 228 128 Table 5.5. Different numbers of aspects resulting from using different map resolutions. From the wing maps of object views and the sets of wing primitives of individual aspects (not shown here) using a map resolution of 32 x 32 is already good enough for both the cobra head and the coffee cup because most salient structures of objects are preserved. Those discarded detailed wings are not only unnecessary but also harmful for the purpose of object recognition. For Other applications, if the details of objects are so important, we suggest use of multiple or hierarchical resolutions to represent objects. A multiple resolution means that using different map resolutions for different parts of 98 r i r i i 1 i ’ 32 48 64 80 96 112 128 map resolution —— : 320 viewpoints ----- : 240 viewpoints Figure 5.2. The stability of wing representations increases as the map resolution increases. objects according to their shape complexity. A hierarchical resolution uses coarse-to-fine map resolutions for each view. We also suggest that a complicated object would better use low resolution maps in order to get rid of noisy details, whereas objects with simple shapes can use any map resolution. Although using a viewsphere with coarse tessellation can result in fewer aspects and better stability than that of using viewspheres with finer tessellations, the constructed representations should be able to tolerate large view range and view perturbation; other- wise some crucial object views may be ignored in the final representations. This problem will be investigated in the next section. 99 5.2.3 View Variation As mentioned in Chapter 3, there are an infinite number of viewpoints on the con- tinuous viewsphere. In practice, one would better discretize it. Then by grouping, , representative views are determined so that a representation can become more compact. It is desirable that in the beginning the viewsphere is uniformly tessellated into discrete viewpoints. This immediately leads to a question of what angle intervals between viewpoints is adequate. Large intervals may lose important views of an object, while small view intervals require a large amount of storage. Understanding the effect of view variations may provide some clues to this question. Figure 5.3. 17 views of the cobra head with each pair of adjacent views differing by an angle of 3 degrees. The image resolution is 128 x 128. 100 17 Views of the Cobra Head, Image Resolution 128 x 128 View #3ng Primitives Wing List 0 10 34910111920222434 1 13 34910111419202122243134 2 11 3491011192122243134 3 12 23491011192021222434 4 l4 234910111920212224313234 5 11 3491011192021222431 6 12 349101119202122243234 7 11 234910111922243132 8 11 49101119202122243234 9 11 49101119202122243234 10 11 234910111922243132 11 13 2349101119202122243134 12 12 4910111219202122243234 13 12 23491011192021222434 14 11 234910111219202224 15 ll 49101119202122243132 16 12 23491011141920212224 Table 5.6. The lists of distinct wing primitives of 17 views of the cobra head in Figure 5.3. As mentioned in Chapter 4, for the viewsphere of 320 viewpoints, the solid angle of a viewpoint (or the view resolution) is about 0.039 steradians (the angle interval between two adjacent viewpoints is about 12.5 degrees). The solid angle of a viewpoint is 0.052 steradians (angle interval about 14.5 degrees) for the case of 240 viewpoints. It is desir- able that most wing features detected in a viewpoint can also be detected in the adjacent viewpoints so that viewpoints with highly overlapped wing sets can be grouped. Through studying the impact of view variation, we hope to discover some useful criteria for view grouping. In order to investigate the effect of view variations, a set of model views of the cobra head (refer to Figure 5.3) was constructed. Then, the representation procedure was 101 Figure 5.4. 17 views of the coffee cup with each pair of adjacent views differing by an angle of 3 degrees. The image resolution is 128 x 128. applied to these views, and their resulting list of wing primitives was examined. Each pair of adjacent (including diagonal) views in the figure differs by an angle of 3 degrees. Therefore, the largest angle variation in this test will be 12 degrees, e.g. the angle between view 1 and view 16. The sets of distinct wing primitives of views are listed in Table 5.6 in which three pairs of views (views 7 and 10, views 8 and 9, views 3 and 13) have exactly the same sets of wing primitives. Most other views have highly overlapped wing lists. The common overlap among wing lists is [4, 9, 10, ll, 19, 22, 24). Although the best situation is that all views within a small angle variation have the same wing set, we must remember that 102 17 Views of the Coffee Cup, Image Resolution 128 x 128 View #Wing Primitives Wing List 0 7 24910111219 1 8 2491011121932 2 9 249101112192132 3 7 24910111219 4 8 2491011121932 5 8 2491011121921 6 7 24910111219 7 6 249101219 8 7 24910111921 9 7 24910111921 10 6 249101219 11 8 2491011121922 12 7 24910121932 13 9 249101112192232 l4 7 24910111932 15 6 249101119 16 6 249101219 Table 5.7. The lists of distinct wing primitives of 16 views of the coffee cup in Figure 5.4. wing sets must vary between some neighbors of the discrete viewsphere if there are several aspects. The situation is much better for the coffee cup (refer to Figure 5.4 and Table 5.7) where more views have the same wing sets (view 0, 3 and 6, view 1 and 4, view 7, 10 and 16, view 8 and 9) and views have more stable common wing list. In order to further test the effect of view variation using larger angle perturbations, an experiment with angle perturbation range of 50 degrees (Figure 5.5) was performed. Each pair of views now differs by 12.5 degrees. The result is shown in Table 5.8. The common overlap among wing lists is now reduced {9, 10, 11, 19, 22]. We conclude that both distinguished and overlapping wing sets can be used to further reduce the number of aspects. Figure 5.5. 16 views of the cobra head with each pair of adjacent views differing by an angle about 12.5 degrees. 5.2.4 Convolution Masks In this section, we will consider another factor that can affect the stability of con- structed representations. Many image processing procedures ranging from data smooth- ing and edge detection to computations of surface characteristics can be implemented in terms of convolutions of masks (or filters). Such convolutions have a problem that the input to a finite filter consists of the data points covered by the filter. Different sizes of filters produce different results. This difference is typically a function of the filter and the area covered by the filter. A small filter will suffer by excluding distant feature points from consideration. However, a large one suffers by covering points coming from dif- ferent features. The latter is sometimes referred to as the problem of multiple features 104 16 Views ofthe Cobra Head, Image Resolution 128 x 128 View 4?ng Primitives Wing List 1 12 349101112141920222434 2 12 23491011121419212231 3 11 234910111219202224 4 11 234910111219223134 5 13 34910111214192022243134 6 11 234910111419222434 7 11 3491011121922243134 8 14 234910111214192122303132 9 11 49101114192122243134 10 12 239101112141922293134 11 13 2349101119202122243134 12 10 34910111920222434 13 13 2349101112192021222434 14 11 3491011121419223134 15 12 23491011121922243134 16 14 234910111214192021222431 Table 5.8. The lists of distinct wing primitives of 16 views of the cobra head in Figtne 5.5. [Ca188]. Clearly, both situations can considerably degrade the performance of convolu- tion and there is a tradeoff between the two. To handle the above problem, researchers usually employ a set of multiple-scaled filters. Different areas are convolved with different scales of filters. This approach has two difficulties: either the final result must be obtained through a complicated integration process or the optimal filter must be determined a priori. Another method [Ca188] sug- gests that small, possibly overlapping, neighborhoods of fixed size can be used, which are chosen such that the entire image is covered. However, the problem of what fixed size of covering areas to use and how to cover the entire image (partitioning or overlap- ping), still exists. Regardless of these issues, both approaches have a common problem that using different sizes of filter mask produces different results. This problem is also 105 encountered in the construction of representations. We present a general model to evaluate computational differences due to using dif- ferent filter sizes. Examples of edge detection using LOG filters and smoothing using Gaussian filters will be given later to illustrate use of the model. Let WI and w; be two square masks with sizes differing by 2n, i.e. |W2 |-|w1 |= 2n, where n can be any integer. According to Brady et al [Bra85], an n times repeated convolution of a filter is approxi- mately equivalent to convolving the data points with a Gaussian mask g whose standard deviation is proportional to ‘52-. Therefore, we can write W2 = wl * g", where ’*’ represents the convolution operator and g" means an n-fold repetition of g. The 3 x 3 ]. Let A1 and A2 denote the areas covered by the filters respectively. Then, the difference Gaussian mask g is 2 2 t—th—I p—e t—INn—b -L g 24 in computations at an image point (x,y) or map resulting from convolutions of these two filters with their corresponding areas becomes: A(x.y) = (A2 * W2)(x.y) - (A1 * W1)(x.y) = A2 * W2 - Ar * W1 = A2*(w1*g")-A1*w1=(A1+A'2)*(w1*g")-A1*w1 =A1*(w1*g")+A’2*(wi*g")-Ai*wi (5.2) = A1*w1*(g"-I>+A'2*. where the variables x and y have been dropped for conciseness. We assume that the sizes of smaller masks have been enhanced by filling zeros around their circumferences when necessary. I represents the unit impulse mask. A; (finite support) is the area difference between A1 and A2, i.e. A; = A2 - A1. From this equation, the computational difference can be simply determined from the smaller mask WI and the larger area A2. Note that filters W1 and W2 are task-dependent. Before the closed form of Equation 5.2 can be derived, filters must be specified. 106 \ flq: M (a) (b) Figure 5.6. (a) A continuous scale space, (b) a discrete scale space. Let us consider a familiar example of looking for zero crossings from a convolution of the Laplacian of Gaussian (LOG) operator and a range image. It has been shown that there are displacements of zero crossings using different filter scales [Asa86]. Figure 5.6 shows a continuous scale space and a discrete scale space. Much noise can be suppressed using coarse scales whereas better location of depth discontinuities can be achieved using fine scales. Theoretically, an edge map with precise locations of discontinuity and less noise can be obtained by tracking contours in the scale space from coarse to fine scales (Figure 5.6(a)). In practice, only a few discrete scales are used (Figure 5.6(b)), which make the tracking operation difficult. I The following equation derived by Nalwa & Binford [Nal86] for an ID step edge indicates the displacements of zero crossings using two filters with different scales 0; and 02. In practice, one implements a discrete LOG operator using a fixed width w such that o = 1:2 Therefore, different widths of the mask will lead to distinct scales 0’s and in turn different results in distinct LOG filters. In view of the u'uncation and discretiza- tion of the LOG function, zero crossings (which are usually regarded as the positions of edges [Gri86]) are shifted from their actual positions. Moreover, such shifts differ from filter to filter. Let WI and wz denote a small and a large filters respectively and WI and wz be their widths. Let the step function be where zl and 22 are larger than 0 and 21 at 22. The displacement turns out to be b-a A = w2 -w2). (5.3) C2d( 2 l The parameters of slopes a and b and step-size d can be estimated from range values. To be more explicit about the use of Equation (5.3), let us refer to the range image in Figure 5.7(a), which is a real image of the cobra head taken using the ERIM range finder. Figure 5.7 (b), (c), (d) and (e) show maps of zero crossings derived using sizes 11, 13, 15 and 17 of the LOG filter respectively. The registered result of these zero-crossing maps based on the displacement equation (5.3) is shown in Figure 5.7(f). Lowe [Low88] has used a similar concept to recover shrinkage error due to convo- lutions of the Gaussian smoothing filter and detected contours. The amount X of shrink- age at a point is a function of Gaussian scale 6 and curve radius r at that point. i X = r (1 - e 7" ), where r is estimated by the measured second derivative of the smoothed curve X .. :2“. X n = e 7’2 / r. Point coordinates of the smoothed curve are then interpolated by the computed shrinkage CITOI'S. 5.3 Compatibility of Wing Features It is essential that object wings extracted from real images should be compatible with those of internal representations. Otherwise, the recognition procedure will not succeed in identifying scene objects based on the representations. Our discussion in the following few sections doesn’t attempt to present a complete procedure for wing detec- tion. Instead, it only demonstrates that wings detected from real images can be 108 Figure 5.7 (a) A range image of the cobra head taken using the ERIM range finder. (b),(c),(d),(e) Maps of Zero crossings derived using sizes 11, 13, 15 and 17 of the LOG filter respectively. (f) The registered map of zero crossings using Equation 5.3. comparable to wings of internal models. Although many ad hoc techniques [Bes85, Lau86, Hof87, Fan87, Par87] of surface and curve detection have been developed, none of them is completely satisfactory. To date, the problem of feature extraction is still nonuivial. Unlike wing detection with syn- thetic images used for generating object representations, which is a top-down process guided by geometric models, wing detection in real images can only use the sensory data. Without knowledge of models, computations of surface characteristics have to rely on the available data and general domain assumptions. 109 Current segmentation techniques developed for range images include methods which are edge-based [Lau86, Fan87, Par87] or region-based [I-Iof87, Be588]. In the region-based approach, characteristic values are computed through mask convolutions. The pixels close to object boundaries (within a distance about half width of a mask) will result in defective evaluations of characteristics, while the ones far away from boundaries have better results. Thus, a region growing [Be388] procedure using the central areas of regions as the seeds is necessary to compensate for this defect. Object boundaries are then derived by intersecting the growing regions. Edges detected in this way are usually not perfect. A local analysis [Hof87] of the acquired edges can then be applied to verify them. On the other hand, the edge-based approaches directly look for object boundaries based on surface normal, slope and curvature, which seems simpler than region-based approaches. Unfortunately, the same problem of computing normals and curvatures is also encountered by the edge-based techniques. To mediate, masks with different scales and oriented preferences [Lau86, Fan87, Par87] are adopted. Edges derived in this way will be further modified using their contextual sm'face information. Line-drawings [Tur74, Wal75, Ma187, Cae82, Nal88] and shape from contours [Bin81, Bra85, Koe84, Hu88] have been systematically investigated. The relations between curves and surfaces are becoming clearer through a substantial number of infer- ence rules. These rules are useful for deducing 3-D shapes near curve segments. As a result, inference rules can compensate for some problems with segmentation techniques. The results of segmentation can be verified using these rules. 5.4 Wing Detection from Real Images Instead of creating new procedures, we mostly use available techniques to carry out partial experiments in wing detection including the methods adopted fiom Hoffman & Jain [Hof87], Besl & Jain [Bes86], Laurendeau & Poussart [Lau86] and Mart & Hildreth [Mar80]. A primary purpose in this section is to look for feasible approaches to exuact- ing wing features in real images. Feasible approaches should be easy (few thresholds 110 involved), efficient (fast processing) and reliable (little variation with respect to mask size used). On the other hand, they need not be perfect. We can tolerate incomplete and even fallacious wings existing in the final results. The recognition procedure to be dis- cussed in the next chapter should be capable of tolerating these defects. In the wing detection experiment, a number of real images of the cobra head, coffee cup and polyhedral object have been used. Figure 5.8 shows these range images displayed in gray scales. They are taken using the ERIM sensor. Two alternate approaches to feature extraction have been tried in the experiment and are presented in the following. 5.4.1 Hoffman-Jain Technique The first strategy includes steps of segmentation, jump edge detection and region unification. The Hoffman & Jain technique [Hof87] has been used to segment the range images. The results of segmentation are shown in Figure 5.9. It reveals that for some regions there are too many small patches involved: images are oversegmented. However, except for the triangular region at the right in the last segmented image of the polyhedral object, there is no other patch in the images across the physical edges of objects. This is good since one merging procedure will be enough to obtain most significant regions which are consistent with object surfaces. However, there is another difficulty when using the present result of segmentation to extract wing features especially from nonpo- lyhedral objects. The boundaries between patches are ragged, which eventually will result in many small wings which will be thrown away because of size or infrequent occurrence. Here, instead of solving the problems of over segmentation and raggedness at once, we will simply detect the silhouettes and jump edges in the images, then apply a unification procedure to the segmented images. Contours are extracted using both the LOG operator mentioned in the last section and Laurendeau & Paussart’s approach [Lau86]. The jump edges and silhouettes of images are shown in Figure 5.10. Afterwards, a symbolic uniform procedure repeatedly groups pixels into the most likely neighboring region according to the majority of eight neighbors of pixels. The neighbors of each pixel may be bounded by contours so that 111 neighboring pixels on the other side of a contour will not be considered. Figure 5.8. Experimental real images of the clay cobra head, coffee cup and polyhedral object taken using the ERIM sensor. 112 Figure 5.9. Segmentation results using Hoffman & Jain’s technique. 113 Figure 5.10. Jump edges of images. 114 Figure 5.11. The results of the first segmentation method. 115 Figure 5.11 shows the results of the segmentation imposed by contours. Comparing the results with those in Figure 5.9, it is clear that the boundaries between patches have been smoothed and most noise pixels within patches have been removed. In the images in Figure 5.11, there are a lot of internal edges which are boundaries between patches. Some of them are me crease edges, whereas some are spurious edges. Spurious edges will increase the recognition time but should not affect the recognition results. 5.4.2 Combined Signs of Directional Derivatives In this section, feature extraction based on directional derivatives is studied. A directional derivative of a function at a point is defined as the first derivative of the func- tion in a certain direction at that point. Clearly, unlike curvatures, directional derivatives are view-variant. A reason that we pay attention on directional derivatives instead of cur- vatures is that wing primitives can be view-variant. Another reason is that directional derivatives are more noise-resistant than curvatures which depend on second derivatives. The method is based on the property of range values which represent distances between the sensor and scene points. Suppose there is a range image of an infinite plane whose normal is not perpendicular to the optical axis. If one scans the entire image along a direction (for example the horizontal direction), then the range values either increase, decrease or remain the same. If the first derivatives of range values are computed in this direction, then the signs of the first derivatives will be all the same (either + : increasing range values, - : decreasing range values, or 0 : equal range values). This means that the pixels in an image corresponding to a spatial plane have the same sign of directional derivatives. Although different scanning directions may result in different signs for the plane, once a direction has been selected, the signs of directional derivative of the plane will be the same. Suppose two intersecting half-planes are depicted in a range image and neither of these two planes is perpendicular to the image plane. If we scan this image along a given direction and compute the first derivatives in this direction, according to the above analysis of a single plane, the derivative signs of pixels of a plane will be all the same. 116 depth depth A l +’ , / / 1 / scanning direction scanning direction (a) (b) Figure 5.12. (a) Two planes have different signs of directional derivatives. (b) Two planes have the same sign of directional derivatives. However, it is possible that two planes in an image may have different signs of direc- tional derivative (Figure 5.12(a)). Of course, both planes may also have the same sign (Figure 5.12(b)). In the first case (two planes having different signs of derivative) the intersection, or the boundary, of planes can be easily determined based on the derivative signs. For the second case (both planes having the same sign), an interesting question arises. Is there a scanning direction along which the computed signs of derivative of the planes can be different. The following lemma answers this question. P2 3' ‘/07 image plane x r Figure 5.13. Two intersecting half-planes are depicted in a range image. Neither of them is perpendicular to the image plane. 117 Refer to Figure 5.13. Let the image plane be the x-y plane. The equation of a plane in E3 can be written as a'x +b'y + c'z = d'. If the plane is not perpendicular to the image plane, est 0, the plane equation can be simplified as 2 = ax + by + d , I I I where a = -£.-, b = --b—., and d =--d—.. c c c Lemma 5.1: Suppose there are two intersecting half-planes P1 and P; (as shown in Figure 5.13) such that neither P1 nor P2 is perpendicular to the image plane. P1: z1= a1x+b1y+d1, P2: 22 = a2x+b2y+d2. Then there is an angle range in the image plane within which any scan- ning direction r(0) can make the derivative signs of the planes distin- guishable. The angle range is -12£-+min(01,02) < e < §+max(01,02), (5.4) where 01 = cos-1((a%+:l%)l,2) = sin‘1((a%+bl%)1,2)a 02 = cos‘1((a%_+:2%)l,2) = sin-1((a%+:2)l,2 ). and 01 #92. Proof: Because the scanning direction r lies on the image plane, it can be represented by a unit vector r = (cos 0, sin 0, 0), where 0 is the angle between the scanning direction and the x-axis. The directional derivative of the plane 2 in direction r is —Z- : acosG+b sine. dr 118 For planePlz dzl 2 2 1/2 a1 1 . dr a1 cosG+b1 srnG (a1 1) ((a%+b¥)l/2 (a¥+b%)m = (a? +b¥ )“ 2(cos 01 cos 9 + sin 01 sin 0) = (af-I-bi )1’2008 (9 - 91) . where 01 is the angle of the projected surface normal of plane 1 with x-axis in the image plane. b1 ) (ai+bi)“2 01 = cos'1( in-l( l “2) = S For plane P2: dz 7:2- : a2 cosG + b2 sine = (a§+b§)mcos (9 - 02) , where 62 is the angle of the projected surface normal of plane 2 with x-axis in the image plane. 6 - cos‘1( 02 ) = sin“( b2 ) 2 (a§+b%)“2 (a%+b%)“2 Due to the image formation, 0 < I01 - 62 I < 1:. In order to make the derivative signs of dz dz planes distinct, 7:— - T:— < o , or equivalently, cos (e-el) cos (8-82) < o . By diagrammatical analysis, if %+min(81,82) < 8 < %+maX(91. 92). (5-4) dz dz then 7r; - E2— < 0 . Note that 01:: 62; the range (5.4) always exists. For example, we 61 + 62 2 can simply choose 0 = 12‘- + , which is the central value of the range. 91+92 2 +8 2 —61) cos (122+ -92) cos (0 - 01) cos (0 —82) =cos (g. + 2 92-91 9-9 9-9 2 1 1 2)]=-Sin2(———2 )<0 2 )[-sin ( 2 = -sin ( 119 From the above lemma, we conclude that except for accidental cases (parallel and perpendicular) the crease edges, both convex and concave, of polyhedral objects in an image can be detected by trying an adequate set of scanning directions. Consider the case: P:z=—x, P: =_—x. 1 2z 2 Since 61 = 02 = 0, there is no such scanning direction which can detect the edge formed by P1 and P 2. In the following sections, implementation of the above idea is first presented. Experiments on synthetic images for testing some properties of the algorithm are performed. Afterwards, we turn to realimages. 5.4.2.1 Computation of Directional Derivatives The approach first fits data points with either a piecewise smooth quadratic surface [Bes86] or cubic spline [Yan85]. Directional derivatives are then computed based on the fitting surface. For real images, range data are first smoothed by a square median filter before computations of derivatives. For synthetic images generated from geometric models, the range values need not be smoothed because the projection process has impli- citly smoothed the point data. Let a range image be denoted by the function f (u,v), where u represents the row variable and v the column variable. Two directional derivatives of f“ and f, have been computed. Once fu and f, are available, the directional derivatives can be easily deter- mined for an arbitrary direction (cosO, sine) using the following equation. EL = 2: 2L - = - dr Bu cosG+ 3v srn0 f“ cos9+ f, srn0. (5.5) Let us refer to the arrays containing values of f“ and f, as the f, and f, maps respec- tively. Before using these maps to segment images, they can be smoothed again using a median filter to further remove isolated labels. For each map if an entry has positive value, it is labeled 1; if it is negative, it is labeled 2. Other entries in the map are assigned 120 zero. The results are the sign maps of derivatives. Due to computer discretization error, the probability that a derivative equals zero is very low. Thus, boundaries between regions will mostly be one pixel wide (refer to Figure 5.14). With only one threshold at zero, the method is easy to implement and use. The results below show that it has practi- cal promise. A symbolic uniform procedure (mentioned in Section 5.4.1) for making the regions more homogeneous is applied to each computed sign map. Since a sign map contains three different values: 0 (background), 1 (positive derivative), and 2 (negative deriva- tive), in order to combine two maps into one and also keep their regions distinguishable, an adequate integration of the sign maps will be needed. A linear combination has been employed. By this, we mean that the first sign map is multipled by a constant and added to the second sign map. The constants adopted shouldn’t be 1 or 2 because these two values have been used by sign maps to represent symbolic regions. The constant used in the experiments is 3. After combining the sign maps, the resulting map is the segmenta- tion map. The entire procedure is sketched below. 1. Smooth a range image using a median filter. 2. Compute the directional derivatives f“ and f, using Besl & Jain’s fitting model. 3. Smooth the derivative maps using a median filter. 4. Derive the sign maps based on the derivative maps. 5. Improve the sign maps using a symbolic uniform procedure. 6. Combine the directional derivative sign maps. 5.4.2.2 Synthetic Images Although the segmentation method is easy to understand theoretically, problems could arise in real data that has noise and that arises from nonpolyhedral objects. To test the previous segmentation procedure, synthetic images generated by projecting geometric models with perturbations, rotations and noise were first used as input data. 121 (Model perturbation has been defined in Section 5.2.3). Rotating a model means that the model is rotated about the projection direction. Experimental models include the models of a polyhedral object and a clay cobra head (Figure 3.2). The first experiment investigated the influence of view perturbations on the segmen- tation results. A set of images was produced by sequentially projecting a model after it had been perturbed by a small angle. Figure 5.14 shows the segmentation results of polyhedral images with different view perturbations within the range of 12 degrees. As we can see, small view perturbations haven’t affected the segmentation results. I Figure 5.14. Perturbation test using the polyhedral model. The second experiment explored what would happen when a model is rotated about the projection direction (analogous to the optical axis of a sensor). The model is 122 sequentially rotated 30 degrees each time. Twelve images of the model with different rotated positions are then generated. Figure 5.15 shows the segmentation results of the polyhedral images. Most edges were preserved. Only one was missed in some images, while no fallacious edges appeared. A few missing edges will not harm the object recog- nition, whereas sptuious edges cause more trouble for the recognition procedure. 60° 90" Idea /:0 Figtu'e 5.15. Rotation test using the polyhedral model. The next experiment tested the effect of the size of convolution masks used in data smoothing and computations of directional derivatives. Mask sizes range from 3 x 3 to 11 x 11. The results of polyhedral images are shown in Figure 5.16. According to these results, multiple edges appear around some real edges as mask size increases. However, the multiple edges corresponding to an actual edge are parallel and close to each other. If we find multiple edges corresponding to a real edge in the segmented image, the actual edge can be approximately determined by averaging the multiple edges. 3’3 in" 7x 7 1x7 Figure 5.16. Convolution size test using the polyhedral model. The final experiment examined how well the proposed method can tolerate noise. A uniform random number generator was used to generate noise with ranges from 123 Figure 5.17. Noise test using the polyhedral model. (-0.1,0.1) to (-0.9,0.9)mrn which is about (0.1% - 1%) noise with respect to the object diameter. The results of this test are shown in Figure 5.17 for the polyhedral object. Although boundary elements were gradually spoiled, the global shapes of objects were preserved. Figures from 5.18 to 5.21 display various segmentation results for synthetic images of the cobra head. As shown in these figures, the conclusions made for polyhedral objects apply to curved objects too. A prominent difi'erence may be that unlike polyhedral objects whose segmented contours and regions correspond to physical counterparts, the ones in the segmented images of curved objects may not. Furthermore, those regions and contains are similar to object silhouettes in the property of view-variance. Let us refer to the contours and regions extracted from sign maps of directional derivatives as critical shape elements. As mentioned in Chapter 2, wings can be defined along any kind of con- tour and region as long as they can be reliably detected. The experimental results reveal that the proposed segmentation approach possesses reasonable ability to resist variations of perturbation, convolution and noise but not rota- tion. In the rotation test (Figure 5.19), the same segmentation result appears every 90 degrees. For example, the results of views with rotation angle 0, 90, 180 and 270 degrees are exactly the same. Similarly, the views with rotation angle 30, 120, 210 and 300 degrees form another group. This equivalence phenomenon is due to the fact that O the cobra head model. tron test using Perturba 5.18. Figure sing the cobra head model Figure 5.19. Rotation test 11 Figure 5.21. Noise test using the cobra head model. segmentations result from integrating the derivatives computed along two orthogonal directions (90 degrees apart). If we scan an image along four uniformly separated direc- tions (45 degrees apart), the result should repeat every 45 degrees. Let us refer to this phenomenon as the equivalence property of directional derivatives. Figure 5.22 shows a set of 36 sign maps of the f, derivative. Maps are obtained by sequentially rotated the cobra head 10 degrees each time. According to the equivalence property of directional derivatives, there are 18 maps in this set of maps, which are dis- tinct. In fact, if we are given two orthogonal views V1 and V2, and a rotational angle A, the view away from view V by angle A can be easily computed using Equation (5.5). This property, the equivalence property, and the property of rotational variance of the segmen- tation technique might provide some intereSting aspects for object representation. 126 Figure 5.22. The sign maps of f, derivative of the cobra head. They are sequentially rotated by 10 degrees. 5.4.2.3 Real Images In this seetion, we will apply the proposed segmentation method to real images which were taken using the ERIM sensor. In order to verify the compatibility of segmen- tation results of real images with synthetic images, their results are computed under dif- ferent smoodling operators (median filters) and convolution masks (for computing the 127 directional derivatives). Figure 5.23 displays the segmentation results of several real images using a 9 x 9 square median filter to smooth images and a 7 x 7 convolution mask to compute directional derivatives, the results are analogous to those of synthetic images. What will happen if different masks sizes are used for smoothing and computation. Figure 5.24 displays the results of real images using different sizes of smoothing and computation masks (9 x 9 for smoothing, 3 x 3 for computation). The contours are a little more ragged than those in Figure 5.23, where 9 x 9 and 7 x 7 masks were used. Their basic shapes are consistent. To look at more examples, Figure 5.25 shows the cases of using other mask sizes for the cobra head images, where the results shown in the first row of the figure were obtained using a 9 x 9 smoothing filter and a 5 x 5 convolution mask; the second row displays the results using a 7 x 7 smoothing filter and a 5 x 5 convolution mask, and the last row shows the results using a 7 x 7 and a 3 x 3 respectively. The mask size hasn’t significantly affected the segmentation results. The global shapes are preserved. It will not be difficult for a recognition procedure to tolerate such variations. Finally, Figm'e 5.26 shows the sign maps of derivatives f. (column 1), f, (column 2), and mean curvature (column 3). Instead of fusing maps into one, considering the indi- vidual maps will be much easier than a fused one because there are few contains and regions. Besides, other directional derivatives can be easily computed using Equation 5.5. According to the experimental results pertaining to investigations of perturbation, rotation, convolution and noise of the segmentation algorithm, it seems that wings could be nicely defined based on those critical contours and regions. To extract wing features, derivative sign maps are first superimposed by their corresponding mean curvature maps (column 3 of Figure 5.26, where dark gray color means concavity and light gray color represents convexity), and then object wings are defined. 5.4 Summary In this chapter, we investigated the properties of uniqueness, stability, compatibility, terseness and locality of wing representation. In examination of uniqueness, a function for estimating similarity between two representations was defined in terms of a set of 128 Figure 5.23. The segmentation results of real images in Figure 5.8 using a 9 x 9 smoothing filter and a 7 x 7 computation mask. 129 Figure 5.24. The segmentation results of real images in Figure 5.8 using a 9 x 9 smoothing filter and a 3 x 3 computation mask. criteria. Criteria were characterized by distinct and distinguished wing sets of objects and appearance fi’equencies of wing primitives. The stability property was investigated with respect to the impact of input data with variations of resolution, perturbation and convolution on the constructed representations. The algorithms revealed that a small variation in the input data has little influence on the final results. Compatibility of representation was explored by examining consistency between wings detected from real images and those in the database. Two segmentation methods were studied. The first was based on techniques of Hoffman & Jain [Hof87], Marr & Hildreth [Mar80], and Laurendeau & Poussart [Lau86]. The second method was based on the directional derivatives of range values using Besl & Jain’s computational model. A series of experiments studying the impact of perturbation, rotation, convolution 130 Figure 5.25. The segmentation results of real images using different sizes of smoothing and computation masks. The first row shows the results using a 9 x 9 smoothing filter and a 5 x 5 computation mask. The results in the second row were obtained using a 7 x 7 smoothing filter and a 5 x 5 computation mask. The last row displays the results of using a 5 x 5 and a 3 x 3 respectively. and noise on the second segmentation approach were performed. The results showed that directional derivatives possesses ability to resist variations of perturbation, convolution and noise but not rotation. We come to the conclusion that the properties of uniqueness, stability and compatibility are reasonably preserved by wing representation, while the property of terseness is not satisfied. Figure 5.26. Sign maps of f“, fv and mean curvatures. Chapter 6 Recognition Based on Wing Representation Object recognition involves identifying and locating objects using sensory data. Pri- mary difficulties in object recognition include noisy data and occlusion. Sensory data are inevitably contaminated during acquisition making feature extraction unreliable. Another difficulty is that scene properties are usually distorted during image formation. Recovery of complete object shapes for recognition is neither necessary nor advisable. As a result, object models play an important role; they provide additional knowledge, which can compensate for the defects and irrelevant features of the sensory data. A recognition sys- tem using a set of models as a knowledge base is referred to as a model-based recogni- tion system. 1 Models are commonly represented in terms of salient features and structural rela- tions between features. In previous chapters, we presented an approach to representing objects which is characterized by model aspects. Each aspect contains a set of views. Each view is in turn represented by object wings and their structural constraints. Tech- niques of recognition based on wing representation are developed in this chapter. The major steps include indexing, consistent labeling, parameter estimation and decision. This chapter is organized as follows. In Section 1, related recognition techniques are reviewed. The problem and the procedure of model testing are defined in Section 2. In Section 3, a su'ategy of indexing into models and views based on detected wings is presented. The details of interpretation tree search for consistent labeling are addressed in 132 133 Section 4. In view of multiple-view representation, each view portrays a 2-D aspect of an object. A computational model of view transformation in terms of 2-D curves is developed in Section 5. In Section 6, a clustering procedure for determining the class and the pose of an unknown object is presented. A summary of the recognition procedure is given in Section 7. 6.1 Object Recognition Techniques The recognition problem has long been an important issue in computer vision. The problem can be classified (in terms of dimensionality) into three categories: 2-D to 2-D [St082, 30182, Tsa85, Tur85, Aya86, Kno86, Sch87], 3-D from 2-D [Low85, Hut87, Lin88, Sho88, Ohm88] and 3-D to 3-D [BolS3, Goa83, Sha84, Br084, Yan85, Fau86, Gri87, Che87, Sto87,88, Ike88]. Several issues are involved in the design of a 3-D object recognition system. The first is the class of features on which matching procedures are based. Features can be points, curves, surfaces, volumes, characteristic parameters, shape descriptors or moments. The second factor involved in the design of a recognition system is the representation scheme used to describe internal models. Relational tables [Sha84], trees [Mea8l, Hen85], graphs [Won85], primal sketches [Mar82, Bra85, Gri86], scale spaces [Hum86], evidence rules [Hof87] and characteristic views [Dud77, Fek83, Goa83, Ike88] are a few of the possible schemes. The third design choice in a recognition system is the control paradigm. First guess [Hof84], hypothesize-and-test [Br084], relaxation labeling [W al75], local-feature-focus [Bol82], interpretation tree search [Gri87], generalized Hough transform [BalS3], geometric hashing [Lam88] and RAN SAC [Fis81] are some examples of control para- digms. Finally, computational methods are another aspect which influences the design of a recognition procedure. Prominent approaches include least squares [Sch87], relaxation [Pri85], recursion [Low85,87], optimization [Gun87] and closed forms [Gri87, Che87], Many model-based recognition procedure basically contains three fundamental components: feature correspondence, parameter estimation and decision. We now review some previous work in 3-D object recognition with respect to these three components. A 134 comprehensive survey of most current recognition techniques has been made by Besl & Jain [Bes85], and Chin & Dyer [Chi86]. 6.1.1 Feature Correspondence The simplest approach to solving the feature correspondence problem may be the Random Sample Consensus (RANSAC) a paradigm for model fitting [FisSl]. Its basic idea is that given a set of sensed features and model features, a small set of sensed-model feature pairs is randomly selected to instantiate the free parameters of a model. Sensed features consistent with this instantiation are collected. If the number of collected features is greater than a predetermined threshold, it is assumed that the global correspondence is correct and an optimization procedure is then applied to the collected featm'es to determine a final estimate of the pose. In the RAN SAC approach, since feature pairs are randomly selected, no examina- tion of relationships (constraints) among sensed and model features is applied before a parameters are estimated. Bolles & Cain [Bol82] presented a method, called local- feature-focus, to select mutually consistent feature pairs using geometric coupling con- straints to generate initial hypotheses. This method greatly reduces the number of feature combinations. However, the local-feature—focus method assumes that sensed features and their relations can be reliably computed. The local-feature-focus approach only uses local information to confirm the con- sistency of selected sets of features. In order to provide stronger evidence to further reduce the number of combinations and increase the reliability of selected features, glo- bal consistency should be incorporated. A mechanism to accomplish this purpose is the paradigm of interpretation tree search. The interpretation tree (IT) was first formally pro- posed by Gaston & Lozano-Pe’rez [Ga583]. By this method, all sensed features are inter- preted before they are used to estimate pose parameters. Since a breadth first search was used by Gaston & Lozano-Pe’rez to implement the control mechanism, a combinatorial explosion forces the system to employ only a small set of features. Grimson and Lozano-Pe’rez [Gri87] used depth-first search and introduced 135 nil nodes into the IT. Nil nodes are necessary to find all candidates of interpretation when dealing with scenes with multiple objects. Unfortunately, nil nodes make the combinator- ics worse. 6.1.2 Parameter Estimation Once a correspondence between sensed and model features has been made, a set of parameters having to do with environments, sensors or models is usually computed. This problem of parameter estimation has been mathematically termed the generalized inverse set mapping [Bes85]. Here, we are only concerned with the estimation of rotation, trans- lation and scaling for pose parameters. The least squares method is the most common technique used to estimate parame- ters. However, as pointed out by Fischler & Bolles [Fis81], the least squares method is not robust; it assumes the input data are smooth enough to fit a model. It has no ability to reject gross errors a priori but simply averages deviations over the data set. This draw- back also occurs in many optimization techniques which either maximize or minimize a set of predefined criterion functions. In order to deal with this problem, techniques of iteration and relaxation are used. The iterative approach requires an initial parameter estimate, often obtained through least squares, then the estimated parameters or an error-correction vector of parameters is resubstituted into the criterion function [Lowe85]. The optimization procedure is then repeated. Through several iterations, better results often can be achieved. The relaxation methods [Wal75, Pri85] are in some sense similar to iterative approaches, which attempt to remove bad data from the data set using statistical updating functions or heuristic con- straints. To date, the problem of error recovery in parameter estimation is still a non- trivial problem. 136 6.1.3 Decision The parameter estimation step often produces several candidate solutions for the pose parameters. A variety of approaches [Chi86] have been proposed to find the solution from the set of possibilities. The generalized Hough transform [Ba183, Gri88]], clustering [Sto82,87], or geometric hashing [Lam88] have been used to look for so—called optimal solutions. Since the final result is only the most feasible one, the matching is thus often referred to as an inexact matching [Boy88]. In the generalized Hough transform approach, a parameter space is divided into discrete bins. Computed parameter vectors are quantized and entered into corresponding bins. Parameter vectors resulting from feature pairs of a correct match of a model to the sensory data are assumed to be approximately equal. The center of gravity of the largest cluster in the parameter space is assumed to represent the correct parameters. There are several difficulties with the use of the Hough transform methods. The first one is selec- tion of the size of bins, which is a function of noise and processing error. The second difficulty is that for a parameter space of high dimension the memory usage will be very large. Accessing and searching entries in a large table is also cumbersome. In addition, according to Grimson & Huttenlocher [Gri88], the probability that false solutions are returned could be very high if the sensory data possess even moderate levels of noise, occlusion and clutter. Some of the problems of the Hough transform can be partially removed by using the continuous parameter space directly instead of quantizing. The latter methods are gen- erally referred to as clustering approaches. Clustering techniques and their comparisons can be found in Dubes & Jain’s classic work [Dub76] in which three categories of clus- tering methods are presented: minimizing squared error, hierarchical clustering and graph—theoretic clustering. Stockman [St087] introduced a clustering procedure using overlapping volumes in the continuous parameter space. This method simply counts the number of parameter vectors (patterns) within a fixed-size volume centered at each pattern. The pattern pro- ducing the largest count together with its neighbors form the most feasible cluster. 137 Although this method has time complexity 0 (N 2), where N is the number of patterns, for the Hough transform approach, its time complexity is 0 (N '), where N ' is the number of bins. The number of patterns is usually less than the number of bins. Minimum spanning trees and k-means are two popular approaches adopted by com- puter vision researchers. The minimum spanning tree method, which is a graph-theoretic clustering technique [Zah71], sequentially connects patterns in the parameter space to their unvisited nearest neighbors according to a similarity matrix of patterns. The result- ing structure is called the minimum spanning tree. Links in the tree with lengths larger than a pre-defined threshold are deleted resulting in a set of clusters. The center of grav- ity of the largest cluster is then regarded as the solution of pose parameters. In the k- means approach, patterns are first divided into k groups. Then, points are moved from one group to another in order to optimize a distance metric. Clearly, both minimum span- ning tree and k—means approaches need a meuic function for pairs of points. However, in the case of transformations, each has three types of parameters: rotation, translation and scaling. Their units are not consistent with each other. Without either normalizing param- eters a priori or considering them separately, the metric function is nontrivial to design. 6.2 The Definition of the Problem and the Procedure of Model Test The recognition problem starts with a wing representation of the scene. The scene can be a jumble of objects (Figure 1.1) in which objects may occlude one another. The scene representation may thus be a combination of several incomplete object representa- tions or an incomplete single object representation. Our purpose is to determine the class (identity) and the pose (location) of unknown objects in a scene based on the scene representation. The proposed recognition procedure includes four stages: indexing, consistent label- in g, parameter estimation and decision. In the indexing stage, candidate models and views are predicted based on the set of detected wings extracted from the image. For each candidate view, an interpretation tree search using wing constraints is applied to the sensed wings. Once a consistent labeling of sensed wings has been found, 138 transformations of models to objects are readily computed. Each transformation produces goodness of match, mean and maximum errors which indicate how good the match from a model‘ view to an object view is. An object pose is then determined in terms of the predicted viewpoint and the computed transformation. In the final stage of decision, the "best" identity and pose of an object are determined through a clustering procedure. A set of clustering statistics is computed to indicate the confidence of recognition. Thus the final result of an object recognition includes the prediction of class and location of the object, the measurements of goodness of match, mean and maximum errors, and the confidence in this result. Once a model match has been made, the matched portion is removed from the scene picture. This process could be done in parallel for all models. Model testing will stop when the scene picture is empty or no more models can be matched to remaining por- tions of the scene picture. 6.3 Indexing into Models and Views In the early stages of recognition, wing features are used to index into the knowledge base which consists of a set of models and views. As mentioned before, both simple and composite wings possess considerable potential for discriminating objects. In this step, most of the models will be rejected. Only a few models will be considered in the next step. Suppose we are given a set of sensed wings detected in an image. This set of wings may include some spurious wings because of noise. It may miss some actual wings due to occlusion or imperfect feature extraction. Let S denote the set of sensed wings. Sup- pose in our model base M there are m models: { M;, i =1,---,m }. Each model may be examined using a preliminary model-test procedure based on wing sets. Models will be ignored if their wing sets don’t intersect the sensed wing set. The remaining models are then queued according to a measurement of priority. Recall that when discussing the uniqueness property of wing representatiOns, a simi- larity function was introduced for estimating the proximity between two object 139 representations. A similar concept can be used here to construct a priority function to arrange models for testing. A primary difference between the priority function and the similarity function is that the representations used by the similarity function are all com- plete object representations. The representations used by the priority function is a scene representation which may be an incomplete object representation (single object) or a combination of several object representations (multiple objects). Despite this difference, most criteria defined for the similarity function can be used by the priority function. Rather than directly deriving the priority function, we simply examine the criteria of the similarity function (listed in Appendix D) one by one. There are eleven criteria used by the similarity function. The first criterion (the numbers of aspects of models) will be ignored by the priority function because a scene representation, which reflects a scene view, is comparable to only one aspect. Similarly, criteria 7, 8, 9 and 11 deal with full model aspects and will be omitted. The second cri- terion of the similarity function should be replaced by the sets of distinct wing primitives of models corresponding to the set of sensed wing primitives instead of just the sets of distinct wing primitives of models. According to this modified criterion, a model with a distinguished wing set that is more consistent with the sensed wing set will possess high priofityandbetestedInasimilarvein,criteria3,4,5and6allneedtobemodifiedby including this additional condition of correspondence to the sensed wing set. Finally, for the structural relationship criterion, sensed wings are usually not exactly equivalent to model wings. The criterion should be modified to consider the compatibility of relation- ships subject to tolerance ranges of parameters. After deletion and modification, we 6 . . . . . . 72“ obtained srx crrterra for constructing the prrority function P = e "1 , where s; is a weighted penalty for violating criterion i. Let us use an example to illustrate the computation of the priority function. For sim- plicity, assume that the priority function is only based on the criterion of appearance fre- quencies of distinct wing primitives. The definition of appearance fiequency of a wing primitive has been addressed in. Chapter 5 where the cobra head and the coffee cup examples have been shown in Table 5.1. The priority of a model can then be computed 140 by summarizing the appearance frequencies of model wings corresponding to the distinct sensed wings. To make this explicit, let us look at a simple example. Suppose the distinct sensed wing set is S = [11, 18, 27}, where numbers represent wing primitives. Suppose also that there are two candidate models. The first model M1 has the distinct wing set {5, 11, 18, 22, 27, 31} with its corresponding appearance frequencies {0.3, 0.2, 0.1, 0.4, 0.3, 0.2}. The second model has the distinct wing set {3, 11, 15, 18, 24, 27, 34} with the corresponding appearance frequencies [0.2, 0.8, 0.4, 0.5, 0.1, 0.6, 0.7}. For model M 1, the summation of appearance frequencies of the model wings corresponding to the sensed wings is 0.2+0.l+0.3=0.6, whereas for model M 2 the summation is 0.8+0.5+0.6=1.9. The second model M 2 has higher priority than that of the first model M1 because M 2 has wing set that is more consistent with the sensed wing set than that of M1. M 2 will be tested first based on this simple priority function. The above example only considered the case of missing wings. To consider spuri- ous wings, other criteria such as 2, 4, and 6 should be included into measurements of priority. Let M; denote the set of models in which the wing sets of models are not con- sistent with the set of sensed wings S. Models with less inconsistency will be examined first. By less inconsistency, we mean that there are fewer sensed wings which are not in the set of model wings. For example, suppose that we have a set of sensed wings S = {3, 5, 11, 22, 31}. Using the previous model examples, the measurement of inconsistency for model M1 is l, and 3 for model M2. Therefore, M1 will be tested first in this case. It is clear that to determine the priority of a model one should combine the results of many criteria. Let us refer to any sequence arranged according to some priority measurement as a queue. During the indexing process, models are first queued. Since each model contains a set of aspects, the test sequence of aspects are arranged. Thereafter, each view in a candi- date aspect is tested by a view u‘ansformation and a clustering procedure to be addressed in Section 6.5 and 6.6 respectively. Views in an aspect need to be queued before they are tested. This procedure will continue until either the unknown object is recognized or failure is reported when the set of candidate models are used up. 141 6.4 Interpretation Tree Search An approach to implementing the control of the interpretation tree search is presented in this section. Before proceeding, let us look at the example in Figure 6.1, which shows a correspondence problem to solve and a tree structure to solve it. In this example, a pyramid having 6 edges and 4 faces is shown on the left side within the square and is used to label the set of sensed features displayed on the right side of the square. Sensed features include 2 edges (el and e2) and 2 surface patches (p1 and p2). The problem asks which model edge should correspond to sensed edge e,- (i=1,2), and which model surface should correspond to sensed surface patch p ,- (j =1,2). To solve the problem, first we list all possible correspondences between model features and sensed features according to their feature types (edge and face). These correspondences are portrayed in terms of the tree depicted in Figure 6.1. At the first level of the tree, sensed edge e1 may correspond to any one of the 6 model edges or none at all. Therefore, there are 7 branches spread out from the root: each indicates a possible interpretation of the sensed edge e 1. Proceeding to the next level, each node in the first level is spread out into 7 branches where each branch again specifies a possible interpre- tation for the second sensed edge e2. Therefore, to this level, there are a total of 72 paths (a path is defined as a sequence of nodes starting from the root node and terminated at a leaf node). Continuing this expansion process for sensed patches p1 and p2, we finally obtain 72x52 paths. If we consider a model with more features or a large set of sensed features, the number of paths will be large. To alleviate this problem, we introduce con- straints to prune some branches of the tree as it grows. 6.4.1 Pruning Algorithm An algorithm for interpretation tree search which solves the feature correspondence problem of any number of feature types is developed in this section. The algorithm has storage complexity O(N), where N is the number of sensed features. Adopting Dewey coding it can be easily implemented by a hardware device, which is similar to a digital counter with a special carry notation. 7 5 combinations epl 1 . f3 . . :2 e L I 00.; ncc ’ p2 6 d sensed {mums Model: e ges 4faCcs Tree Pruning Based on Feature Constraints Figure 6.1 An example of consistent labeling problem. Suppose we replace the nil nodes in an IT tree with 0’s. Referring to the previous example in Figure 6.1, the first path of the tree can then be specified by the 4-digit number (0000) and the second path (0001). Following this coding strategy, all paths can be indexed as follows. 143 (0000) (0001) (0002) (0003) (0004) (0010) (0011) (0012) (0013) (0014) (0020) (0021) (0022) (0023) (0024) (0030) (0031) (0032) (0033) (0034) (0040) (0041) (0042) (0043) (0044) (0100) (0101) (0102) (0103) (0104) (0600) (0601) (0602) (0603) (0604) (0610) (0611) (0612) (0613) (0614) (0640) (0641) (0642) (0643) (0644) (1000) (1001) (1002) (1003) (1004) (6630) (6631) (6632) (6633) (6634) (6640) (6641) (6642) (6643) (6644) There are four digits for every path number, one for each of the sensed features. Look at path number (0642). It means that the first sensed feature corresponds to none of model features (0), the second sensed feature corresponds to model edge 6, the third sensed feature which is a surface patch corresponds to model face 4, and the last sensed feature corresponds to model face 2. Clearly, the above set of path numbers is a set of sequential numbers whose digits have number base (7755) respectively. For this exam- ple, we need an array, called the carry array, to store the base numbers "7755". To prune a tree (skip some path numbers), we simply change the corresponding digit at some posi- tion of a path number. In Appendix F, an implementation of the interpretation tree search for a general case is presented. The pruning algorithm finds the entire path once a path number has been generated. There is no pointer tracking involved. This not only greatly eases the design of the algo- rithm but also saves tracking time. It can also be easily implemented by hardware dev- ices. The time complexity of the algorithm is clearly exponential. However, there are heuristic strategies which can be embedded into the procedure to improve its perfor- mance. For instance, to compute a transformation, usually we need a small set of feature pairs. Let n be the minimum number of pairs needed to determine a unique transforma- tion. Paths with lengths less than n will be useless and can be skipped. Larger features or features with particular types are generally easier and more reliable to detect. The perfor- mance of the algorithm can also be improved by arranging the input sequence of sensed features a priori. In addition, an efficient data structure for storing feature constraints, 144 such as hashing tables, may reduce the access time during path checking procedure. 6.4.2 Estimating the Expected Number of Surviving Paths Analysis of an upper bound on the expected number of nodes at some level of the interpretation tree has been done by Grimson et al [Gri87] based on the distance con- straint. Instead of estimating the upper bound of nodes using a single constraint, in this section we present a probabilistic model for estimating the expected number of surviving paths E [Np] characterized by the pruning algorithm and the wing constraints. Wing con- straints include eight unary and five binary constraints (Chapter 2). This set of constraints may have redundancy. The importance of the following analysis is that it provides a way to estimate the pruning power of individual consuaints. Thus, less important constraints may be abandoned and the remaining constraints can be arranged according to their power. The analysis will first consider a simple case with one unary and one binary con- straint. Later it will be extended to the general case. Suppose we have detected a set of object wings in which n1 wings are of type t1, n2 wings are of type t2, - ° ' -, and n, W wings are of type t,,. The number N, of sensed wings is N, = Eng, where w is the i=1 number of distinct wing primitives. Suppose also that a model is to be tested, which has m1 wings oftype t1, m2 wings oftype t2, - . - °, and m,,,' wings oftype t,,.'. The number W N,,, of model wings is N", = 2m, where w is the number of distinct model wing prim- i=1 itives. Let p. denote the probability that a pair consisting of a sensed wing and a model wing satisfies the unary constraint u, and pb denote the probability that two pairs of sensed wings and model wings pass the binary constraint b. The following proposition provides the expected number E [Np] of surviving paths, where Np denotes the random variable representing the number of surviving paths. Proposition 6.1: Given parameters p... pb, N,, 21,-, mi and w (as defined above), the expected number E [Np] of surviving paths with respect to the unary constraint u, the 145 binary constraint b and the given pruning algorithm is NAN: - 1) w "i N: 2 EUVp] = [nmi ]pu pb i=1 Proof: Without loss of generality, assume the sequence of sensed wings has been arranged in advance so that model wings with type t1 are first branched; then the wings with type t; are next, and so forth. Let a,- denote the expected number of branches at level i. Therefore, 01 = mlpu a2 = [mtpupelal a3 = [mlpupilaz -1 an, =[m1pflp21 lam-1 n1 an1+1 = [m2pupb lam OOOOOOOOOOOOOOOOOOO N,-r Eris/,1 = aN, = [new I ant-1 By iterative substitution of the expected number of branches at each level, we obtain, N, - 1 Z 1' Ns(Nr"1) W ' s w ' N3 2 Ele] =trIml'1p’Z'pb" = [filmi-"lp. Pb . (6.1) i=1 1: From the above proposition, it is clear that the expected number of branches at the last level (the expected number of surviving paths) is exponential. In order to make N N -1 E [Np] small, one or both of p, and pg, should be small. Since 423—1 is always larger than N, (except N, = 1 or 2), binary constraints play a more important role than unary 146 constraints in pruning the interpretation tree. To estimate the probabilities p, and pa. divide both sides of Equation (6.1) by W Hm? and take the logarithm of the resulting equation. We obtain a linear equation in i=1 log p. and log pb: E[N ] N (N " 1) log w p = N, log p, + ;—;-— log pb. (6.2) Hm? i=1 Since the left side of the above equation and N, can be found experimentally, the proba- bilities p. and p1, can thereby be evaluated by solving a system of linear equations. For the general situation, suppose there are n, unary constraints with pruning proba- bilities {pup j=l,:--,n,,} respectively, and nb binary constraints with probabilities [pbp k=1,---,nb} respectively. Then, the expected number of surviving paths N, for the general case can be obtained by simply replacing the probabilities p, and pb in Equation (6.1) by the products of probabilities pal. ’s and Pb), ’s: ”r(Ns‘l) w . ”I N "b _2__ Ele] = [nmf' ] [ l'Ipu,] ’ [ Him] - (6.3) i=1 [=1 k=1 To estimate the probability of a constraint i, or equivalently the pruning power of a constraint, set all probabilities of constraints except for i to one. Note that this disables the pruning capability of the other constraints. As a result, we can concentrate on the constraint of interest. If we are interested in the combined pruning capability of a set of constraints, keep the probabilities of those constraints unchanged and set the other proba- bilities to one. The following experiment uses the housing [Gri85] as an example to illustrate the power of feature constraints in pruning the interpretation tree. The same idea can be used to study wing constraints. Referring to Table 6.1, the results were obtained using the binary constraints of distance and angle (nb = 2, pb = pb1 ° sz). no unary constraints (n, = 0) and edge evidence only (w = 1). Ten synthetic edges (N, = 10 and n1 = 10) and a 147 Level #Path reaching #Die #8er #Survive to this #8er to this #Checks at this level level without nil level with nil this level =1]: 16 0 16 15 1 0 2 256 195 61 30 31 225 3 976 778 198 25 173 950 4 3168 2856 312 3 309 3197 5 4992 4215 777 1 776 5900 6 12432 11495 937 1 936 12188 7 14992 13729 1263 1 1262 14889 8 N208 18267 1941 l 1940 22008 9 31056 27983 3073 l 3072 33755 10 49168 44729 4439 l 4438 56618 Table 6.1 The number of paths at levels using edge evidence. model with 15 edges (m1 = 15) have been employed for this experiment. The number of surviving paths to level 10 (with nil) is 4438 (E [Np] = 4438). Substituting these values into Equation (6.2), we obtain an equation in log p1,. log4438 - 10 log 15 = 45 logpb. and pb = 0.66. Clearly, this probability is too large because we considered the paths with nil nodes which were assumed to satisfy all constraints. For paths without nil, the number of sur- viving paths to level 10 is 1 (E [Np] = l). Replacing 4438 by 1 in the above equation, we obtain pt, = 0.548. Unfortunately, this is still large. A possible reason is that to the level 5 the number of surviving paths has already been narrowed down to l, which is the correct interpretation path. Constraint checks after this level will all be satisfied and make contri- butions to the probability pb. This situation can also be seen in Equation 6.2. After level 5, E [Np] remains 1, whereas both n,- and N, keep increasing so as to make the probability pb increase too. Therefore using a large number of sensed features may degrade the power of constraints and may not be always helpful in saving time finding the correct path. An adequate set of a few sensed features will be good enough for quickly finding the "correct" path. 148 Substituting N, = 5 and n1 = 5, we get pb = 0.258. If we further consider going to level 4 (4 sensed features employed), the number of paths is 3. Let N, = 4, n 1 = 4 and E [Np] = 3. We obtain pb = 0.197. Although this result is already much better than the previous ones, we really expect a smaller value for p), indicating better power of the con- straint. Unfortunately, we don’t know in advance which and how many features will be appropriate. In applications, several unary and binary constraints are used simultane- ously. Multiple constraints can increase the performance of tree pruning. Level #Path reaching #Die #8er #8er to thk #Survive to this #Checks at this level level without nil level with nil this level = = 1 16 0 16 15 1 0 2 256 185 71 40 31 225 3 1136 915 221 40 181 1324 4 3536 3049 487 19 468 3865 5 7792 6947 845 2 843 8533 6 13520 11431 2089 4 2085 17902 7 33424 28231 5193 8 5185 44668 8 83088 74783 8305 3 8302 96019 9 132880 119808 13072 3 13069 171333 10 209152 191489 17663 2 17661 247096 Table 6.2 The number of paths at levels using face evidence. Several experiments have been performed. Table 6.2 shows the results using face evidence only. In Table 6.3, mixed edge and face evidence were used. Substituting these results into Equation (6.2), the combined probabilities of binary constraints (distance and angle) using different feature evidence are shown in Table 6.4. Mixed evidence is a little better than edge evidence, and edge evidence is in turn better than face evidence. We conclude that some constraints are suitable for certain types of evidence, and that using many inadequate constraints or many sensed features may not be helpful in tree pruning. To study integrated influence of multiple constraints, more data will be needed to set up an overdetermined system of linear equations for computing probabilities. 149 Level #Peth reaching ' #Die #Survive #Survive to this #8er to this #Checks at this level level without nil level with nil this level 1 16 0 16 15 1 0 2 256 155 101 70 31 225 3 1616 1457 159 12 147 1567 4 2544 2300 244 2 242 2466 5 3904 3282 622 l 621 4253 6 9952 8680 1272 1 1271 11543 7 20352 17889 2463 l 2462 25980 8 39408 34860 4548 1 4547 51085 9 72768 65370 7398 1 7397 85749 10 118368 108429 9939 1 9938 133737 Table 6.3 The number of paths at levels using mixed evidence of edge and face. To This Nil Edge Face Mixed Level Nodes Evidence Evidence Evidence Ill-1‘0 with 0.660 0.680 0.672 10 without 0.548 0.556 0.548 5 without 0.258 0.276 0.258 4 without 0.197 0.269 0.184 Table 6.4. The probabilities of combined binary constraints (distance and angle) computed to different levels with or without nil nodes using different feature evidence. 6.4.3 Time Complexity We define the time complexity of the pruning algorithm as the expected number E [N,] of checks for both unary and binary constraints. Following the definitions in the last section, the expected numbers of checks for levels in the IT are 150 bl = m1 b2 = (m1+m1Pu)01 b3 = (m1 +2m1Pu)02 b4 = (m1+3mlpu)03 bx, = [mi +01 '- 1)m1Pu] atl—l bll-l-l = [m2 + tlmzpl‘] at] (N,-l)(N,—2) w "i ”1'1 2 bN, = (Hmi )pu pb [1+(Ns "' l)pu] i=1 The expected number E [N,] of checks is simply the summation of b; ’s. N. E [N,] = Elbg. (6.4) Note that the nil nodes will not affect the above result because there are no con- straint checks needed. Nil nodes are assumed to comply with all constraints (both unary and binary). In order to reduce the number of checks, the probabilities should be small. In other words, the feature constraints should be powerful. For the general case, replace p. and pl, respectively by the products of their probabilities for individual constraints. Experimental results are shown in the last columns of Table 6.1, 6.2 and 6.3. The number of constraint checks at certain level are about 5- 14 times of the number of sur- viving paths with nil at that level. Equation (6.1) and (6.4) show that both numbers of checks and paths should increase exponentially as the tree level increases, and so does their ratio. In practice, the ratio however increases linearly as the level increases (Table 6.1, 6.2 and 6.3). This reveals that an exponential numbers of paths have been pruned at each level. 151 6.5 View Transformation Suppose we are given two views: a model view and a scene view. Let M be the set of wings derived from the model view and S be theset of wings derived from the scene view. An example is shown in Figure 6.2. Part (a) shows the model view and part (b) shows the scene view. Suppose also that the correspondence between model wings and scene wings is known by the IT algorithm. Now, the problem is to determine the transfor- mation T by which the scene view can be matched to the model view. Figure 6.2. A model view and a scene view. 6.5.1 Curve Matching Schwartz and Sharir [Sch87] presented an optimization method to solve the problem of 2-D curve matching. Their method only considered the problems of rotation and trans- lation, and assumed the scale problem has already been solved. The following method extends the Schwartz and Sharir work by including the scale factor. Consider a consistent pair of a model wing L and a detected wing segment I as shown in Figure 6.3. Let I, be that portion of L which is the corresponding segment of l, with transformation T. Each curve is represented by a set of uniformly spaced points (equal intervals). °.u,.}. I L={u1,u29 . . I 152 Figure 6.3. A consistent pair of a model wing and a detected wing segment. l={u1,u2, - ° -,u,,,}, where mSn, 1’={v1,v2,~-,v,,,}. As shown in Figure 6.3, we assume that the sensed wing will typically be a subset of the model wing and hence the transformation is "into" and not "onto". The following transformation computation needs to applied to each segment of length m of L, so that it will result in 2(n-m+1) transformations computed from the pair of l and L. The constant 2 here is due to the need to match both directions. Note that the true transformation is included in this set of computed transformations but that there may be several viable can- didates because one pair of wings will not be enough to determine a unique transforma- tion. There are usually several consistent wing pairs available. Having computed candi- date transformations for all wing pairs, a "best" transformation can be determined through a clustering procedure [St082,87, Gri88] to be discussed in Section 6.6. The matching problem can then be formulated as an optimization problem, "I 5 = mrinz |Tuj-vj|2 , Ta, = Resuj+a, .=l . where R9 is the rotation, a is the translation and s is the scale factor. Our purpose is to look for a transformation T that minimizes the 8 function. To simplify the derivation, assume the scale factor s is a constant. The scale factor is a nonlinear function of depth, but can be approximated by a constant when depth variation is small relative to the stand- off. By following the derivation presented in Appendix F, the rotation angle which 153 Ill minimizes the 6 function is the negative of the polar angle of 2 ujVj, where uj and V} are i=1 corresponding complex values of vectors uj and vi. We obtain the minimization 5’ of 8 m "I m 1 m IZZIIIj‘Vj|2+IZl(“jXVj)|2 I: I: 5' "’ E'Vi'z'm'flfl'z‘ ,. 2 ” ’ Xlujl i=1 m m [I 2am I2 + I2 Figure 6.4. An abstract 2-D feature space of transformation for illustration. The clustering procedure uses overlapping volumes distributed according to patterns in the continuous feature space. This method simply counts the number of patterns within a fixed-size volume centered at each pattern. The pattern having the most neighbors forms the most feasible cluster whose center of gravity is regarded as the correct transfor- mation parameters. An implementation of this procedure can be found in [Che87]. Since the size of volumes is an important factor which can substantially influence the final result, a closer investigation of how the cluster radius is formed is the topic of the follow- ing sections. Similar analysis can be found in [Gri88]. 6.6.1 Radius of Cluster Referring to Figure 6.4, it is clear that choosing an adequate radius for the cluster is very important. There are several situations shown in Figure 6.5, which could occur due to an inadequate radius selection. A small radius of cluster can result in multiple clusters (Figure 6.5(a)) and cause selection of transformation parameters that look good but are 161 incorrect. Also, with either a too small (Figure 6.5(b)) or a too large radius (Figure 6.5(c)), the resulting center of gravity of the cluster may shift away from the true posi- tion. In order to study the efl‘ect of different cluster radii, we use the housing [Gri85] as an example model. Synthetic features are generated from this model to serve as sensed features. it true position (a) (b) (C) Figure 6.5. Situations could occur due to inadequate cluster radii : (a) a small radius of cluster can result in multiple clusters; (b) using a too small radius of cluster, the resulting center of gravity of the cluster may shift away from the true position; (c) using a too large radius of cluster. The experiments proceed as follows. 1) Add noise to the sensed features. Noise is generated by a uniform random number generator. 2) Based on the model and the generated sensed features, a set of transforma- tions can be computed. The details of this computation has been addressed elsewhere [Che87]. 3) Apply the clustering procedure to the above set of uansformations to deter- mine the optimal one. The sizes of cluster for rotation and translation com- ponents are also recorded. The size of a cluster is defined as the cardinality of the cluster. 162 4) Different cluster radii have been used by the clustering procedure. For the rotation component of transformations, the cluster radius ranges from 0.01 to 0.1 (amounts to about 0.6 to 6.0 degrees), whereas for the translation com- ponent, the radius ranges from 5.5 to 10.0 mm. 5) The resulting transformation is then applied to the model featmes. By com- paring the transformed model features to the sensed features, a matching error in terms of squared errors of model-sensed feature pairs can be computed. 6) The above steps are repeated using different levels of noise ranging from 0.0 to 4.0 mm. 6.6.2 Experimental Results and Discussion . Since transformation components are considered “individually, Table 6.15 shows the matching errors using different cluster radii of rotation component by fixing the radius of translation component at 3.0mm and Table 6.16 shows the cluster sizes. Table 6.17 shows the matching errors using difi'erent cluster radii of translation component by fixing the radius of rotation component at 0.088 (about 5 degrees) and Table 6.18 shows the corresponding cluster sizes. Note that the values of 3.0 and 0.088 are estimated based on the error analysis of calibration and transformations presented in Chapter 4, which coin- cide with our observed experimental errors. Several critical points of interest in this experiment are summarized below. (1) As shown in Figure 6.6, for a given noise level, there is always a minimum match error which occurs at some cluster radius, called the adequate radius. Let us refer to a curve of matching error versus radius of cluster as an error curve. The point that corresponds to the minimum error is called the trough of the curve. For example, in Table 6.15, for the noise level 1.5, the minimum match error is 0.699, which occurs at radius 0.08. Thus, 0.08 is the adequate cluster radius for the rotation component at noise level 1.5. Using radii either smaller or larger than the adequate radius will result in a larger match error. 163 Matchlng Error Cluster Radius of Rotation Component 0.03 0.023 0.023 0.5 1.432 0.669 0.604 0.570 0.899 0.987 1.332 1.243 1.194 1.194 1.0 0.573 1.097 1.320 1.234 1.234 1.223 1.071 1.059 1.059 1.059 1.5 0.966 0.972 0.988 0.846 0.841 0.800 0.701 0.699 1.049 1.049 2.0 1.568 1.461 0.849 0.844 0.840 0.839 0.952 0.910 0.887 1.362 3.0 2.728 2.652 2.522 2.553 2.361 2.240 2.185 2.145 2355 2.253 3.5 2.982 2.982 2.982 2.982 2.973 3.877 3.877 3.877 3.233 2.784 4.0 2.787 2.787 3.095 3.591 3.587 3.207 3.206 3.167 3.167 3.334 Table 6.15. The matching errors using different radii of rotation cluster byfixingtheradiusofuanslationclusteratB.0nnn. Cluster Size " Cluster Radius of Rotation Component Noise Level 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 ——-r—=_e—-t——-= 0.0 168 168 168 168 168 168 168 168 168 168 0.5 26 47 52 56 103 110 157 164 164 164 1.0 52 84 144 148 148 148 153 153 153 153 15 32 35 37 41 46 49 51 52 74 74 2.0 29 34 64 67 70 70 72 75 77 92 3.0 24 27 28 28 29 30 33 37 42 44 3.5 25 2.5 25 3 26 49 49 49 57 62 4.0 24 24 27 52 74 74 74 75 75 98 Table 6.16. The cluster sizes for rotation component when fixed the cluster radius of translation component at 3.0mm. (2) If we draw error curves for all different levels of noise in a figure as depicted in Fig- ure 6.7, the troughs of the curves should theoretically tend toward larger radii as the noise level increases because computed transformations get random. This phenomenon hasn’t prominently been observed in our experimental results. A possible reason is that the experimental ranges of both noise level and cluster radius hasn’t been wide enough. (3) Matching errors tend to increase as noise increases (look at the tables of match error vertically). There are only a few cases whose matching errors are larger than their 164 Matching Error Cluster Radius of Translation Component Noise Level 5.5 L1; 7.0 7.5 8.0 8.5 9.0 9.5 10.0 a = == 0.0 0.790 0.997 0.767 0.943 0.943 0.726 0.726 0.726 1.097 1.088 0.5 1.232 1.267 1.243 1.322 1359 1.338 1372 1315 1.354 1335 1.0 0.994 0.989 1.002 1.024 1.054 1.052 1.055 1.095 1.119 1225 15 2.106 2.019 1.983 2.106 2.086 1.988 1.969 1.969 2.039 2.017 2.0 0.746 0.757 0.772 0.909 0.859 1.033 1.033 0.848 1.275 1.054 3.0 1.974 1.943 2.003 1.984 1.984 1.984 1.984 1.984 1.984 1.988 3.5 3.340 3.436 3.436 3.546 3523 3.385 3.385 3.416 3.416 3.440 4.0 3.542 3.534 3.750 3.815 3.657 3.550 3.583 3.196 3.298 3.227 Table 6.17.Thematchingerrorsusingdifferentradiioftranslanon‘ cluster by fixing the radius of rotation cluster at 0.088 (about 5 degrees). Cluster Size Cluster Radius of Translation Component Noise Level 55 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 0.0 137 139 141 146 146 148 148 148 150 153 0.5 85 90 91 95 98 100 106 113 115 117 1.0 111 114 117 120 124 126 128 130 131 134 1.5 38 39 40 42 42 48 49 49 50 59 2.0 45 47 48 50 53 55 55 56 58 59 3.0 19 23 27 29 29 29 29 29 29 30 3.5 18 20 20 22 24 28 28 28 28 30 4.0 27 31 32 36 38 42 44 46 48 51 Table 6.18. The cluster sizes for translation component when fixed the clusta radius ofrotation component at 0.088 (about 5 degrees). initially given noise, most are smaller (look at the tables horizontally). This situation is particularly notable at high levels of noise. It means that the initial noise (or equivalently the sensory errors) haven’t been amplified in the final results. they have been slightly attenuated by the clustering procedure. (4) Both Table 6.16 and 6.18 reveal that the cluster size (cardinality) is increased as the cluster radius increases. According to the analysis of computational error in Chapter 4, the errors caused by translation are larger than those of rotation given the same initial 165 A error curve match error adequate : trough radius . radius of cluster Figure 6.6. For a given noise level, there is always a minimum match error which occurs at some radius of cluster, called the adequate radius. The curve is called the error curve and the point corresponds to the minimum error is called the trough of the curve. match V V radius of cluster Figure 6.7. The troughs of error curves should tend toward larger radii as misc level increases. error in feature points. Clusters of the rotation components should be more compact than clusters of the translation components. This has been observed in our experimental results (Table 6.16 and 6.18). As a consequence, clusters of the rotation components pro- vide stronger evidence than the translation components in determining the true cluster. (5) Another aspect of interest is that the size of a cluster can suddenly increase at some radius. For example, in Table 6.16, at noise. level 0.5, there is a jump from 56 to 103 corresponding to radii 0.04 and 0.05. Figure 6.8 shows an example of cluster size jump. The radius at which the cluster size changes abruptly usually coincides with the adequate 166 radius determined in the figure of error curves. For instance, in Table 6.15, looking at the noise level 0.5, there is a minimum matching error (0.57) at radius of 0.04. ll size abruptly 01' increased cluster ’ radius Figure 6.8. The size of a cluster suddenly increases at some radius. (6) If we connect the troughs of error curvesat distinct levels of noise, a trough line may be obtained. An analytical equation of the trough line can be calculated by regression methods. Later given a matching error related to noise, an estimate of the adequate radius can be determined immediately and provide some suggestion of where an adequate radius should be. We leave the research on automatic determination of adequate radius for future study. 6.7 Summary This chapter started with a general review of current recognition techniques and then defined the problem and the steps of object recognition based on wing representa- tions. These steps include: indexing into the model base, consistent labeling, parameter estimation and decision. During indexing, wing primitives play an important role in discriminating objects and predicting sensory views. Each candidate view is then exam- ined by a model-test procedure which is a combination of the interpretation tree search, view transformation and a clustering procedure. A pruning algorithm has been presented to implement the IT search. The expected number of surviving paths and the time complexity of the algorithm have been closely investigated. It is this investigation that provides a way to study the power of proposed 167 pruning constraints. The pruning power of a constraint is specified by the probability that object features pass the constraint. Probabilities can be obtained by solving a linear sys- tem of equations and the smaller a probability is, the more powerful the constraint will be. A computational model was given to estimate the parameters of a view transforma- tion. Each computed transformation comes with measurements of goodness of match, mean and maximum errors. Object pose is computable for sculptured objects provided that a finely tesselated viewsphere is used. At this point in time we do not have a pro- cedure for refining a pose gotten via indexing to a coarse viewsphere. The class and pose of an unknown object were determined by a clustering procedure which also reports a set of statistics to indicate the confidence in recognition results. Finally, a study of cluster radius indicates that the clustering procedure hadn’t amplified the initial sensory errors in the final results. Chapter 7 Summary, Conclusions and Future Extensions The entire work is summarized and conclusions are drawn from the experimental results. Major contributions are then discussed and possible extensions of the research are addressed. 7.1 Summary and Conclusions The research in this thesis was concerned with a restricted recognition problem which can be stated as follows. Given a range image of a scene, which can be a jumble of either distinct or similar rigid objects, specific objects in the scene are to be recognized based on the given single image and a set of object models. Two major tasks were included: identification (recognize the object class) and localization (determine the object pose). The objects could be of arbitrary shape. To solve a recognition problem, researchers must first understand the object domain and create a database containing the representations of these objects. Then algorithms are developed to extract object featmes in sensory data and construct a representation com- parable to the internal representations in terms of the detected features. Finally, pro- cedures are developed to recognize objects based on their representations. There are three important tasks which have traditionally been considered to be formidable in a recogni- tion problem. The first is the representation of objects, especially irregularly shaped objects. The second is the development of feature extraction procedures. The third is to 168 169 develop a recognition procedure based on the object representations. In this thesis, we have paid more attention to the issues of representation and recognition, and less to the feature extraction. In order to represent objects, a new set of primitives, called object wings was pro— posed. There are two categories of wings: simple wings and composite wings. A simple wing is defined as a triple including a pair of surface patches separated by a contour seg- ment, while composite wings are formed by grouping simple wings through non- accidental relationships such as cotermination, symmetry, parallelism, connectivity, col- linearity and curvilinearity. Thirty-four simple wings were defined to serve as prototypes for practical use. This set of new primitives possesses a higher level of structure than other commonly used features such as vertices, edges, surfaces and critical points. We thus make the conclusion that object wings together with their spatial structures provide more information than the aforementioned features for both indexing into a model data- base and computing object pose. After closely examining the properties of wing primitives and their structural rela- tions, we studied the adequacy of wing representation for polyhedral scenes. That is, given the wing representation of a scene containing polyhedral objects, we can recon- struct all visible parts of objects based on the given wing representation - that is the coor- dinates of all visible vertices, the equations of all visible faces, and the completely labeled line drawing of the scene. To this end, we demonstrated that if a wing representa- tion has been derived from the image of a polyhedral scene, the spatial structure of the scene could be uniquely and correctly recovered from the wing representation. Then, we turned to the central issues of how to represent objects using the proposed wing primitives. A set of objects, whose surfaces range from polyhedral to arbitrary shapes, were used as examples for object representation. This set of objects included a block, a coffee cup and a man-made clay cobra head. Since wing primitives are not view-invariant, a multiple-view scheme called a viewsphere was adopted. Several view- spheres with different tessellations of 20, 80, 140, 240 and 320 viewpoints were studied for the purpose of multiple-view representation. It was proposed that each object be 170 represented in terms of hierarchical aspects which are characterized by "scaled" wing sets: common wing set, distinguished wing set and distinct wing set. We examined the properties of uniqueness, stability, compatibility, terseness and locality of wing representations. The testing of wing representation with respect to these criteria was done by investigating the influence of convolution filters, view resolution and map resolution on the resulting representations. In order to inspect the compatibility of wing representations, we have implemented several segmentation techniques and developed methods to extract wings from sensory data. The results revealed that the wings detected in real images are comparable to those of internal representations in the database which were made from models. We concluded that wing representations have the properties of uniqueness, stability, locality and compatibility for the object domain tested. However, the terseness property is not satisfied by current wing representation due to the multiple-view scheme of viewsphere. It seems possible that this problem can be solved by using low map resolutions and coarse viewspheres. Also, the inherent hierar- chy of the viewsphere and the "scaled" wing sets can be combined to achieve more com- pactness of representation. In order to show that objects can be identified and located through matching wing representations of scenes and models, we developed a recognition procedure that has been divided into four stages: indexing, consistent labeling, parameter estimation and decision. Techniques for dealing with these stages included feature comparison, interpre- tation tree search, curve matching and clustering. Several properties of these algorithms were also investigated. The experimental results encourage us to make the conclusion that 3-D object recognition can be achieved based on wing representations. 7.2 Contributions The most important contribution of this thesis was the introduction of wing representation theory and the associated computational framework. It is within this framework that techniques of representation and recognition of arbitrarily-shaped 3-D rigid objects were developed. A major purpose for proposing this theory is that previous 171 theories were either too restricted to deal with general objects or too idealized to imple- ment in machine vision systems. To compensate for these two difficulties, wing represen- tation theory was proposed, which includes three major components: a set of wing primi- tives, a set of wing constraints and a multiple-view data structure called viewsphere. This theory was designed to (l) handle general rigid 3-D objects, (2) overcome imperfect feature extraction, (3) index to object models and (4) compute object pose. Based on wing representation theory, a series of experiments were performed to answer questions on how to extract wing features from synthetic images for consu'uction of the object representation, what kind of geometrical structure is suitable for wing features, how to compute these structures, and how to detect comparable object wings from real images. Although current results are by no means exceptional, they are good enough to show that the construction of representations in terms of object wings is feasi- ble from both theoretical and practical viewpoints. Another contribution in this research is the design of a procedure to recognize scene objects through their wing representations. There are two requirements which make the development of the recognition procedure dificult. The first was that both problems of identification and localization must be handled. The second requirement was that the recognition procedure should be able to deal with the multiple-view representation scheme together with multiple object features. It is desirable that based on this prototype recognition procedure, better ones could be designed by either improving the current method or designing more advanced approaches such as automatic programming [Goa83,Ike88] and connectionism [Fe181,Sab82]. In wing representation theory, eight unary structural constraints and five binary rela- tional constraints have been defined. A viewsphere combined with this set of wing con- straints and wing primitives form the essential ingredients of the theory. In order to test the appropriateness of wing constraints, the probabilistic models for estimating the expected numbers of surviving paths and constraint checks were constructed in terms of a pruning algorithm. These models provided a way to study the pruning power of wing constraints so that redundant and weak constraints may be discarded. These probabilistic 172 models formed a contribution to the design of wing representation theory. Finally, we discovered that image segmentation based on the signs of the directional derivative has several properties of interest. Since the computation of directional deriva- tives is easy and the signs of the derivative are stable, segmentation results are fairly reli- able. Although contours and regions extracted by this method may not always correspond to physical counterparts, like object silhouettes, these contours and regions are sensitive to view perturbation. Object silhouettes have long been regarded as useful clues for object recognition. Derivatives in any direction can be easily computed from a pair of orthogonal directional derivatives using the following equation 3% = fucose +fzsin0, where r is the scanning vector and 0 is its directional angle. The sign map of the deriva- tive with respect to any direction can be efficiently predicted. This property together with other properties of rotational equivalence (Section 5.4.2.2) and computational reliability reveal the potential that object wings may suitably be defined based on those contours and regions extracted by this particular segmentation technique. To this end, further study of directional derivatives will be needed. Secondary contributions of this research include the following: (1) the reconstruction of labeled line drawings from wing representations of polyhedral scenes, a byproduct of demonstrating that wings are an adequate representation for polyhedral objects, (Section 2.6, Appendix G), (2) the use of a structured light sensing system for constructing geometric models (Che85, Section 4.1.1), (3) the development of an approach to augmenting geometric models such that model vertices contain 3-D coordinates and a set of surface characteristics (Section 4.2), (4) the analysis of errors resulting from sensor calibration and uansformation computa- tion (Sections 4.1.3.2, 4.1.3.3), 173 (5) the introduction of a general model for estimating computational errors caused by dif- ferent sizes of convolution masks (Section 5.2.4), (6) the extension of Schwartz & Sharir’s curve matching by including a scale factor (Sec- tion 6.5.1, Appendix F), (7) the empirical demonstration that initial sensory errors are not amplified in the final results of pose computation through the clustering procedure (Section 6.6). 7.3 Future Work Experimental results support the thesis that simple wings can be reasonably extracted from both synthetic images (for representation) and real images (for recogni- tion), and that objects can be recognized through their wing representations. These results encourage us to go flu-ther in the proposed representation theory and its implementation. In the previous work, we have deliberately impoverished object wings during experimen- tation, considering only a collapsed set of thirty four simple wings. Other categories of wings, through either augmenting or collapsing the current wing set, need to be investi- gated. In order to improve object representation and object recognition, another part of wing theory, concerning composite wings, should be carried out. The use of composite wings in representation can reduce the storage complexity of the current representation in terms of simple wings because composite wings are higher level features than simple wings. In addition, indexing models using composite wings will be more effective than using simple wings so as to increase the efficiency of recogni- tion. The detection of non-accidental relationships among simple wings, such that simple wings can be grouped into composite wings according to some relationships, will form an important research topic which will be closely related to perceptual organization. It is known that if object segmentation (high level image segmentation) can be com- pleted for an image, object recognition based on the segmented image will become much easier. Of course, recognition and segmentation can proceed simultaneously. Since current segmentation techniques have difficulties with feature extraction, wing maps derived from images are typically imperfect. By imperfect wing maps, we mean that the 174 wing maps have either missing wings or spurious wings or both. Based on such a kind of wing maps, there are two directions to follow such that high level segmentation of images taken from scenes with multiple objects might be achieved. The first direction to follow is to look for composite wings in a given wing map. As mentioned in Section 2.4, a composite wing is composed of a set of simple wings which belong to a single physical object. Once composite wings can be successfully extracted, they will be further clustered into groups so that each group corresponds to a single phy- sical object. Object segmentation of images can then be done by recovering visible parts of objects based on groups of composite wings. The second line to follow is to reconstruct labeled line drawings from wing maps. Since a labeled line drawing represents the important spatial structure of scene, object segmentation can be accomplished by interpreting the labeled line drawing. In this thesis, we have studied the reconstruction of labeled line drawings from wing maps of polyhedral scenes (Chapter 2 and Appendix G). Further study of scenes containing curved objects will be necessary. Unlike line drawings, wing maps suffer from imperfect object wings. On the Other hand, wing maps contain such additional information as range values and surface normals, information which is not available for line drawings. In prac- tice, wing maps possess more powerful capability in interpreting scenes and are more realistic than pure line drawings. There are other potential tasks that will also require significant effort. The first task may be how to remove the terseness problem currently existing in a multiple-view representation scheme. Using "scaled" wing sets along with the hierarchical scheme inherently existing in the viewsphere (Chapter 5) or using fewer aspects as representa- tives [Bas88] are two potential ways to resolve the terseness problem. The second research topic is how to automatically construct wing representations in terms of multiple views (aspects) for 3-D physical objects. Many researchers [Pla87, Gig88] have suc- ceeded in automatically constructing the aspect graphs for polyhedral objects but only a few [Koe76,79,87, Bow89] have considered curved objects. Another topic of research involves looking for efficient controls and strategies for indexing into a model database. 175 Tactics from connectionism [Fe181, Sab82], geometric hashing [Lam88], or precompiling and automatic programming [Goa83, Ike88] may provide us with the ideas of how to organize modules in our recognition system using object wings. Another interesting topic of research is to explore methods of pose computation with fewer steps. The technique of 3-D fi'om 2-D [Sh088], which does not need depth recovery, might accomplish this. APPENDIX A Computational Methods A.l Computation of 3-D Coordinates In a structured light sensing system, given the calibration matrices of the camera C and the projector P, and the coordinates of bOth image (x;,y,-) and grid (x,,y,) of a point located at a light stripe, the 3-D coordinates (.1:w ,ymzw), of that point can be computed as follows. 011012013614 P11P12P13P14 C= 021022023024 . P= P21P22P23P24 - 031032 033 034 P31 P32 P33 P34 The following relation holds. F q -x q tx; xw sxg w _ Yw _ Yw ty; - C , and syg — P zw Zw t S L l l After extension and rearrangement, the scale factors s and t can be removed, and the fol- lowing linear system of equations is obtained. (C31xg-c11)x,,+(C32x,-—c12)y,.,+(C33x,--c13)z,,, = 014-634xi (C31yi-C 21 )xw+(C 32)’i“C 22))’w+(c 33yi-C23)Zw = 624-6 34% 176 177 (Pails-P 11)xw+(P321s-P12))’w+(P33xs-P13)zw = P 14-P 341s (Pan’s-P 21 )xtfiCP 32ys-P22)yw+(P 331:“st )Zw = P 24-P 34%- Here we have four equations to solve the three unknows x,,., y,,., and 2,... However, some- times a feature point, such as a termination, only provides grid coordinates of either x, or y,. Only three equations are available, but still enough to solve for the three unknowns. A.2. Homogeneous Transformation Let T(v) denote a translation, which translates a geometrical element fi'om its current position r to a new position r + v, where v = (v,,vy,v,). , . 1 0 0 Vx v T(v) = 8(1) (1) v: (A.2.1) 0 0 0 1 a V2 V1 (a) perpendicular case (b) nonperpendicular case Figure A.1. Rotation transformations. Referring to Figure A.1, let 0 be the subtended angle between vectors v1 and v2, and R(a, v1, v2) be the rotation, which rotates VI to v2 about the axis a = (ax, a,, a,). If the rotation axis a is perpendicular to the plane spanned by the vectors v1 and v1 (Figure A. 1(a)), the rotation transformation [Fau79, Pau81] is R(a,V1,V2) = where ’11 "21 ’31 ’12 '22 ’32 r13 ’23 ’33 178 '711 "12 V13 0' r21 ’22 ’23 0 '31 '32 '33 0 0001 a2 ( 1-cos0 ) + c059 axa, ( l—cosG ) + azsin0 a,.,az ( 1-cosO ) - aysin0 aya, ( 1-cos0 ) - azsine a; ( l—cosG ) + c056 aya, ( l-cosG ) + axsinO aza, ( l-cose ) + aysinO aza, ( l-cosG ) - axsine a3 ( 1-cosG ) + cose a (A.2.2) However, if the rotation axis a is not perpendicular to the plane spanned by the vec- tors v1 and V; (Figure A.1(b)), the 0 in the above equations should be replaced by 0'. . [V2-(V2'k)k][V1"(V1'k)/€] case = Ivz-(vz-k)k||v1-(v1-k)k| (A.2.3) APPENDIX B Construction of Viewspheres B.1. Viewsphere with 320 Viewpoints Without loss of generality, assume the viewsphere is a unit sphere. The tessellation starts with an icosahedron [Ba182, Bas88], which has 12 vertices, 20 faces and 30 edges. Define \/ r l _ l _ = Ef/L4- , b = W , C — 3 , d - a + b , . . 1 + 315 . . where gr rs the golden ratro; gr = ———2—-. The coordlnates of 12 vertices are [or as b]: [Os as -b]a [.09 be 0]: [—b9 09 .0]: [.09 _b9 0], [or .09 —b]9 [09 .09 b]: [as -b’ 0]: [b9 09 a], [as b: 0]: [-ba 09 a]: [b9 09 ‘0]. For each triangle of the icosahedron, a geodesic tessellation of frequency two splits it into four small triangles. This process is repeated once more after the newly created vertices have been projected onto the sphere surface. As a consequence, a total of 20 x 4 x 4 = 320 cells can be obtained. We call these cells the viewpoints of the view- sphere. 179 180 B.2. Viewsphere with 240 Viewpoints Starting with a dodecahedron [Yan85] that has 12 pentagons and 20 vertices, define a = 0.7136, b = 2a cos36° cos 18°, c = sin (cos‘1 g), d = a sin 54°. We can immediately obtain the coordinates of those 20 vertices as .2 2). VI = (%9 ct 0), V2 =(%9 ca 0), V3 = (-c’ or 121'): V4 = (C, O, V5=(o. 52’- c). v.=(o. :23 c). v7=<—d. d. d). Vs=-pr+R'-r. = AR'Pl'l'R’To. and Il'o'lz = IAR “P1+R"l'o|2 5 IAR'P1|2+|R"Po|2 S IARIz IP11: + IR'lz Irolz Since the rotation matrix is an orthonormal matrix, ' d IR |2=‘5 and IP1|2~3~ 2r, d 2r, Weobtain SIARI2+J3T I . 2r 2r . Let p =—d-2-1n’andp=-d—°. Then, p SIAR|2+prl3. 184 Since |AR|2 = (22m3-r’2, from Equation (C.1) i=1j=l |AR|2 = 24141-122)“. Finally we obtain p' s 2.11 - (1 -p2)UT+p‘B-. APPENDIX D Criteria of Uniqueness In this appendix, the criteria and their penalties for justifying the uniqueness pro- perty of wing representation are listed. A similarity function (in Chapter 5) and a priority function (in Chapter 6) are constructed based on the penalties. Definition of terminolo- gies have been defined in Section 5.1.1. Let s; denote a penalty for two representations which violates the criterion i of uniqueness, and w j be the weight pertaining to criterion j. Let R, denote the wing representation of an object It, and A be the entire set of wing primitives. Two representations R1 and R2 are not the same, if one of the following conditions is not the same. (1) The numbers n,- of aspects of models i : penalty: s1 =w1|n1-n2|. (2) The sets D3 of distinguished wing primitives of models i : penalty: 52 = wz|(DlUDZ)-(Dln02)|. (3) The sets ADi of appearance frequencies corresponding to the sets Di of distinguished wing primitives of models i : penalty: forpj e A, ifpj e D‘, then Aj- =ADj-; otherwise Aj- = 0. S3 = W3 ZlAj -Aj I i=1 185 186 (4) The sets Pi of distinct wing primitives of models i : penalty: 34 = w4|(PlUP2)-(P1nP2)|. (5) The sets APi of appearance frequencies corresponding to the sets Pi of distinct wing primitives of models i: penalty: forpj e A, ifpj e P‘, then Aj- =APj-; otherwise A5- =0. 111 2 s5 = W5 2|A} —A,-|. i=1 (6) The sets F i of wing features of models i : penalty: s6 = W5|(F1UF2)-(FlnF2)|. (7) The sets SD“ of distinguished wing primitives of aspects of models It : Let so? be the set-of distinguished wing primitives of an aspect i of the model k. penalty: S7 = W7 |(SDl U SDz) - (SD1 n SD2)|. (8) The sets SP‘ of distinct wing primitives of aspects of models k : penalty: Sg = W3 |.(SPl U SP2) - (SPl n SP2)|. (9) The sets SF" of wing features of aspects of models 1:: Let SF 1‘ denote the set of wing features of an aspect i of the model k. penalty: s9 = W9|(SF1 u srz) - (SF1 n sr2)|. (10) The sets R" of structural relationships of models k : penalty: s10 = wlo|(R1UR2)-(RlnR2)|. (11) The sets SR" of structural relationships of aspects of models k : penalty: s11 = W11 |(SRl U SR2)-(SR1 n SR2)|. APPENDIX E Interpretation Tree Search Consider a general case of feature types {t,-, i = 0,N}. Suppose we are given a set of m sensed features, So with type to, SI with type t1, . - - °, and S”, with type t,,,. Suppose also that we want to test this set of sensed features with a given model which has no features with type to, n1 features with type t1, - - - -, and n", features with type t,,,. Thus, the carry array is (n,,,+1, n,,,_1+1, - - - -, n1+1, n0+l). A sketch of the path generation function is then given below. Function : Path Generator Objective: Generates the next candidate path from the current path. Input: PATH: the current path array. POS: the position of the path array at which the digit will be increased by one. CARRY: the carry array. Output: PATH: the generated path. MARK: the most significant position of the path array where the digit is modified. COMPLETE: reports whether or not the current path is the last path. Procedure : begin if (the current path is the last path) COMPLETE = yes; else COMPLETE = no; increase the digit at position POS of array PATH by one through referring to the CARRY WY; mark the most significant position of PATH where the digit is changed; end: (‘1' end of procedure path generator *"') 187 188 Suppose the constraints on both sensed and model features are available. There are two kinds of constraints: unary constraints and binary constraints. Unary constraints specify the properties of individual features and binary constraints illustrate the relational properties between a pair of features. Properties involving more features are not con- sidered here. A function based on feature constraints to examine the legality of a path is sketched as follows. Function: Path Check Objective: Checks the legality of a path. Input: PATH: the path array. MARK: the highest position of the path array where the digit has been modified. LENGTH: the length of the path. Output: POS: the position of the path array where the consistency check failed. LEGAL: reports whether or not the path is a legal interpretation. Procedure : begin POS = LENGTH - 1: for(i=MARKtoLENGT1-l) do begin check consistency between sensed and model features using unary constraints: if (consistency fails) { POS = i: LEGAL = no; return; ] for (j =i-1downto 0) do begin check consistency between pairs of sensed and model features using binary constraints; if (consistency fails) [ POS = i; LEGAL = no: retum: } end: end: LEGAL = yes; end; (** end of procedure path check **) Using the path generator and the path check procedures, the pruning algorithm for interpretation tree search is given as follows. 189 Algorithm : Interpretation Tree Search Objective: Looks for candidate interpretations. Input: MODEL: model features. SENSE: sensed features. Output: NO: number of candidate interpretations. PATHS: candidate interpretations. Procedure : begin LENGTH a number of sensed features; POS = LENGTH - 1; COMPLETE = no; set up CARRY array; set up initial PATH array: while (COMPLETE is no) begin call procedure PATH GENERATOR(PATT1,POS,CARRY,MARK,COMPLETE); call procedure PATH CHECK(PATH,MARK,LENGTH,POS,LEGAL); if (LEGAL is yes) store PATH in PATHS; end: end; (** end of algorithm interpretation tree search “) APPENDIX F Curve Matching Consider a consistent pair of model wing L and detected wing segment I as shown in Figure 6.3. Let 1' be that portion of L which is the corresponding segment of l by transformation T. Each curve is represented by a set of uniformly spaced points (equal distance intervals). L = {u1,u’2. ° ° ' .lln}: l= {u1,u2,' - - ,u,,.}, where mSn, l' = {v1,v2, : - -,v,,._}. The matching problem can then be formulated as an optimization problem, "I = ° T . _ . 8 ijI u, v, I2 . Tuj = Resuj +a, (R1) where R9 is the rotation, a is the translation and s is the scale factor. Our purpose is to look for a transformation T such that minimizes the 8 function. To simplify the deriva- tion, assume the scale factor sis a constant. In a perspective projection, it is a nonlinear function of depth, but can be approximated by a constant when depth variation is small relative to standoff. First translate by a“ the geometrical center of l to the origin of a reference coordi- nate frame. 190 191 l m — u'. (R2) ,. a . Without loss of generality, let us use the same symbols as those before this translation, so that 2n,- = 0. (R3) i=1 "I The Equation (F.l) becomes 5 = snin IRosuj + a - vj |2. 0a :3 i=1 Expanding the above equation and using the result of Equation (F.3), "I M "I "I 8= min [522 Iujl2 +m|a |2+ z |vj|2 —22a 'vj -232Reuj~vj ]. (R4) 9,8 ,3 i=1 i=1 j=l j=l Examining this equation, the translation a, is independent of 0 and s and can be deter- mined immediately. I a=-l- v,- (F5) mil After substituting Equation (F.5) into (R4), in order to minimize the resulting equation , the following criteria should hold. ’a_2§_ 825‘ 2 _ V8 = 0, and det 358 8892853 > 0. _ 8389 662 . Therefore, we obtain the intermediate scale factor: In . lellj'Vj s = JT——. (R6) 2 luj |2 j=1 192 Substituting Equation (F.6) into (El), we obtain (ZReuj°Vj)2 . m 2 l m 2 j=1 8=n101n[2|le ~;I§le - m ]. 1:1 1-1 zllujlz 1: Now, the problem is reduced to finding an R9 that minimizes the function 8. Since a rota- tion in the Euclidean space can be represented by a matrix, if we map 2-D Euclidean vec- tors into the complex system, the rotation can be simply represented by a factor e ‘9 . Through this complex mapping, the function 8 becomes a scale function of 9. The rota- m tion angle, which minimizes the function. 18 the negative of the polar angle of zlujvj, I: where uj and v,- are complex numbers corresponding to vectors 11,- and vi. We obtain the minimization 8" of 5 m 2 m 2 m lifflil +|zl(ujxvj)| = J: ZV'IZ- '1 j=1] "' . 2 Zlujl i=1 1- m 2 1 5 = 2|va -m| j=1 m and m m (I zujvj I2 + I 2(UjXVJ’)|2)l/2 s = 1:1 m 1:1 , a = au-Rea'. 2‘."le2 j=1 The inverse value of 5' /m can be used for goodness of match. As shown in Figure 6.3, we assume that the sensed wing will typically be a subset of the model wing and hence the mapping is "into" and not "onto". The following transformation computation need to applied to each segment of length m of L, so that it will result in 2(n-m+1) transformations computed from the pair of! and L. The constant 2 here is due to the need to match in two directions. Note that the true transformation is included in this set of computed transformations but that there may be several viable candidates because one pair of wings will not be enough to determine a unique transformation. There are usually 193 several consistent wing pairs available. Having computed candidate transformations for all wing pairs, a "best" transformation can be determined through a clustering procedure. APPENDIX G Reconstructing Line Drawings from Wing Representations In this appendix, we consider the adequacy of wing representation for polyhedral scenes. To this end, we demonstrate that if a wing representation has been derived from the image of a polyhedral scene, the spatial structure of the scene could be recovered from the wing representation. In other words, given the wing representation of a scene containing polyhedral objects, we can reconstruct all visible parts of objects based on the given wing representation - that is the coordinates of all visible vertices, the equations of all visible faces, and the completely. labeled line drawing of the scene. _ As already pointed out by many line-drawing researchers [Huf71, C1071, Mac73, Kan80, Bar81, Dra81, Sha79, Fuk83, Sug86, Ma189], without additional information, analysis of line drawings, based on categories of vertices and edges [Huf71, C1071], using relaxation of consistent constraints [Wal75], can usually lead to a number of coherently labeled line drawings. Geometrical properties such as range values of vertices, edge length, angle between faces, size of face and surface normal, are commonly used as additional information by researchers to reduce ambiguity of interpretations. Here our purpose is not to interpret the 3-D meaning of a 2-D line drawing, but rather to construct the 2-D line drawing and its 3-D interpretation from appropriate frag- ments of the 3-D scene. It is fairly natural for us to start with the surface equations before reconstructing line drawings. This is because the surface information available in object wings is enough to compute equations of polyhedral object surfaces which are part of the wings. In the following subsections, we study a theoretical approach to interpreting 194 195 polyhedral scenes based on given wing representations. 0.1 Assumptions and the Object Domain Assume that A 1) the object domain contains polygons (2-D) and polyhedra (SD), A 2) scenes can be composed of multiple objects which may occlude one another, A 3) the entire outer boundary of a scene is in the field of view, A4) the wing "sensor" produces for each visible edge (in some arbitrary but fixed viewpoint) at least a wing segment associated with the edge, A 5) there is no spurious wing (noise) in wing representations derived from scene images, A 5) the range images from which wing representations are derived are available, and A 7) measurements and computations are infinitely accurate. Clearly, among these assumptions, 4, 5 and 7 are unrealistic but this is a theoretical study. There are interesting practical applications which we will leave to future study. I Q N (a) (b) Figure 6.1. Example objects are not permissible in our domain: (a) this object is composed of a polygon (2-D) and a polyhedron (3-D), (b) this object is composed of a set of intersecting polygons. Basically, the approach is based on faces, which may arise fiom either 2-D or 3-D objects. Not all scenes and not all views can be handled. Although the object domain includes bath 2-D and 3-D objects, an object can not be composed of bath 2-D and 3-D components; in other words, they must be either polygons or polyhedra. Composed objects will not be permissible; an example of such an object, which is made of bOth polygon and a polyhedron, is shown in Figure G.l(a). Planar panels [Sug86], Origami 196 objects [Kan80], and solid polyhedra are some classes of objects that can be handled by the reconsrruction scheme given below. However, objects formed by intersecting a number of polygons, like the one shown in Figure G.1(b), are not considered in this study. Objects may have polygonal holes. Some examples of objects which can be han- dled are shown in Figure 6.2. s V %A w .___J Figure (3.2. Examples contained in the object domain: a planar panel with a triangular hole, an Origami object with two holes, a polyhedron with a vertex formed by four faces, and a polyhedron with a hole. In this paragraph, we will impose three more assumptions on our object domain. A vertex of a polyhedron can be shared by a number of faces and edges (see vertex v in Fig- ure 6.3), while A 3) a vertex of a polygon should be shared by exactly two noncollinear edges. A 9) Each edge connects exactly two vertices for bath polygon and polyhedron. A 10) An edge of a polyhedron is formed by two noncoplanar polygons. A face of a polyhedron is a polygon. A polygon is a connected planar area which may have inner polygonal holes. A hole of a face must be within the interior of the face. The interior of a face is an open set bounded by the edges of the face but doesn’t include edge points. Thus, holes will not touch boundaries of faces. Refer to the example in Figure 6.4, in which area a is a hole of face F, whereas areas b, c, d and e according to our definition are not holes because these areas touch the boundary of F by either vertex or edge. An object face (or polygon) can be represented by a set of simple circuit(s) of edges in a line drawing. In graph terms, a circuit is a path whose start and end vertices are the same. A circuit is called simple if no vertex, other than the start-end vertex, appears more than once, and the start-end vertex does not appear elsewhere in the circuit [Eve79]. Note 197 ->‘1 V- Figure 6.3. A vertex of polyhedron can be shared by a number of faces. that a hole itself is also a simple circuit of edges. A: y, P 5 Figure 6.4. Area 0 is a hole of face F, whereas areas b, c, d and e are not holes of F because these areas touch the boundary of F (violates A 3). We will consider images which are taken under general viewpoints for 2-D objects (polygons); however, no restriction on viewpoints is imposed for 3-D objects (polyhe- dra). Thus, a polygon (Z-D) cannot project into a straight line in the image plane, whereas a polyhedron (3-D) can form by accident a polygon in the image plane. In short, we demand that for each visible edge at least one face that forms it is visible. For simplicity, if an image is taken from a jumble of objects (including both 2-D and 3-D objects), the image is assumed to be taken fi'om a general viewpoint for all objects. We also assume that there is no accidental arrangement in a jumble of objects; in Other words, we only consider general scenes. Accidental arrangements include such situations as touching 198 vertices, touching edges and touching faces, which would violate the vertex-edge-face restrictions A 3, A 9 and A 10 stated above. An example of accidental alignment is shown in Figure 6.5, where two polyhedral objects have their top faces coplanar and have their A edges touch each other. N Figm'e 0.5. An accidental arrangement - two polyhedral objects have their top faces coplanar and meanwhile have their edges touch each other (violates A 10). Suppose the background of a scene is a large plane. Any line segment in a wing representation must thus be shared by exactly two distinct regions whose corresponding spatial faces may either separate (jump edges) or connect (crease edges) each Other. Therefore, to recognize whether a line segment present in a wing representation map is a real object edge or just a virtual line, we simply inspect points (in both the original range image and the wing representation) on both sides of the line segment to see whether they are located on the same plane. A wing representation of a polyhedron depicted in Figure G.6(a) purposely made to be imperfect, is an example of input data to a reconstruction procedure to be addressed in Section 6.2. In this figure, the surface types of wings are not shown because they are all planar faces, and the small dots indicate the boundaries of wings. Figure G.6(b) shows the labeled line drawing reconstructed from the wing representation, which is expected to be the output of the reconstruction procedure. Here, we follow the convention of line- drawing researchers [Sug86] that a scene is successfully interpreted when coordinates of vertices, face equations, and the labeled line drawing of the scene, or equivalently the spatial structure of scene are reconstructed. Note that line-drawing people [Huf7 1 , C1071, Ma187] have used (->, +, -} to represent occluding edge, convex and concave edges 199 :23. (a) wing representation (b) labeled line drawing Figure 6.6. (a) A wing representation of a scene containing only one polyhedral object; (b) the reconstructed labeled line drawing of the object. This is an example of input and output data of the proposed reconstruction procedure. respectively; however, our wing representation currently uses (l, <->, +, -} to denote silhouette, jump, convex and concave edges. In a polyhedral world, the occluding edge (->) in line drawings actually denotes both silhouette (l) and jump edge (<->) in wing representation. 62 Reconstruction Rules The basic idea of reconstructing labeled line drawings from wing representations of polyhedral scenes (or simply from "wing maps") is first to recover the line drawings of individual visible object faces: the entire scene present in an image is then reconstructed by integrating the recovered individual faces. During reconstruction, only local informa- tion is employed to recover individual faces. Global constraints are not necessary pro- vided that we cantake sparse samples from the original range image. It is because of this property that a variety of objects and multiple-object scenes with occlusion can be han- dled by the proposed reconstruction algorithm. By relying more heavily on the general assumptions, a global search procedure can be formulated which does not need to 200 reference the original range image. Refer to the example wing map shown in Figure G.6(a). Recall that an object wing is composed of a pair of surface patches separated by a contour segment. In a polyhedral world, an object wing is composed of two planar patches separated by a straight line seg- ment. According to assumption 6 - the range images from which wing maps are derived are available, one can immediately compute equations of planes containing object faces which form the wings. Let a plane ax + by + cz + d = 0 be represented as a vector (a,b,c,d). Therefore, the data structure of an object wing i can be represented as W: = ((xi1.yi1). (xi2.)‘i2). (ai1.bir.ci1.dii). (0:2.bi2.ci2.di2)) where (x31 ,ygl) and (x32,y,-2) are two endpoints of the object wing. A wing representation (wing map) of a polyhedral scene is a set of object wings. {Wili =1."". n}. where n is the number of object wings. Due to assumption 7 - measurements and compu- tations are infinitely accurate, the computed plane equations are assumed to be perfectly precise. The following rule clusters wing segments based on their plane equations such that wing segments in the same cluster lie on the same plane. Rule 1: Wings can be clustered into a group, called a wing group, such that wings in the same group lie on the same plane. Clearly, it is not necessary that wings in a group correspond to the same object face; they may come from several distinct object faces which lie on the same plane. According to assumption 4 - the wing "sensor" produces for each visible object edge at least one wing. Thus, all planes containing object faces will be known as will be all lines contain- ing visible edges. Additionally, due to assumption 5 - there is no spurious wing (noise) in wing maps and there is no superfluous plane derived when computing plane equations from wing segments. In other words, all planes derived are exactly the planes on which object faces lie - no missing and no extra plane. 201 Consider the example in Figure G.6(a). By following Rule 1, we obtain wing groups: g1 = {3, 4, 9,10,11, 12,13,14, 15},g2 = {1, 2, 3, 4, 5}, 33 = {17, l8, 19, 20}, g4 = {5, 6, 7, 8, 9} and g5 = {8,10, 11,12, 16,17,18}, Each wing group corresponds to only one object face. However, it is not always so simple. For instance, look at the exam- ple shown in Figure G.7(a). After grouping wing segments according to their plane equa- tions, eight wing groups are obtained. They are: h] = [1, 2, 3, 4, 5}, h; = {10, ll, 12}, 113 = {33, 34, 39, 40, 18, 19, 24, 25, 29}, h4 = [ 4, 13, 14, 15, 16, 17, 28, 29, 26, 27}, h5 = {21, 22, 23, 24}, h5 = {7, 8, 9, 11}, h7 = {5, 6, 35, 37, 38, 39, 33, 32, 31, 30, 26, 25, 23, 22, 20, 14, 13, 7, 8, 9, 10, 12} and 83 = [36, 37, 38, 40, 20, 21, 19, 18, l7, 16}. Among these groups, h3 has wing segments from faces f3 and f14, and hg has wing segments from faces f3; and f33. There is no trivial way by which a wing group can be separated into subgroups such that each subgroup corresponds to only one object face. The following rules provide the way to reconstruct line drawings of object faces which lie on the single plane from a given wing group. -i— "r- I ' .I 4 0 fl: / 9' / ‘P 5' I 1 I . 9 B + £1! fr: ' 11. I ' ‘ i"? 3" i «r f” ' + ‘ I - p Z1: 1." ML 5* 4‘? M T .53" 21' P if H 1L a ”‘I" I l : k r» .mr‘r” ,3» ""-.t.=- +5— '1" 3' ' t i“ - ,. ,. i" '4. 413 4» + f 3f ‘1' f 11:: sf 2/ l + , 3t 30 ' . —_,— 1 (a) wing representation (b) labeled line drawing Figure 0.7. A plane may contain several distinct object faces. For example, faces f32 and f33 lie on the same plane and f3 and f 14 lie on another plane. Suppose the given group is 31 of the example in Figure 6.6. Figure 6.8 shows the picture of g 1. According to assumption 3 - the entire outer boundary of a scene is in the 202 field of view, without loss of generality, we assume that there is a picture frame corresponding to the boundary of the wing map as shown in Figure 6.8. Then, for each wing segment in g1, construct a straight line within the picture frame. The broken lines in the figure are the constructed lines and the black line segments in figure are the given wing segments. Note that an object boundary may contain several wing segments, for instance, wings segments (3,4), (11,12) and (14,15). Due to assumption 7, collinear wing segments will lie on the same constructed line. By assumption 4, we know also that there are no extra lines generated and no missing lines; all visible object edges must lie on these constructed lines. We then compute intersections of lines within the picture frame. This will result in a set of intersections and line segments. Some are real object features (vertices and edges); some are artifact features. Each visible object vertex on the given 3-D plane must correspond to one of these 2-D intersections. The intersections are labeled by letters (a,b,c,..,g) in the figure. Let a line fragment be a portion of line, and a line segment be a line fragment with two intersections as its endpoints. For convenience, we also define a semi-line segment as a line fragment with an intersection at one end and another end located at the picture frame. We now select only those line fragments which form the visible boundaries of the given 3-D object faces in the given plane. Figure 0.8. Straight lines are constructed along wings in g 1. 203 Recall that each visible object edge should have at least one wing segment associ- ated with the edge (assumption 4). We can remove semi-line segments because such line segments have no wing associated with them and will correspond to no object edges. Rule 2: Line segments with their endpoints located on the picture frame are removed, or simply, semi-line segments are removed. Q 5. / \>r ¢ \ ' \ \ \\ / ,’(\ \ >2 /“ }\\\ "\/h’ (a) (b) Figure 6.9. (a) After removing line segments, the resulting picture of the example in Figure 0.8. (b) For each remaining line segment, preserve the line segment if it overlaps at least one wing segment. After removing semi-line segments, the resulting picture of the example is shown in Figure G.9(a). Then, for each remaining line segment, preserve the line segment if it overlaps at least one wing segment. This is because of assumption 5 - there is no spurious wing (noise) in wing maps. Therefore, a line segment overlapping a wing segment must lie on some object edge. Rule 3: Preserve each line segment that overlaps a wing. The recovered object edges (denoted by continuous line segments in the figure) of the example is shown in Figure G.9(b). There are also several undefined line segments (denoted by broken line segments). As we can see, for this particular case, all object edges have been recovered. However, consider another example in Figure G.7(a); after applying Rule 1, 2, and 3 to group [14, the resulting picture is shown in Figure G.10(a), where line segment [71, which is a portion of object edge, hasn’t been recovered because no wing segment is associated with it. Fortunately, those missing edge segments must lie on some broken line segments as mentioned early in this section. We need to examine broken line segments in order to recover missing edge segments. 204 9 “\X > X '5 . on \ \ 1.1 0‘ (b) Figme (3.10. (a) Consider the example in Figure G.7(a); after applying Rule 1, 2, and 3 to group h4, 13, line segment H is missing due to no wing segment on it. (b) The missing line segment can be recovered by extending the open ends. Since any line segment in a line drawing must separate exactly two distinct regions, if points (refer to both the range image and the wing map) on both sides of a candidate line segment are located on the same plane (the plane on which the members of the current wing group lie), the line segment can be deleted because it doesn’t correspond to any object edge. Therefore, the line segment between k and l in Figure G.10(a) can be successfully recovered, while segments 52, e7, g7; 37 and 77:. are identified as non-edge segments. Rule 4: A line segment is removed from the line drawing unless it has exactly one side with points on the plane of the current wing group. The result of applying this rule to the data structure of the example in Figure G.10(a) is displayed in Figure G.10(b). Note here that this rule makes Rule 3 redundant because the edge segments recovered by Rule 3 can also be recovered by Rule 4. How- ever, in an implementation, the operation of Rule 3 will run faster than that of Rule 4 in view that Rule 3 only refers to the wing segments in the wing map, whereas Rule 4 needs to refer to the range values of points in the original image and examine whether or not 205 points are located on a given plane. Due to assumption 4, a large portion of object edges can be recovered by Rule 3. Return to Figure G. 10(b). Intersections e, h and k are clearly not vertices of the polygon. As mentioned in the last section, a vertex of a polygon should be shared by exactly two noncollinear edges. However, in some situations, after removing false line segments, some intersections may become isolated. Thus, we have the following rule. Rule 5: Delete intersections which are not shared by exactly two noncollinear line seg- mcnts. After this rule, we are led to the following situation: a) every remaining intersection (vertex) has exactly two noncollinear line segments sharing the intersection (incidence degree 2), b) every line segment (edge) connects exactly two intersections, and c) line segments form a set of simple circuits. These conditions are in fact coincident with the definition of a polygon. Recall that a wing group may contain wing segments coming from several different object faces. The above rules can reconstruct distinct object faces from a given wing group; or equivalently, the rules can recover all disconnected object faces lying on'the same plane. An example of group kg of the wing map in Figure G.7(a) is shown in Figure G. 1 1, where the results of intermediate steps are presented. So far, all edges and vertices of faces are recovered. The remaining work is to label the edges with types of [1, <->, +, -}. Rule 6: A line segment is labeled with the wing type(s) which appear(s) on the line segment. By this rule, a line segment may have a compound type. A compound type means that it is composed of a set of distinct types. For example, a jump wing (<->) and a silhouette wing (1) may appear on the same line segment; in this situation, the line seg- ment is labeled with the compound type (<->l). It is possible that on the same line seg- ment several wing segments with different wing types may appear simultaneously. In practice, only jump wing and silhouette wing can alternatively appear on an image line. Other combinations of wing types are impossible. Figure G.12 shows the result for wing group h7, where a line segment is labeled (<->!) because there are two different wing 206 P ”0 -e x“. e— 7 - , -“" I ‘° 4 / rt M to 19 M g .f / u . I. / / ‘4 1 I : / ’.a I ‘ ’ ,2? I. : . c ., .. iv, u, . .9. . 1‘ ,4; f a . + 3 7 -/ a 4 2 . a..__.. __---..:: a.--“ a i 1’7 ' u ’ "l 1 ‘4 z: I, ,,/’i . ‘2 ’ . laé-_-—v—---.-r<—--§-—-—-$ E Q vm . c.- _' . x n 4' ,4, I? m . ; ,rr ‘ I ’1’ d .d C 5" C O V t ‘ .Q a ar_i/] 5‘——e fin Figure G.11. An example demonstrates that different object faces can be reconstructed from a wing group containing wing segments coming from different object faces. types appearing on that line segment. For a line segment with compound type, we don’t know where the boundaries of object wings are for the time being. This problem can be easily solved by putting all faces of objects together so that all boundaries of wings can be unveiled. Suppose that applying the above rules to all wing groups, all individual object faces are recovered and their edges are labeled too. To complete the line labeling of a scene, the following rule is invoked after individual faces are integrated, which will need to rely on the wing map and the recovered T junctions. The wing map will tell which parts of a line segment should be labeled by which types. 207 U » + + I o Figure G.12. The result of wing group In contains a line segment with compound type (<->!) which has been highlighted by a circle. Rule 7: If a line segment possesses a compound type, the components of the compound type are separated and assigned to sub-line segments according to the recovered T junctions and the wing map. The reconstructed labeled line drawing of the example in Figtn'e G.7(a) is shown in Figure G.7(b) in which unique labeling, or equivalently, unique interpretation of the line drawing is obtained in terms of an imperfect wing representation. The reconstructed labeled line drawing of another example in Figure G.6(a) is shown in Figure G.6(b). 6.3 Reconstruction Algorithm An algorithm characterized by the above set of rules, which reconstructs the labeled line drawing of a polyhedral scene from a wing representation will proceed as follows. The input data include the range image and its associated wing representation of a polyhedral scene. The output result includes a labeled line drawing, the equations of visi- ble faces and the coordinates of visible vertices. (1) Compute the equations of planes containing object faces using the wing segments in the given wing representation; (2) Cluster wing segments into wing groups according to their associated plane equations; (Rule 1); 208 (3) For each wing group (3.1) generate a line for each wing in the wing group; (3.2) compute the intersections among the generated lines: (3.3) remove line segments with their endpoints located on the picture frame (Rule 2); (3.4) preserve line segments which overlap a wing (Rule 3); . (3.5) remove any line segment unless points on exactly one side of the line segment are located on the given plane (Rule 4); (3.6) delete intersections which are not shared by exactly two noncollinear line segments (Rule 5); (3.7) record the 3-D coordinates corresponding to the remaining intersections by intersecting line equations; (3.8) label line segments with the wing types which appear on the line segments (Rule 6); (4) Integrate reconstructed line drawings of individual object faces in order to discover T junctions; (5) Ifalinesegmentinmemtegratedlinedrawingpossessesacompmmdtype,thecomponentsof mecompomdtypeamsepmawdmrdmeassignedmdisfimtpamofmeedgeaccmdingm thediscoveredeunctionsandtheoriginalwingrnapmuleD. (6) Output the resulting line drawing with edge labels, the plane equations corresponding to object faces, and the coordinates of intersections corresponding to 3-D vertices and 2-D T junctions. 6.4 Further Examples Since we haven’t imposed any restrictions on viewpoints for polyhedra, we may by accident have the situation as shown in Figure G. 13(b) for the polyhedron displayed in Figure G.l3(a). The above procedure can deal with this accidental case. In addition, objects with holes and multi-face vertices are also permissible. An example of such an object is shown in Figure 0.14, which has been taken from Sugihara’s book [Sug86]. A few objects from Origami world [Kan80] have been used to study the rules and the pro- cedure (Figure 6.15). We claim that the proposed rules and the procedure can uniquely reconstruct from wing representations the labeled line drawings for a wide variety of polyhedral objects. Due to the 3-D information in the wings, a wide variety of objects and viewpoints can be handled than is typical of past research. In fact, cmrent algorithm can interpret the objects in Figure 6.1, even though we haven’t allowed them in the domain. 209 (a) (b) (c) Figure G.13. (a) A polyhedral object, (b) an accidental view of the object, and (c) the reconstructed line drawing. I 4 1 .I 1 MS § -’ .i’xr .\ I \l/ 'l 9‘“. I .’/ t} ’1", / .\ r r > -’ ' ' l H'K O I O (a) wing map (b) labeled line drawing Figure G. 14. An object with a hole and multi-face vertices. t I I ‘ A —°. I ° {>- l-<-> / l' O - ‘ N . . .1 5:... *l ’1 , 4213 r 53 2:2 *1 /, ' , T ‘3‘ ' , ' (a) wing map (b) labeled line drawing Figure 6.15. An example of Origami objects. 210 G5 Existence, Uniqueness and Correctness of Reconstruction To justify the validity of the proposed rules for reconstructing line drawings of polyhedral scenes under the assumptions made in Section G. l, we need to cover three aspects of reconstruction: existence, uniqueness and correctness. The existence of recon- struction states that there exists at least one line drawing which can be extracted from the given wing representation. Uniqueness demands that exactly one line drawing is extracted from the given wing representation. As for the correctness of reconstruction, it requires that the extracted line drawing should reflect the spatial structure of the scene. In this section, we will discuss how and why the proposed rules and the algorithm can reconstruct a unique and correct line drawing from a given wing representation. Recall that the proposed algorithm reconstructs the line drawing of a scene fiom its wing representation by first recovering all individual visible faces of the scene; then, the entire scene is reconstructed by integrating the recovered faces. During recovery of an object face, the algorithm identifies its edges first and then recovers the face in terms of the identified edges. Note also that this hierarchical reconstruction process - from edges to faces to scene - only uses local information to reconstruct the line drawing of the scene; there is no global constraint involved. In other words, every feature (both edge and face) can be recovered independently. This fact together with assumptions 4 and 5 make the following sequence of proof applicable. That is, if one can show that every object edge present in an image is uniquely recovered, one can prove that every visible object face can be uniquely recovered and in turn prove that the entire scene can be uniquely reconstructed. Consequently, the proof of existence and uniqueness of scene reconstruco tion can be narrowed down to a proof of the existence and uniqueness of edge recovery. In the reconstruction algorithm, the step 3, which is related to the recovery of edges, includes eight substeps. The major purpose of this step is to reconstruct all object faces lying on a plane. The substeps of step 3 are quite straightforward. Once object edges has been uniquely recovered from a wing group, if the regions, which are formed by the recovered edges, satisfy the following conditions, these regions possibly correspond to some real world object faces. The conditions include: a) every 211 vertex of the regions has exactly incidence degree 2, b) every edge of the regions con- nects exactly two vertices, and c) the recovered edges form a unique set of simple circuits which bounds the interiors of the regions. So far, what we can say is that there exists solution(s) for the face reconstruction. We need to show that the face reconstruction is unique. As mentioned early in this section, the uniqueness of face reconstruction will be a natural result because faces are formed by a unique set of edges. In a similar vein, the reconstruction of the entire scene is also unique because it is formed by integrating the unique set of object faces. G.5 Summary Many previous line-drawing researchers started with a perfect line drawing and attempted to achieve an interpretation of the line drawing in terms of 3-D spatial struc- tures. Unfortunately, this usually leads to a set of possibilities. Moreover, if the initial line drawings are imperfect, the problem becomes extremely difficult. In order to solve the problems of multiple interpretations and imperfections, assumptions and additional information were employed. Although parts of the problems have been successfully solved by introducing extra knowledge, the overall schemes are not efficient. The reason is that most researchers have separated their schemes into two major modules: recon- struction module and interpretation module. Additional information is only used in the interpretation module to reduce ambiguity rather than be used in the reconstruction module. In fact, available information should be used everywhere in the whole scheme as long as the information is helpful at those places. In this research, we used range values as additional information in both reconstruc- tion and interpretation modules. This not only greatly simplifies the design of procedures but also increases the efficiency of processing. In addition, instead of following the con- ventional sequence, we separate the modules into several stages and rearrange the stages in a hierarchical manner. For instance, in our algorithm individual faces are first recon- structed and interpreted, and then the entire scene is considered. One apparent advantage of this hierarchical rearrangement is that it increases the flexibility of the algorithm in 212 dealing with a wide variety of object classes. Another important aspect of this research is the use of wing representation as input data to the reconstruction algorithm. In contrast to perfect line drawings, wing represen- tations are assumed to be imperfect. On the other hand, wing representations contain the information of range values and thus surface normals, which is not available in line drawings. In practice, wing representations are more realistic than line drawings. The additional information not only considerably compensates for certain deficiency of wing representation in interpreting scenes but also leads the reconstruction procedure to a unique result. The judgement of correctness of a labeled line drawing is also greatly simplified due to this additional information. It is our conjecture that after the wings are extracted, the line drawing can be con- structed without further access to the underlying range image. The proposed algorithm applies the first three rules used previously. It then departs into a state-space search for the polygonal circuits that form the faces in the given plane. At each step of the search one line segment is added to the line drawing and all constraints checked. It is easy to see that the search is finite although exponential. The key point to prove is that there is a unique set of circuits satisfying all the constraints that is the goal state of the search. While such an algorithm has less practical value than the one detailed here, it is of theoretical interest and will be pursued in future work. Currently, we only consider polyhedral scenes. If a scene contains only one object, the proposed algorithm can handle more general objects than those defined in the current object domain. We need to study how to extend this work to include scenes with curved objects. [Asa86] [Aya86] [Baj871 [Ba183] [BarS4] [Bar8l] [Bas88] [BesSS] [Bes86] [Be888] [Bha87] [Bie87] List of References Asada,H., and Brady,M., "T he Curvature Primal Sketch", IEEE PAMI, Vol.8, No.1. 11132-14, 1986. Ayache,N., and Faugeras,O.D., "HYPER: A New Approach for the Recognition and Positioning of Two-Dirnensional Objects", IEEE PAMI, Vol.8, No.1, pp44-54, 1986. Bajcsy,R.. and Solina,F., "Three Dimensional Shape Representation Revisited", lst ICCV, London, England, pp23l-240, 1987. Ballard.D.l-I., and Sabbah,D., "Viewer Independent Shape Recogni- tion", IEEE PAMI, Vol.5, No.2, pp653-659, Man, 1983. Barr,A.H., "Global and Local Deformations of Solid Primitives", Computer Graphics 18(3). PP21-30, 1984. Barrow,H.G., and TenenbaumJ.M., "Interpreting Line Drawings as Three-Dimensional Surfaces ", Artificial Intelligence, 17, pp75-l 16, 1981. Basri,R., and Ullman,S., "The Alignment of Objects with Smooth Sur- faces", Proc. 2nd ICCV, Tampa, Florida, pp282-288, Dec. 1988. Besl,P.J., and Jain,R.C., Three-Dimensional Object Recognition", Computing Survey, Vol.17, No.1, pp75-145, 1985. Besl,P.J., and Jain,R.C., "Invariant Surface Characteristics for 3D Object Recognition in Range Images", CV GIP 33, pp33-80, 1986. Besl,P.J., and Jain,R.C., "Segmentation through Variable-Order Sur- face Fitting", IEEE PAMI, Vol.10, No.2, ppl67-192, 1988. Bhanu,B., Ho,C.C., and Henderson,T.C., "3D Model Building for Computer Vision", Pattern Recognition Letters, pp349-356, May 1987. Biederman,I, "Recognition-by-Components: A Theory of Human Image Understanding", Psychological Review, Vol.94, No.2, ppllS- 147, 1987. 213 [Bin71] [Bin81] [B0182] [B0183] [Bow89] [B0y88] [Bra85] [Br084] [Cae82] [Ca188] [Che85] [Che85] 214 Binford,T.O., "Visual Perception by Computer", Proc. IEEE Systems and Control, Miami, Dec. 1971. Binford,T.O., "Inferring Surfaces from Images", Artificial Intelligence 17. PP205-244, 1982. Bolles,R.C., and Cain,R.A., "Recognizing and Locating Partially Visi- ble Objects: the Local-Feature-Focus Method", Int. J. Robotics Res. 1, 3. PP57-82, Fall, 1982. Bolles,R.C., H0raud,P, and Hannah,M.J., "3DPO: A Three- Dimensional Part Orientation System", Proc. 8th Int. Joint Conf. on Artificial Intelligence. Karlsruhe, West Germany, pplll6-1120, Aug. 1983. B0wyer,K., Eggert,D., StewmanJ., and Stark L., "Developing the Aspect Graph Representation for Use in Image Understanding", DARPA, IUW, Palo Alto, pp23-26, May, 1989. B0yer,K.L., and Kak,A.C., "Structural Stereopsis for 3 -D Vision", IEEE PAMI, Vol.10, No.2. PP144-166, 1988. Brady,M., P0nce,J., Yuille,A. and Asada,H., "Describing Surfaces", CVGIP 32, pp1-28, 1985. Brooks,R.A., "Model-Based Compute Vision", UMI Research Press, Ann Arbor, Michigan, 1984. Caelli,T., Visual Perception Theory and Practice, Pergamon Press, New York, pp93, 1982. Califan0,A., "Feature Recognition using Correlated Information Con- tained in Multiple Neighborhoods", Exploratory Computer Vision Group, IBM Thomson J. Watson Research Center, Yorktown Heights, New York, 1988. Chen,S.W., and Stockman,G., "Experiments in 3-D Data Acquisition", PRIP Lab. TR. MCS-8304011, Computer Science Dept, Michigan State Univ., East Lansing, Michigan, 1985. Chen,S.W., and Stockman,G., "Constructing Constraint Tables for Model-Based Recognition and Localization", MSU-ENGR-85-34, Computer Science Dept, Michigan State Univ., East Lansing, Michi— gan,1985. [Che87] [Chi86] [C1071] [C0075] [C0084] [Dra81] [Dub76] [Dud77] [Eve79] [Fan87] [FarS7] [Fau86] [Fau79] 215 Chen,S.W., Stockman,G., and Shrikhande,N., "Computing a Pose Hypothesis from a Small Set of 3 -D Object F eatures", MSU-ENGR- 87-001, Computer Science Dept, Michigan State Univ., East Lansing, Michigan, 1987. Chin,R.T., and Dyer,C.R., "Model-Based Recognition in Robot Vision", ACM Computing Surveys, 18(1), pp67-108, 1986. Clowes,M.B., "On Seeing Things", Artificial Intelligence 2, pp79-116, 197 1. C00per,L.A., "Mental Rotation of Random Two-Dimensional Shapes", Cognitive Psychology, Vol.7, No.1, pp20-43, 1975. C00per,L.A., and Shepard,R.N., "Turning Something Over in the Mind", in [The Mind’ s Eye, Freeman and Company, New York, 1986], pp102-108, 1984. "The Use of Gradient and Dual Space in Line-Drawing Interpreta- tion", Artificial Intelligence, 17, pp461-508, 1981. Dubes,R., and Jain,A.K., "Clustering Techniques: The User’s Dilemma", Pattern Recognition, Vol.8, No.4, pp247-260, Oct 1976. Dudani,S.A., Breeding,K.J., and McGhee,R.B., "Aircraft Identification by Moment Invariants", IEEE Computer, V01.c26, No.1, pp39-45, Jan. 1977. ' Even,S., Graph Algorithms, Technion, Haifa, Isreal, 1979. Fan,T.J., Medioni,G., and Nevatia,R., "Segmented Descriptions of 3 -D Surfaces", IEEE J. Robotics and Automation, Vol.RA-3, No.6, Dec. 1987. Farin,G.E., Geometric Modeling: Algorithm and New Trends, The Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania, 1987. Faugeras,O.D., and Hebert,M., "The Representation, Recognition and Location of 3D Objects", Int]. J. Robotics Research, 5(3), pp27-52, 1986. Faux,L.D., and Pratt,M.J., Computational Geometry for Design and Manufacture, Ellis Horwood Ltd, pp7l-73, 1979. [Fek83] [Fe18 1] [F1381] [F0182] [Fuk83] [Ga883] [Gib50] [Gig88] [Gig88] [Gi180] [Goa83] [Gri85] 216 Fekete,G., "Generating Silhouettes of Polyhedra", NB83-NADA- 4036, Computer Science Dept, Univ. of Maryland, 1983. FeldmanJ.A., and Ballard,D.l-I., "Computing with Connections", TR72, Computer Science Department, University of Rocheter, 1981. Fischler,M.A., and B011es,R.C., "Random Sample Consensus: A Para- digm for Model Fitting with Applications to Image Analysis and Automated Cartography", Graphics and Image Processing, Editor F01ey,J.D., pp726-740, 1981. F01ey,J.D., and Dam,A.V., Fundamentals of Interactive Computer Graphics, Addison-Wesley Publishing Company, Massachusetts, 1982. Fukui,Y., l-Iirotani,T., Ohira,T., and Kishi,Y., "An Input Method for Solid Models by Drawing, Preprint", Industrial Products Research Institute, Ministry of International Trade and Industry of Japan, 1983. Gaston,P.C., and Lozano-Pérez,T., "Tactile Recognition and Localiza- tion Using Object Models", AI, Memo-705, MIT, 1983. GibsonJ.J., The Perception of the Visual World, Boston, Houghton Miffiin C0mp., Boston, 1950. Gigus,Z., and MalikJ., "Computing the Aspect Graph for Line Draw- ings of Polyhedral Objects", CVPR, Ann Arbor, Michigan, pp654- 661, June, 1988. Gigus,Z., CannyJ., and Seisel,R., "Efiicient Computing and Representing Aspect Graphs of Polyhedral Objects", ICCV 2nd, Tampa, Florida, pp30-39, Dec., 1988., Gillam,B., "Geometrical Illusions", in [The Mind’s Eye, Freeman and Company, New York, 1986], pp87-94, 1980. Goad,C., "Special Purpose Automatic Programming for 3D Model- Based Vision", Proc. ARPA Image Understanding Workshop, Arling— ton, VA, 1983. Grimson,W.E.L., "Sensing Strategies for Disambiguating among Mul- tiple Objects in Known Poses”, AI, Memo-855, MIT, Aug. 1985. [Gri88] [Gri86] [Gri87] [Gri88] [Gun87] [Hen85] [Hof84] [Hof87] [Hor77] [Hu89] [Huf71] [Huf7 7] 217 Grimson,W.E.L., "The Combinatorics of Object Recognition in Clut- tered Environments using Constrained Search", 2nd ICCV, Tampa, Florida, Dec. 1988. Grimson,W.E.L., From Images to Surfaces, The MIT Press, Cam- bridge, Massachusetts and London, England, 1986. Grimson,W.E.L., and Lozano-P6rez,T., "localizing Overlapping Parts by Searching the Interpretation Tree ", IEEE PAMI Vol.9, No.4, pp469-482, 1987. Grimson,W.E.L., and Huttenlocher,D.P., "On the Sensitivity of the Hough Transform for Object Recognition", AI, Memo-1044, MIT, 1988. Gunnarsson,K.T., and Prinz,F.B., "CAD Model-Based Localization of Parts in Manufacturing", IEEE Computer Society Magazine, pp66-74, Aug. 1987. Henderson,T.C., "Efl‘icient 3D Object Representations for Industrial Vision System", IEEE PAMI, Vol.5, pp609-617, 1983. H0ffman,D.D., and Richards,W., "Parts of recognition", Cognition, 18, pp65-96, 1984. ' H0ffman,R., and Jain,A.K., "Segmentation and Classification of Range Images", IEEE PAMI, Vol.9, No.5, pp608-620, 1987. Horridge,G.A., "The Compound Eye of Insects", in [The Mind's Eye, Freeman and Company, New York, 1986], pp24—35, 1977. Hu,G., and Stockman,G., 3-D Surface Solution using Structured Light and Constraint Propagation IEEE PAMI, Vol.11, No.4, pp390-402, 1989. Huffman,D.A., "Impossible Objects as Nonsense Sentences", Machine Intelligence 8 edited by Meltzer,R. and Michie,D., New York, Elsevier. PP295-323, 1871. Huffman,D.A., "A Duality Concept for the Analysis of Polyhedral Scenes", Machine Intelligence 8, Ellis Horwood, England, pp475-492, 1977. [Hum86] [Hut87] [Ike88] [Jai88] [Kan80] [Kan761 [Kn086] [K0c87] [Koe76] [Koe79] [K0684] [K0e87] [K0r87] 218 Hummel,R.A., "Representations Based on Zero-Crossings in Scale Space", Proc. IEEE CVPR, Miami Beach, Florida, pp204-209, June, 1986. Huttenlocher,D.P., and Ullman,S., "Object Recognition using Align- ment", lst ICCV, London, England, pp102-111, 1987. Ikeuchi,K., and Kanade,T., "Towards Automatic Generation of Object Recognition Programs", CMU-CS-88-l38, May 1988. Jain,A.K., and H0ffman,R. "Evidence-Based Recognition of 3-D Objects", IEEE PAMI, Vol.10, No.6, pp783-802, Nov., 1988. Kanade,T., "A Theory of Origami World", Aritificial Intelligence, 13, pp279-3ll, 1980. Karrizsa,G., " Subjective Contours", in [The Mind' s Eye, Freeman and Company, New York, 1986], pp82-86, 1976. Knoll,T.F., and J ain,R.C., "Recognizing Partially Visible Objects using Feature Indexed Hypotheses", IEEE J oru'nal of Robotics and Automation, V01.Ra-2, No.1, pp3-l3, 1986. Koch,M.W., Kashyap,R.L., "Using Polygons to Recognize and Locate Partially Occluded Objects", IEEE, PAMI, Vol.9, No.4, pp483-494, July, 1987. K0enderink,J.J., and van D00m,A.J., ”The Singularities of the Visual Mapping", Biological Cybernetics, Springer-Verlag, 1976. K0enderink,J.J., and van D00m,A.J., "The Internal Representation of Solid Shape with Respect to Vision", Biological Cybernetics, Springer-Verlag, 1979. K0enderink,J.J., "What does the occluding contour tell us about solid shape", Perception, Vol.13, pp321-330, 1984. K0enderink,J.J., "The internal Representation for Solid Shape based on the Topological Properties of the Apparent Contour", Chapter 9, Image Understanding edited by Richards,W., and Ullman,S., 1987. K0rn,M.R., and Dyer,C.R., "3-D Multiview Object Representations for Model-Based Object Recognition", Pattern Recognition, Vol.20, No.1, pp91-104, 1987. [Lam88] [1.111186] [Lin88] [L0w85] [Low87] [Low88] [Mac73] [M3187] [Mar82] [Mar80] [Mea8 1] [Mur87] [Nal86] 219 Lamdan,Y., and W01fs0n,H.J., "Geometric Hashing: A General and Efficient Model-Based Recognition Scheme", Proc. 2nd ICCV, Tampa, Florida. PP238-251, Dec. 1988. LaurendeauD, and P0ussart,D., "3D Model Building using a Fast Range Finder", Proc. IEEE CVPR, Miami Beach, Florida, pp424-426, 1986. Linnainmaa,S., Harwood,D., and Davis,L.S., "Pose Determination of a Three-Dimensional Object using Triangle Pairs", IEEE, PAMI, Vol.10, No.5. pp634—647, 1988. L0we,D.G., "Perceptual Organization and Visual Recognition", Kluwer Academic Publishers, 1985. L0we,D.G., "Three-Dimensional Object Recognition from Single Two-Dimensional Images", Artificial Intelligence 31, pp355-395, 1987. ‘ L0we,D.G., "Organization of Smooth Image Curves at Multiple Scales", Proc. 2nd ICCV, Tampa, Florida, pp558-567, Dec. 1988. Mackworth,A.K., "Interpreting Pictures of Polyhedral Scenes", Artificial Intelligence, 4, pp121-137, 1973. MalikJ., "Interpreting Line Drawings of Curved Objects", Interna- tional Journal of Computer Vision, Vol.1, No.1, pp73-103, 1987. Marr,D., Vision, Freeman and Company, New York, 1982. Marr,D., and Hildreth,E.C., "Theory of Edge Detection", Proc. R. Soc. Lond. B207 . PP187-217, 1980. Meagher,D.J., "Geometric Modeling Using Octree Encoding", Com- puter Graphics Image Processing 19, 2, pp129-147, June, 1981. Murray,D.W., "Model-Based Recognition Using 30 Shape Alone", CVGIP, 40. PP250-266. 1987. Nalwa,V.S., and Binford T.O., "On Edge Detection", IEEE, PAMI, Vol.8, No.6. pp699-714, 1986. . [Na188] [Ohm88] [Par87] [Pau8 1] [Pen87] [Pla87] [Pot83] [Pre85] [Pri85] [Req80] [R0b66] [Sab82] 220 Nalwa,V.S., "Line-Drawing Interpretation: A Mathematical Frame- wor ", International Journal of Computer Vision, Vol.2, No.2, pp103- 124, 1988. Ohmura,K., T0m0n0,A., and K0bayashi,Y., "Method of Detecting Face Direction using Image Processing for Human Interface", SPIE, Vol.1001, Visual Communications and Image Processing, pp625-632, 1988. Parvin,B., and Medioni,G., "Adaptive Multiscale Feature Extraction from Range Data", Proc. IEEE Computer Society, Workshop on Com- puter Vision, Miami Beach, Florida, pp23-28, Nov. 1987. Paul,R.P., Robot Manipulators: Mathematics, Programming and Con- trol, The MIT Press, Cambridge, Massachusetts, pp25-28, 1981. Pentland,A.P., "Perceptual Organization and the Representation of Natural F orm ", Readings in Computer Vision, Morgan Kaufmann Pub. California. PP680-699. 1987. P1antinga,W.H., and Dyer,C.R., "Visibility, Occlusion, and the Aspect Graph”, TR, No.736, Dept. of Computer Science, Univ. of Wisconsin, Madison, Dec, 1987. P0tmesil,M., "Generating Models of Solid Objects by Matching 30 Surface Segments", Proc. 8th Int’l. Joint Conf. Artificial Intel., Karlsruhe, West Germany, Aug. 1983. Preparata.F.P., and Shamos,M.I., "Computational Geometry", Springer-Verlag, New York, 1985. Price,K.E., "Relaxation Matching Techniques - A Comparison", IEEE PAMI, Vol.7, No.5. pp617-623, 1985. Requicha,A.A.G., "Representation for Solids: Theory, Method, and Systems", ACM Computer Survey, 12, 4, pp437-464, 1980. R0berts,L.G., "Machine Perception of Three Dimensional Objects", TippettJ.T., et al, Optical and Electro-Optical Information Processing, MIT Press, Cambridge MA., 1966. Sabbah,D., "A Connectionist Approach to Visual Recognition", Ph.D. Dissertation, Computer Science Department, University of Rochester, 1982. [Sch87] [Sek85] [Sha79] [Sha84] [Sha84] [She84] [She71] [Sie77] [Sh088] [Sto82] [Sto87] [St088] 221 Schwartz,J.T., and Sharir,M., "Identification of Partially Obscured Objects in Two-Dimensions by Matching Noisy Characteristic Curves", Intl. J. Robotics Research, Vol.6, No.2, pp29-44, 1987. Sekuler,R., and Blake,R., Perception, Alfred A. Knopf, Inc., New York, 1985. Shapira,R., and Freeman, H., "The Cyclic Order Property of Vertices as an Aid in Scene Analysis", Communication of ACM, Vol.22, pp368-375, 1979. Shapira,R., and Freeman, 11., "The Use of Objects’ Faces in Interpret- ing Line Drawings", IEEE, PAMI, Vol.6, pp789-794, 1984. Shapir0,L.G., MoriartyJ.D., and Haralick,R.M., "Matching Three- Dimensional Objects using a Relational Paradigm", Pattern Recogni- tion, V01. 17, No.4, pp385-405, 1984. Shepard,R.N., "Ecological Constraints on Internal Representation: Resort Kinematics of Perceiving, Imaging, Thinking and Dreaming ", Psychological Review, Vol.91, No.4, pp417-447, 1984. Shepard,R.N., and MetzlerJ., "Mental Rotation of Three-Dimensional Objects", Science, Vol.17l, No.3972, pp701-703, 1971. Siegel,R.K., "Hallucinations", in [The Mind' s Eye, Freeman and Com- pany, New York, 1986], pp109-116, 1977. Shoham,D., and Ullman,S., "Aligning a Model to an Image using Minimal Information", Proc. 2nd ICCV, Tampa, Florida, pp259-263, Dec. 1988. Stockman,G., K0pstein,S., and Benett,S., "Matching Images to Models for Recognition and Object Detection via Clustering", IEEE PAMI, Vol.4, No.3. PP229-241, 1982. Stockman,G., "Object Recognition and Localization via Pose Cluster- ing”, CVGIP, 40, pp361-387, 1987. Stockman,G., Chen,S.W., Hu,G., and Shrikhande,N., "Recognition of Rigid Objects using Structured Light", IEEE Control System Maga- zine, pp14-22, June 1988. [Sug78] [Sug86] [Th087] [Tsa85] [Tuc86] [Tur85] [Tur74] [Tve84] [Vis85] [Wal75] [W an87] [Whi79] [Whi55] 222 Sugihara,K., "Picture Language for Skeletal Polyhedra", CGIP, Vol.8, pp382-405, 1978. Sugihara,K., Machine Interpretation of Line Drawing, The MIT Press, Cambridge, Massachusetts, 1986. Th0mpson,D.W., and Mundy,J.L., "Three-Dimensional Model Match- ing from an Unconstrained Viewpoint", Proc. IEEE on Robotics and Automation, Raleigh, North Carolina, pp208-220, 1987. Tsai,W.H., and Yu,S.S., "Attributed String Matching with Merging for Shape Recognition", IEEE PAMI, Vol.7, No.4, pp453-462, 1985. Tuceryan,M., "Extraction of Perceptual Structure in Dot Patterns", Ph.D. Dissertation, University of Illinois at Urbana-Champaign, 1986. Tumey,J.L., Mudge,T.N., and Volz,R.A., "Recognizing Partially Occluded Parts", IEEE PAMI, Vol.7, No.4, pp410-421, 1985. Tumer,K., "Corrrputer Perception of Curved Objects using a Televi- sion Camera", Ph.D. Dissertation, School of Artificial Intelligence, Univ. Edinburgh, 1974. Tversky,B., and Hemenway,K., "Objecm', Parts and Categories", J. of Experimental Psychology: General, 113, ppl69-193, 1984. Vistnes,R., "Detecting Structure in Random-Dot Patterns", Proc. of Workshop on Image Understanding, DARPA, pp350-362, Dec. 1985. Waltz,D., "Understanding Line Drawings of Scenes with Shadows", The Psychology of Computer Vision, edited by Winston,P.H., McGraw-I-Iill Book Comp. New York, 1975. Wang,Y.F., and AggarwalJ.K., "On Modeling 3-D Objects using Mul- tiple Sensory Data", Proc. IEEE Conf. Robotics and Automation, Raleigh, NC. pp1098-1103, 1987. Whiteley,W., "Realizability of Polyhedra", Structural Topology, Vol.1, pp46-58, 1979. Whitney,H., "Singularities of Mappings of Euclidean Spaces. I : Map- pings of the plane into the plane", Ann. Math 62, pp374-410, 1955. [W0186] [W0n85] [Yan85] [Zah71] [Zuc85] 223 W01fe,J.M., The Mind’s Eye, Freeman and Company, New York, 1986. W0ng,A.,K.,C., and You,M., "Entropy and Distance of Random Graphs with Application to Structural Pattern Recognition", IEEE PAMI, Vol.7, No.5. pp599-609, 1985. Yang,I-I.S., and Kak,A.C., "Determination of the Identity, Position and Orientation of the Topmost Object in a Pile”, Proc. 3rd Workshop on Computer Vision: Representation and Control, Bellaire, Michigan, pp38-48, 1985. Zahn,C.T., "Graph-Theoretical Methods for Detecting and Describing Gestalt Cluster", IEEE, Computers, V01.c-20, No.1, pp68-86, 1971. "Early Orientation Selection:P{Tangent Fields and the Dimensionality of their Support", CV GIP, Vol.32, pp74-103, 1985.