LIBRARY ~ Mlchlgan State a ,Unlverslty I rues IN RETURN aoxm machini- Momnom your room. TO AVOID FINES Mum on or baton duo duo. DATE DUE DATE DUE DATE DUE UN 0 2 11* ' A I, ~ 2;} a fi MSU IsAn Namath. mm Oppomnny Institwon WM‘ U _ ____________ __ fi—__ H Knowled Hierarchical Token Grouping In Knowledge-Based Tubular Object Extraction By Qian Huang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Computer Science Department 1994 Know ,1» l . w- - -L 3“. _. ~AI.C;T‘\ "‘ .A. f‘ . ‘ .i‘t‘: , I “a. . 7v - «"2"! I" «l ’.:,. It] Jaf‘zm. ‘ . I A . ..~,.‘c',‘! .. . ‘ I‘M. '. 1".“ 'w‘ . n i“ r». v 1’2“. ”3“). v . J.‘u .I.‘ | D .“I. J v» -: ”Ii.“ ;..‘ 1“ , y ‘-.\:w.' ‘I h .‘ .l) . ‘\ .lf;..l . 7‘ ‘ ..r .V. k. 5‘: 'v. c . A” ABSTRACT Hierarchical Token Grouping In Knowledge-Based Tubular Object Extraction By Qian Huang This dissertation addresses the issues in extracting tubular objects from 2D and 3D images using an integrated model-based approach under a problem solving archi- tecture called hierarchical token grouping. Tubular objects exist in important application domains: blood vessels, plant roots, bacteria, and roads are all tube—shaped. Recognizing this class of objects from images, quantifying them, and understanding their networks will be of great benefit to many domains. Unfortunately, a robust recovery of tubular objects from noisy images can not be achieved using simple segmentation methods. An integrated, hierarchical, and descriptive approach to the problem is proposed. Both modularity and integration are indispensable aspects of vision problem solv- ing. A set of tools for a set of problems has to be effectively integrated to achieve an overall solution. We propose a formalism called hierarchical token grouping (HTG) for such an integration purpose. Using the central theme of grouping, we exploit the homogeneity in vision problem solving. Heterogeneous modules can be treated as ho- mogeneous operational units that systematically aggregate perceptual tokens across all levels of abstraction. A formalism is developed that establishes a consistent and . .1..o-rr.>‘v ‘ .r‘ :( :‘_.,..:G.. I I ‘ l q-A'.px~“1r) ._,i.u.'r:l .‘liv ‘4: 4‘ V. I Xv” 1"; r- 9 a .Llu In" ' I .4. . _ in). lfgr gm' TI ' I U3 .. ‘k. A”Ht‘r ~.l P' try-[i 'T ”-- t M ( . 1.1.: . a.“ m, ‘“ “A z \‘i' 'r . .. ‘ u. . A . ‘r . "a :‘ l u w. ‘4 ‘ ‘4‘"4‘ . a I'll systematic framework for integrating modules, cues, and knowledge, all in a glob- ally coherent mechanism. The formalism is used in, but not limited to, developing a model-based recognition system for tubular objects. An integrated model-based recognition strategy is adopted to extract tubular ob- jects. The geometric shape information is combined with the imaging information in a generalized stochastic tube model, a parameterized form of a class of Generalized Cylinders. This model renders the recognition problem as a. parameter estimation problem which is subsequently solved in a hierarchical fashion using optimal filters. Due to the descriptive nature of the model, a new scheme is used to visualize a. sweeping along a 3D object trajectory. The model-based tubular object recognition system developed under the frame- work of HTG is tested and evaluated using a total of 53 2D images and 59 3D volumes including synthetic data and real data consisting of blood vessels, plant roots, bacte- ria, and wires. Experimental results demonstrate the effectiveness of the integrated model-based recognition strategy and the usefulness of the framework of hierarchical token grouping. To my parents Hongyi Liu and Shu Yun Huang iv ACKNOWLEDGMENTS Many people have, in many ways, made this dissertation possible. I gratefully acknowledge their contributions. Throughout the years of my stay at Michigan State University, each and every member of my committee has provided me with their continuous support, encour— agement, and help. I sincerely thank Professor George Stockman for what I have learned from him, for his sharing many hours of discussions with me, and for always urging students to do a better job in research. I am very grateful to what Profes- sor Anil K. Jain and the late Professor Richard C. Dubes have taught me through their excellent courses in Computer Vision and Pattern Recognition as well as their insightful questions during seminars and meetings. They inspired my interest in this field. My special thanks go to both of them also for the opportunity they presented to me and for their confidence in me through all these years. I sincerely thank Professor John Weng for generously spending time to discuss various research issues with me. I appreciate very much what I learned from Professor Alvin J. M. Smucker who got me engaged in a good yet difficult computer vision problem at the early stage of my research which has led to many fruitful thoughts and part of the results reported in this dissertation. I thank Professor Richard Enbody for advising me on various issues when he served on my committee. Thanks also go to Professor Betty Cheng, Professor John Geske, and Professor Lionel Ni for their support, guidance, and particularly for their helpful answers to my cross-field questions. With this opportunity, I especially like to express my deep respect to late Professor Richard C. Dubes who had influenced ..'f- \\ "\) y - 3.31:2 TN 'i ‘ in ~ 14- ~ “till ..;‘I AUX! o... .. .. , ‘u‘ “I .‘t‘fifif‘ of adj} we 5mg m- .4 ‘ w‘».,i. ' I M . - than mi 539,. " -.. I 3‘1 .‘M‘ ‘ r?"- P ‘ ‘ ‘ '7‘ r... T Lu- .mciuy s . .,~. . . l,‘ |- .‘ . ... tut. Ih‘ : 7‘“ h' I ... \Ivl‘ ‘r' A. ~n x- ‘<|A in. 7‘ I: w 18. ' I 'b._ {kl‘ u h‘r- r.“ l .A.I “'. v if i'i-‘i- |‘ ' ~ ~c ‘. f?» .jg. A. . “ ~~*‘l.l T I ..~ --~. ‘ b .»‘ c . NV ' I ‘1 . 3"! N. ‘ ._l?‘i‘;n. 4‘ ‘1'. I. u C. A;- y“- i .1 '9 n. i" in: ‘. 'r.. \( ‘~~ h. r . ‘J‘V-"-t "~4. v '1. I r . q .__ "Cu; in "CW r» A." RAIN}. . :- qa‘lj g. _ «'JPQ if? I u'; tr a .19 1,... . .J. ‘ u“,l.' "Cf *. In, . .' If" T. ,I‘ \ . ‘ Jaw. ‘ QSv ~. . P. many graduate students with his wisdom, decency, and sincerity toward science. He had always generously volunteered his help and guidance to me. My experience of doing research with him on the subject of fractals was invaluable. With my love, I thank my parents, Hongyi Liu and Shu Yun Huang, for giving me the sense of value of education, for providing me the strength to survive hardships, for always being with me even though we may physically be so far apart; with my love, I thank my sisters, Wen Liu and Yi Huang, for constantly standing by me and being so patient and supportive; with my love, I thank my brother, Xian Liu, for sharing with me many inspiring thoughts and talks and for always setting good examples for me; with my love, I thank my aunts, Shu Jun Huang and Shu Shao Huang, for sacrificing so much to help raise us during the very difficult period of our lives. I thank my husband, Eric J. Byrne, with all my heart, for his love, understanding, encouragement, patience, and support. His faith in me has pulled me through many difficult times. I also sincerely acknowledge the contribution of Mr. Thorn Stone who helped me tremendously at the initial stage of my stay in the United States. Without these people, many things would not have been possible, including the completion of this dissertation. I sincerely appreciate the Pattern Recognition and Image Processing Laboratory at Michigan State University. Professor Richard C. Dubes, Professor George C. Stock- man, Dr. Mihran Tuceryan, Professor John Weng have, and especially Professor Anil K. Jain have made our PRIP lab an excellent environment for research. A special thank also goes to our laboratory manager John Lees. His capability and support to the research activities in the lab have made a difference. Lastly, but not the least, my thanks go to all my fellow graduate students, in- cluding the former and the current Prippies. I thank Patrick Flynn, Narayan Raja, Deborah Trytten, Sateesha N adabar, Greg Lee, Hans Dulimart, and Timthy Newman for their assistance, warmth, and friendship. Special thanks go to Sally Howden for vi D . ,.u 1’." m; .Vali. I " r V . 'h'v s-k. y ,y I . .. -..‘..‘...p \h' l her always being there, listening, sharing, and helping. Thanks also go to current graduate students in the lab: Shaoyun Chen, Jinlong Chen, Yuntao Cui, Chitra Do- rai, Marie-Piere Dubuisson, Sally Howden, Kalle Karu, Jianchang Mao, Sharathcha Pankanti, Nalini Ratha, Dan Swets, Marilyn Wulfekuhler, and Yu Zhong. I enjoyed very much our activities together, both academic and social, and I will always value the friendship we built at Michigan State University. The experience we shared, full with joys and frustrations, as graduate students is very special and unforgetable. vii TOP T.— Q ll 5. HST Of fl 1 Introdum A. E I). o 2 . ,I. i or. a 4.0 1.4.. .. J .9.) x .. .\. i. -.. at... «:1. .10. )1. a). .1. .i. )1. 3.1. mm... a: c a: s .u. a n. . a) a .1 a a: a n) c D... l. drv) I h A u u. I. » ~ C I: rm . . . I . . )) w) )J J . Li. .. a... Al. x.-. .1. a]. T i 1 \I/ r t v) . . . . . i I; .l. r\ .d T1 rl. at. 0. n). .1; a... D. ll. E -1“ .hJ vll lid AI 5 q.) I ‘. q . 1 A H .flly huh fll~ «Hi/h 9.. TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES 1 Introduction 1.1 Machine Vision Problem Solving ..................... 1.1.1 Tubular Object Recognition ................... 1.1.2 Defining Vision Problem Solving ................. 1.1.3 Computational Problems ..................... 1.1.4 Characteristics .......................... 1.2 Motivations ................................ 1.3 Homogeneity Behind Heterogeneity ................... 1.3.1 Homogeneity in Operation .................... 1.3.2 Homogeneity in Representation ................. 1.3.3 Homogeneity in Operational Principle .............. 1.4 Contributions of the Thesis ....................... 1.5 Organization of the Thesis ........................ 2 Hierarchical Token Grouping 2.1 The Architecture ............................. 2.2 Grouping Hierarchy ............................ 2.2.1 General Representation: Token ................. 2.2.2 Uniform Operation: Grouping .................. 2.2.3 Unified Grouping Principle: Homomorphism .......... 2.2.4 Grouping Agent .......................... 2.2.5 Generic Token Grouping Algorithm ............... 2.3 Properties ................................. 2.3.1 A General Constructive Paradigm ................ 2.3.2 A Homogeneous Architecture .................. 2.3.3 Object-Oriented .......................... 2.3.4 Systematic Behavior ....................... 2.3.5 Consistent Interaction Interface ................. 2.3.6 Cohesive Integration Environment ................ 2.3.7 Opportunistic Problem Solving ................. 2.3.8 Distributed and Parallelism ................... 2.4 Practical Potentials ............................ viii xii )(iii JRMH 7 10 13 15 17 17 18 19 20 21 24 24 26 27 30 31 42 43 46 47 47 47 48 49 49 49 i. a... 4.. ‘1 9.. 34. 11 I} '1‘. 1.3.. 4“. VII. v .U. at. a... .A‘. . . l . l A .l u I) N. l; DI . w 0 r3. Tl \.!- fl...) 0.2 .x.) _.n. .1. 1 .\.I. I.) w. . I L a; ~ 3 n.. a I .\.I ~ 3 I? .v t . 11 at o . ,7. .v LA . u . n - . Al. . . . bu. . . . n y . . . 4 .l... . u u. I . ‘1 1‘. I1. 1... -_|/ -.\I .1 r U: c a .... lA 1-4 v. A a P \.I— 1.!— ‘nr.~ 4.. .«L 0..) 0.... .H lia ‘1 01 0.1 WA. 5.ch I H \|\ K’s \Irh I) «I'll )J arr him Aim Tl. Aim nib e.\.. U PI .1). .. I). .. I). .. l. .. I). .. I). .. Pt I). .. )xJ. .. 1.). .. ll: 1.). ._ 7.). l). .. r. x. xi). .. xi). 7). )1. .n\._ l .\J V J - . . . . ill . .. . .. In. . . . “I“ . . . . . . . u f . I13 1%; 4n“ ‘ 1‘1 14 ‘1. l} I} 4 '1 LU rdu «Lylw a I d Fl .u 3:un .I‘A o~I.J U I DI; u )4. I? n; - .\.. . B :14. I), . «.14. and glib. I I I3. IT. IT. 3 4 2.4.1 Formal Method To Build Complex Vision Systems ....... 2.4.2 Automation and Quick Prototyping ............... 2.4.3 Software Reusability and Quick Prototyping .......... 2.5 Theoretical Implications ......................... 2.5.1 Process Modeling ......................... 2.5.2 Dynamic System Behavior Description ............. 2.6 Summary ................................. Background 3.1 Representations .............................. 3.1.1 Taxonomy ............................. 3.1.2 Point Representations ...................... 3.1.3 Region-Based Representations .................. 3.1.4 Boundary-Based Representations ................ 3.1.5 Relational Representations .................... 3.1.6 Hierarchical Representations ................... 3.2 Recognition Methodologies ........................ 3.2.1 Three Levels of Processing .................... 3.2.2 Taxonomy of Recognition Methodologies ............ 3.2.3 Abstraction of Recognition Methodologies: Token Grouping 3.3 Interaction and Integration ........................ 3.3.1 Neighborhood Interaction .................... 3.3.2 Interframe Interaction ...................... 3.3.3 Intermodule Interaction ..................... 3.4 Computational Paradigms For Vision .................. 3.4.1 Marr’s Computational Model .................. 3.4.2 Lowe’s Computational Model .................. 3.4.3 Model-Based Hierarchical Paradigm ............... 3.4.4 A Concrete Mechanism: Hierarchical Token Grouping ..... 3.5 Summary ................................. Tubular Object Recognition 4.1 Object Domain .............................. 4.2 Geometric Shape Modeling: Generalized Stochastic Tube Model . . . 4.2.1 Survey ............................... 4.2.2 Modeling Local Shape ...................... 4.2.3 Modeling Global Shape Dynamics ................ 4.3 Modeling Imaging Processes ....................... 4.3.1 Photometric Sensor Modeling .................. 4.3.2 MRA Sensor Modeling For Blood Vessels ............ 4.4 Model—Based Tubular Object Extraction ................ 4.4.1 4-Stage Recognition Strategy .................. 4.4.2 Initial Segmentation ....................... 4.4.3 Automatic Seeding ........................ 4.4.4 Local Recognition ......................... ix 50 51 51 52 52 52 53 55 56 .56 56 61 64 66 67 68 68 71 76 76 84 86 87 89 89 90 91 95 102 104 105 106 108 \r" “3 O ‘ (.1: g,” 1w ., . , , 5") W _ . '_..o _ v4 ' ‘a '1' ‘— _< . '7 1. 6’ I «'p. " ._.w v w . ‘1 u‘rd ' D 4.4.5 Global Recognition ........................ 130 4.5 Summary ................................. 136 Tubular Object Recognition Via HTG 138 5.1 Representation Hierarchy ......................... 139 5.1.1 Object Model Decomposition .................. 139 5.1.2 Tokens and Token Hierarchy ................... 140 5.2 Knowledge Hierarchy ........................... 153 5.2.1 Acquiring Dynamic Environment Information ......... 154 5.2.2 Knowledge Representation .................... 155 5.2.3 Knowledge Hierarchy for Tubular Object Recognition ..... 161 5.3 Grouping Hierarchy ............................ 163 5.3.1 4-Stage System Diagram in HTG Design ............ 163 5.3.2 Initial Classification ........................ 165 5.3.3 2D Seeding ............................ 170 5.3.4 3D Seeding ............................ 173 5.3.5 3D Tubes ............................. 175 5.3.6 3D Tubular Objects ....................... 176 5.3.7 Grouping Hierarchy for Recognizing Tubular Objects ..... 177 5.4 Interactions ................................ 178 5.4.1 Interaction Models ........................ 179 5.4.2 Data Flow Models ........................ 181 5.5 HTG Design For Recognizing Tubular Objects ............. 182 5.5.1 HTG System Architecture .................... 182 5.6 Summary ................................. 183 Experiments and Results 185 6.1 Performance Evaluation ......................... 186 6.1.1 Performance Accuracy and Robustness ............. 187 6.1.2 Two Types of Errors ....................... 187 6.1.3 Parameter-Based Evaluation ................... 188 6.1.4 Boundary-Based Evaluation ................... 189 6.1.5 Region-Based Evaluation ..................... 192 6.2 Synthetic Data Generation ........................ 193 6.2.1 2D Synthetic Data Generation .................. 195 6.2.2 3D Synthetic Data Generation .................. 196 6.3 3D Tubular Object Visualization .................... 197 6.4 Experimental Results From 2D Images ................. 199 6.4.1 Results and Evaluation on Synthetic Data ........... 200 6.4.2 Results and Evaluation on Real Data .............. 204 6.5 Experimental Results From 3D Volumes ................ 212 6.5.1 Results and Evaluation on Synthetic Data ........... 212 6.5.2 Results and Evaluation on Real Data .............. 221 6.5.3 3D Visualization ......................... 231 6.6 Summary ................................. 231 7 Conclusion 7.1 Conch: 7.1.1 -, q I . A -- 733 C cats." I q .) IQ .___.. .2 . ‘l-l .. ..r‘ 1 Distance 1 1.1 301'}. BmHOGRA 7 Conclusions, Contributions, and Future Work 246 7.1 Conclusions ................................ 247 7.1.1 Hierarchical Token Grouping ................... 247 7.1.2 Integrated Model-Based Object Recognition .......... 250 7.2 Contributions ............................... 253 7.2.1 A Formalism for Vision Problem Solving ............ 253 7.2.2 Model—Based Tubular Object Recognition Via Hierarchical To- ken Grouping ........................... 254 7.3 Future Research .............................. 255 7.3.1 Formalisms in Vision Problem Solving ............. 255 7.3.2 Model-Based Object Recognition ................ 256 A Distance Transformation: Chamfering Technique 258 A.1 2D Chamfering .............................. 258 A2 Extended 3D Chamfering ........................ 259 BIBLIOGRAPHY 262 xi \‘l' ‘II 1 latrines: - Ti? Ti“- ii“? In 4 Q. ‘ 1 ~ .. , 1,! Rex .. .. N \i‘V‘r ,‘ A‘ h 2““ LIST OF TABLES 3.1 Taxonomy of representation schemes and examples ........... 57 3.2 The relationship between the three level processing paradigm and the five issues in recognition methodologies .................. 73 5.1 Bottom-up information flow on the interaction channels within GH. . 181 5.2 Information flowing on interaction channels between GH and KH. . . 182 6.1 Ground truth for 2D synthetic test images ................ 201 6.2 The results for 2D straight synthetic objects ............... 203 6.3 The averaged results from all 2D straight synthetic objects ....... 204 6.4 The results for 2D curved synthetic objects. .............. 206 6.5 The region-based evaluation of performance on plant root images. . . 207 6.6 Boundary-based evaluation of system performance on plant root images.208 6.7 Ground truth for 3D synthetic volumes .................. 214 6.8 Ground truth for 3D synthetic helixes. ................. 217 6.9 Results from 3D curved synthetic objects. ............... 218 6.10 Results from 3D synthetic helixes ..................... 220 6.11 Summarized evaluation of the system performance on 30 real volumes. 232 xii 3.1 Exam: 1.? Hieran 3.1 The 1:. 31 TWO \‘ poi: : q .. '3'- IKPJC 9i.‘ ‘ ML It", ”3:00.356 f0? (if; I . llarrs LL“) .. 4.. h—‘ : ._-, C3 ‘1 .1 l» C Llalttpr M I~~ A‘ {11.1" 4 ‘ ‘ ‘3 A Y‘( ~ - . D Ji-G :1 R Q (J: . . 4. ‘ G. .) T DE 1! v ~ 4;, 3“: E .l a ' Isier. ’5- I 1.1 1.2 2.1 3.1 3.2 3.3 3.4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 LIST OF FIGURES Example of tubular object recognition. ................. Hierarchical system diagram for tubular object recognition ....... The framework of hierarchical token grouping. ............. Two viewpoints of an image representation. (a) Intrinsic images, (b) point representation. ........................... Implicit and explicit spatial relationships among cells. (a) Implicit spa- tial relationship representation implemented as an array. (b) Explicit representation for a 4-connectivity relation. (c) Explicit representation for an 8-connectivity relation. ((1) Explicit representation for spatial relationship “left”. ............................ Marr’s computational model for vision. The picture is taken from “Perceptual Organization and Visual Recognition” by David Lowe. Lowe’s computational model for vision. The picture is taken from “Perceptual Organization and Visual Recognition” by David Lowe. 2D digital images of (a) 2D blood vessels, (b) 2D plant roots, (c) 2D bacteria, (d) 3D blood vessels from MRA scan. ............ A tubular object. ............................. A rotated and translated tube model ................... Rotational relationship between UVW and XYZ. ........... The images of a cylindrical surface (left), its vmin’s (middle), and vmax’s (right) ................................ Intensity profiles along the cross sections of plant roots. (a) unnor- malized profiles over 15 cross sections, (b) normalized profiles over 15 cross sections. .............................. Ideal blood flow. ............................. The sensor measurements configuration on a cross section within a blood vessel in (a) real situation, (b) theoretical model. ....... The model-based 4-stage tubular object recognition system diagram. An example of initial segmentation on a 2D intensity image of plant roots. (a) An intensity image of plant roots, (b) intensity histogram of (a), (c) initial segmentation result on (a), ((1) initial edge detection on (a). .................................... An example of 3D subvolume of blood vessels from MRA scan. (a) A 3D subvolume projected using MRI, (b) intensity histogram of (a). xiii 6 7 25 58 60 80 82 90 92 96 97 101 106 108 109 110 111 4.13 Initial 1.13 The in I'i(’\\‘t“d 4.14 Exam 1.15 Exam}; the res dlezznir dim. 4.16 Exam; 1.41sz VGCIOIS i:- Range 1.15 Sensiu . An era 1 Decoy} 2 lhffafr 3 fray r COEEEI: 132.); ,- 31 Exam: A l .~l_ eifihjfie 2 f ‘ J-J 8.31.”)? J-l Kr- . ' LUV." .' ,3} _, ' Rr.aty; a r'-.' ._ p If-Llfc J \ J r, v ‘r “9 a: l 1 ., A .XA. 1:? ) ‘. ' I \ 1“]: k4.) ' w. .[1 t1 3‘33 .5 fr ' I J 2.31“ 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 Initial segmentation result (b) from a 3D blood vessel volume (a). The images of initially detected surfaces from an MRI subvolume, viewed from front, left, back, and right, respectively ........... Examples of overlapping (a) and enclosed (b) relationships. ..... Examples of the detected principle curvatures from the front view of the vessel shown previously. Leftzdepth image of the surface; Mid- dle:minimum curvature direction vectors; Rightzmaximum curvature direction vectors. ............................. Examples of the detected principle curvatures from the the back view. Left:depth image of the surface; Middlezminimum curvature direction vectors; Rightzmaximum curvature direction vectors. ......... Range of optimization. .......................... Sensitivity of SNR to (a) radius, (b) orientation. ........... An example of the empirical distribution of p. ............. Decomposed object model ......................... Hierarchy of decomposed object model .................. Examples of spatial relationships among point tokens. (a) 4- connectivity relation among input point tokens, (b) 8-connectivity re- lation among 3 boundary point tokens. ................ Examples of overlap regions and the four corners of the region. (a) Region generated by an overlap relation. (b) Region generated by an enclose relation. ............................. Bounding boxes for (a) a 2D ribbon, (b) a 3D cross section. ..... Bottom-up hierarchy of tokens. ..................... Knowledge hierarchy for tubular object recognition. .......... Relationship between 2D and 3D tubular object recognition and hier- archical token grouping. ......................... The Grouping hierarchy for tubular object recognition. ........ Communication channels within GH. .................. All the communication channels in the system .............. Tubular object recognition system diagram under HTG ......... Examples of the distance distribution ’DB for image R2 at (a) initial stage, (b) intermediate stage, and (c) final stage. ........... Examples of helixes generated using different parameters, (a) w = 36, c = 1.0, b = 0.1778, (c) w = 20, c = 3.5, b = 0.0508. ......... Examples of 2D synthetic tubular objects, (a) two straight objects in S23, (b) one curved object in S26, (c) two curved objects in S27. Effects of noise on 6’ and 63 for 2D synthetic straight objects. . . . . Effects of noise on p" and 6' for 2D synthetic curved objects ...... Estimated local tube centers superimposed on input images (a) S23, (b) S26, (c) S27. ............................. Examples of man made tubular objects of pipes. ........... Examples of man made tubular objects of wires. ........... xiv 114 114 116 119 119 126 127 128 140 141 144 147 149 153 162 164 178 179 180 183 190 197 200 205 207 F - I ‘-‘.| . ,. II x‘ 7“ F'.’ A ."_ 0.3.: a \ f, '1 9‘. t? 4kfii \ 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 6.18 6.19 6.20 6.21 6.22 6.23 6.24 6.25 6.26 6.27 6.28 6.29 6.30 Examples of organic tubular objects of bacteria. ........... 210 Examples of organic tubular objects of plant roots. .......... 211 Detected tubes for test images (a) R1, (b) R2, (c) R3, ((1) R4, (e) R5. 211 Detected tubes for test images (a) R6, (b) R7, (c) Bl, (d) B2, (e) B3. 212 Average improvement in 153 and eg. .................. 213 Examples of the distance distribution DB for image R5 at (a) initial stage, (b) intermediate stage, (c) final stage. ............. 215 Superimposed center points of local tubes for (a) MM7, (b) MM8. . 215 The recognition result for input image MM4, (a) input image, (b) detected boundary, (c) the tubes found. ................ 216 Examples of 3D synthetic volumes. (a) volume S32, (b) volume S34, (c) volume S35. ............................. 216 Three synthetic helical objects. (a) helix-1, (b) helix-3, (c) helix-6. . 217 Values of figure of merit along objects (a) 834-1, (b) 835-1 and 835-3. 219 The estimated radius for helix-2 under different noise situations. . . . 221 Sensitivity of the figure of merit to noise (n). (a) helix-2, (b) helix-6. 222 Experimental results on a helix. (a) the helix without noise, (b) the estimated helix axis from (a), (c) the helix with noise degree n = 20.0, (d) the estimated helix axis from (c). ................. 222 Estimated local tube radii with n = 0 along (a) helix-2, (b) helix-6. . 223 The estimated tube lengths for synthetic objects helix-1 and helix—6. . 224 Volumetric blood vessel data displayed using MIP, (a) subvolume sub2, (b) subvolume sub14, (c) subvolume sub16, (d) subvolume sub33. . . 225 Initial classification results from four subvolumes (sub30, sub26, sub5, and sub21). The first column ((a),(d),(g),(j)) shows the original vol- umes. The second column ((b),(e),(h),(k)) shows the initial segmenta- tion. The third column ((c),(f),(i),(l)) displays all the initially detected surface points using MIP technique. .................. 234 Result of initially detected surface points from subvolume sub5 (ren- dered as range images), viewed from front, left, back, and right, re- spectively. ................................. 235 Seeding results for volumes sub16 ((a)-(c)), sub7 ((d)-(f)), sub29 ((g)- (i)), and sub5 ((j)-(l)). The first column: original input volumes. The second column: initially detected surface points. The third column: seeds detected and superimposed on input. .............. 236 Seeding results for volumes sub14 ((a)-(d)) and sub2l ((e)-(i)). The first column: original input volumes. The second column: initially detected surface points. The third column: initially detected interior points. The fourth column: seeds detected and superimposed on input. 237 Effectiveness of the model-based automatic seeding approach. For all three input volumes, even with very noisy surface detection, there is no seed detected. The first row: original input volumes. The second row: initially detected surface points. ................. 237 XV 6.31 Effective detectio and (6'! these vc 6.3? [E {a} the the sea inx'aiid; giobal : i633 lifted} (ax \lx the 9% in‘va‘fzt globe; 6.34 s... {efte {‘3' S‘Ne‘ "Q. g 53? x r. 6.31 6.32 6.33 6.34 6.35 6.36 6.37 6.38 6.39 Effectiveness of the model-based automatic seeding. The initial surface detection results ((b) and (f) for input volumes sub17 and subl3 ((a) and (e)) are very noisy. Seeds are generated only near the objects in these volumes ((c) and (g)). ....................... Effectiveness of the model-based verification of seeds from data sub14 (a) the detected seeds, (b) the seeds that are verified, the brighter the seed, the higher the confidence in the seed, (c) the seeds that are invalidated using the tube model, (d) the seeds that actually activate global sweepings. ............................. Effectiveness of the model-based verification of seeds from data sub34 (a) the detected seeds, (b) the seeds that are verified, the brighter the seed, the higher the confidence in the seed, (c) the seeds that are invalidated using the tube model, (d) the seeds that actually activate global sweepings. ............................. Sweeping results for volumes sub26 ((a)-(c)), sub7 ((d)-(f)), sub29 ((g)- (i)), and sub5 ((j)-(l)). The first column: original input volumes. The second column: seeds detected. The third column: recovered object axes. ................................... Sweeping results for volumes sub21 ((a)-(c)) and sub14 ((d)-(f)). The first column: original input volumes. The second column: seeds de- tected. The third column: recovered object axes. ........... Sweeping results for volumes sub34 and sub33; (a),(c): original input volumes; (b),(d): recovered object axes. ................ A real volume with the portion of visualized segment marked. Twenty five visualized cross sections (frames of a “movie”) ....... 3D visualization of nine of the cross sections, starting from the 14th in the previous figure. .......................... xvi 238 239 240 CHAP Introdi l' ‘ Y L, .. z. .0»,le U. i I . 11.. " 11 ‘ . ' ‘ ‘ "‘l“Aflif?.,"(‘ In} I'] ”(91“ P" 1 ' “Add,” ’ 2‘ ‘IA F - ., V ‘ ~-.-. \' f - - s.. . 0.11)”? ’c‘r ‘ ".i'!‘ h . 41‘ I . _ IL. ‘63“).(4 Fag. v”: .7. “11.131921 T .‘L:‘ .IAf‘t. ‘.._.\‘ _. - «rm-u... l“§ ‘ ‘Q‘_' an ‘d'.1,._ L“f '~ ‘4 4 .='-.,,“. .q‘l}, \ ‘5 ‘ ii”. «1;» ~ g - cw .u Jul» _ ‘y‘ f ‘- ‘HC‘; 11 ‘. 16“,; 1 .3 CHAPTER 1 Introduction The problem of recognizing or extracting tubular objects from both 20 intensity and 3D volumetric data is addressed using a proposed vision problem solving architecture called hierarchical token grouping. Tubular objects exist in various application do- mains: blood vessels in the medical field, bacteria in biology, plant roots in agriculture, roads in remote sensing, wires and paths on circuit boards in an industrial inspection environment. Tubular objects sometimes form complex networks. Biological tubular objects are often deformable. Automatically recognizing this class of objects from im- ages, precisely quantifying important features associated with these objects such as their geometric shape, volume, or branching frequency, reliably describing the shape dynamics of the objects, and understanding their networks will be of great benefit to all these application domains. In attacking this practical vision problem, homogeneous characteristics exhibited at multiple levels of problem solving are observed. The homogeneity is then examined carefully from a more general vieWpoint with the motivation of developing a useful architecture for vision problem solving that facilitates the integration of machine vision systems in a more systematic and consistent framework. By exploiting the homogeneity, a constructive and hierarchical architecture that employs knowledge at many levels as possible is proposed under which vision problems can be solved using .'..'- . - .. Hind. . upt‘fa. m.rETa..LJE PL; 2! slfi~ 9n, '- --< nu.'\-. .khl \ 1“ III" ...'..'. 5 1 ‘ :91. '9 ”f! ‘-ul'\,“\-JL “TN :~ ‘ ' . LJ‘ \ ‘ h- i '- d H LT \F: i V .t—y-c L‘ , 'I i'ic'f‘w'. (TVL - Hi In > , :1”? A £9._ mm. i' r.nN,_ “‘QHY- __ K‘ ‘J, :1 . m - " ;'I; 1' . “\I‘w. v a uniform operational scheme, a consistent representational interface, and a coherent integration environment of top-down and bottom-up information. This proposed ar- chitecture is motivated by application problems and applied to develop a hierarchical model-based recognition system that extracts tubular objects from input data. The architecture of hierarchical token grouping has potential as a general vision problem solving paradigm. 1.1 Machine Vision Problem Solving Vision is a highly sophisticated sense of human beings. It is achieved in an eye-brain biological process where eyes provide powerful sensors and the brain forms a marvelous analysis system. To some, the ultimate goal of machine vision is to accomplish, by means of machine processing of images, at least, the same capability of perception as human beings. Sometimes, however, even more is expected of a machine vision system, such as the quantification of objects, something not usually provided by the human visual system. How exactly do human beings achieve the task of vision? What is the distinc- tion between the eye and the brain? What is the explanatory model that correctly characterizes human vision? While machine vision became a subject of interest and study several decades ago, researchers in human visual behavior have been trying to answer these questions for centuries. Even until today, unfortunately, answers to these questions are still fragmentary. A systematic analysis of human vision is not yet available. Nevertheless, results from human vision research have greatly influenced the re- search in machine vision. With the dramatic success of digital computers in almost every corner of modern society, computer scientists hope to achieve the same in the field of machine vision. Much effort has been made toward the methodologies that v 37". 2‘ ”-‘ " mull -lul $1 ’ ll .i 1| .‘A"?"" -'-“ f8:n.ail~>.c .;.”Xg:lLE a . 1 C“? «WAN " ' ‘1 .Lgl '"r-y ‘«..“- .,‘ .,:_:‘ '.‘,I'J‘ 7),.1' -‘ ' . _ M_' ‘i A -.IA.-l ii ‘ Y b, 4‘; ‘\ >|“I. .4 l y ‘ A ‘G. U ~“‘ ‘ db» .1 a,(dp\.C‘ .‘. "L- Hun. ‘ . M. .Fsg v ‘1‘: ‘ .._i ‘A.1l(- a *g [I T . i'; ‘gi “i ‘i. y “~> 1r . U, .‘ “‘1". Jun; YW- Alfi‘. “ t“ 4. e . r,» . 1‘ i S (\\ ll“ ‘j‘jav—‘v . v ‘ I 3‘ . . ‘—I \2 attempt to make computers possess the capability of perception. Lacking an exact model of the human visual system, these methodologies are basically engineering approaches that correlate input signals with perceptual output. In this thesis, in search of solutions to application problems, we had to study an essential issue in building machine vision systems. Integration is a significant aspect in designing and building computer vision systems. Realizing that the major obstacle in integrating a machine vision system is the heterogeneity in modules, in data, and in knowledge, we exploit the inherent homogeneous characteristics in vision problem solving and utilize them to seek a more consistent and systematic way of assembling machine vision systems. We propose a vision problem solving architecture, called Hi— erarchical Token Grouping (HTG), that possesses a set of desirable properties. This proposed architecture is intended to provide a formalism for designing, developing, and describing machine vision systems. It does not provide solutions for particular vision problems. The proposed architecture is applied to the problem of extract- ing tubular objects using the integrated model-based approach. A computer vision system designed for tubular object recognition is developed under the paradigm of hierarchical token grouping. In this chapter, we discuss the problem of tubular object recognition and present our philosophical view toward vision problem solving that establishes the foundation for the proposed problem solving architecture. In Section 1.1, the problem of extract- ing tubular objects from 2D images is introduced. A set of important observations about how objects of interest are to be recognized in a hierarchical paradigm are made. Based on these observations, the process of general vision problem solving is defined as a series of inverse mappings, each of which corresponds to the recovery of some meaningful perceptual information. The central operational theme of group- ing is identified and its relationship with the two essential computational problems involved in vision problem solving is established. The characteristics of such defined . . o 1 ESIOD QWME’YIJ 5 1"! .- ' ”w; .... “6 U3." ”JP L l~ 2"“1757” "'1" ALMA". “PL! 1;) C . . cartons of II ' no u m. I U c .'. mam: 1:1 .‘t‘r 1.1.1 Tub U; l 1" 'V‘ “k s.»' l‘f‘“‘ " , ' i .t .‘ - I: ,7" ' ' ,A . -. ‘ui‘ 'Ji' '4‘ ‘ I ‘ . .flgfplr- ~ , I t "95. dip‘ nu '<_(‘.“p V _ ‘16 > ‘ (J3) 94' - ‘ I‘STR 5‘ ‘ ON. \‘\ a}. i v .‘.”, . A_L.- _ Lia“ \. A “ ,4. u . }I:I ‘\~ ‘5. \_ rs 'r . | a i" .L .;;. "Q1 ‘i v;__ .4. . gs‘l‘ Vision problem solving process are examined. Based on these discussions, in Section 1.2, we describe the motivations of the proposed paradigm. In Section 1.3, inherent homogeneous characteristics of vision problem solving are presented. The major con- tributions of this thesis are summarized in Section 1.4. and its organization will be explained in Section 1.5. 1.1.1 Tubular Object Recognition The problem of recognizing or ertracting tubular objects is to (1) identify instances of the objects of interest, (2) describe the geometric shape of the extracted objects, and (3) quantify the extracted objects by providing a set of measurements 3' Exam- ples of tubular objects can be blood vessels in medical images, bacteria in biological pictures, plant roots in the images taken from underground, roads in remote sensing imageries, and wires or paths of circuit boards from the images taken under a con— trolled environment for inspection purposes. Features that characterize the shape of tubular objects can be their symmetric axes, their cross section parameters such as radius, the deformation characterizations, or branching frequencies. Measurements that quantify tubular objects can also be made such as the area (in 2D case) or the volume (in 3D case) an object occupies. While this research solves the problem of tubular object recognition in both 20 and 3D cases, only the 2D case is presented in this chapter for the purpose of illustrating our observations toward vision problem solving. The complete formulation of the problem is given in Chapter 4. Let us consider the problem of extracting tubular objects from 2D intensity im- ages. A given input image is a set of discrete points, each of which carries an inten- sity measurement. The recognition task is to organize these input points according to some criterion. Such criterion must be established based on our knowledge about ‘In this thesis, we use “extracting” and “recognizing” interchangeably because our viewpoint is that no extraction can be done without recognition. . I 1 ‘ ' ,..,. orrul- " .rvl'h. \Jelu‘dr Ll‘l .' _ .- - .. .) 1:5,!an l li.(‘ - €§TII€IUC (LIN .‘,Z .. .’- . 3.5.19" ' m e. s v. ‘4. A 271-002 . (one 2 l‘l ‘ ' '.- ‘ i" - ‘ vt v1.1 1‘?“ . "0 m. ‘ 9“ ref. . ‘ sung!“ ft- . , "th both tubular objects and the nature of the input. From experience (knowledge), we know that the 2D projection of a 3D tubular object usually produces two parallelly symmetric curves formed by the shadow of the object. Due to the smoothness of object surface, such curves should also be smooth. That is, the object silhouette, or “ribbon”, corresponds to two smooth curves in a 2D image plane. A curve can be detected from a set of connected edge points and a tubular object may be enclosed within a pair of parallel curves. Note, this is a top-down knowledge decomposition process which correlates what we know about the objects to the data that a com- puter vision system takes as input. From the vision problem solving point of view, such a decomposition describes a constructive approach to perform tubular object recognition. Figure 1.1 shows such a constructive viewpoint in recognizing tubular objects. The task of a computer vision system for recognizing tubular objects is to organize the input in a meaningful way using the knowledge discussed above. The recognized objects need to be represented in a descriptive way so that the computa- tion of volumes, areas, and branching structures can be facilitated. From experience, we know that simple segmentation processes can not provide a robust solution to this problem and an incremental process scheme is required. In each step of processing in Figure 1.1 can be considered as one level of abstraction performed by an individual module. For example, at the lowest level, an edge detection module can extract the edge points. Based on its result, another module at the next level may recognize the straight lines which can be further grouped (aggregated) to form polylines that approximate curves. Pairs of parallel curves can be identified through an even higher level module. The question of whether a pair of parallel curves corresponds to a 2D tubular object needs to be answered by yet another higher level module whose func- tionality may be to verify certain intensity configurations based on the knowledge about the shape of tubular objects and the imaging condition under which the input is acquired. This is a bottom-up information recovery process. -k-c‘.--’ ....... 'ocn --F.J---J--'f-T-fi---J--q.-.r --‘Lylon{co+!.,oqgoc'ooo‘ra-‘--..t- - ‘- ”*;¢o+oo+ocupoo\oco‘o -’-/u L- -“ncJ ....... L---L--*oo‘+- l I I ‘ -'Vz’vt-C-f.-- C-D'r-cn‘pocfl-no“-oukol ”my, o ‘ -h Figure 1.1. Example of tubular object recognition. Combining both the top-down knowledge decomposition and bottom-up informa— tion recovery process, Figure 1.2 gives a diagram of a computer vision system with two hierarchies, one for the decomposed knowledge (on the left) and the other for the incremental information reconstruction process (on the right). The recovery process is a continuous inverse mapping, from a set of discrete points to a partition of the image plane using the knowledge from the left hierarchy. Each module on different levels of the recovery process has its own functional role and none of them is isolated. They interact and together they carry out the recognition task, using the top-down knowledge, in a bottom-up fashion. We observe, ( 1) different types of perceptual Hierarchy of Hierarchy of Knowledge Information Recovery T0940" Elongated Elongated Object object A Parallel Parallel ribbon nbbons Curves Curves Lines Lines Edge points Edge points B . Points Pixels ottom up Figure 1.2. Hierarchical system diagram for tubular object recognition. entities are extracted at different levels and they are essentially grouped (aggregated) from lower level entities, (2) conventionally, both the methodologies and the represen- tations adopted by the modules at different levels can be heterogeneous. While the heterogeneity does not affect each individual module, it could cause difficulty when these modules are organized to form a system in which they interact. It is conceivable that the more homogeneous the individual modules are, the easier to integrate them, hence, the more manageable the overall system becomes. 1.1.2 Defining Vision Problem Solving In general, from a computational point of view, a vision system is to recover three dimensional information from a discrete set of points that carry sensor measurements. Such a problem solving process can be informally defined as the following. .l rsmn #2254? .r. z. . .. . .1 .. nut. ll.) 7:93: h I I ~' » 4 ,_~_. a A '5’ ‘3- ’s I -w—‘ 4. -H a A r - o If t (r! o “'4‘ it? C . l r. .f - ‘1 W U“ . "(L -l ‘ ‘“ 11,, I .. j e v .‘ - ‘. t 4.. J1 Al-.'_l"l‘ ’A‘I‘ ‘DZ ’W"v-~;,, I“ ii‘lx‘.‘\P ll? . if)" , I__ cfr!”T" v‘ y -'“n’. ‘r, A A‘ 1'!“ .31: .A L; F37“ . I |‘&: lf" ‘lA 0‘ .-.; ' . "U‘l v . . : “1E :r[.,'V v a 1‘. Jig-ii“: - “ -;;~3 l“ K i I, ‘1 “.l A vision system takes a set of discrete points, each of which carries sensed informa- tion, as input and produces a partition, over all the data points, of NI entities, each of which is symbolically associated with a label, designated for certain meaning {such as object name), and a description, which may be hierarchical, about the characteristics of the corresponding object inferred from the image data, including its 3D physical ap- pearance, its organization, its spatial or componential relationship with other entities, or even its functional properties. Notice, the partition here is semantically meaningful and is realized not merely based on the characteristics of the visual data but, importantly, is also based on the use of knowledge that a vision system is equipped with. Such a partition can also be recursive or hierarchical, i.e., a single region in the partition can have an internal partition. If we consider the process of imaging as a mapping from an interpretation of a 3D scene to a discrete image function, the recovery task that a vision system needs to accomplish is an inverse mapping from a discrete function, a perceived image, to a 3D scene or a meaningful interpretation of the image. Denoting a mapping function by M, an interpretation of a 3D scene by I, a discrete image function by f, then the mapping from an interpretation of a 3D scene to an image is: 01‘ Mapping function M can often be formally defined as a concatenation of a series of '1 I ab 0' (‘b (‘5’ -, ‘ :1 HQ .,, 3' ”V: :Y‘y"- .. ‘ r ‘ “new re ‘9 t), l\‘-«,l_.1 “nulliu‘x‘ §“ . . . .\ -tjtu C ‘I‘v- ._1I (f ‘l J a \‘pP (1‘ ' W... l‘ - 4 ~0_ V“‘-:_ll‘"\. .' Vlkl‘c" ‘Jr:, b‘. l"? f. ._ Ltd?“ : v .1 013.,“ -1 .A; > ‘ .1“ Stills—WI “a , ‘u I he. (‘9‘, . .‘_ . P- J" . 0 "if .1, .‘:p ., g ‘ f1" . “elixir-<1 .rg.‘ lr J \4‘ v. ‘A :N ‘Y:\_ we.» ‘ufm . , “GL1, .,‘_ D t - mappings, each of which carries out one type of data transformation: M: MIOMQO...OMm, 01‘ f = M1(M2(./\/l3(...(.Mm(I))..))). Using the same notation, the process of vision problem solving is formally defined as an inverse mapping: 01‘ Similarly, such an inverse mapping can be accordingly decomposed into a concatena- tion of a series of inverse mappings, each of which performs one type of inverse data transformation: M” : M" 0M" m m-I o ...0M,-I, 01‘ I = M7.’(M;’_1( ;’.2(~-(M7’(f))~)))- Each of the decomposed inverse mappings recovers a certain type of information. For example, certain intrinsic properties of the 2D image can be retrieved in some early step of the entire recovery process. These properties can then be utilized to recover other observable features that may be still non-geometric. Based on these features, information about the 3D geometric structures may be inferred which can then be mapped to some 3D entities. The linkage between 3D entities and concepts yields an interpretation of the input image. Knowledge plays an indispensable role in this process of vision problem solving. That is, the inverse mappings defined above apply ms kinds of l Bull] the int} l~ L ‘r ( _;.- new the ~,m< .I v '. hardy what K11 v,‘ y I ~13...» O '1 ] .‘.‘t‘....'?(l. l3 mt. 1.1.3 Com 3 35212303 car; Add," ’1‘ ‘ av‘v _ ‘ > ' ”min? [13:13!“ i! 1 9" ‘ rpm“ , ~‘ ualy..r .,_ Sc 3“ rf . . J1 . ,9 . ‘1' . \“i.'(» v- -~- \ -1 t if [fl-lain”... ‘1.‘t .- . - P “~' ‘-.Cv . i". A)“ f . ‘ ‘4 g.‘ ‘ 1 IE r .‘ ‘ J.lr \ ~. 1,8". ‘ til"? fl E“ ‘ ,\»~ A 4.; (‘4 ‘ a lF’.-F_1 CLK 91__ \ A'HE‘ .3 x. F‘;-,; h‘ Il‘Y \ _ lak) \ ‘K 10 various kinds of knowledge along every step of the recovery. Both the informal and the formal definitions for vision problem solving provide merely the specification of the input and output relationship of a black box system. Exactly what kinds of computations are performed within the black box is left un— specified. In the next section, this aspect of vision problem solving is examined. 1.1.3 Computational Problems A partition can be achieved in either a divisive or an agglomerative fashion. A divisive method starts with the entire data set as one entity and then successively divides each set into smaller sets according to some criteria, often dissimilarity. It is a decomposition process in a coarse-to-fine hierarchy. An agglomerative method starts with the individual data point as an entity and then combines or groups entities recursively into larger ones until some criteria, often similarity, are violated. This approach is a bottom-up grouping process in a fine-to-coarse hierarchy. Either one of the methods can be used alone to solve the partition problem adequately. In practice, however, the conceptual difference between a divisive and an agglom- erative method, that one is a coarse-to-fine operation while the other is a fine-to-coarse operation, will undoubtedly affect the implementation of a computer vision system. On the other hand, the choice of which method to use may be constrained by the nature of the available data and other considerations. In both cases, however, one thing is common: the effect of the operational criteria, similarity or dissimilarity, used to perform either the decomposition or the grouping is crucial. In our View, the agglomerative grouping method is preferred from points of View of both human vision and practical computation. Evidence from psychological study reveals the grouping phenomena inhuman perception. Observations in human per- ception indicate that similarity among non-geometric attributes is generally correlated 0 lb 90311391 I'lC 51 .'. ' l .. o. 2,.) 9y »!.ltr-|J.d'.il.l‘.l§ \Ai ; A ’5‘ \r.~qI - l... r u s “ii-6.4-. \ CA. A.. - renfionin" areasuhpqgi .£-.',. .l» ‘ 1v ‘ trim “a: (we: flmllerogy A \n ' '7 ‘v-v-_.‘-.'. ‘. a I, . L,V‘L‘valit d l:~\ . . n Gl . ‘_ . -r<.. l" .‘ Y l-MV'AJv fill-La} I 3313.: PW- . he . a'i'i' 'Y',‘ is -_ sits h a: I . .fi v . $.3— M N" UJ’ttia‘ep 11 to geometric structures[93], i.e., geometric structures can be detected from those sub- populations that have similar non-geometric attributes (simplest can be intensity or intensity changes) observed from image data. This suggests a bottom-up grouping operation with the attributes of individual entities at a finer level to be computed be- fore a subpopulation of these entities can be determined. Research in computational vision has been influenced by the findings of such grouping processes in human vision. From the computational point of view, the input data to a computer vision system is typically a discrete array. What a computer immediately encounters are the values at individual quantized points, i.e., what is available at this point in a computer is not an entirety but the finest details that a digitizer allows. From these two consid- erations, it seems more natural to perform the task of vision problem solving in a fine-to—coarse hierarchy, i.e., start with the finest level and then gradually achieve an overall partition through subsequent abstractions by grouping. How can grouping achieve the overall task of machine perception? The experi- ments conducted by Stevens et al in human perception showed that there exist two kinds of scales in human visual processes[93]. One is the scale of the constituent intensity changes and the other is the scale of local geometric structure. Marr also noticed the indispensable roles of these two scales in perception[62]. Based on their observations, Stevens et al propose that these two scales are substantially independent of each other. Therefore, there are “two distinct computational problems”: detect- ing intensity changes across spatial-temporal scales, and detecting structures across spatial scales [93]. The first computational problem is to detect perceptual information at certain levels of abstraction across multiple images, either from a multiple resolution scheme or from a time sequence. The goal along this dimension is to recover the accurate properties of certain type of perceptual entity. One example is the detection of zero- crossings using the Laplacian-of—Gaussian at different scales proposed by Marr[62]. miner exam; ,. i .‘ J-‘ mm it mi ,. . . jvvr-‘-"' . .- l M... (“it’ll ll .1. ‘l I . . - .I'.‘ .Fflvve'wr ... {1, .‘.,Lui, : l ' .‘m «a- .'lI . '1 |(' -"3’ I .‘.“..-cdltl (1 .0. § 12 Another example is the recovery of motion and structure information from a sequence of images. This computational problem has been viewed as a grouping problem[62, 34]. In this thesis, we call it grouping across multiple images. The second computational problem concerns the recovery of 3D structural infor- mation which may or may not be directly revealed in the image data due to the limitation of local intensity information, as pointed out by Marr[62]. To reassem- ble the structural pieces, “global organization is detected, in part, by ‘bootstrapping’ from local organization” [93]. This constructive vieWpoint of human perception is sup- ported by many other studies in human vision[57]. Perception seems to be achieved through a continuous process of such bootstrapping or, from the viewpoint of this thesis, a continuous process (hierarchy) of grouping. We call grouping along this dimension grouping across multiple levels of abstraction. Therefore, to solve the computational problems involved in vision tasks, grouping is performed along two dimensions, one is the dimension of spatial-temporal scales and the other is the dimension of geometrical scale. A continuous process of grouping in a space along these two dimensions leads to the perception, hence, the interpretation of a perceived scene. Although the two computational problems are distinct, they may also interact. In the literature, some researches have also exercised such an interaction of the two types of computational problems in perceptual grouping across multiple resolutions [59, 80, 87, 88, 30, 29]. The process of vision problem solving is a process in which groupings can be performed alternately along the two dimensions. Having identified the computational problems and the grouping behavior involved in solving these computational problems, what characteristics does vision problem solving possess? We discuss this topic in the next section. 1.1.4 Char .’ “"9 r0 .}.I~'- Q Dtuttfiltt id. (1' I ‘ .' h "r.“ '.~ ~ ~. hilt] “Unldn \lP‘4(l ‘ V t u ‘ ttlt‘sfztltiliffi if. ' l 'y- r. .1 . warmth. Constructive a 5513' I r '-.,‘.u hit-.612: II) ¢- 'v rs L"“*‘15Ct"-")tt'--v "‘f ".d ‘:.~~..\.; 1 .. In)”, “ f H “A fi‘t'i‘g' . .,, llOdularitY \ ~L'4‘22K l . . .f; P‘Vrv -,_ 1w)“ , ~g ‘ . .‘.“&A \,.w . . till-“Y l:- --‘ ‘L‘iT‘Jh- :Vz‘w_ \ s|‘<'f‘ .* l" v M 'V‘ - 7‘ "4m r t 13 1 .1.4 Characteristics Different characteristics of vision problem solving have been demonstrated in studying the human visual system. They are discussed separately below and will be explored in developing the architecture we propose to solve a wide range of machine vision problems. Constructive and Hierarchical Physiological observations of the human visual system suggest that human perception is achieved in a multi-level process. Evidence exists that demonstrates that vision is constructive and hierarchical[57]. From a computational point of view, a. constructive vision leads to a divide-and-conquer scheme which reduces the complexity of indi- vidual computational units that comprise a vision system. Perceptual information is recursively recovered and perception is achieved in an incremental fashion. Modularity Studies in psychophics, anatomy, and physiology provide evidence for modularity of human vision[99, 49]. Individual modules with specialized functional roles in the human visual system can be identified [57]. Such identifiable individual modules in the human visual system has led to the shift of research attention to the isolated study of individual modules. Integration, Cooperation, and Competition Although evidence for modularity exists, modules are not independent but rather they interact. The integration behavior among individual modules has long been ob- served. In machine vision, even though the research results from studying individual modules will eventually contribute to the success of flexible computer vision, merely a bag of tools for a bag of problems will not provide an adequate overall solution to \u¢$l .. ~l " 2 J l u.‘ Act-7"‘le Del-U] m ‘ ‘ 1 I... ‘ 4" .' KLIrTPZT. lllU‘J 1 . . , . ,_ ”or bend» fur I 15:; IS arming». Role of Know f1? imam \"is ...,._> h“ La: 'lFA‘7719‘tffl If “Q ltf‘fif‘. 0‘? l ‘ I i,._ L . - . I .3? “tn-41f: .‘. 1‘. I‘ , l 3:.‘3ii'tr-gge l .‘,_ ._ Q":E L-G: Ei'l'iY‘J'A 3‘ t~. 5.)“.‘12.’ U100 ' :enelt :fi 1 ‘ o- r , ‘ I V‘- , .11...cf. it? VQ~. ’ ~21 ; -,, , _. "V ' I -. “‘4 has: I Bagel: .’ . ..-v;.h._!l'\ .r ‘ - .h \ V: ‘-!. \“‘.‘ 7‘ Gt 1 l .t ‘ 99-1». \ u : MWN,‘ V"..‘\ ‘ \. > a Try. .1 ‘ army ‘|t-‘ "«_‘ I» 14 machine perception. The cooperation and competition among individual modules is crucial to achieve perception. In the context of constructive and hierarchical vision, different modules act both individually and cooperatively. Cooperation and compe- tition behavior is the outcome of integrating individual modules. An overall visual task is accomplished by integrating the capabilities of different computational units. Role of Knowledge The human visual system is not just a simple feed forward mechanism. It seems that whenever there is a forward path, there is also a corresponding feedback path[8]. It has been observed that top-down feedback physically exists in the hierarchy of the human visual system[57]. These observations indicate that knowledge plays an indispensable role in perception and is applied whenever is possible. While the role of knowledge has been acknowledged since the beginning of machine vision research, there has been no coherent way to apply knowledge in the process of vision problem solving. Homogeneity The human brain can perform many heterogeneous tasks with essentially homoge- neous and simple computational units (neurons). These computational units are ho- mogeneous in the sense that every one of them computes in a similar fashion and the computational results are represented in a consistent way. The networks connected by such homogeneous computational units are therefore homogeneous, consistent, systematic, and coherent biological computers. In machine vision research, in order to achieve the same set of different visual tasks, both the methodologies and the representations adopted are usually also het- erogeneous. It may not become a concern in the research related to studying isolated individual modules. However, such heterogeneity will introduce much difficulty in - 'o‘s..' . lI_E’=_....c.,’._e 1' .'. r-q] 1r: - .‘. 4‘.Gl 3‘: 6‘ b . .‘. g“. , l -~.<1 km 1‘- f~:. 'tif Vii‘llt l.‘ t , l. Infill .17‘,‘)' J'Cuun, ‘M rs fr "l'Jl'F Q3! . J" Plftrry, V .1_:‘V- “(Dr . g: r“? l 15 integrating these individual modules, causing serious problems in building computer vision systems. 1 .2 Motivations Identifiable individual modules in the human visual system have led to the shift of research attention to the isolated study of individual modules. As mentioned before, merely a bag of tools for a bag of problems certainly will not provide an overall solution to computer vision. While the view of constructive and hierarchical vision prevails, the issues of cooperation and competition among individual modules become crucial in achieving machine perception. It relates to an important issue in computer vision: how to integrate individual modules, multiple sources of information, and different types of knowledge. It is impossible for computer vision to succeed without an adequate solution to this problem. Kanal recently emphasized the need for a formalism for integration[54]: Formalisms for developing algorithms and parallel implementations for many of the individual tools have received much attention in recent years. But the inte- gration of heterogeneous computational components, multiple sensors producing diflerent types of data, and heterogeneous knowledge bases, is a significant sys- tem design problem for which we currently have only ad hoc techniques. Clearly more systematic methods and formalisms need to be developed for the design of complex multilevel systems consisting of heterogeneous modules performing spe- cialized local computations while interacting with other modules at the same and different levels of a hierarchical organization. Such interaction involves infor- mation and decision flowing back and forth, with competition, and cooperation, all in the context of global constraint satisfaction. llrsity also poi . 1 , - - - - - 4 ”internal flld‘ . l " . > x e u . . .. ._ .tm. mluw l I .j‘ l fl. 1, '1 Ut‘t .' 1 i (l nig'tfr .'.. ~' I. . fluffy]? 9' to g ,i f it fray“ 0,; (it. nuance: ;-.-.. ’ . ’ v . tl]vl';n,~' . “L- ;‘ ‘v., . Gt (“Ff \l'fi FRI-l \I " i'fift‘t v v .. r-..Gl.v Oink“... if “ P {In “9 J ‘ ‘JA Mata . ‘ ‘ ~\.'-I ‘ i ‘41 i‘ ~ .‘ a“. If“ ‘ ...e,i_,0,w_ . : iii-1' ‘I .l_“.d \\. . (A r .'y, If ‘lf T'pq . ‘n I- . ‘ '36 iv..- 1]., Diff-l '1” (’fi \ ‘ ‘Hf‘ 7‘. * 'wlF- l A 'L‘ _ .lfl .,. 3'21“», . ‘ art-“t . ‘1‘,r“‘ «J. . r A l: . -‘( 'll". "tat A 16 Minsky also pointed out the importance of developing the discipline in organizing managerial machine vision systems so that advantages of different methodologies can be fully exploited and the disadvantages can be compensated[65]: It is time to stop arguing over which type of pattern classification technique is the best because that depends on our contest and goal. Instead we should work at a higher level of organization and discover how to build managerial systems to exploit the different virtues and evade the diflerent limitations of each of these ways of comparing things. Cooperative and competitive behavior is the outcome of integration while integra- tion is achieved through interacting. In the literature, interaction has been attempted at several scales. The first scale is local neighborhood interaction. The second scale is the interaction across multiple images. The third scale is the interaction among individual modules across levels of abstraction. To date, there is no general paradigm that supports flexible interactions among individual modules in a consistent environ- ment that allows integration of heterogeneous types of information and knowledge in a globally coherent way. In our View, the major obstacle is from the heterogeneous nature of data, of knowledge, and of the techniques employed to solve heterogeneous visual tasks. Heterogeneity exists due to the inevitably different nature of the visual tasks involved. We need to eliminate as much unnecessary heterogeneity as possible so that its negative effect on integration can be minimized. In the meantime, we need to explore the homogeneous characteristics in vision problem solving and to utilize them in developing a more consistent and systematic integration environment. To do so, the following questions need to be answered: 0 What kind of homogeneous characteristics exist? 0 What do they offer and why are they useful? . How do we till 7 ‘ ~ ‘ i. 12:56 {it’ll .‘t‘tllull. tie taster to the '-.l 1.3 Homo sameness dyer teciinzez'tett he. . P- . ‘ v by av n1>,1. . Q t . .m .t t ‘ ' Pisa.c.}vl.tf> 1.4(1‘ 1.3.1 Homc _t \y‘ , . a ".V" -. 1 A ”u .. 1339‘1 it: 5! ”iv-u. : -y~ ' we. I l\ Fr .‘.. JJ+ 3.0%th a V A ‘A- TWT‘t‘t‘ "“ ‘ i " JUN E) A". (1T. 1‘" ilk "K. t- : [D1 -..‘ ‘ 3H"? (M ‘ -.I._ Lu ~ ., .- «u \lxl‘d] r .71] ». “1'; lF“ ' . l. ‘ \‘l‘M—J“ a“, ‘iJuf'Aa‘l‘; i <3 Jrlr‘ 17 o How do we utilize them in the context of integration? In the next section, we try to answer the first two questions. Chapter 2 is devoted to the answer to the third question. 1.3 Homogeneity Behind Heterogeneity Homogeneous characteristics can be explored in three aspects of vision problem solv- ing: homogeneity in operation, homogeneity in representation, and homogeneity in the principles that govern operations. We examine them separately below. 1.3.1 Homogeneity in Operation As discussed in Section 1.2, the process of vision problem solving is a process of con- tinuous grouping, along, maybe alternately, the two dimensions corresponding to the two computational problems involved in vision. With this perspective, different mod- ules, which may superficially seem heterogeneous, perform groupings consistently to solve the visual problems at different levels of abstraction. Here, each module is de- fined as a computational unit that possesses the functionality of data transformation or unit transformation from its input to its output. This uniform vieWpoint toward the heterogeneous individual modules immediately leads us to a potentially homogeneous architecture for vision problem solving with a hierarchy of grouping processes. Such a role of the grouping operation in a complex computer vision system has been speculated by several researchers [106, 34, 93, 17]. Brady indicated explicitly in 1981[17]: It is equally clear that grouping operations need to be defined at each level of resolution of each representation in the visual system, in order to impose hier- archical structure upon the representation. 1:6 mast import :2 law ultimo '- 'l: M. to 1.,1.‘ 1. 1.3.2 Homo; l WIRHIUCtit'e in. 1 .l .4 all '5. Perceptual 6 ' ! c '0' ""t “l " l toe \.§ 161 we: A, a. 7' ‘Li'\ r ..._..i1ti'.s or exit Sh‘fil ?~ ‘p . l ' 1 f"»4 LU .jljl ii lift"; ’9’r- r . ‘ . “'f‘l‘tfeiitdlll't'N ”"TW-o - “mason mm. “"9... 1 . '1 M. if}? ’ ]- [, HUIJ t‘ I. Y.” \. . {'1' ' . *. U‘- i“. 7'v I [Dalila]: ’ 41, J5 3", -.§.[i];: lli‘l ‘ If [4 . div) “A. "ll" ]|l_ ‘l‘if‘s V. . ll ,[P-n V s It] ‘ \uf. , ‘ All.) I. "if"; f ‘4: .‘I- ‘CL lJn “Jfll. 18 The most important advantage offered by a homogeneous architecture for vision prob— lem solving is the systematic behavior of all individual modules. 1.3.2 Homogeneity in Representation A constructive problem solving scheme implies that a solution is achieved incremen- tally. Perceptual entities derived at one level of abstraction bridge the solutions for the visual tasks that are below and above in the hierarchy. They may provide constraints or evidence that are useful for other levels of processing and can be propa- gated to both higher and lower levels along the hierarchy. Therefore, the functionality of representations for the perceptual entities at different levels is mainly to provide the interface among individual modules. Such an interface should facilitate different information requests and efficient information delivery. For complex vision systems with flexible interactions among modules, heterogeneous representations at different levels will undoubtly introduce difficulty for different modules to communicate with each other. Is the heterogeneity in representations inherent in vision problem solving? Con- sidering the nature of perception, visual data can be adequately characterized by limited types of information. For example, any perceptual entity can be character- ized by its properties, intrinsic or extrinsic, such as surface type or some confidence measure, and by its relationships with other entities which can be spatial, temporal, or organizational. In our view, no matter how heterogeneous the underlying percep- tual entities to be represented are, a homogeneous representational scheme is not only possible but also beneficial in order to provide a consistent interface among different modules. When the homogeneous operation, grouping, is coupled with a homogeneous rep- resentation, a process of grouping operations systematically imposes a hierarchical structure on the representation at all levels. 1.3.3 Homogt life: heterogemn‘.‘ gems grouping 11'. ties that gown: tl r attic-3118. In more fillfl'iltf‘ffi fruit ”GTE lljt‘ saii'lt- be governed by Pi? iiilffon. That is. was it. “3:3, .3 537.“ or de‘itz‘al a . - r. . ,H ‘ , 01 C1,; djLUI'i ltx“-*3[ ‘ ‘ ' - A. 1“?” r . J”‘WWF 'ri " J H‘p’lhh .’ ca“ ("1:- ‘ )L. f O l l L n A Fifi? — ~ ' ' ”in“ [hf-i: A; 4; His, fer? wt 'r-l‘. iJH’HlfJI fi‘ p "gUTPrI‘ ‘ . l C(‘ .3?» “44?» ‘gllffd to Evy“ 3t. , 19 1.3.3 Homogeneity in Operational Principle When heterogeneous modules are viewed from the operational perspective as homo- geneous grouping units, the behavior of each grouping module is determined by the rules that govern the grouping operations. Take the addition operation in algebra as an example. The computational process that performs the addition takes two or more numbers from its domain and produces one number in its range. Although behaving the same way for different domains and ranges, an addition operation can be governed by either a set of rules for binary addition or a set of rules for decimal addition. That is, the functionality of the addition is determined by the rules that governs it. With different ruling principles, an addition process may perform either a binary or decimal addition. Therefore, the addition is an abstract operation defined at an axiom level, while binary and decimal additions are operations defined at the concrete level. Similarly, in this thesis, grouping is defined as an abstract aggregation operation. The functionality of a grouping process is determined by its employed grouping principles or grouping criteria because a grouping operation is performed if and only if its grouping criterion is satisfied. Ultimately, the inherent heterogene- ity of individual modules lies in the grouping criteria. The inherent heterogeneity is now preserved in a way such that it affects only the internal behavior of individual modules. Is there any homogeneity among different grouping criteria? If so, such homo- geneity must indicate some more abstract principle behind the grouping criteria for different problems and the grouping operations are actually all, explicitly or im- plicitly, governed by this generalized principle. In the literature, researchers have attempted to generalize the principles of organizing perceptual data. Lowe has used the principle of nonaccidentalness for organizing 2D perceptual data under the as- sumption of vieWpoint invariance[58]. Haralick has considered the general principle 20 of homomorphism[41]. In their paper discussing the role of structures, Witkin and Tenenbaum have proposed a unified organizational principle called least distortion or fuzzy identity [106]. In this thesis, we argue that an extended concept of homomor- phism, presented in Chapter 2, can be used as a generalized grouping principle. The homogeneity in operational principle identifies the general information flows: top-down and bottom-up, that determine a unified syntax to specify flexible grouping criteria. 1.4 Contributions of the Thesis The primary contributions of this thesis are in two areas of computer vision research: the area of studying formalisms for vision problem solving and the area of study- ing model-based recognition methodologies. Our contributions to the area of vision problem solving are: e We explored, using the central theme of grouping, the homogeneous charac- teristics behind seemingly heterogeneous machine vision techniques. 0 We exploited the above identified homogeneity and proposed a homogeneous problem solving architecture for vision, called Hierarchical Token Grouping. Through this proposal, we established a formalism for developing complex computer vision systems. 0 We showed that this architecture possesses a set of desirable properties that implicate both the practical and theoretical potentials of the architecture. 0 We demonstrated that the proposed vision problem solving architecture can be applied to real-world problems. Our contributions to the area of model—based recognition are: . \Ve dor'ns'z' ,. the satietzt i I We (im’t‘l't’s‘ l jects. mull“ life desist." system turf We tested 1 Ike eizjw-ri reverting scheme all. «1*- .. F. Jul-PHI .‘l ’i SOliif? rPEl "] A. g 21 0 We developed a Generalized Stochastic Tube Model that describes both the salient and dynamic shape properties of tubular objects. 0 We developed an integrated model-based approach for recognizing tubular ob- jects, including plant roots, wires, bacteria, pipes, and blood vessels. 0 We designed an automatic multi-level reconstruction strategy and developed a system under the framework of Hierarchical Token Grouping. e We tested the developed system and evaluated its performance and robustness. The experimental results show that the integrated model used is effective in recognizing the class of objects we are interested in, the hierarchical recognition scheme allows a more robust object recovery, and the propose machine vision problem solving architecture is useful in developing machine vision systems that solve real problems. Chapter 7 will give more detailed discussion and concrete conclusions about these contributions. 1.5 Organization of the Thesis The remainder of this thesis is organized as follows. In Chapter 2, using the central theme of grouping, the homogeneous architecture of Hierarchical Token Grouping is formally defined. The desirable properties of the proposed architecture are examined. Both practical and theoretical potentials of this architecture are speculated. We claim that the proposed vision problem solving architecture is a generalization of many existing computer vision techniques that are designed to solve different problems using seemingly heterogeneous methodologies. A literature survey is presented in Chapter 3. Four aspects in computer vision that are related to vision system integration are surveyed: representation, recognition .5 p — - . . . 3. I.t‘t..w,lti.c§;t 5 t} for 'ision. Infill: ill.r the p8.“ tetniques cart I to support the . tag»? of existizn .. ,. . l l: 5e: and tier « 1':th Pf“ , . (u ...t.... F‘- I . kl ill _.td lhe prunifri; Q I . 'T‘ei l 1 71 “.'ULPVT)’ ’ med ""av;\:‘n ‘ -'....zkt‘ and ill t.., ' ‘ use? mantel cite “‘1 .Zcf‘e’o v ...M“~ Jvt-a ‘ . ' v I i J“ ft“ *1 - 1 j, < < .5 i Q! . ”‘Nfiy‘l o ‘q ‘lJI _ 5 l . ~91", f'. erl“: H W! “N- s V 1 . .__ if‘tf- I [J'Jflf Q ‘ ._ l « ‘0 11 ‘3: cf ‘Ji Sh.— :‘IJlly'tv ,~If‘t 22 methodologies, interaction and integration, and computational models and paradigms for vision. Techniques in each of these aspects are classified into several categories. W'ithin the perspective of token grouping, we demonstrate that each category of techniques can be treated as a token grouping problem. Such a demonstration is to support the claim that the proposed architecture is a generalization of a wide range of existing techniques and can be used for a wide range of vision problems. As yet another demonstration, a real-world problem of tubular object recognition is considered in Chapter 4 and its solution is posed as a hierarchical token grouping problem in Chapter 5. The problem of tubular object recognition is examined in detail in Chapter 4. A model-based approach is proposed for recognizing tubular objects from both 20 intensity and 3D volumetric data. The modeling includes (1) a generalized stochastic tube model characterizing the structural properties of tubular objects, and (2) the imaging process models, predicting the expected cross sectional sensor measurements. An automatic multi-level recognition strategy is proposed that exploits the power of the models at different levels of abstraction. In our solution to the problem of tubular object recognition, many classical vision problems are encountered such as segmentation, perceptual organization, and matching. In Chapter 5, we demonstrate how these individual problems can be formulated as token grouping problems and how conventional heterogeneous techniques can be organized in a consistent and systematic way to form a computer vision system. In Chapter 5, the problem of tubular object recognition presented in Chapter 4 is posed as grouping problems at different levels of abstraction. The strategy de- signed for solving the problem is realized under the paradigm of Hierarchical Token Grouping. An object model is decomposed into different levels of detail which cor- responds to the hierarchy of object reconstruction. This hierarchy is mapped to a set of grouping agents, each of which is responsible for one type of perceptual entity. ‘ -11: -- 9 1 he 30mm m . _ '1 3‘ '1 mm mom ‘ - . D -|' us entire sun r6- Jereraml. u Imitatror; ma: 'i“':‘al ("'V'VlTrj‘ 1.6.35.3 1 ..Ll.li A afifilllC pit-Jill Exziermotx: l (‘T 1 v ..‘_:P}lj an“: {:(“IIQ ‘ ‘ 4’3‘J3‘ifit‘d. BU'~ ' A n ““ “5 Plant '1 z-- ",1. . , “mm-1mm 2 ;\ Fl. . ‘ . n "“"”"“d In C: 1‘ gr‘Ji‘TJITm . .‘u’. ?‘_‘ ‘ ‘ H- ex Li H “an , ( . K v [1 in x u b _<.‘ .F’ljé-Q:\L . 23 The specifications for this set of grouping agents define the tokens involved, inter- action models, integrating multiple sources of information, and the architecture of the entire system. Based on these specifications, various descriptions of the system are generated, including its knowledge hierarchy, its grouping hierarchy, and its com- munication channels. Since the problem of tubular object recognition involves many classical computer vision tasks , Chapter 5 serves as a demonstration in terms of how a specific problem can be solved under the paradigm of hierarchical token grouping. Experiments and results are reported in Chapter 6. To evaluate the system de- signed and developed in Chapters 4 and 5, three performance evaluation methods are proposed. Both parametric synthetic data and real data are used to test the system. Real data includes man made objects such as wires and pipes, and biological objects such as plant roots, bacteria, and blood vessels. System performance and robustness are evaluated based on the data for which ground truth is available. A new scheme for visualizing the recognized 3D objects is also proposed and applied in Chapter 6. The experimental results have shown the effectiveness of both the model-based approach proposed in Chapter 4 and the system realized under the framework of hierarchy of token grouping. The benefits of the descriptive methods are demonstrated. The major contributions highlighted in the previous section of this chapter are restated in Chapter 7. Finally, the conclusions and the discussions about the future research are also given in this chapter. CHAP' Hieram l , s Ills clap», .‘Wvé ‘- L51 ””5 Oi \IlSiu f Ll. i3”: if w J " ping' Silt ' a V 'k‘.‘ T» , ‘LlE‘ » ' - ‘13. ? ~.l - - t l. , 1‘, ,1 CHAPTER 2 Hierarchical Token Grouping In this chapter, a paradigm for computer vision problem solving, called Hierarchical Token Grouping, is proposed. The paradigm is based on the homogeneous charac- teristics of vision problem solving presented in Chapter 1 using the central theme of grouping. Such a paradigm is not intended to provide particular or general solu- tions to vision problems but rather it provides a formalism by which heterogeneous solutions for vision problems at different levels can be organized consistently and sys- tematically, supporting integration of modules, cues, and knowledge, all in a globally coherent mechanism. 2. 1 The Architecture The architecture of Hierarchical Token Grouping is shown in Figure 2.1. It has two distinct hierarchies, the left one represents the knowledge hierarchy, denoted by K H , and the right one is the grouping hierarchy, denoted by CH. The imaging model, denoted by I M and located in the middle of the two hierarchies, emphasizes the indispensable role of the sensor in perception. Within the grouping hierarchy, various grouping agents are hierarchically orga- nized to incrementally recover what is seen in the visual data. In GH, grouping is 24 25 Level of Alisa-action L Figure 2.1. The framework of hierarchical token grouping. extended along two conceptual dimensions, one is the dimension of spatial-temporal scales, denoted by M, along which multiple images exist from either multiple res- olutions or a time sequence, and the other is the dimension of geometrical scale, deoted by L, along which multiple levels of abstraction span. The two conceptual dimension axes form a grouping space, denoted by V. At any level of abstraction, say I E L, a certain type of perceptual entity is to be recovered from the grouping domain of perceptual entities extracted from other levels. A perceptual entity can be characterized by a set of features some of which may be obtained by grouping corresponding perceptual entities across multiple images, either multiple resolutions or a time sequence, along dimension M. The two computational problems of percep- tion (discussed in Chapter 1) are solved incrementally and alternately along the two tnnmuudhnms While each of t laiESOtiirole.t. ru: (are at \iari(, mmumhfl. in the form of u; fig‘ieprocessri azdsuch a Ter(V{ Otjfifl HiOdEl. l.) lHnaflstolu ._,, (Ed-ilHEIGthP. .' 1573121130“ film ii mad I ,\1_ arr}; “Qiy'l‘i‘pl‘lg = ‘ .. 'A 11 1 kn. ‘ .'.“)ffrgtfl. . 143 On ‘he' ’ "9‘. . 4 Hr. : ‘NJLdlaI ‘1)? l I The rec (.‘. 0p... . ‘wt‘ 5?!» - r¢~1‘lil.rm. 26 conceptual dimensions by utilizing the knowledge stored in KH. While each of the three major components (KH, CH, and I M ) in the framework has its own role, they interact. Through the imaging model, the two hierarchies com- municate at various levels by either retrieving knowledge from KH or updating the contents in K H. Knowledge is organized as a hierarchy in KH and flows downward in the form of expectations or expected events at different levels of abstraction. Dur- ing the process of problem solving, information is reconstructed incrementally in CH and such a recovery makes use of the knowledge stored in KH such as decomposed object model. Depending on the type of sensor used, the knowledge retrieved from K H needs to be rendered into a form in which it can be appropriately applied in CH. Therefore, while information flows bottom-up in CH and top-down in K H , in- formation also flows between the two into each other’s hierarchy through the imaging model I M . In this architecture, the information recovery process (GH) is separated from the knowledge (K H and I M ) It implies that a solution to a vision problem directly depends on the solutions to three subproblems. The first subproblem is related to the information recovery process which is treated as grouping problems in this framework. The second subproblem is related to knowledge engineering, including knowledge generation, retrieval, and updating. The third subproblem concerns the use and learning of knowledge during problem solving or the interaction between the above two. Each subproblem is itself a separate and yet difficult problem. The emphasis of current study is mainly put on the various aspects of the first subproblem. 2.2 Grouping Hierarchy In Figure 2.1, grouping hierarchy GH consists of a set of grouping agents that are organized hierarchically along the two conceptual dimensions. Each grouping agent, 1y -,‘ ‘ g - ‘ mentztec b\ one usual task Ha g ottextiire. A g . 4,. . J . “‘I chalet; at or. {FEEDIHE arm-,1 ~ ‘5. ‘ g .‘UJEPi Of .I‘P l“ . . ‘ _ A ”A? V‘ A ‘ .. ~ ) kc... JG dCl.9i\\Pi_ 9;; J ' “'4‘. ‘46:, ‘ . z \ Pd pp ‘F‘fl‘il"?" r ~. . ,_ “Mm pruw fi‘qul:r!nn 0.. ‘l (I . [LP Erl-le}.r " ‘ 4:‘~l.g\ _| - '. :3“ -... “ ‘.~|I, '1'. “r ui P 4'." 27 identified by one block in GH in Figure 2.1, is responsible for solving a specific local visual task via grouping such as extracting straight lines or extracting a certain type of texture. A grouping agent acts on a domain consisting of the perceptual entities extracted at other levels and organizes the data in the domain. The output of each grouping agent includes a new set of perceptual entities, each of which is an organized subset of the domain entities, representing the abstraction of its input. The output can be accessed by other grouping agents so that grouping operations continue until the derived perceptual entities correspond to meaningful objects. This information recovery process is the inverse mapping described in Chapter 1. There are several important issues associated with every grouping agent: rep- resentation of both domain and output perceptual entities, the grouping operation and the associated principles that govern the grouping behavior, and its interaction model with other grouping agents. If a grouping agent is viewed as a black box, then the groupings describe its behavior; grouping principles affect only the internal func- tionality of the box; the interaction model determines how the box is connected to other black boxes or other grouping agents; and the representation of both domain and output entities serves as the interface between it and the outside world. In the following sections, these issues are formally addressed and are generalized across the grouping space V. 2.2.1 General Representation: Token In the context of constructive vision, grouping agents interact through exchanging information. Therefore, the issue of representation is directly related to the issue of interfacing among different agents. In this section, a homogeneous representation, called token representation, is defined for the perceptual entities of all levels. A token represents a general perceptual entity or a meaningful event in perception, visual or conceptual. It can be a point at the finest image resolution such as a pixel, or ll can i)? an ohjer +1" ~ ’ writ? mm W" .:,.. _" ‘.;;«€fl320frdlii\' G ffr'n‘arf l‘ A i i I . “up idl UliTJ V‘ - W’Rfr . i "\vx..“.:l‘.1‘g .‘.. a 4.. l~ ‘i' F l“ tym- _ Qumran. it ‘l; ‘ iry‘fir’z . fine. 28 it can be an object at a conceptual level. Each token can be adequately characterized by its organization, its properties which may be intrinsic, extrinsic, relational, or even functional, and its deformation. Every token can be organized to form one or more other tokens. Within the grouping hierarchy, the token is the uniform representation employed by all grouping agents. Tokens are scattered in an n + 2 dimensional space, called token space with n spatial dimensions and ‘2 conceptual dimensions, where n. is the dimensionality of input image(s). Within the grouping space V formed by the two conceptual dimensions L and Al, tokens form a hierarchy along each dimension, representing the perceptual entities abstracted at different stages of perception. Formally, let T,” be token i at v, where v=(l,m)€ L x 1W 2 V. A token is a 3-tuple: Ti” = {Capan'vaiIL where C: is a set of component tokens from other levels that specifies the organization of T,”, F,” is the feature set that characterizes token Ti”, and DE’ denotes the deforma- tion characterization of T,” at v. Specifically, denoting the grouping domain at v by G: and the grouping principle at v by 672’, the organization of T,” is defined as: c: ={T;;: |1g m 3 mm 6 agency) = T}, (2.1) where n is the number of tokens to be aggregated, T}: is token jm from vm E V, and the predicate G:(C:’) = T means that a grouping criterion, denoted by CZ, is satisfied by Cf at v. A feature set F,” is a union of two classes of features: Fit) 2 Pi” U Rf, (2.2) 29 where P,” denotes the set of property features that characterize T,” and R? is the set of relational features that describe the relationship between T,” and other tokens I . . o T,” ,2 # J. Property feature set P,” CODSlStS of, say N , feature measurements: ‘U . a P,- = {7711,7712,...,mNP}. (2.3) Each property feature can be intrinsic such as surface orientation, or extrinsic such as a confidence measure. The relational feature set is defined as: R? = {(N",{(k-m’,a.~.k,)})}. We NR. 133's god"), 1.37477. (2.4) where N3 is a set of relation names, (Mn, {(kj,v’,a,-,kj)}) describes relationship N" between T,” and other g(D<1") tokens, with k,- and v’ together pointing at the kj'th token at v' and aiJc, = {air}, 0 S t S qn being a set of qn attributes that describe the relationship between token Ti” and TEJ'. Therefore, Rf defines a set of M NH H relationships, with Nne NR, between token T,” and other tokens. The deformation characterization 05’ is defined by some measures that capture the deformation by describing the deviations of actual observed perceptual events from the corresponding expected perceptual events. Such explicitly represented deformation information offers a basis from which a tolerance model for extracting the perceptual entities at v may be established. Token representation is homogeneous and general. It provides adequate descrip- tive power for characterizing perceptual entities at any v E V and offers a consistent interface among different grouping agents. The feature set of a token determines what information is accessible from v. The information carried by a token, upon 30 being accessed, gets propagated to other grouping agents. All tokens that are related according to the component description C,” for different v’s form a hierarchical repre- sentation for the perceptual entity at the top level. By tracing G,” down to the bottom of GH, descriptions with various degrees of details about the perceptual entity can be obtained. 2.2.2 Uniform Operation: Grouping Grouping is defined as an abstract aggregation operation. Token grouping is a com- position of a set of tokens into a new token. Consider a grouping operation at an arbitrary v E V. Let T” be the set of all tokens formed at v E V. Assume the grouping domain at v, denoted by GE, is a union of all sets of tokens from various levels v’ E V, I G: = UUIELu T” , L” 2 {v1, ..,v1}. (2.5) A grouping operation at v E V is defined as: {Tf’} H Tg’, T,” e 0:, (2.6) The tokens that are aggregated, {Tf’}, become the components of the new token, C,” = {T}’}, that determine its organization and collectively specify an observed event extracted from the visual field. A grouping operation is constructive due to the nature of aggregation. Recall, a token is characterized by its feature set. Characterizing the newly aggre- gated token T,” is part of a grouping operation. Feature set F,” of T,” can be derived by a feature aggregation: II:C,”UT”I——+F”, 2 where H gathers the information about a set of individual tokens, which may include 31 non-component tokens, and establishes the feature set of Ti)- 2.2.3 Unified Grouping Principle: Homomorphism Grouping criteria specify the principles of grouping. Define a grouping agent to be a process that carries out grouping operations according to given grouping criteria. For a particular grouping agent at v E V, denoted by GA(v), its grouping criterion G” can be generally defined as G” = :j; /\ 6:, (2.7) where G: and G: are two predicates that are connected by an “AND” operator. Specifically, G: specifies a grouping subspace or domain of GA(v). A grouping sub- space is where the candidate tokens to be grouped reside. The term G‘c’ defines the criterion to determine whether the evaluated agreement between an observed event and an expected event signals a significant similarity. Therefore, G: defines the rules (principles) that govern the grouping operation at v. A set of tokens from grouping domain G: can be grouped to form a new token at v provided that the set of to- kens satisfy criterion G2. Note, definition (2.7) is a general form of defining grouping criteria for the grouping agents at all levels. As mentioned earlier, the general form of grouping domain G: of GA(v) is a union of sets of tokens from different levels (along either L or M): G'" 3 O Uv'eLvT”, (2.8) where L” is a list of levels v E V and T”, is the set of all tokens from v' E L”. Since L” explicitly lists the levels with which GA(v) will interact, it defines the interaction model at v, either an intermodule interaction model or an interframe interaction ‘ I ‘ .. node. (1" the EFOYJDI K? ‘ 'P 'i '13 ‘ ._lC-L.li.:.ie'- t-e prepare we present V r ~- .‘. - “M ml: Cf.) 'i;;rfl I m‘ G “0'! t .._C.,‘J:‘v; g, .5. ;;_ . a U APT l Phi A ., r , . "ifs/1?" 32 model, depending on along which conceptual dimension (L or M) GA(v) performs the grouping. The interaction channels between GA(v) and GA(v’),v’ E L” can be established according to (2.8) so that all the cues accessible from these levels can be propagated into GA(v). Which specific cues are to be used and how they are to be integrated are further made explicit in criterion 0;. In the following sections, we present a unified criterion, homomorphism. We first introduce the mathematical concept of homomorphism and its previous use in computer vision. Then we examine how this concept is to be used as a generalized criterion for grouping. Mathematical Homomorphism and Exact Matching Mathematically, a homomorphism is a mapping function [28, 41, 90]. Let [G; *] be an algebraic structure where G is the domain and * is an operation or relation that acts on G. One example of an algebraic structure is a graph that consists of a set G of nodes and a relation * among nodes, represented as arcs connecting nodes. To define a homomorphism, assume another algebraic structure, say [G’; #]. Then, the mapping, h : G H G', (2.9) is a homomorphism if Mg.- * 93') = h(gi)#h(gj)a Vgsgj 6 G. (2-10) Note, this definition does not require h to be an injection (one-to-one). Therefore, the existence of a homomorphism or for two structures to be homomorphic merely claims a similarity but not necessarily identity. When the mapping is injective, then h is a monomorphism. The strongest case is an isomorphism when h is bijective (one- to-one and onto) [28, 90]. One example of a homomorphism is discussed by Doerr and Levasseur[28]. Considering a “picture-taking” process, a camera is a mapping gunman {fa {Mr-LIIIIPTN‘ thrill one d Ilia is a (4‘1“ be appiied. la dismiss; lit 20le of marlingrll. €- A». .“l .. «air- 1 ' \-\Mi.4‘ls.lt‘ EU“. 1 “El é ~ ‘\. )t' H. k 4" . “ ‘3 T‘ ' 1 cHPlIro 33 function that transfers something three-dimensional onto a photograph, something two-dimensional. The mapping, h : R3 r—+ R2, corresponds to a homomorphism. Even though one dimension is lacking, we still are able to recognize much of what is present in the 2D image due to the similarity. But a question arises: how similar is similar? This is a question that has to be answered when the concept of homomorphism is to be applied. In discussing matching problems in computer vision, Shapiro and Haralick used the notion of relational homomorphism to describe both exact and inexact structural matching[41, 90]. We briefly introduce this concept of relational homomorphism and examine how it is used in defining exact, and later inexact, matching problems. Let D = (P, R) be a structural description of an perceptual entity where P = {P1, P2, .., Pu} is a set of primitives, one for each of the n parts of the entity, and R represents the interrelationships among parts. Each primitive is a binary relation P,- E A x V where A is a set of attributes and V is a set of values. R = {PR1, PR2, .., P125} is a set of named N-ary relationships over P. Each PRk,k = 1,2,..,K is a pair (NRk, R1,) where NR;c is the name for relation Rk and Rt 6 PM“ for some positive integer Mk. Assume an N—ary relation R g PN over set P. A function, h : P H Q, maps elements of P into a set Q. Define Ro h = {(q1,.,qN) E Q | 3(p1,..,pN) E R with h(p,-) = q,~,i=1,..,N}. (2.11) Let S C_: QN be an N-ary relation over Q. A relational homomorphism from R to S is a mapping, h : P H Q that satisfies R o h Q 5. That is, under a relational homomorphism, each element of PN is mapped to an N-tuple in S Q Q”. Based on the above definitions, an exact matching can now be formally defined. Assume a stored model or a prototype structural description Dp = (P, R) and a (urinate str: 1 Kit ”j I :3 :9 tries U C -,_‘ sees If ., ‘Adl‘ (L‘- I‘. ‘ . ‘ . ls 1 9-. ‘1. .. " “phi . "MJr" Pith: it p, . «do; map F "W." 34 candidate structural description DC 2 (Q, 5). Let P = {P1,P2,..,P,,}, Q = {Q19Q‘29'°9Qm}’ and R = {(NR1,R1),..,(NRk,Rk)}, S = {(1VS1,81),..,(lVSk,Sk)). Dc matches Dp if there exists a mapping h: P H Q, satisfying 1) h(P,-) = QJ- => P,- g Q,,an(l (2.12) 2) NR, = N51: R; O h Q 51' (2.13) The first condition (2.12) describes a mapping function that gives the correspondence between the primitives of the two structural descriptions. The second condition (2.13) states that the mapping function h must be a relational homomorphism from each relation of one description to the relation with the same name of the other descrip— tion. By this definition, an exact match is achieved through finding a relational homomorphism h. e—Homomorphism and Inexact Matching The notion of exact matching is only appropriate for perfect data. In real situations, an exact match of structural descriptions should never be expected. The concept of inexact matching is necessary. In an inexar rionships anion: -.;,L,:. .- “(hinting fhflti ll : i ll Here. ll'p detirn “Eli 35 In an inexact matching, possible missing or distorted parts and untrue interrela- tionships among the parts in a candidate structure are taken into account. Define a weighting function: W = (Wp, Wk), Wp = {w1,w-2,..,uin}, W'R : {VVRIVW W’RK}. (2.14) Here, Wp defines a primitive weighting function: WP: P H [0,1] satisfying 21;, w,- = 1, that assigns a weight to each primitive in P. W3 is a set of N-tuple weighting functions in which each l/VRk, 1 S k S K, is a function WRk: Rk H [0,1] satisfying 2,63,: WR;c = 1 that assigns weights to the .Mk — tuples of relation R), for each k,1 S k S K. Suppose R is an N-ary relation over P, S is an N—ary relation over Q, Wp is a weighting function for P, and WR, is a weighting function for R. Define an e — homomorphism from R to 5 with respect to weighting functions Wp for P and WR, for R: h : P H Q, (2.15) such that 2 W12, 3 e. (2.16) rER,h(r)¢S That is, the total weight on those N—tuples that are not satisfied by h with respect to S is less than threshold 6. Additionally, for each corresponding pair P,- and Q, such hathli‘: Ci with (41”, Fug,.fxann : ;,. 2122' Iridppli.g 0‘ . . 0 .LPUHLTUFIIntT nimnunreat in inexact i~h0momogfi T; l. it) (1 its"? ‘ * nan a L. ti r‘, 36 that h(P,-) = Q], the following should be satisfied: with (amvvmn') E Pia (arnvvinq') E ij “in E at:{(1:n|(1.m 6 Ali and afn is a threshold on attribute am 6 A by which the value of attribute am from a I candidate primitive Q], i can differ from the value of am from a model primitive :m'j, P,, vmg. Examine conditions (2.16) and (2.17), an e—homomorphism is an inexact or fuzzy mapping with respect to the specified 6 and at, where e is the tolerance for all the untrue interrelationships among parts while at is the tolerance for total distortion on primitive attribute values. An inexact matching can now be defined based on the above defined 6 — homomorphism. Let D}; be a weighted model structural description defined as: l3: (P? WP: Ra Wit), where P and R are defined as before. l/Vp indicates the importance of each part P.- E P and WR specifies the importance of each relation R)c E R. A candidate structural description D0 = (Q,S) inexactly matches a model structural description Up, with respect to (l) the tolerance for attribute distortion or at, (2) the tolerance for a missing part denoted by pt, and (3) the tolerance for structural distortion represented by rt 2 {6; I R)c E R}, if there exists a mapping h: P H Q U {null} (2.18) :33 (’5 [I .hat sari. r’y.» i~.. ‘ ' hr». HITTINTEW on (I! ;. _: 'i - \ . . h 5fivrkinar in n; Can be tolerate: a ‘l I. at We“ ' ' ‘l ' 4....lflm 1;}, iii-‘9.“ H ii ‘ du‘ «:19 P” l l '3? U"H ‘ ‘ filJ' )Pf‘ - ~. A. A v" l. , v- 4 , .ue.il,11"0 ‘ ..~ - \ u. : hr- ..I,,“!IJ"r. 10”,} . .. ,(l N,"- .< 1., ,.A F- u. exam] DIP 9:41.23 .‘ r \ISAUI [' Vi \ fig] ‘4' ‘3‘. ._ 'p > Ill. 1 ii 37 that satisfies 1) h(P,-) = Q, E Q with respect to at, (2.19) 2) ZP.EP,h(P,)=nuu “’1' S Pu (220) 3) NR; = NS => 12. is c,- -— homomorphism (2.21) with respect to l/VR, from R,- to 5,. In this definition, the tolerance models, (2,, pt, and rt, specifically define “how similar is similar” in matching or indicate the degree of “dissimilarity” (inexactness) that can be tolerated. Typically, a matched candidate structure will be considered as an identified instance of the model structure. Therefore, in effect, a homomorphism serves as the criterion for deciding whether a set of primitives should be organized, or grouped. Extending the precise concept of homomorphism to the concept of e- homomorphism allows the inexact handling of various visual tasks. A slight revision to the constraints on the weighting functions provides the capability of also identifying non-structural or non-relational similarity using the same notion of homomorphism. For example, we can relax constraint (2,6,3,c WR)c = 1) to (0 S Zregk WR;c S 1). Then, by letting all WR;c = {0} (weigh all relations zero which means that the interre- lationships among parts actually do not play a role in matching), the inexact matching defined in (2.18) represents a thresholding operation based on some attributes. To what extent does the principle of seeking homomorphisms generalize the or- ganizational principles across both problem solving dimensions? In the context of computer vision, a homomorphism is a mapping from an observed visual event to a model event. The principle of homomorphism states: perform a grouping if and only if a homomorphism, or a mapping, can be found. Haralick has considered various structural matching problems at different levels of abstraction and formally proved that they ca process of in are integrah? of homomor: -.~ " ‘ ' ..‘UfilOfllUf-pill5 uninv~§ - ~l tidrAf‘d am: :t‘é.‘ entities is 'li' ”:3 ' «J tweet iv t ”:|'-'."" "” . -J..iun.'uf;rl11\”‘ SEN": ~ K ”Aa‘ l "‘ ..liUl‘, PM fr». ‘ :."" .‘ - ”JP-HE prtprf's V ‘L :‘C‘~ ' x“. ”Diff f v ~\.. 1 f‘ - ' 0 xii“ Honlomorph;E lav ifF‘i v , ‘ ‘ ‘r‘lf‘JQlN‘p. a, ] CT‘ I_" r u }r\ . rened 9W" . l 9.... ””5 '30" “will of h. ‘9 ~-' ». 1_ ii] (I "5 L i. “3:1 0 , ' FTOIJPH.E ‘ fi‘z‘i‘ ' lls ‘r, ‘AlvOIXPd I k.‘3‘ (‘0 \ ”190063“: ll h J‘f‘i‘ , ltd Pr)r."> 3“ Fri» ‘ \ilv - Jmufllfl‘v 38 that they can be formulated as homomorphism finding problems [41]. During the process of homomorphism finding, multiple sources of information and knowledge are integrated under the framework of structural matching. The original concept of homomorphism was extended to e-homomorphism in such a way that both the homomorphism in structure and in non—structural attributes can be simultaneously evaluated and assessed. Such an extension further equips this principle with the ca- pability of generalizing the grouping criteria across all levels of abstraction. For the grouping along the spatial-temporal dimension, various criteria for grouping percep- tual entities have been proposed [61, 34, 87, 87, 30, 29]. Since these criteria are defined no differently than many of the criteria used by Haralick in proving them equivalent to homomorphism satisfaction problems, they can also be formulated as homomorphism satisfaction problems. Therefore, the principle of homomorphism finding governs the grouping processes along both conceptual problem solving dimensions. This will be the topic of the next section. Homomorphism in Token Grouping As we introduced earlier, G‘c’ gives the grouping principle or the criterion about when an observed event and an expected event signals a significant similarity. Employing the concept of homomorphism as a unified grouping principle across the entire GH, we call G: the homomorphism criterion that describes a homomorphism validity test for the grouping at v E V. There are three components defined in G2: the perceptual events involved, the homomorphism evaluation function, and the validity test. The first component identifies two events between which the homomorphism is evaluated. The second component defines a certain measure that quantitatively reflects the de— gree of homomorphism or similarity. The third component is necessarily related to a test that assesses the significance of the evaluated homomorphism. In our can: curring in or 1 subspace Ci. 2 C13,. l o .ere cetec.e . v new! other tacrmuumfnq .. L ‘ .0 JE‘ used in r? refill lfi‘ STU?“ .[V' ‘ “if“ EientS art: .-e taro is to r». - .‘.ti[)fi.""' ‘ . . , ..s .‘. (U QT‘W-w -.. n. -‘ "Hi-<11 hi )Y ‘L Cf): 39 In our context, the homomorphism is evaluated between an observed event oc- curring in or inferred from the input, represented by a set of tokens from grouping subspace G3, and an expected event or a model instantiated from the observable ev- idence detected from visual input. Let’s denote these two events by E: and ES, meaning observed event and expected event at v, respectively. The specific type of homomorphism to be identified is determined by the types of features of the events to be used in evaluating the homomorphism. Since each token is described by a set of property features and a set of relational features, three types of homomorphism may be defined. The first type is structural homomorphism. The second is the homomor- phism in property. The third is the combination of both. If only relational features of the events are used to evaluate the homomorphism, pure structural similarity between the two is to be identified which may correspond to some subisomorphism. If only property features are used, the similarity to be identified among two sets of tokens is with respect to certain properties, such as surface orientation, without concerning how the involved tokens are spatially related. The third type is to identify some structural homomorphism and in the meantime to demand certain properties hold on the component tokens involved. Since an observed event seldomly matches exactly with an expected event, homo- morphism has to be evaluated in an inexact fashion[90]. An evaluation function is needed to measure the degree of agreement between the observed and the expected. Given Ef,’ and Eg, a homomorphism evaluation function, denoted by H, measures the homomorphism between the two based on two parallel sets of features from the events. Let S}; be a property selection function at v, I 8;)? = UU’EL”{Sml)Sm27 ",Sval}v a Smi : [091]» K 5‘ be a rtir CL. Ices? two tum twain is defii; fl. . . .. irieract WEI be unized in If i ‘- iffifif‘d as a fl"; :1 rfw ' g‘J‘ ; I . ‘ q , .. l» lfllpgv- . [HM ‘ A._l “»$ \. l~ . v dirk)“ IIi‘I’ El‘l. [D f ‘ JD.“ v~ . He‘ll 1(JL‘Q ‘ \ tr \l.J a : ‘ hgnPflpi 1“»41. "iill‘t A Pd ,. O _ . 3rd}; 40 and Si? be a relation selection function at v, I 5;)? : Uv’EL”{Snlasn2a ”a'SnN I}U 3 SM : [091] v R These two functions select a set of features from level 1” 6 L” on which the grouping domain is defined. That is, L” defines all the levels that the grouping agent at v will interact with, while 5}; and S}; define what features, properties or relations, will be utilized in the grouping agent at v. Based on these two selection functions, H is defined as a function of the chosen features of the domain tokens involved. That is, H(E§,Eé’) = H([(5i3 X PL”) U (5% X 31.1510,“ng PL”) U (5i: X RMlle), where (SfD x PL”) is a set of selected property features of the involved tokens and (5}; x R”) is a set of selected relations among involved tokens. [..]° is the set of all chosen cues from the bottom-up flow, specifying observed event E3. [..]e is the set from the top-down flow, specifying expected event E: . Information from bidirectional flows is integrated in homomorphism evaluation function H and the outcome of the integration influences the grouping decisions at v. Note, the selection functions here differ from the weighting function used by Shapiro and Haralick in defining inexact matching problems. The property selec- tion function is to choose a set of features that characterize certain aspects of the component tokens that are important in deciding whether they can be abstracted into a higher level perceptual entity. The relation selection function is to choose spe- cific interrelationships that have to be satisfied among the component tokens of the potential new token. In effect, these two selection functions together determine an attributed graph to be used in identifying a homomorphism. On the other hand, the weighting function described in [90] is to weigh the importance of each node and each arc in a gm“ g The ‘lt’gree ( dlf The hon} Let HP dt‘TIOIe I»? expressed in mean ' V ”f ' v I” .-. 911.1?" I} Y ‘14 l". . ‘ - ' ‘* \Uht‘fic 1 fun."- "“lPOEEm 5.21 All lllligii' A'L l \‘t’ enamel if 9}»- 5 1mm . a fUC ‘, 1‘ 41 are in a given graph. Therefore, they play different roles and both are important. The degree of agreement between E: and E: is specified by the evaluation outcome of H. The homomorphism exists if and only if the evaluation outcome is significant. Let H}; denote the validity threshold of H at v. Then the grouping criterion G: can be expressed by a generic homomorphism test: 1:: H(E:, E:) E Hf), (2.22) where ’_—‘_’ means “comply with” or “satisfy”. That is, an observed event E: is homo— morphic, either in structure or in properties or in both, to an expected event E: if and only if the homomorphism evaluation outcome H(E:, E:) complies with the validity threshold H}. A new token T,” can be formed if and only if E: is homomorphic to E: with respect to the given criterion G:. Once the homomorphism test is passed, all the tokens involved in E: become the components of T,” and together form the component set C,”. An intuitive interpretation of definition (2.22) is: a new perceptual entity (T,”) is extracted if the visual data (E: ) agrees with (C‘—: H};) what we are looking for (if E: is from a focus of attention) or what we think we see (if E: arises from evidence in data). The expected event E: is stored in and accessed from the knowledge hierar- chy. Note, the validity threshold H)”; can be a set of thresholds with respect to the chosen features. They are similar to the thresholds used in defining inexact matching problems and specify the tolerance to the inexactness between the observed data E: and the expected E: . H}; is part of the knowledge associated with E: and can be learned from training. In the case study presented in Chapter 4 and 5, we will show, by examples, how this can be done. The grouping criterion definition (2.7) is uniform across all grouping agents. At any v E V, it is where the bottom-up (E:) and top-down (E:) information flows meet :er I, - .. its lf' ,f Hut ‘5 i C7 1.)..- it“ , J De (I. ll 42 and get integrated. It specifies simultaneously (1) the interaction model at v (L” in defining G:), (2) the sources of information that are propagated into GA(v) (G:, E:, and Ht), (3) the specific cues to be used (5:) and 5%), and (4) how the cues are to be integrated (H). Notice, while different grouping agents can define flexible interaction and integration schemes, they all use exactly the same specification form as defined in formulae(2.7),(2.8), and (2.22). 2.2.4 Grouping Agent Having defined representation, operation, operational principles, and interaction model (the issues associated with every grouping agent), we now formally define a grouping agent. A grouping agent is a computational unit that (1) takes a set of inputs, (2) organizes its inputs via grouping according to some principles, and (3) generates a set of outputs. Denote a grouping agent at v E V by GA(v). It is formally defined as: GA(v) = (I:,G:,O:,), (2.23) 3, to be: with its input I: and output 0 Iii : {GZvEgingla 0v : {Tvaisj’ngla at and T” to be the token set formed at v: T” {Ti} if Ci” 6 G§,GZ(CE’) 0 otherwise where i E [1,Nv] and N, is the size of T”. The input of GA(v) is specified by two general sources of information, the bottom-up flow (G:) and the top—down flow (E: and Hit). The output is specified by (1) T” that contributes to the bottom-up flow 50111211 new mix“ I" [if any) and I i333). the home fordifierent r's. grouping decisio mung GA: 1‘ is Every token careptual diam ”Vial dimemt l ..1 rate. 162815 Di 3 PIA“.- - ' ,. 12:)”‘(1 to fern: imzl P t-- \v‘ If f‘.\dl‘..,)i 51*: f ' ..ciure featurv‘ 43 so that new tokens can be carried out of GA(v) and (2) the new expectation E: at v' (if any) and the revised validity threshold Hf; (both will be posted in KH). In (2.23), the homomorphism criterion G: determines the internal behavior of GA(v). For different v’s, inherent heterogeneity exists in G:, but affecting only the internal grouping decisions of GA(v). Therefore, it does not interfere with the interaction among GA(v)’s (that is through I: and G:,). Every token in T” is abstracted from either the tokens from lower levels along conceptual dimension L or the corresponding tokens from multiple images along con- ceptual dimension M. That is, a token in T” may be a composition of tokens from lower levels of abstraction or may be a token that represents a refined version, via grouping, of a set of corresponding tokens collected along conceptual dimension M. For example, a set of corresponding curves extracted at multiple resolutions can be grouped to form a refined curve at the same abstraction level [59, 80, 87, 88, 30, 29]. Another example is to group a set of point correspondences to extract motion and structure features for a given pixel token. Definition GA(v) = (1:, G:,O:,} is concise and adequate for describing a group- ing agent. More importantly, it is uniform for any v, serving as a general form of formal specification for all GA(v), v E V. With this general form, even though differ- ent gl‘Ouping agents can be individually responsible for different types of perceptual entities, they can all be uniformly defined as a grouping process. Therefore, a vision problem solution can be described by a set of such specifications, each of which defines one grouPing agent and the collection of which describes an interconnected hierarchy of token grouping or GH. 2.2.5 Generic Token Grouping Algorithm With the generally defined token representation, grouping operation, and grouping criteria, we now give a generic token grouping algorithm for any GA(v), v E V. 44 Generic Token Grouping Algorithm GA(v)={G”,T”} { GSSPACE <—_—. G; ; T” <= (0 ; Nv = 0 ; Loop { {35,113 <= K_Retrieve(G:) ; C,” <= O_Event(G") ; F,” <= H(C.-”) ; E: <= {G:,Ff} ; H <_—. H_Evaluate(E:, Eg’) ; if (H 9—4 HE) D? <= Deform(C.paFiva€) ; T,” <= {05,3803} : E: <: K-Instantiate(T,-”) ; T” <2 T”UT,-”; NU <= Nu +1; endif. } while(NOT Terminate) ; } End of GA(v). - Token Grouping Agent at v - Form grouping domain - Initialize output token set - Retreive knowledge - Choose candidate tokens - Compute features - Form observed event — Homomorphism evalaution - Homomorphism validation - Collect deformation - Form a new token - Generate hypothesis - Add the new token In this algorithm, “GSSPACE” denotes the grouping subspace determined by G:; T” is a set of N, newly formed tokens at v; “K_Retrieve” is a knowledge retrieval module that accesses an expected event and its associated validity condition from l the time deternizix ation lien 1 15a: com E, and c l . ‘ v-LM-r. _ *Lli.‘ ll Kid 91,; ‘I‘il‘ v. ' ‘ ”firm “i. ‘ if]: T” Sir. L. .l 45 the knowledge hierarchy; “O_Event” is a grouping candidate selection module that determines the observed event E:; “H_Evaluate” conducts the homomorphism evalu- ation between E: and E:; II is a feature extraction module; “Deform” is the module that computes the deviation of the newly formed token from the expected event E: and collects it as deformation information about the tokens at v; and, finally, “KJnstantiate” is a module that dynamically instantiates a new expected event E:', which may be evidenced by the new token, at level v’ of K H . The condition “Termi- nate” is satisfied when a consistent interpretation of the scene is reached at the top level and such a decision is cascaded in the form of a termination signal to all the lower level grouping agents. As we can see, before each grouping agent is terminated, it keeps generating tokens some of which may represent alternative interpretations for the perceptual entities at that level. Since a grouping agent does not stop until it is suppressed by high level decisions, it will maintain all the grouping options open which facilitates possible backtracking. This generic algorithm is structured. It decomposes the solution to a visual task at any level into six subsolutions, each of which has a distinct and well defined functional role. Therefore, the algorithm establishes, within the context of token grouping, the principle of decomposing the solution for a specific visual task at any level: decom— posing by functional roles (six roles). Due to the uniformity among grouping agents, this principle is applicable to all levels. By following this decomposition principle, any vision problem solution can be posed as a grouping problem. The algorithm describes how a grouping agent is assembled from distinct func- tional modules. For any v, the algorithm shows exactly what functional modules should be working where in a grouping agent. The identical algorithmic structure for all grouping agents indicates that it is feasible that (1) the creation of grouping hier- archy GH can be automated and (2) all the interaction channels can be established automatically based on specified G:,‘v’v E V. In this way, computer vision systems w m r. a .. e . - . . ..m «A . l .C . HT. rt. 2. H y . . I ‘l. n, 'Y‘r '! ,l u u a . ua .r. .P. . . . .L ..U .5 .n ,. lam . 46 can be effectively integrated or assembled in a globally coherent and structured way. Hierarchical Token Grouping: Vision Problem Solving Having all the notations defined, let’s now go back to Figure 2.1 to reexamine grouping hierarchy GH. Each grouping agent is responsible for a specific visual task. It takes input from both bottom-up (G:) and top-down (E:,H};) information flows, where the bottom-up flow provides a set of so far extracted perceptual entities and the top- down flow provides expectations in the form of instantiated expected event with its associated criterion for such an observed event to be identified, and integrates them in homomorphism evaluation function “H_Evaluate”. The homomorphism evaluation governs the internal behavior of a grouping agent that ultimately determines how the data in grouping domain should be organized. If a new entity (T,- E {T}) is formed via grouping, it is made available to other grouping agents. A newly formed token may provide evidence for another perceptual entity. In this case, a new expectation (E:') can be instantiated in the knowledge hierarchy KH. The deviation of the new entity from the corresponding model (E: ) stored in KH (if any) can be collected to characterize the deformation (D). It can be seen from Figure 2.1 that all the grouping agents are identically structured. 2.3 Properties The goal of the proposed framework is not to provide solutions to vision problems but rather it provides a general architecture under which vision problem solutions can be realized in a principled way. The proposed architecture possesses certain properties that makes the process of realizing a machine vision system more standard, more efficient, more flexible, and more manageable. 2.3.1 A C llelnerarchicz .'. ..,. i- .- .uinauune\is n properties ( THIJETLlIlOIl lllt‘ i'r ‘ in next (liaz. ”fired as a ,1 97‘ .« ~~i pr»mEaUd In ”(‘N‘ " ~9l limpet \i‘. irrli , whiwune I if ~11 . i a, gm'u‘plm aft—1""; ' ' ”“495 1F c ‘ s" .rg. - tqllzed Q ‘. \ i d "2L- 'tL (. .t “Jeni. -I 10;: 'l'lfteS§ ’ a 31,»? 47 2.3.1 A General Constructive Paradigm The hierarchical architecture lends itself to a paradigm of constructive problem solving for machine vision. It is general because (1) it is established based on the abstraction of properties of vision problem solving, (2) it is a generalization of most existing recognition methodologies and recognition paradigms (this will be demonstrated in the next Chapter), and (3) GH can have an arbitrary number of levels. It can be viewed as a unified architecture for the paradigms of both conventional symbolic processing and artificial neural networks. 2.3.2 A Homogeneous Architecture The perspective of grouping establishes the basis for a homogeneous problem solving architecture. It is homogeneous in representation, in operation, and in the structures of all grouping agents. Such a homogeneity is achieved by exploiting the common properties in solving vision problems at different levels of abstraction and fully uti- lizing them. The homogeneity leads to overall vision systems that are structured, more manageable, and easier to build. The inherent heterogeneity is preserved in such a way that it only affects the internal behavior of individual modules but does not hinder the interconnection or integration of overall systems. 2.3.3 Object-Oriented The modern concept of object-oriented software development means that software is organized as a collection of discrete objects that incorporate both data structure and behavior. Therefore, in an object-oriented system, the data structure hierarchy is identical to the operation hierarchy. Object-oriented development is a conceptual process that is independent of a specific implementation. That is, it is fundamentally a way of thinking but not a programming technique [83]. Based on the abstraction or genrraliuzlion of token grouping pres systems. Each taker but not in a pll}‘8l<‘h panning operat inn . lnohjectorient ,. neufels: 0 Object mmie l r 0 D‘ynnrrirr him! 0 functional m: training It 1" arch in" - ‘ ' 995 the object mu P[Warp . . ‘ -Ull§ dillOIfl S“? . .'.,EYD. ‘he Clelff’fl ;( are l0 1'1 ‘ .-e transform ”the <- .\Q ' ,.lem_ 8,1, 851:; Ling benefit; I, lt’ l. 3.4 System; n . in, Urpf_ A“ Ur 07‘ . . a 5.01;}, :‘ Ell D[ r, a 5 ””héfl‘ior n 48 or generalization of the essence of vision problem solving, the theme of hierarchical token grouping presents a methodology of object—oriented modeling of machine vision systems. Each token grouping agent is an object (in an object-oriented modeling sense but not in a physical object sense) with its token representation as data structure and grouping operation as behavior. In object-oriented modeling methodology, a system is described by three kinds of models: 0 Object model describes the objects in the system and their static relationships, 0 Dynamic model describes the interactions among objects in the system, and 0 Functional model describes the data transformation within the system. Examining the architecture of hierarchical token grouping, the grouping hierarchy GH gives the object model of the system; the collection of L”, Vv E V, that describes the interactions among grouping agents (objects here), provides the dynamic model of the system; the collection of G:, v E V, each of which defines how a set of domain tokens are to be transformed into another set of output tokens, describes the functional model of the system. Such an object-oriented modeling methodology for vision problem solving benefits the development of computer vision systems in practice. 2.3.4 Systematic Behavior The uniform grouping operation results in systematic behavior. Such a systematic grouping behavior naturally imposes a hierarchical structure on object representation. fl 2.3.5 Consis liken represenrar mag grouping an ten of perception. a I an: possesses acleqr: l .. l. ' reaes of abstract rur: contains most of I'm 138:? 2011. 2.3.6 Cohesi 1 fr" ‘ lr .-e proposed art 5?,_,_ \- ..i.LG(t‘C m El 0,) rl"‘ 5453'"? 4' . .G‘Ga‘is It: emu," p»: iliich can in -i Ur Fowl»: ' . ”“hfllnfl (l- : ‘dt ., d tucrf‘r ~ d\UUH among Since inteemri acti- - iffy!” - a N” lhe lllit’ra e--' “a, form of form- ( t“ a},- r A. L r “m comm. ”ifiifi’rqy. - 3 Wit}; . ”n OPDC |l‘ilri. , "5 prrjm Lt ill ”in s Err ”In” 09“,. r » ‘1 ‘ I. 49 2.3.5 Consistent Interaction Interface Token representation is consistent at all levels. It provides a homogeneous interface among grouping agents that facilitates the interactions at all scales. Within the con- text of perception, a token is a generalized scheme for representing perceptual entities and possesses adequate descriptive power for characterizing perceptual entities at all levels of abstraction. We will demonstrate in Chapter 3 that token representation contains most of the representation schemes used in the literature, hence a general- ization. 2.3.6 Cohesive Integration Environment In the proposed architecture, integration of modules, cues, and knowledge are all fa- cilitated in a cohesive environment. A grouping agent is an encapsulated entity that separates its external aspects, which are accessible by others, from its internal prop erties, which can be hidden from others. This encapsulation is the natural outcome of combining data structure and behavior within one single entity and makes the interaction among grouping agents more easily defined. Since integration is achieved through interaction, a cleaner encapsulated inter- action environment will support integration more effectively. As indicated earlier, because the interaction models for different grouping agents are all defined in a gen- eral form of formal specifications, interaction channels can be established automati- cally from concise specifications. Therefore, the integration can be facilitated more efficiently within a globally cohesive environment. 2.3.7 Opportunistic Problem Solving Vision problem solving is opportunistic by nature. The hierarchical token grouping architecture offers an opportunistic problem solving paradigm. Grouping agents start to function Wh are described i don scheme ca Once the data is a hierarchy to a data flair 2.3.8 Di lhe proposer; ’1“ _ y . v plulf? rne rrihe' One eras-191,. 1a.. A, rh€$\\ ”T13 if; L‘ ., . dilllf] ‘lf v-i K . \ UH Prr ”~90." 50 to function whenever appropriate opportunities arise. In general, such opportunities are described by the readiness of domain tokens. Such a data dependent synchroniza- tion scheme can be extracted from the interaction models of all the grouping agents. Once the data dependency is guaranteed, an entire machine vision system organized as a hierarchy of grouping agents is synchronized on flow of tokens, a scheme similar to a data flow architecture. 2.3.8 Distributed and Parallelism The proposed architecture can be implemented in a distributed environment to ex- plore the inherent parallelism. Often, some grouping agents can work independently. One example is the grouping agents that are responsible for connected components and straight lines. They both are grouping results from pixel tokens but they do not rely on each other. Such independent grouping agents can be sent to distributed pro- cessors so that inherent parallelism can be exploited. Given a set of grouping agents, the classification of the independent sets of grouping agents can be derived from the data dependency relationships extracted based on the specifications for the grouping agents. As far as the data dependency can be ensured through synchronization, even dependent grouping agents can work in a distributed computing environment. 2.4 Practical Potentials 2.4.1 Formal Method To Build Complex Vision Systems The framework of Hierarchical Token Grouping presents a disciplined and engineering way of building complex machine vision systems. It is developed based on the nature of vision problem solving. While the behavior of different modules may not be an important issue in studying individual modules, it becomes an crucial matter when integrating indie at The propu- tle behavior of d to: integrating rt rcenections earn: it“ 9 t- ‘ lax/ll; UlllDlll (if r... t - .0 a specifica .rt 1.): es user speci.‘ ' I v. 1%. .uUill‘lh -.H my. all ill? 9 RH ' ‘ wfh a lilEIllV a 99131 0f mach " t . .lfil ' JWI-Orrentp 1i P'usneq : ’t r. - eurdpg‘lll t c .. V ‘ PT'Z'Ule ‘ In R 1 ..r) ‘ x. .159pr “(JIQ‘ ml? in " ~(JI‘Q: M r 9‘- . swim“ .‘o ' 4 ‘l 3 -f a n \ a:rl \- < CO“ Are r ‘[)(J is t? - {9,11% “~ ”la O’v-r ,«U‘ t‘r ‘ R1 or: y 51 integrating individual modules to produce a larger system that achieves a bigger task. The proposed framework introduces a formalism that captures the essence of the behavior of different modules and utilizes it in developing a homogeneous method for integrating machine vision systems. 2.4.2 Automation and Quick Prototyping U ‘U. Since a grouping agent at it can be completely defined by (1:, G:, O ,} and the inter- connections among a set of grouping agents can be extracted from their corresponding input/output definitions, the structure of an entire system can be described by such formal specifications. Therefore, it is feasible to develop an environment that com- piles user specifications and automatically generates the desired system architecture, including all the specified grouping agents and the interaction channels among them. Such a highly automated system development environment will benefit the develop- ment of machine vision systems, including quick prototyping, debugging, and testing. 2.4.3 Software Reusability and Quick Prototyping Object-oriented development methodologies promote information sharing and offer the prospect of reusing designs and codes. Because of the emphasis on abstraction and encapsulation (concentrate on object structure instead of on procedural structure of problem solving), common structures can be shared, even by different systems. Therefore, ultimately, a system can be built by assembling reusable components from libraries. Consider solving vision problem under the framework of hierarchical token grouping, if a generic set of perceptual entities is identified (such as the set of geons) and a corresponding set of grouping agents that extract these generic perceptual entities via grouping is developed, it is conceivable that many computer vision tasks can be solved by assembling different grouping agents together to form vision systems that are suitable t1 2,5 Theor 2.5.1 ProceS 1"" oer; though thr- pr icieat dereloprrienf tision problem solt :zartretical poi“ 1' Hit ll. ed as an al , . ,,. , . (3%)de or the element: in is appropriately in‘ u. algebraic 5N er' :(‘Y‘ru‘it- l ' ..mnf. etemerit "‘ n. " 4 \' .‘h ‘ . I « ..tEII opera: EU!) 1 r, . .2 T ‘ .'...ce the process ( {tau 3)“? . em at an if»: ‘ .ec. product of h. "Y ..trntar modelino r ‘3 lv E‘ dug”- ‘—. 1. ”45'.“ “eff GVent dvv- 52 that are suitable for particular application needs. 2.5 Theoretical Implications 2.5.1 Process Modeling Even though the proposed architecture possesses practical potentials in terms of ef- ficient development of vision systems, it is essentially an architecture for modeling vision problem solving processes rather than an implementation by itself. From a theoretical point of view, a grouping agent, say GA(v), can be mathematically mod- eled as an algebraic system with domain G: and the grouping operation performed on the elements in the domain. Grouping operations are associative. If a null token is appropriately introduced, this algebraic system becomes a monoid which is a class 0f algebraic systems defined at an axiomatic level with associative operations and an id'iintity element[28]. Since the direct product of the monoids defined using grouping as their operation is still a monoid, the entire grouping hierarchy GH is also a monoid, hence the process of vision problem solving can now be modeled (abstract) as an alge- braic system at an axiomatic level, a monoid. This agrees with what theory dictates: direct product of algebraic subsystems allows us to create larger systems. Such a formal modeling method for vision problem solving allows the utilization of the rich literature on algebraic theories, especially its applications in formalizing behaviors of discrete event dynamic systems. 2.5.2 Dynamic System Behavior Description A machine vision system may be distributed. Different computational units may OPEI‘ate both synchronously and asynchronously, depending on the dynamics of the SyStem. The state transition of a system can be opportunistic, depending on the 53 input data. Describing these different aspects of the behavior of a dynamic system will be of great benefit. There are various formal methods in the fields of concurrent systems, software engineering, and parallel computing, that can be used to describe systems consist- ing of computing agents that communicate through channels. These formal methods provide mathematical semantics for process, concurrency, nondeterminism, and com- munication. For example, Hoare’s T-CSP (Communicating Sequential Processes), Petri nets, and I/O automata are good candidates to provide behavior descriptions for complex systems. Based on specifications for grouping agents, different types of information can be compiled, such as interaction models, data dependencies, and state transition conditions, and used to establish descriptions about different aspects 0f the system. For example, Petri nets can be built from a set of grouping agent spec- ifications. At run time, dynamic system behavior can be visualized by allowing the tIokens flow within the Petri nets built. Data dependency relations can be established based on the input / output specifications. Control points can then be set accordingly to ensure necessary synchronizations. Parallelism can also be maximized once the data, dependency is known. All these can be compiled automatically from the formal SPeCification of a vision system. 2 .6 Summary In this Chapter, using the central theme of grouping, the homogeneous architecture of Hierarchical Token Grouping is formally defined. There are three major components in this paradigm: the Grouping Hierarchy, the Knowledge hierarchy, and the Imaging Madel. Each of these three components has a distinct role in vision problem solving bUt they also interact. The knowledge hierarchy has essentially the same organization as the grouping hierarchy and both interact at various levels of abstraction, sometimes 54 through the imaging model. The Grouping Hierarchy consists of a set of grouping agents, each of which is an encapsulated object that combines both data structure and behavior, extracting a certain type of perceptual entity via grouping. From the perspective of grouping, heterogeneous visual modules can now be viewed as homogeneous operational units. A token representation is designed to represent the perceptual entities at different levels of abstraction. Such a generalized representation scheme offers a consistent interface, facilitating the interactions among different modules. The concept of homomorphism is extended to be used as a unified grouping principle across the entire Grouping Hierarchy. It identifies two general sources of information flow so that a general Syntax of formal specification for defining flexible grouping criteria is established. This specification provides (1) interaction models among grouping agents, (2) sources 0f information to be integrated, and (3) the knowledge to be used. The proposed architecture possesses a number of desirable properties with respect to efficient system design, quick prototyping, testing, and system component sharing. Both practical and theoretical potentials that the proposed architecture may offer are speculated. The proposed architecture is a generalization of many existing computer vision techniques that are designed to solve different problems using seemingly heterogeneous methodologies. In Chapter 3, we will survey the literature from the perspective of token grouping and demonstrate the generality of the proposed architecture. CHAPTER 3 Background In this chapter, we survey four important aspects in vision problem solving: rep- resentation, recognition methodologies, integration, and computational paradigms. Representation is concerned with how a computer denotes or symbolizes objects and relationships. Recognition methodologies discuss how individual visual tasks can be achieved using a computer. Integration deals with the issues of how to combine dif- ferent kinds of information and methodologies in solving complex vision problems. Computational paradigms are related to general theories about what needs to be com- puted and when during the process of vision problem solving. Instead of merely surveying the literature related to these areas, we will also put the literature into the perspective of hierarchical token grouping. The goal is to show that the proposed architecture of hierarchical token grouping provides an adequate paradigm. Specifically, we will argue that (1) the token representation possesses ade- quate descriptive power for all the representation schemes surveyed in this chapter, (2) token grouping is an abstraction of most recognition methodologies, (3) integration at different scales is coherently addressed in the generic grouping criteria proposed in this thesis, and (4) hierarchical token grouping offers a concrete and systematic mech- anism under which various theoretical computational paradigms for vision problem solving can be realized in a globally homogeneous way. The literature survey related 55 th .4“ x ..,4 .. 56 to the area of tubular object recognition is treated in Chapter 4. 3.1 Representations Representation is one of the most important aspects in vision. It deals with issues of how a perceptual entity is abstracted, stored, and utilized during the process of perception. To organize the presentation, we classify different representation schemes into several categories and examine them from the perspective of token representation. 3.1.1 Taxonomy In the literature, various representation schemes have been proposed for representing heterogeneous types of perceptual entities. They essentially fall into five categories: point representation (for 20 pixels and 3D voxels), region representation (for 2D areas and 3D volumes), boundary representation (for 2D curves and 3D surfaces), relational representation, and hierarchical representation. Table 3.1 lists these five categories and Provides a few examples for each category. Some of the representations may belong to more than one category. For example, the Generalized Cylinders (GC) can be either a region (3D volumetric) or a boundary (3D surface) representational scheme, depending on how it is used. The Constructive Geometric Structure (CSG) can be classified as either a relational or a hierarchical representation because it describes both the relationships and the hierarchical organizations of the components Of an Object to be represented. We now examine individually each of the above five categOri es. 3.1.2 Point Representations conventionally, a digital image function f0 is described as a discrete representation Of a SCene being imaged. It is defined in a sample space, denoted by S, of either two 57 Table 3.1. Taxonomy of representation schemes and examples. Category Examples Point Intrinsic images representation Region-based Connected component, Quad/ Oct trees, representation Axial representation, GC, Superquadrics Boundary-based Freeman chain codes, B-splines, GC, representation superquadrics, Fourier descriptor, EGI Relational Aspect graphs, CSG, Cell decomposition representation Quad tree, Oct tree Hierarchical CSG, Cell decomposition, Quad tree, representation Oct tree or three dimensions. In this sample space, each quantized cell x, either a pixel at x = (:c,y) E S or a voxel at x = (.r, y, z) E S, has some digitized sensor measurement which can be intensity, depth, or speed[4]. Such an image f0 can be processed to derive other useful features for each cell such as gradient magnitude, curvature, surface normal, or velocity of optical flow. Thus, a set of equal sized images, called intrinsic images[6], can be formed each of which represents one feature across all the discrete cells where f0 is defined. Denote this set of intrinsic images by f= {f03f19f23‘fifd}, (3.1) where d is the number of features computed for every cell and each feature image f;,1 S i S d, is computed based on the original image function f0. Using this defini- tion, the original image function f0 is a special case with d 2 0. Such a representation is depicted in Figure 3.1(a). Instead of considering f as a collection of d + 1 feature images each of which 58 £(x) - (£000, £1(x), ...., £d(x)} Figure 3.1. Two viewpoints of an image representation. (a) Intrinsic images, (b) Point. representation. describes one aspect of the 3D scene being imaged, it can be viewed as a collection 0f N cell representations each of which characterizes a discrete cell using (1 + 1 fea- tures. Here, N is the number of discrete cells in the sample space. This viewpoint is Visualized in Figure 3.1(b). Each cell x in Figure 3.1(b) is represented by a point representation f(X) = {f0(x)if1(x)a "’fd(x)}a X E S (32) Organizing information this way reflects the perspective of treating each cell as an independently manipulatable entity with its own characteristics. This exercises the CotlCept of object encapsulation of object-oriented design, starting from the smallest data unit. Although organized differently, information is preserved. The features captured in intrinsic images are all obviously preserved in the point representation. What is more implicit is the spatial relationship among individual cells. In an intrinsic image 59 representation (3.1), spatial relationships are embedded in the coordinates of cells. For example, the spatial adjacency relationship n-connectivity is a binary relation over {x} E S which can be expressed by anx' iff IX—X'ISI, Where D4” denotes an n-connectivity relation and l is a distance measure whose value depends on the value of n. The expression states that a cell x is n-connected to cell X’ if and only if the Euclidean distance between the two is less than or equal to value 1. For example, if f is defined in a 2D space, when n = 4, l = 1; when n = 8, l = \/2. In a 3D space, when n = 6, I = 1; when n = 26, l = x/3. Another example of a Spatial relationship is “left”. Suppose the width of S is defined along the X axis. In this case, relation “left” can be inferred using x W x' iff .T < (17’, Where x = [:r,y,z]T, x' =2 [:r’,y’,z’]T, and ML! denotes the “left” relationship. Since the coordinates of cells from representation (3.1) to representation (3.2) are not Changed, the spatial relationships among individual cells can be inferred in the same Way from (3.2). Hence, spatial relationships are also preserved in point representa- tion_ Due to the preservation of information, the two representations ((3.1) and (3.2)) are equivalent. The implicit representation for spatial relationships can be made explicit. F ig- Ure 3.2 gives one example in which (a) is a set of 2D cells in a 4 x 4 array (the ConVentional array representation of a 2D image); (b) is the explicit representation for relationship D44 among the cells in (a); (c) is that for relationship N8; and (d) is that for ML; based on only two rows of (a). The process of making a relation, say N", 60 explicit is to assign a set of indices to every cell x, representing relation N” between x and other indexed cells. For example, to make N4 explicit, cell (2,2) is assigned four indices pointing to cells (1,2), (2,3), (3,2), and (2,1). (0.0) (0.1) (0.2) (0,3) (1.0) (1.1) (1.2) (1.3) (2.0) (2.1) (2.2) (2.3) (3.0) (3.1) (3.2) (3.3) 0.1 (0.2 0,3) W 1.1 a (1.3) (2.0 2.1‘ (2;) (2’3) ( (3.1 3,; 3'3) (C) Figure 3.2. Implicit and explicit spatial relationships among cells. (a) Implicit spatial relationship representation implemented as an array. (b) Explicit representation for a 4-connectivity relation. (c) Explicit representation for an 8-connectivity relation. (d) Explicit representation for spatial relationship “left”. It can be easily shown that a point representation is contained in token represen— tation. An image f0 can be represented as a set of N tokens, each of which is 11cc” : {Cicell’EcelDchell}, 1 S 2 S N, 61 where Cicell = xi, Ficell : Picell U R56”, and its intrinsic and relational features are Pfe“ : {xi’f0(xi),..,fd(xi)}a Ric” 2 {NR9 {vaajllv Mne AfRal S] g 9(Mnla where N3 is a set of relation names that need to be made explicit for future use, x,- points to another token to which token T?" is related under N", a, is a set of attributes that describes the relation between Tfeu and T196” . The number of other tokens, g(l>c 2: can Ewmceeem mammmoooa ~32 085 2: ocean—ob 3:28:28 23. .m.m @568 74 Conditioning Examples of conditioning at low level include various image enhancement techniques, filtering, and morphological processing. At the intermediate level and high level, conditioning can be in the form of outlier removal according to some criterion. Labeling Labeling is the process of deciding which perceptual entities participate in what event according to some criteria. Assigned labels signal a classification of the perceptual entities in some given space. A typical example is thresholding in which the criterion can be an intensity threshold value. A labeling can also occur at higher levels of processing. For example, classifying detected perceptual entities into a set of known Object parts. The criterion used for labeling specifies some kind of homogeneous Property among the perceptual entities that are assigned the same label. From the perspective of token grouping, functional module O_Event(G”) in a grouping agent plays a similar role. Given criterion G” = G: /\ G2, it identifies a set Of tokens from grouping domain G"; that comprise an observed event E0. Grouping The grouping defined in[42] is simply an operation that mechanically collects all the p erCeptual entities that have the same labels. Such a defined grouping does not decide which entities participate in which spatial event, that is done in labeling. A token grouping agent, however, is an independent decision making mechanism. Therefore, the Simple grouping defined in [42] performs only part of what a token grouping agent is Capable of, Specifically, it performs C,” <2 E, where E, is identified in the labeling Step (see Section 2.2.5). 75 Extracting Features This operation computes a set of features of perceptual entities. In a token grouping agent, this task is performed by functional module H. Mat ching As discussed earlier, there are essentially four types of matching techniques: geometric matching, statistical matching, graph matching, and iconic matching. Each of them can be considered as a homomorphism finding problem, hence can be performed in functional module H_Evaluate(Eo, E6) in a token grouping agent. For a geometric matching, since correlation is usually made between two parallel sets of object parameters, the set from data needs to be estimated from the grouping candidate before the matching takes place. For example, if a tube-shaped object is modeled by a 3D cylindrical surface, the set of perceptual entities hypothesized to comprise a 3D cylinder surface should be first characterized using six model param- eters in order for it to be matched against the model. Having the parallel sets of features from both data and the model, a geometric matching is to find a homomor- PhiSm between two single node graphs, one from the model and the other from the data, Where both nodes are attributed by their corresponding sets of parameter val- ues_ In this case, the homomorphism is defined based on the property features, hence a Similarity of the two nodes is to be identified. Statistical matching can be viewed the Same way. A graph matching lends itself to a homomorphism problem. With each individual point in an input image represented as a token, an iconic matching can be formulated as a graph matching. 76 3.2.3 Abstraction of Recognition Methodologies: Token Grouping Note, in effect, all five steps together perform some kind of unit Imnsfo-rmation[4‘2]. Such a unit transformation process repeats across levels of abstraction until the or- ganized data is identified as what the recognition system is looking for. That is, at every level, the central task of all these five steps is common: to enable an appro- priate unit transformation. The first three steps choose appropriate candidates for a unit transformation (grouping). The fourth step prepares some measurements that are needed in order to carry out the unit transformation. The last step is where the transform is completed provided that there is a significant similarity between the set of chosen candidates and what the unit transformation operator is looking for. The process of unit transformations is the process of token groupings because both organize data through means of aggregating a set of candidate perceptual en— tities into a new perceptual entity. These two viewpoints are based on the common observation of the homogeneity among different levels of processing. However, instead of considering the five steps as loosely coupled, each grouping agent unifies them and encaPSUIates the functional behavior exhibited in these five steps and data represen— tation. Another difference is that, in the framework of hierarchical token grouping, all levels of grouping are put into the perspective of reaching an overall solution so that the interactions, within GH and between GH and KH, are explicitly treated, a1 ' . . . . . . . I In a Coheswe and conSIStent manner, to fac111tate a more systematic. integration scheme. 3’3 Interaction and Integration 111 te . g1‘a'tlon is a very important aspect in solving vision problems. With different indi V ‘ 1 dual modules to be responsible for detecting different types of cues, integration k 77 can be facilitated through interactions among modules. In the literature, interaction has been attempted at several scales. The first scale is neighborhood interaction which is local by nature. The second scale is interframe interaction which across multiple images. The third scale is intermodule interaction among individual modules across levels of abstraction. The following sections survey the interactions at these three scales and show that they can all be supported under the paradigm of Hierarchical Token Grouping. 3.3 - 1 Neighborhood Interaction The interaction at the scale of local neighborhood occurs among neighboring entities. Contextual information is integrated to describe the compatibility among neighbors. The goal of such an interaction is to maintain some predefined consistency within the neighborhood. A typical framework for achieving this type of interaction is relaxation [109, 71, 2, 85, 31, 75]. Neighborhood interaction is supported under the proposed HTG architecture. Since the smallest perceptual entities are represented as manipulatable tokens, in- tegration of the information from neighboring tokens can be easily implemented in a homomorphism evaluation function. Haralick has proved that the interaction and integration at this scale can be realized through homomorphism satisfaction [41]. 3°3-2 Interframe Interaction The next scale of interaction is across multiple image frames. There are two situa- tions where multiple frames exist, (1) frames from a hierarchy of multiple resolutions and (2) frames from a time sequence. In the first situation, information from dif- ferent resolutions gets integrated in the fashion of coarseto-fine, fine-to-coarse, or a corleination of both. The purpose of such interframe interaction is often to obtain 78 or refine some visual data through integrating the equivalent type of information derived from different image frames. During the interaction across temporal space, correspondence information is often integrated. A typical framework used for in- terframe interaction among different resolutions is the pyramid or processing cone [62, 100, 40, 96, 80, 59, 60, 105, 3]. Interframe interaction is explicitly facilitated by the groupings along conceptual dimension M (see Figure 2.1 in Chapter 2). 3.3. 3 Intermodule Interaction The third scale of interaction is among individual modules. Zucker has termed the interaction at this scale as a vertical process through which horizontal processes (mod- ules) that are responsible for different perceptual entities interact[108]. While the literature has provided fairly consistent frameworks for the interaction at the previous two scales, the interaction among individual modules have been so far essentially ad hoc and heterogeneous. Individual modules interact by exchanging a Wide range of heterogeneous types of information. For example, data from differ— ent sensors can be integrated in a module to estimate or to characterize 2D or 3D structures [56, 27, 1, 2, 63, 108]. Different types of knowledge can be utilized and integrated in a module to infer invariant perceptual regularities from input images [60, 86]. Cues from different processing modules can be combined in a higher level module in order to identify salient features of objects [52, 26, 15]. Due to the het- erogeneous nature of different visual tasks, there is so far no coherent mechanism developed for supporting the interactions at this scale. The framework of hierarchical token grouping is our attempt to provide an envi- I‘Onment in which intermodule interactions can be supported in a more coherent way. The token representation offers a consistent interface so that interaction models can be Specified by using a list of levels (22’s) and the feature selection functions. From the SPe ‘ . . . effication for grouping agents, these types of information can be eaSily extracted 79 and compiled. Therefore, while allowing the freedom of defining flexible intermodule interactions, the proposed framework supports them in a structured and manageable way. 3.4 Computational Paradigms For Vision Researchers working in the field of vision have been trying, over the years, to an- swer important questions concerning the nature of vision. Aloimonos summarized as follows: There are those who ask the empirical question (what is), i.e., they are trying to find out how existing visual systems are designed; those who ask the normative question (what should be), i.e., they are trying to find out what classes of animals or robots would be desirable (good, best, optimal) for a set of tasks; and finally there are those who address the theoretical question (what could be), i.e., what range of possible mechanisms could exist in intelligent visual systems. Various computational models for vision have been proposed in the literature. Several have important impact in the literature among which the three level paradigm, origi- nally proposed by Marr and later revised by Lowe, remains to be the most influential. In subsequent sections, we review some of them and, more importantly, we examine the relationships between them and the framework of hierarchical token grouping. 34:. 1 Marr’s Computational Model Marr’s computational model for vision answers theoretical questions of vision (what). 18 Work was the first attempt to make an analogy between human v1310n and machine vi ‘ 81011 [62]. Marr proposed the three level computational model for vision in 1982 (see 80 | Image Features I Stereo notion Shape from x 2 100 Sketch lntrlnelc Images l Recoqni t ion Object Models Figure 3.3. Marr’s computational model for vision. The picture is taken from “Per- ceptual Organization and Visual Recognition” by David Lowe. Figure 3.3). The model dictates that perception is achieved within a hierarchy of three levels of bottom—up computations and representations. The computation at the lower level provides image features that are directly avail- able or computable from input images. Marr terms these feature as “Raw Primal Sketch” (RPS) and “Full Primal Sketch” (FPS). The concept of place token was pro- posed to represent individual entities of RPS and grouping is suggested as a means of “aSSernbling” FPS from RPS. But both the concept of place token and the concept 0f grouping were neither pursued further nor made concrete in Marr’s works. Intrinsic properties are computed at the intermediate level and represented by a set of intrinsic images[6]. Shape information of visible surfaces comprises the rep- resentation of the 2%D sketch at this level which bridges what is available from the low level and what is t0 be derived at high level (3D structures). From the cognition po' , , int of view, however, the 2%D sketch at this level does not prov1de much more 81 abstract information than the original input image because the representation is still an array and data is not yet organized. Based on the 2%0 sketch, primitive surface patches and compact volumes can be extracted and assembled (grouped) to index to the instances of the objects that are similarly structured. How the derived 3D shapes are utilized toward cognition is not made specific in Marr’s model. As Witkin and Tenenbaum pointed out, in contrast to the previous two levels that are data-driven, iconic, and domain-independent, high level processing is basically goal-driven, symbolic, and controlled by the expectations of the perceiver[106]. Marr’s computational model describes a constructive approach to vision. From an original input image to the recognition is a recovery process. The information flow is essentially bottom-up. The model specifies what is to be recovered when. It emphasizes the role of 3D structure reconstruction and considers it as an indispensable step toward cognition. The high level is described as a semantically involved process and the explicit participation of knowledge in cognition occurs, in Marr’s model, only at this stage of processing. Some important issues are not addressed in this model, such as top-down influence, interprocess interaction, integration, and cooperation and Competition. While this model addresses many theoretical aspects of vision, to apply it effec- tively in practice, the normative questions such as “what kind of mechanism could it be in order to realize such a recovery process?” have to be seriously answered. Considering the mechanism of neural networks in the human brain where massively Connected networks consist of all homogeneous computational units which together can Perform complex and heterogeneous vision tasks, a homogeneous and coherent Inechzi—Ilism for machine vision should be possible. Marr has proposed grouping as a concrete mechanism fOr low level processing, essentially from RPS to FPS. To date, meg . . . . . . hahlsms for realizmg the recovery theories for various reconstruction problems at y 82 lmege Feeturee I I Perceptual orgeni set ion l l l : : -: E El El .9. I 8 3| 3| I a l I g ' Perceptual ‘8’ l I no I Groupinge :3 ii i L I} - - . :2 2 1’20 5km“ an inference 30 Groupinge l I Recognition l i —+ |_ Recognition . Object Modele I‘ Figure 3.4. Lowe’s computational model for vision. The picture is taken from “Per- ceptual Organization and Visual Recognition” by David Lowe. different levels have been local and heterogeneous. 3.4.2 Lowe’s Computational Model Believing that perceptual organization is an essential level of inference and that non- accidentalness plays an important role in human perception, Lowe presented a revised computational model for vision[58]. This is shown in Figure 3.4. In this revised model, Lowe emphasized the role of perceptual organization in Vigion and de—emphasized, but did not eliminate, the use of depth information. From the general vision point of view, Lowe defined the functional roles of perceptual organiZation to be for segmentation, three-space inference, and indexing to world knowledge. He presented a computational model for vision similar to Marr’s model but Contained the pathway that bypasses the intermediate 2%D representation in Mal-108 model along both bottom-up recognition and top-down verification directions, reflecti , , , . 11g the observation that 30 recognition can often be achieved adequately and 83 directly from 2D information. Nevertheless, the recovery of depth information is retained as an alternative pathway even though it is no longer treated as a necessary condition for perception. In Lowe’s theory, such a bypass pathway is possible due to the non-accidentalness derived from the assumption of viewpoint invariance. The major contribution of Lowe’s work was to identify a concrete set of non—accidental or causal relationships among 2D and 3D structures based on perceptual regularities and to utilize them in indexing object models (index 3D part or object models directly from 2D perceptual structures). Such an indexing scheme generates a reduced search space for high level processing where indexed objects are verified using the constraint of viewpoint consistency. The role of grouping was made explicit in Lowe’s work: to uncover the causal relationships in the visual field. Perceptual grouping served as the mechanism under which significant 2D structures are identified via grouping at the intermediate level- The organizational (grouping) principles used in Lowe’s work were developed based on the theories of the Gestalt psychology. The Gestalt psychologists demon- strated that people tend to reach the simplest possible interpretation for given visual data. The Gestalt psychology revealed that groupings seem to follow the principle of Simplicity. But the concept of simplicity was never made concrete. Another qual- itatiVe description of this general principle is the minimum principle which states that grouping is performed among those perceptual entities “which requires the least amount of information to specify” [44]. In their important paper on the role of struc- tur e in Vision [106], Witkin and Tenenbaum proposed a unified principle of grouping Called fuzzy identities or least-distortion across both space and time which is, concep- tually, Similar to the minimum principle. Although all intuitively appropriate, these prop Osed general grouping principles are essentially concepts which are difficult to ap- Pl ' . . . y In Computation. Lowe’s work has been Viewed as an effort to make these princ1ples 84 more concrete and was an important contribution to the literature in this respect. The set of causal relationships or perceptual regularities he developed serve, under the constraint of viewpoint invariance, as a generic basis of perceptual groupings that are “likely to survive intact through later stages of interpretation” [58]. Lowe’s model also represents a constructive approach to vision, in which what information is recovered when is explicitly defined. In Lowe’s model, processing is still mainly bottom-up except in verification where indexed objects serve as hypotheses or expectations. The issue of interprocess cooperation/ competition is not explicitly addressed in his computational model. Besides dealing with the theoretical questions about. vision, the issues related to the normative questions were also considered. Specifically, Lowe used grouping as a mechanism for perceptual organization at the intermediate level. But, by what overall mechanism the vision tasks can be achieved IS not further examined. 3.4.3 Model-Based Hierarchical Paradigm While both Marr’s model and Lowe’s model acknowledge the use of knowledge but essentially only at high level of processing, another viewpoint emerged in the litera- ture that makes the role of knowledge explicit at every level of information recovery. Rather than being a conflict with the previously discussed models, this view separates the information recovery process from the knowledge that is utilized by the recov- ery Process. We call this kind of computational scheme a model-based hierarchical paradigm. BrOOks first presented this vieWpoint in his work ACRONYM [20]. He designed a hierarchical representation for the object model used (knowledge about objects) and separated it from the information recovery process. The computational paradigm 0f ACRONYM consists of two parallel hierarchies, one for the bottom-up object 1‘ ec . Over y process and the other for the top-down knowledge representation. These 85 two hierarchies interact horizontally at multiple levels during which the decomposed object models can be used to assist in extracting the perceptual entities that are significantly similar to the model at those levels [20]. With the same view, Jain and Binford generalized the object model hierarchy and extended it by allowing other types of knowledge to also be stored in it in a hierarchical fashion. The observation that one can never experience or measure the physical world directly suggests that a vision system must build a model of the world (not just objects) using its past experience and, at any given time instance, must instantiate the model for the environment of that moment by combining the learned experience and the currently sensed information[53]. In this context, Jain and Binford conclude that knowledge must be applied at every level of information recovery. They further categorized the types of knowledge applied in visual information recovery[53]. The knowledge related to perception is divided into three different categories: knowledge about objects, knowledge about sensors, and knowledge about the domain in which the vision system is to function[53]. Here, knowledge is classified according to the nature of what it reveals about the world. Besides concerning what is to be recovered when, such a model-based hierarchical Paradigm also is concerned with how perception at different levels can be achieved, including how the two hierarchies interact. In this paradigm, the role of knowledge at eVery level is acknowledged and the integration of the top-down knowledge with the bOttom-up cues is explicitly addressed. For example, edge detection methods depend on the nature of the sensor data. With the knowledge about the type of Sensor Currently being used instantiated in the model for the world environment, an appropriate edge detection method can be invoked. The separation of knowledge from the computations that apply knowledge yields a. divide_and-C0nquer scheme at a even higher level. To solve vision problems, three Ca . tegol‘les of distinct problems have to be solved. The first category deals with the 86 issues of knowledge engineering, including knowledge generation, updating, represen- tation, and retrieval. The second category of problems concerns the theories and techniques for recovering perceptual information using knowledge. The third cate- gory is related to the problems of how to effectively establish necessary interactions between the information recovery hierarchy and the knowledge hierarchy. This leads to a more structured computational model for machine vision. 3.4.4 A Concrete Mechanism: Hierarchical Token Group- ing The framework of hierarchical token grouping is our attempt in answering the norma- tive question of “what kind of mechanism should it be?”, a fundamentally different question from the ones answered by the above surveyed computational models. The Proposed architecture presents a concrete mechanism under which various computa— tional theories for information recovery can be realized. The purpose of this architec- ture is not to provide solutions for vision problems but rather to (1) offer a globally COherent and consistent mechanism, through which integration of heterogeneous mod- ules, different sources of information, and heterogeneous types of knowledge can be achieved in a more systematic way, and (2) provide a formalism for organizing complex Systems in order to produce structured and manageable vision systems. The proposed architecture is capable of supporting all the computational models Surveyed in this chapter. This is because the proposed architecture indiscriminately realiZeS any information recovery technique by uniformly dealing with a grouping p r Oblem. The proposed framework establishes the principles of constructing a vision System but gives the vision system designers the freedom of deciding “what to build”. That is, the decision about “what to be recovered when” is not a concern of the at ‘ . . . . chltecture, Such decisions reflect which computational model IS used and how it IS 87 applied. In order for the architecture to be able to support a wide range of flexible decisions, hence to realize different computational models, it is necessary for the principles of constructing systems to be independent of the content of the systems to be constructed. With a flexible number of levels, the architecture is capable of supporting various vision systems with different purposes. One issue that is seriously addressed in the proposed architecture is homogeneity in vision problem solving and utilization of it in developing a homogeneous envi- ronment with uniform interface (representation) among levels in order to facilitate interactions and integrations. The homogeneous representation allows a consistent constraint propagation interface among all levels so that interaction, backtracking, or explanation can be systematically realized. The uniform token grouping operation naturally yields a hierarchical structure of the token representation for objects. The r016 of grouping as a general mechanism to organize or abstract data in a hierarchical CorDIDIJtation environment was speculated by Brady in 1981[17]: It is equally clear that grouping operations need to be defined at each level of resolution of each representation in the visual system, in order to impose hierarchical structure upon the representation. The advantages that should accrue from imposing such structure are likely to be precisely those which have inspired the development of data structures generally in computer science. Ver all, the proposed framework is a general engineering paradigm for machine Vision. I t; - l S Concrete, consistent, systematic, and structured. 3 ‘ 5 Summary Th 6 literature survey is presented in Chapter 3. Four aspects in computer vision th 6.1; are related to vision system integration are surveyed: representation, recognition file t 11Odologies, interaction and integration, and computational models and paradigms g 88 for vision. We demonstrated that (1) token representation is a generalization of a wide range of representation schemes proposed in the literature, (2) token grouping generalizes the recognition methodologies at different levels of abstraction, (3) the unified grouping principle of homomorphism is capable of describing the interaction of all three scales, and (4) the framework of Hierarchical Token Grouping can support flexible computational models for vision. As a concrete demonstration of the applicability of the proposed framework, it will be shown in Chapter 4 and 5 that the framework of HTG provides a realistic and useful problem solving architecture to the real-world problem of tubular object recognition. CHAPTER 4 Tubular Object Recognition In this chapter, a model-based approach is described for recognizing tubular objects from both 2D intensity and 3D volumetric data. The modeling includes (1) a general- ized stochastic tube model characterizing the structural properties of tubular objects, and (2) imaging modeling, predicting the cross sectional sensor measurements. An automatic multi-level recognition strategy is proposed that exploits the power of the nFIOCIEIS at different levels of abstraction. Region-based and boundary-based recogni- tion techniques are integrated in order to achieve reliable performance under difficult t1 ‘ .. 0136 conditions. 4 ‘ 1 Object Domain hape is an intrinsic property of three-dimenSional objects. Even though often de- f0 I‘Illél-ble, tubular objects possess distinct morphological properties. They are smooth, C . urvl linear, and often symmetric. The elongatedness can be characterized by a large as Dect ratio. Figure 4.1 shows a few examples of biological tubular object networks. The geometric structure of tubular objects can be mathematically modeled so that 1:11 - 61 1‘ distinct shape properties can be quantitatively characterized. 111 image understanding tasks, object shape needs to be inferred from observable 89 90 . ' 3.3% i j; 7 :1". \. ss-zm-wamammmafl (C) Figure 4.1. 2D digital images of (a) 2D blood vessels, (b) 2D plant roots, (c) 2D bacteria, ((1) 3D blood vessels from MRA scan. features. Since how objects are perceived in input images depends on both their 3D shapes and the imaging condition, sensor plays an equally important role because it determines what kind of image features can be observed. In the following sec- tions, both the geometric shape of tubular objects and the sensing conditions are m at hematically modeled. 4-2 Geometric Shape Modeling: Generalized Stochastic Tube Model There are two aspects in terms of the shape of tubular objects. One is about the local sh ape of the objects that are distinct and salient. The other aspect is about their Sh ape dynamics when we consider this class of objects from a more global point of View ~ Modeling tubular objects geometrically needs to capture both of these aspects. In the following sections, after introducing related background in geometric model- 111g techniques, we present an object model for the class of tubular objects, called C;"(37le7‘(ilz'zed Stochastic Tube Model, that characterizes the geometric shape of tubular 0b ‘ J eCts using an explicit parameterized approach. 91 4.2.1 Survey Geometric Modeling In the literature of computer vision, Generalized Cylinders (GC) have been stud- ied extensively [70, 89, 73, 101]. Since this representation scheme was proposed by Binford, it has evolved from straight homogeneous cones to the recent much more flexible deformable models such as superquadrics or geons[72, 12]. Sliafer[89] and other researchers [77, 73, 101] have analytically studied various invariant geometric features of this powerful model to provide constraints that, ultimately, can lead to the localization of objects of interest. A Generalized Cylinder is defined as follows[l3]. Let A be a smooth axis in a three-dimensional space (either straight or curved) and X be a cross section. Define a Sweeping rule 5 of X along A (not necessarily containing A) where both X and 5 ar e functions of the length of A. A Generalized Cylinder, denoted as GC, can then be defined uniquely, up to a translation, by CC = 5(X,A). Figure 4.2 visualizes this definition. Note, there is essentially no constraint to any component involved in defining a GC. The axis is arbitrary. The cross section can be any form such as Sclllares, circles, symmetric, or asymmetric and it can deform along the sweeping. The sweeping rule specifies the relationship between cross section X and axis A. For example, the cross section can form an arbitrary angle with axis A at any point of SVveeping and it can even rotate while being swept so that a twisted object can be generated. Therefore, the GC model has tremendous descriptive power and it provides a general representation. Although the GC model was originally proposed as a volumetric representation scheme, its 2D projected form is also widely used. A Commonly used class of 2D GC is called ribbon. While CC offers a powerful modeling r11ethod, using it effectively for recognition purpose requires a mathematically more Dr . . gclse formulation of the representation. 92 Figure 4.2. A tubular object. The theory of Recognition-By-Components (RBC)[12] utilized the concept of Gen— era'lized Cylinders and defined a set of 36 components, called geons, that can be differentiated on the basis of perceptual properties in 2D images. Biederman demon- Str E1'ted that a wide range of objects can be modeled as assembly of this generic set of C()l—Ilponents[12]. While RBC offers a theory for constructive object recognition, since geons are qualitatively defined, it is difficult to apply the theory in practice for recog- I1it‘i‘3n purpose. To explore the power of this general representation for recognition purpose, it is necessary to directly parameterize the geons. That is, mathematically (2 11 Crete or quantitative definitions for geons need to be developed so that recognition be achieved by explic1tly assoc1ating a set of model parameters With observable 93 image features for different geons. Superquadrics is a parameterized model. It represents a subset of 3D geons that have symmetric cross sections and an eccentricity of 90 degree. With its simple parameterized form, superquadrics can capture various global shape deformations with a small number of five parameters. However, use of this elegant model for recognition demands the segmentation of data points before the model can be applied and the model itself represents a necessarily closed object instead of being able to grow in space (like what is modeled using sweeping in Generalized Cylinders). The Generalized Cylinders, superquadrics, and geons have close relationships. While the cross section function, the axis, and the sweeping rule in the GC model Can all be completely arbitrary, geons and superquadrics are both subsets of GC’s With specific constraints on cross sections, axis, and the sweeping rules. Both GC’s and geons are defined qualitatively. In recognizing the objects represented using these qualitative shape models, the relationship between the geometric shape and the observable image features of the objects have to be made concrete. One way of doi 11g so is to explicitly parameterize the models and to relate the model parameters to obServablesBS, 47, 48]. Some research has also been done in the literature that utilized the parameters of superquadrics to differentiate geons using statistical classification teC1111iques[76]. Ql'QeBased Object Recognition Q Q ‘ based reconstruction of 3D objects aims at capturing the relationship between the Obj eCt shape and the invariant properties in a given space. If the space is a gradient Space, geometric surface features of a GC reveal the orientation of the underlying Obj eCt; while in a shadow plane, the visible shading information is often analyzed to CQIIStrain or determine the surface orientations of the objects in the corresponding gr - . . . . a”'(ilent space. A commonly used recognition strategy is to utilize the detected m 94 features, either surface features or shadow features, to constrain or hypothesize the pose of the objects and then verify [45, 35, 19, 12, 102, 55, 23]. A robust 3D object recovery requires that (1) features used adequately characterize a 3D object and (2) they can be detected reliably. In a gradient space, most GC-based recognition methods are essentially boundary- based approaches. Important issues in these methodologies are to segment surface points and to extract surface features reliably. The explicit relationship between sur- face features and the parameters that characterize, qualitatively or quantitatively, a GC object can lead to a correct recovery provided that the surface features are iden- tified accurately. Practice has shown that both these issues often represent difficult pI‘Oblems [107, 85, 67, 66]. Obtaining precise surface features relies on the quality of Segmentation which often can not be guaranteed. Noisy data can easily corrupt not only the segmentation of surface points but also, consequently, the computed surface featUIes. In a shadow plane, GC—based object reconstruction with 2D input relies on the COIIStraints in gradient space estimated from shadows using shadow geometry. Two kinds of shading information that are most often exploited are the shading configura- tion and the contours of objects. The shading property along the cross sections of a G C has been used to search for blood vessels and coronary arteries in medical imag- ing [55, 23], or for plant roots in processing agricultural imageries[32]. The shading 111°C1e] is, however, either collapsed onto profile patterns to reduce the dimensionality of the computation, causing an inflexibility in recognition and sensitivity to noise, or Ihatched with data exhaustively in the image to identify object shadow regions, leading to an expensive computation. Due to the low level processing scheme, these approaches lack an effective method to localize objects, hence, they often demand ei t her a manual seeding from users or a brute force search. Using contour information, a set of invariant 2D perceptual regularities such as m 95 collinearity, symmetry, or parallelism have been utilized to reveal the relationship between a 3D GC and its 2D image [58, 12, 69, 19]. Another invariant contour primitive, the ribbon defined based on parallel symmetry, has also been used [19, 101, 94]. The theories of perceptual grouping and part—based recognition propose the reconstruction of objects directly from 2D contours. Although elegant in theory, recovery can be difficult in practice. Additionally, ambiguities exist when only 2D contours are used to recover 3D objects. Verification is thus needed. Most 2D GC- based recognition methods in the literature have applied heuristics for verification. 4.2.2 Modeling Local Shape Use of a geometric model for objects provides a priori knowledge to organize sensor data. Based on the GC representation scheme, we develop a geometric model for the Class of tubular objects. A special class of GC, called Right Straight Homogeneous Generalized Cylinder, is parameterized to establish a generalized stochastic tube model that, directly facilitate the recognition. G e n eralized Tube Model Define Straight Homogeneous Generalized Cylinders (RSHGC) to be a class of GC’s that have straight axis, fixed cross section functions, and that its cross sections and aXiS are perpendicular to each other during the process of sweeping[89]. Using a RS HGC of some length, called a local tube, as a primitive, a tubular object can be, Wi t hout loss of generality, approximated as a set of smoothly connected RSHGC’S. A t. 111)lllar object consisting of k local tubes is defined as: 3,. = {T1,T2,...Tk}. 96 Each local tube Ti, 1 g i S k, is described by a 3—tuple: Ti : {PiaXiilili (4.2) where ’P; denotes its 3D pose, x,- denotes its cross section function, and l,- is the length of the tube. c . (x0.y0.20) MK Figure 4.3. A rotated and translated tube model. Fir St, we model a local tube T,- in an object-centered coordinate system. Assume an elliptical function is used for cross sections. Let UVW be an object-centered coordinate system such that T.- is located at the origin of UVW and lying along the V “is (See Figures 4.2 and 4.3). In UVW, T,- is a set of 3D points that satisfy the following: (—)2+(%)2 < 1. vi) $1.. (4.3) 97 where a and b are the major and minor axes of an ellipse. We now express T.- in world coordinate system XYZ. There are two ways to rep- resent T, in XYZ. One is based on the homogeneous transformation between UVW and XYZ and the other is to express T,- directly in XYZ. Assume an arbitrary homogeneous transformation from UVW to XYZ with trans- lation c = [2:0 yo 201T and three rotations: the rotation with respect to the U axis by a, Rum), the rotation with respect to the new V axis by 0, RUM), and the rotation with respect to the new W axis by )3, RAH). Figure 4.4 illustrate these rotations. The translation c = [1:0, yo, z0]T represents the new location of T, in XYZ. The homo— y Y V l W V \ ’4 :U >X ......... N;U >X W Z Z RWY) Figure 4.4. Rotational relationship between UVW and XYZ. 98 geneous transformation is: f (not case —Sfi 0 SaSfiCg — C059 505559 + COCQ SOC/3 0 cmwflsfigdma—aocmno ~ 1‘0 yo 30 1 ] where C, stands for cos(.r) and Sat stands for sin(.r). The relationship between a 3D point p in UVW and its transformed image p’ in XYZ is: p'=[.ryzl]=px’H;,p=[uvw1]=p’><’H,-_I, (4.4) where H,- is a function of six parameters. Figure 4.2 and 4.3 illustrate the relationship between XYZ and UVW. An elliptical cross section in XYZ has eight degrees of freedom: {2:0, yo, 20, a, b, a, B, 0}. In our current research, a special case of ellipse, the circle, is used for the cross sections. In this case, a = b = r and the rotation 12,,(9) vanishes. Combining (4.3) and (4.4), a tube T.- with circular cross sections is the set of points (r, y, 2) such that: [(y - yi) C08(0) + (z - 2:) si10(a)l2 + [(36 — an) C08(5) + (4-5) (y — yi) sin(a) sin(;8) — (z — 2,) cos(a) sin(fl)]2 3 r2, (4.6) Where (58,-, y;, zg)’s are the points along the transformed new V axis. In this formula- tlon’ the pose and size of any local tube is uniquely described by six model parameters. Therefore, Ti = {piaXiali} : {Harali}, (4-7) 99 where H; specifies the pose of T,- and r,- describes the circular cross section. To represent T,- directly in XYZ, note that the pose of T,- can be determined by its location, with respect to the origin of XYZ, and its orientation vector which coincides with T,’s straight axis. Therefore, T,- can be represented by a 4-tuple in XYZ: T, = {c,,d,,r,,l,}, (4.8) where c,- is a chosen reference point on the axis of tube T, that describes a translation, d.- is the orientation vector, 1‘,- is the radius, and l,- is the length of the tube. Figure 4.3 visualizes this definition of a tube. In this representation, c,- and (1, specify the pose of T; and, similarly, r,- adequately describes a circular cross section. The orientation vector d,- can be equivalently specified by two angles, 7 and C, denoting latitude and longitude, respectively. Relationships between (4.7) and (4.8) exist. Recall, in coordinate system UVW, with circular cross sections, T,- is a set of all points (u, v, w) that satisfy: u2 + w2 g r?, Vv 6 [0,1,] 6 V. (4.9) Here, the transformed V axis implied by "H, in (4.7) aligns with the orientation vector d,- in (4.8) and the origin of UVW in (4.7) coincides with the reference point c,- in (4'8). Figure 4.3 visualizes these relationships. EVen though both parameterizations in (4.7) and (4.8) are adequate to characterize a local tube with an arbitrary orientation and size, (4.8) leads to a more direct linkage between its object parameters and the observable image features through an explicit set of relationships between invariant surface features (observables) and Object parameters (to be discussed in the next two sections). Since a more direct 1 ° . . . . Inkage leads to a clearer recognition strategy, the parameterization defined in (4.8) 18 100 used from now on. With this parameterization, the geometric structure of a tubular object consisting of 1: local tubes is modeled as: Ek = {T1,T2,...Tk} = {{C1,d1,r1,11},.., {Ck,dk,rk,lk}}. (4.10) Invariant Surface Features of Tubes Shafer[89] and other researchers [73, 101] have analytically studied, in both gradient space and shadow plane, various invariant geometric features of the GC model. These fundamental properties provide constraints on object surface orientations that, ulti— mately, can lead to the localization of the objects of interest. In this study, some of the fundamental surface features of the tube model are fully and quantitatively exploited. Several intrinsic surface features of this object model are invariant to rotations and translations. Examples are surface curvatures and radius. Curvature at a surface point p is defined as the second derivative (at this point) on a C2-continuous surface. For each surface point p, there is a tangent plane TD and an infinite number of orthonormal basis vector pairs for Tp. Among them, there exists a unique pair of directions along which the surface curvatures reach a maximum value in one direction and a. minimum value in the other. Denote the two extreme curvature values by mm” and "3 m in, , called principle curvatures, and their corresponding directions by vmax and Vmina Called principle directions. Figure 4.5 gives an example of principle directions of a cyliIICirical surface. The surface normal at point p, denoted by np, can be obtained f rom the cross product 0f Vmgn and vmax: up = vmin X vmax ere We choose the surface normal pOinting inward to the object. 101 Figure 4.5. The images of a cylindrical surface (left), its vmin’s (middle), and vmar’s (right). A cylindrical surface can be characterized by some distinct surface properties. For example, all the points on this surface have continuous surface normals, zero minimum curvature value, identical maximum curvature value, and parallel minimum curvature directions. The orientation of a straight tube is parallel to the direction of minimum curvatures; while the cross sections of a tube lie on the tangent plane formed by the two principle directions. Therefore, a 3D cross section is bounded by a set of surface points that have similar lem-n, Km”, and vmin and that are spatially configured in a shape similar to a circle in 3D space. Relationship Between Surface Features and Model Parameters There exists explicit relationships between model parameters {c, d, r} of a tube and the abOVe defined surface features {nmim 1cm”, vmin, vmn, up} at a surface point p on the tube surface. The radius of a tube is the reciprocal of its maximum curvature value and the orientation vector of a tube is parallel to the minimum curvature direction. r: 1 , (4.11) Ema: = vm.,.+c_ (4.12) 102 The orientation vector d of a tube can be derived by simply translating a minimum curvature direction vector to a reference point c and normalizing it, where the refer- ence point c can be a point on the axis of the tube which can be obtained from the features of a surface point p from the same cross section, that is: 1 ”max (4.13) c=p+ronp=p+ .(Vminxvmar), where up is the inward surface normal at point p computed from the cross product of minimum and maximum curvature directions at p. These relationships are invariant. Since a local tube can be completely specified by d, r, and 1, all with respect to a reference point c in the world coordinate system, the above relationships can be quantitatively exploited to constrain the poses of tubes, hence to localize object segments. 4.2.3 Modeling Global Shape Dynamics In using parameterized RSHGC as a primitive to model objects, the shape dynamics can be expressed in shape of axis A and the shape of cross sections which is determined by both the type and the size of cross section function x. In this context, the shape deformation can be captured by considering the dynamics along two dimensions: the dynamics in axis and the dynamics in cross sections. Having mathematically modeled each local tube, in order to model an entire tubu- lar Object, we now formally model the transitions between adjacent parameterized RSHGC’s. Define a transition to be the changes or dynamics of object model pa- rameters between two adjacent tubes. Consider a transition from tube T,-_1 to tube T‘: the tube after the transition T .- can be described based on the tube before the 103 transition T.-_1 with some deformation in model parameters, T,- = T,_1 x 65-13, (4.14) where 6,-“- represents the transition. Specifically, 6i-1.i 2 mic—lgi‘sii—im‘si—im(Si—m}- (4-15) Individual terms in (4.15) correspond to the changes in different tube parameters which are constrained in specific domains: (5,31,,- = c.- — c;_1 = [6:13,6y,6z]T, 6r,6y,6z E [—C, +C], C = ma:r(M, N, O), 631,,- = cos‘1(d.~_1 o dg) E {—9, +0], (5r = I‘.‘ - 13-1 6 [—R,R], i—l,i 6L“. = 1.- — 1,.1 e [—L, L], Where M x N x O is the size of a constrained 3D subspace in X YZ ; 9, R, and L are some constants. Based on (4.14), a recursive definition for a tubular object can be deri ved: E]: = (C0,d0,r0, 10) X {60,1a"a6k-l,k} = To X {5:414}; (4-16) Ek_1 X 6k_1,k, (4.17) Where To = {co, do, r0, lo} denotes a starting tube and {6,-_1,g}f=1 represents a process of transitions that captures the dynamics of the geometric shape of object Ek. The a. O o I I bOVe formulation conveys the VleW of conSIdering an object to be the result of a 104 sweeping: the overall structure of a tubular object is formed based on a reference tube T 0 that travels in 3D space and undergoes a series of deformations. Each deformation is described in terms of To’s trajectory and size, or in terms of the changes in tube model parameters as defined in (4.15). For recognition purposes, this continuous process of deformation is later modeled using several independent stochastic processes, that are functions of the length of the underlying object, or simply time. Each process corresponds to one term in (4.15). Together with the stochastic process modeling (discussed in Section 4.4.5), we call (4.16) a generalized stochastic tube model. It captures the geometric shape of not only local tube-like object segments but also the global curvilinearity and elongatedness of the objects to be modeled. 4.3 Modeling Imaging Processes Since recognition is an inverse mapping from observable images features to meaningful instances of objects, it has to apply the knowledge about not only the objects to be recognized such as shape but also the condition under which the objects are to be imaged. Therefore, the role of sensor information is essential in object recognition. In the context of model-based object recognition, while the Generalized Stochastic TUbe Model describes the geometric structure or shape properties of tubular objects, to ca’Pture the shape information from observables, sensing process also needs to be expliCitly modeled. In this study, we deal with two types of sensing conditions, one is photometric sensing condition and the other is MRA imaging process. Together With the geometric shape model developed previously, such developed sensing models Will be integrated in the process of mapping observable image features to instances of t"lbular objects. 105 4.3.1 Photometric Sensor Modeling Shading of an object in a 2D image is determined by its shape, its position, as well as the lighting condition, specifically the angles between the normals of the visible surface of the object and the light source direction [46]. The surface normal at a surface point p = [:13,g,2]T on local tube Tj, denoted by up, is the cross product of two vectors v, and vy on the tangent plane at p. They are defined as: v. = (1,0,1)? = (1.0, 33?, v,, = (0,1,q)T = (0,1, 2;)? Then np = V; x vy = (g, g;,-1)T. The unit surface normal is obtained by: . np __ (1M1, -1)T n = _ . 4.18) p Inpl x/1+p’+qz ( Suppose the light source is located at [x,, y,, 2,]T. Then the light source direction at surface point p = [:r, y, z]T is: r T 8 —’ a (4.19) y/Pi + (13+ 7‘3 Where, Ps=$-£B,, (lazy—ya, 1322—23. (4.20) The iIltensity value at the corresponding point [.r’, y’, 2’]T in the shadow plane (where I [1. ’ 3": z’]T is the projected point of 3D point p on the image plane) is pr0portional 106 to the dot product of rip and ri,, i.e., 1(x', y’, 2') oc rip o Ii, = cos 7, (4.21) where 7 is the angle between surface normal rip and the light source direction Ii, at point [:r, y, z]T. Considering that only part of the surface can be seen, I (:r' , y’ , z’) is defined whenever cosy Z 0. Figure 4.6 shows the intensity profiles extracted from real data of plant roots. It can be seen that the profiles have shapes close to a cosine function. Figul‘e 4.6. Intensity profiles along the cross sections of plant roots. (a) unnormalized Profiles over 15 cross sections, (b) normalized profiles over 15 cross sections. 4332 MRA Sensor Modeling For Blood Vessels Blood flow usually exists only within vessels. Consider a cross section of a blood vessel. Ideally, maximum velocity of blood flow occurs near the center of the cross 107 section and zero velocity occurs near the boundary of the cross section. This ideal flow pattern is depicted in Figure 4.7. An example of blood flow pattern from a chosen cross section of a real blood vessel is shown in Figure 4.8(a). To utilize this information in identifying blood vessels, a bivariate Gaussian density function, erected from a cross section with an arbitrary size and orientation, is employed to model the ideal blood flow pattern. Formally, in UVW, on every cross section of a tube that is parallel to the UW plane, the following Gaussian density function models the speed distribution of the blood flow on the cross section: b(u) = I x [—e:rp(—%uTZ'lu)], O S v 31;, (4.22) where I is a scaling factor corresponding to the height between the maximum and minimum blood flow speed, u = [u, v, w] is a point on the cross section in the object- centered coordinate system, and 2 is a 2 X 2 symmetric covariance matrix with diagonal elements equal to the square of r/ 3.0 and others equal to zero, where r is the radius of the cross section. In this model, on the plane of a cross section, the speed measurements form a flow field, consisting of a set of concentric equal energy rings with respect to the center of the cross section, where the sensor measurement at each ring is determined by the Gaussian density function value, modeling the flow speed. If a 95% percentile is used to truncate the tails of the Gaussian, Figure 4.8(b) displays a theoretical model for the blood flow. With an appropriately estimated I, (422) specifies an expected configuration of the sensor measurements within a cross SeCtion. Since the model in (4.22) is expressed in UVW which is determined by the or ientation of the object and the radius is used to constrain the distribution, (4.22) integrates geometric shape model with the flow model. 108 C W 0 Figure 4.7. Ideal blood flow. 4.4 Model-Based Tubular Object Extraction The models developed in the previous two sections characterize different aspects of the objects to be recognized. In this section, we designed a model-based system in which these models are utilized in an integrated fashion during the incremental process of recognition. 4.4.1 4-Stage Recognition Strategy The vision system designed for recognizing tubular objects consists of four stages: initial segmentation, automatic seeding, local recognition, and global object recovery. Each stage aims at identifying certain types of perceptual events that are distinct in recognizing tubular objects. An overall recognition is achieved in an incremental f«‘IShion. The purpose of the initial segmentation is to identify the points in an input whiCh are (1) likely to represent object points as opposed to background and (2) likely to represent object boundary. Seeding is to extract meaningful perceptual events that are to be used to hypothesize the poses of segments of objects. Such perceptual eveIlts are identified by fitting parameterized shape model to boundaries of the regions where the initial segmentation produces clusters of points with high intensity. Such 109 (a) (b) Figure 4.8. The sensor measurements configuration on a cross section within a blood vessel in (a) real situation, (b) theoretical model. generated hypotheses are verified at the stage of local recognition in which geometric shape model is integrated with the sensing model under the framework of optimal filters. The goal of model-based seeding and local recognition is to extract a set of reliable object segments, serving as good initial guesses of the entire objects, from which the tubular objects can be recovered via a global sweeping process. The system diagram of this 4—stage recognition strategy is shown in Figure 4.9. It is hierarchical, model-based, and the sensor information is explicitly used during the Process of recognition. In the following sections, each stage of processing is introduced in detail. 4 - 4 . 2 Initial Segmentation Due to the nature of the two types of sensing conditions dealt with in this study, t . . . . . . he Sensor measurements w1th1n object reglons are, in general, higher than the sensor 110 Global Recognition 1. Local Recognition Object Geometric Model Sensing ) Model Seeding Initial Classification ll ill/ll Figure 4.9. The model-based 4-stage tubular object recognition system diagram. measurements from non-object regions. Empirical study shows that both the sensor measurements from object and background regions often have statistical properties close to Gaussian distribution. For 2D case, Figure 4.10 (a) and (b) illustrate a 2D (a) («1) Figure 4.10. An example of initial segmentation on a 2D intensity image of plant l'OOts. (a) An intensity image of plant roots, (b) intensity histogram of (a), (c) initial segmentation result on (a), ((1) initial edge detection on (a). root image and its intensity histogram. Both background and root regions have his- tOgra-Ins shape like Gaussian distribution and together they form a histogram similar to a. bimodal Gaussian distribution. For 3D case, Figure 4.11(a) is a subvolume of 3D 111 MRA input consisting of only non-vessel voxels and (b) is the corresponding inten- sity histogram. Due to the fact that blood vessels occupy much smaller volume than the background tissues, the second mode of histogram corresponding to the intensity distribution of vessel regions is much smaller. Therefore, the histograms established based on 3D inputs present the shape similar to a unimodal Gaussian distribution. (b) Figure 4.11. An example of 3D subvolume of blood vessels from MRA scan. (a) A 3D subvolume projected using MRI, (b) intensity histogram of (a). At the initial stage of detection, the statistical properties of the sensor data can be estimated by fitting an appropriate Gaussian distribution function to the intensity hiStogl‘arn established from input. In 2D case, we use a bimodal Gaussian distribution flinction which simultaneously captures two distributions: G(I ; 110,02) representing the distribution of the sensor measurements of objects and G(I; wharf) representing the distribution of the sensor measurements from background. Here, I is a random Variable Whose values represent sensed intensities. Denote the intensity histogram by h(i) where i is a specific intensity value and h(i) is the number of points in the input 112 that have intensity gray level i. A bimodal Gaussian distribution function k k = 1,2 (4.23) k ”k is fitted with h(i) where 1 ‘* u —— — h I X I k N. I; ( > > 1 ‘* : _ __ 2 0:. N1. gill“) X (1 “kl ] Nk X O'k tk . Pk = "“!22’ N :: h 2 ' tk —(I2xcrk k i=§l () [:1 If the bimodal test is passed, the optimization, with respect to minimizing classifi- cation error, is performed by determining an intensity threshold I, using the Bayes decision rule to minimize the classification error. That is, It is determined such that: C(It; umog) = G(It; what?) This threshold value is used to initially classify all the pixels into classes of interior alid exterior regions of tubular objects. Figure 4.10(c) and ((1) show the initial seg- ITlentation result and initially detected edges using a Canny edge detector[21] on the input image displayed in (a) of the same figure. In 3D case, the statistical properties of the sensor data from non-object regions Can be estimated by fitting a unimodal Gaussian distribution function, G(I; ub,ab), with the left mode of the intensity histogram of the input. It is a fitting process SiInilar to (4.23) except here k = 1. The fitted parameters are used to automatically determine a threshold: It:ub+2'aba that is to be used to initially classify all the voxels into classes of interior and exterior 113 regions of tubular objects. Due to the known pattern of blood flows, the gradient magnitudes of interior vox- els are expected to be high compared to the gradient magnitudes of exterior voxels. The statistical properties of background gradient magnitudes can be estimated by fitting a unimodal Gaussian with the left mode of the histogram of gradient mag- nitudes established using the 3D edge (surface) operator proposed by Zucker and Hummel[107]. A threshold of magnitude, m,, can then be determined in a similar manner: mt=um+2-om, where um and am are the parameter values of the fitted Gaussian distribution function, G (m; um, am). By combining the two thresholds, It and mt, blood vessel surface points can be initially identified. A 3D point p at (:r, y, z) is initially classified as a surface point if and only if: I(p) 2 It, and m(p) 2 rm, and 3p', p' :,£ p, such that I(p') < It and p’ N6 p, Where I () and m(.) are the intensity and magnitude measurements, respectively, and [x36 denotes a 6-connectivity spatial relationship in 3D space. Figure 4.12 and Figure 4.13 show examples of initial processing results on a 3D Volumetric data. Figure 4.12 gives the initial separation of interior and exterior r egions for the input shown in Figure 4.12(a). Figure 4.13 illustrates initially detected vessel surfaces, rendered as range images from four different views (90 degrees apart horizontally). In Figure 4.13 the brighter the points are, the closer the surface points a"? e to the viewer. Due to noise, the initial classification of interior, exterior, and S 1lrface point is not reliable and it can be used only as partial evidence. Figure 4.13. The images of initially detected surfaces from an MRI subvolume, viewed from front, left, back, and right, respectively. 4.4.3 Automatic Seeding A robust recovery of the invariant features of tube surface is needed to enable a reliable estimation of local tubes. We call this process seeding or object localization. Although the initial segmentation discussed previously may not provide a good global Separation of object interior, exterior, and boundary, it is useful in locating a small set of reliable seeds. Below, we describe how useful surface features in both 2D and 3D cases are extracted and how they are used to localize segments of objects. 115 Seeding in 2D Images A tubular object perceived in a 2D image plane can be necessarily (but not suffi- ciently!) located by identifying the corresponding parallel ribbon. We first define parallel ribbons and then show how they can provide well constrained situations for localizing 2D tubes. Parallel ribbons are defined as pairs of parallel symmetric contours. Parallel sym- metry was proposed in [102] and defined mathematically in [79]. Following their notations, in a continuous domain, assume two curves CJ- and Ck parameterized by Curve length 3: 01(3) = (15(3), 30(3)); Ck(3) : ($k(5)9yk(3))s Where 01(3) and 0;,(3) are the tangent orientations along the two curves. CJ- and 0;, are parallel symmetric if and only if there exists a continuous monotonic function f (s) Such that its) = we». Being identified as parallel symmetry, C J- and Ck form a parallel ribbon, denoted by R8,“. The symmetry axis A.- of RB.- is the collection of the middle points between CJ-(s) and Ck(s) for all s where f(s) exists [79]. Two spatial relations, namely overlap, denoted by $20, and enclose, denoted by 33., between a pair of parallel symmetric curves are proven to be sufficient conditions for finding a parallel ribbon [98]. These two relationships will yield a region, called overlapping region, that corresponds to a ribbon RBg. Figure 4.14 illustrates these two relationships and the shaded areas in this figure are the overlapping regions. A special case of a parallel ribbon is straight parallel ribbons. In this case, both Cj and C1. are straight lines characterized by 03(3) = 0;,(3) = 0, where 0 is a constant. 116 (b) Figure 4.14. Examples of overlapping (a) and enclosed (b) relationships. The features of RB.- provide constraints in localizing a tube. A 2D straight ribbon can be characterized by its center (mo,yo).-, its orientation 0;, its radius r;, and its length l,-. As we can see, these features are directly related to the parameters of a 2D tube. Therefore, each ribbon yields a hypothesis, denoted by Hg, for a tube TB; H, = {c?’,0?’,r?”,l]"} (4.24) {C?’,d?”.r$".l?”}, (4.25) where d?” is the hypothesized orientation vector and the hypothesized tube model parameters can be computed as c, _ area(RB,-) rag-FE.- p ’ ,_ = 01(6) + (h(s) t 2 1 1 r,- = §d(CJ‘, Ck), 1.: min{l(Cj)J(Ck)}, 117 . where p’ is a point within ribbon R8,, d(CJ-, Ck) evaluates the distance between the two lines, and 1(0) measures the length of line C. Whether H.- corresponds to a 2D tube depends on the verification of the intensity configuration of the pixels within RB, . Seeding in 3D Volumes For an initially classified surface voxel p0 = {:ro,y0,zo}, define a small spherical neighborhood on around p0: on : {p = ((13,352) i ll P0 — p ll-<- C}, where e is the size of the spherical neighborhood. All the initial surface points within on are used to compute the surface features at p0. A local coordinate system X’Y’Z’ is established using the principle components method[35]. The origin of X’Y’Z’ is located at p0 and the Z' axis is perpendicular to the tangent plane of po. The surface points collected from on are projected into this local coordinate system and a bicubic surface is fitted to this small surface patch using a least—square solution[35]: Z, : f(xl,yl) : “133,3 + agxay’ + asxlyfl + (My/3 0530 + Gs$’y’ + (173/2 + 0833’ + 093/, + 010- The Hessian matrix H is directly related to the measure of the Gaussian curvature, ”G, and the mean curvature, MM: 1 [CG =| H I, KM = 5 - trace(H). 118 The principle curvatures can then be obtained from: 5min = min(|n1I,In2|), I"imam: : max(| ”'1 ii] K2 i)? where 531: KM + flag; — NM, and n2 2 KM — V520 —h:M. At the origin of X’Y’Z’ (p0), the two principle curvature directions are derived from the linear combination of the basis vectors, v; and v;,: (a — c+ \/(a —c)2 +4b2)v;. +2bvg, a 2 c vmin = 2bv§c+(a—c—\/(a—c)2+4b2)-v;, a = b : fr’y’va) = fy'x’mvol = a_6 2 2 2 i c = fv_u;9_‘1). __. a, Figln‘e 4.15 and Figure 4.16 display a few examples of such detected principle curva— ture pairs for the front and the back views of the blood vessel shown in Figure 4.13. with computed surface features, a 3D cross section can be inferred based on the relationships specified in equations (4.11), (4.12), and (4.13) in Section 4.2.2. 119 Figure 4.15. Examples of the detected principle curvatures from the front view of the vessel shown previously. Leftzdepth image of the surface; Middle:minimum curvature direction vectors; Rightzmaximum curvature direction vectors. Figure 4.16. Examples of the detected principle curvatures from the the back View. Leftzdepth image of the surface; Middle:minimum curvature direction vectors; Rightzmaximum curvature direction vectors. To improve the reliability of the automatically generated hypotheses, our seeding method demands that a. local tube only be hypothesized collectively by a set of surface points that result in consistent estimates of model parameter for the same “038 section. Each such set of consistent surface points gathered from a 3D thin disk representing a hypothesized cross section is fitted with a 3D parametric circle to refine the hypothesis. Denote the hypothesis for a tube T.- by Hg, H:- = {C?’,d?’,r?’,l?’}, (4.26) that provides the estimated position, size, and orientation of a local tube along a 120 blood vessel. For a tube hypothesis Hg, initiated from either a 2D ribbon or a 3D cross section, the corresponding axis of the hypothesized tube, hp hp At = {“4131 a "'v Aisbfls}l can be conveniently derived from the hypothesized model parameters: )2 h . h . Ag? = Cip +1 .dipa 0 S] S 1i, Where A13 is a point on the hypothesized axis A?” . Once hypothesis H.- is verified, it represents a local tube, serving as a seed to grow into a global recognition. Compared to the manual or semi-automatic heuristic—based seed finding methods frequently employed in the literature, our seeding method is model-based and automatic. 4.4-4 Local Recognition Hypotheses of local tubes established based on invariant surface features need to be Verified. Since verified seeds represent only segments of objects, this stage of Processing is called local recognition. Based on the hypothesized geometric shape of an Object segment, this stage of processing not only verifies the hypothesis but also Optimizes the recognition of the corresponding local tube by integrating the region and boundary information using the sensing model developed in Section 4.3. The goal of this Combined model-based seeding and integrated local recognition is to locate a set of Teh'able object segments which serve as the initial guesses or the starting tubes for pro(lesses that recover the rest of the objects. The incremental model-based strategy is designed to be robust to combat noise in data with the hope of achieving an efficient and reliable recovery of objects. 121 Hypothesis Verification and Optimization Each seed is represented by a set of tube model parameters that characterize a rec- ognized tube. To recognize a local tube, an optimization is performed to obtain best estimated model parameters. The hypothesis generated based on surface features serves as an initial guess and the recognition of the underlying tube Tg is optimized by maximizing the following posterior probability or likelihood: P(Hif) P(f) P(f | Hg)P(H.-) P(f) max(P(Tg)) = max{ } = mas:{ } Where f is an image function representing the input and Hg is a hypothesis. Because the denominator is not a function of Hg, this optimization is equivalent to maximizing the numerator. That is, P(Hiaf) P(f) } o< max{P(f I Hg)P(H.-)}. (4.27) max{ Term P(Hg) in (4.27) reflects the prior knowledge about the shape of the objects of interest, while P( f | Hg) measures the likelihood of f to have an expected distribution 0f the sensor data, conditioned on the given hypothesis Hg. In effect, Hg specifies a region on which P( f | Hg) is evaluated. Taking the logarithm of (4.27), we have: max{ln[P(f I Hg)P(Hg)]} = max{ln[P(f | Hg)] + ln[P(Hg)]}. (4.28) In general, P (H g) is conditioned on its neighboring tubes due to the causal relationship b('3‘13‘7W23en adjacent tubes. In this context, hypothesis Hg is viewed as being constructed from two parts, one is the model parameter setting inherited from an neighboring t“be, say Tg-1, and the other part is the deformation 55-1,; occurring during the 122 transition from Tg-] to Tg. Because these two parts are independent, we have: P(H.) = ants-..) = P(T.-.)P(6._.,.-). As P(Tg_1) is not a function of Hg, we further have P(Hi) 0‘ P(5i—1,i)- ' (4-29) Substitute (4.29) into (4.28), we define Pi“) = 771015“an” I Hill +1n[P(5:‘—1.4)l}, (4-30) as the criterion in optimizing the recognition of a local tube. At the stage of local recognition (i = 0), since there is no previously recognized tube, the second term in (4.30) vanishes. In this case, local recognition optimizes only ln[P( f I Ho)]. The optimization in (4.30) is often achieved through a geometrical surface fitting using a figure of merit designed based on the Euclidean distance. In this case, the absolute overall distance between a theoretical surface and the data surface directly affects the similarity measurement even when they actually have an identical shape but situate at different positions. The theoretical surface has to be gradually moved towards the data surface until an exact match. Such a process of fitting can be computationally costly. In this study, we employed the theory of optimal filters for matching purpose. With the hypothesis initiated from observable image features, Optimal filters can be dynamically generated that avoid the expensive exhaustive Search scheme often associated with the use of optimal filters in the literature [23]. In next section, we briefly introduce the theory of Optimal filters and then discuss how it is applied for our recognition purpose. 123 Theory of Optimal Filters and Its Use in Object Recognition According to the theory of optimum receivers, to recognize a signal r = s + n at the receiver end, where s is the original signal transmitted from the sender and n is the noise, Signal-to-Noise-Ratio (SNR) will be maximized when the filter used to match with 1‘ equals the input signal 5 [43]. Such a filter is called a matched filter or optimal filter. Let us consider one dimensional case. Denote a filter by f and its input signal by r(t) = s(t) + n(t). The the filter output, r'(t), is given by r’(t) = r(t) * f(t), (4.31) where at represents a convolution. In discrete case, for a causal input signal and a Causal filter, (4.31) becomes N T'lNl = 2 7‘Lille - i], (432) i=0 where r'[N] is the N-th output, r[i] is the i-th sample of r(t), and f[i] is the i-th SEuriple of f(t). Let kj = f[N _ Iii (4'33) We then have N r’lN] = 2: arm. (4.34) which is a weighted sum of all the N input samples. Here, (4.34) gives the signal associated with a single pixel at N. The corresponding noise n associated with a discrete point in the filtering process is I N n = Elsi-0‘2, (4.35) i=0 124 where a is the standard deviation of the noise which is uncorrelated from sample to sample. In order to ensure that the DC component is zero, the following condition is imposed: 2 kg = 0. (4.36) With condition (4.36), the signal (S) at a single point in filtering is N N s = r’lN] = z W] = 2; arm — c). (4.37) Where c is an arbitrary constant. Therefore, the expression of signal to noise ratio is: N S/n = (2k( (r[j] — c) )(l: £7202). (4.38) It can be shown that, to maximize signal to noise ratio S/ n, the choice of kg,i = O, 1 , .., N can be derived directly from the input signal r[i],i = O, 1, .., N. To optimize S/n, set (9(5/71) 8kg = 0, 4': 0,1,...,N. (4.39) Sllbstitute equations (4.35) and (4.37) into (4.39) and solve for kg, we obtain ”:0 k} , . kg: Eli—.02: (Tm _ c) x (r[z] — c),i = 0, ..,N. (4.40) Let N 2 2 =0 k] =g({kj}), (4-41) szzo k401i] - C) VVhich is a function of {kg}. Equation (4.40) becomes kg = g({l€j}) X (1‘[2]— C), 2: 0,1,...,N. (4.42) 125 Notice, from equation (4.41), we have N . Ely—0k? N N 2mm — c): ——-{———g; DJ 21.; = g({kJ-}) x 231.jpg] _ c). (4.43) i=0 Substituting them into equations (4.37) and (4.35), then the signal to noise ratio S/n can be shown to be independent of g({kj}), i.e., s 235-iota ({k) }) I(rlz‘] - c) >2 _ = (4.44) n (ha--019 (.{k })1( r121 — era? £0014] — .)2 (4.45) \/Z§v _0(r[i] —- c) )202 Therefore, in choosing kg that maximize S/n, g({kJ-}) can be an arbitrary value with- out affecting the evaluation of S/n. If we set g({kg}) = 1, from (4.42), we obtain k,- = r[i] — c, i = 0, 1, N. (4.46) Based on condition (4.36), we can derive the value of c, 1 N : N_:T i=0 r[i], (4.47) Which is the mean of signal samples {r[i]}. Therefore, the choice of optimal filter f e(luals the signal itself, up to a translation in signal’s magnitude. Such a translation will not affect the evaluation of S/n because of condition (4.36). This leads to a desirable property mentioned earlier: when the theoretical model has an identical shape as the data, the matching score or the figure of merit should remain the same e"en though the two may position differently. A tube model, either 2D or 3D, rendered according to a given hypothesis Hg 1Ifiipresented by a set of model parameters to be optimized and the sensor model, can 126 be treated as a matched filter. The correlation between this filter and the sensor data indicates the degree of match in their intensity configurations. Therefore, max— imization of P( f | Hg) can be achieved through maximizing SN R. In matching the sensor data with a series of 3D rendered tube models, the derived SNR’s can be used to establish an empirical distribution simulating P( f | Hg) and the Hg that corre- sponds to the best SNR yields the optimized local recognition that we are seeking. Figure 4.17 shows an example of the range of optimization which looks like a cone. Figure 4.18(a) plots the values of SNR with respect to different radius of tubes. The peak value corresponds to the correct radius. Figure 4.18(b) shows the sensitivity of SNR to the direction of tubes. Similarly, the peak SN R value corresponds to correct Orientation. Combining both parameters (orientation and radius), Figure 4.19 shows an example of such an empirical distribution estimated based on some synthetic data. In Figure 4.19, radius and orientation vary along X and Y axes and the computed SNR is plotted as the dependent variable along Z axis (height). Figure 4.17. Range of optimization. 127 Semitivitynhdius SauitivityioOtiemnion '3 I I I I I I I I I I I I I U T fi I T l T T radius=3.0 — radim=3.0 —- 1.2 b name-4.0 ~ 1.2 - mums ~ ”$5.0 ----- ruins-15.0 ..... g H g 1.1 E i ‘5 I .......... 3' l a --------- a E 0-9 a: 0.9 0.8 0.8 0.7 l J J l l l l l l l l J 1 0.7 l152253354455556657758 8 Radius (a) (b) Figure 4.18. Sensitivity of SNR to (a) radius, (b) orientation. Matched Filter Generation We now discuss how a 2D and a 3D matched filter can be generated based on a tube hypothesis and sensor model. Let Mg be a theoretical 3D tube model and M; be the Corresponding matched filter to be generated. The process of dynamically generating a matched filter can be considered as a mapping, denoted by G), which transforms Mg into Mg, that is: Q: MtHMf. Specifically, C9 is defined as: . N o 52 o R(Mg) if 2D input M! = @(Mt) = N o S3 o R(Mg) if 3D input Vvhere R(.) is a 3D volumetric model generator, N is a normalization operator, S 3() is a 3D rendering (sensing) operator, and Sz(.) is a 2D rendering (shading) operator. lDenote a 3D volumetric model by M”, a rendered model by Mr, and a normalized, 128 Figure 4.19. An example of the empirical distribution of p. hence final, model by NI]. We define: RI Mt = Hi H Mu = {(11), 3172)}, Where Hg is a set of model parameter values defined in (4.26) and Mg, is a set of 30 points that satisfy inequality (4.9). To render the 3D volumetric model, the 3D rendering operator: 335 Mt; H Mr ={($,y,z,1($$y1z))}={($,y,z,b(u1v’w))}’ a-Ssigns each point (2:, y, z) in Mg, 3. value I (:c, y, 2) determined by the expected blood flow speed determined using (4.22). The relationship between (u, v, w) in the object— Qentered coordinate system and (2:, y, z) in the world coordinate system can be deter- 11'lined through hypothesized object parameters Cg and dg. A 2D matched filter Mr is rendered based on the theory of shading discussed in Section 4.3.1. The value of M f at a point p = [x,y, z]T in the shadow plane (where 129 z is a constant) is proportional to the dot product of the unit surface normal nip and the light source direction Ii, at p (see (4.21)). Assuming the number of gray levels is 256, we have $23 MvHMr = {($3y7z31($ay’z))}? where 1 + P3P + 93‘] \/1+P2+‘12\/1+P3+(1§. I(x,y,z) = 256 x The computations of p, q, p,, and q, are described in Section 4.3.1. [(27, y, z) is defined Whenever I + p,p + q,q Z 0. Finally, the normalization operator is defined as: N: MrHMf:{(xayazaj(xayvz))}’ with A _I $13,}. I 1 ital/,2 I(xvyvz)‘ (1‘,y,z)-u, u— ( y )lT ill-I(l )a Where a is the mean of the predicted sensor measurements within tube T5. Therefore, filter M; is M, normalized such that the expected value of MI is zero. Figure of Merit I“ilter M; is matched against input data to determine the similarity between this d3'namically generated model and the sensor data. An iconic matching is performed for this purpose. For any particular filter, the maximum SNR we can possibly obtain is the autocorrelation (a perfect match). On the other hand, with the estimated properties of the background intensities distribution, the matching between M, and the background provides the lower bound of SNR. The Gaussian function G(ub,a§) es-tima.ted in Section 3 can now be utilized to determine such a lower bound with beepect to each hypothesis. Assume a background patch 39(2), 3]) is created based on 130 G('ub,a§). Set a range for SNR be [SNRf-,SNR}‘] where SNR: = Z M, x 3,, SNR? = Z MK. Note, this range is adaptive to both input data and individual hypotheses. With this adaptive range, we define a figure of merit to be: (2.) _ SNR — SNR: ‘0 ' ‘ SNR: —SNRf-’ Where 0 3 p(2) S 1. We use 39(2) as an estimate of P(f | H.) Therefore, at i = 0: P'(0) = ma${1nlp(0)l}- (4-48) A system parameter, 1;, is chosen as the validity threshold for verification: H,- is accepted if and only if p*(0) 2 17. Such a matching is local, hence its computational Cost is reasonable. Since a matched filter is determined by the models of both the object shape and the Sensor used, this local recognition method verifies, simultaneously, both the expected geometric shape of a tube and the expected configuration of the sensor measurements. 4.4.5 Global Recognition Identified local tubes are isolated segments of objects each of which is used as a St'arting point of a sweeping process at the stage of global recognition whose goal is to a-(111ieve a recovery of the entire underlying objects. Initiated from each starting tube, global recognition sweeps the seed in space along the underlying object according to t he criteria discussed below. 131 Recursive Optimization The overall structure of a tubular object is considered to be formed by a starting tube traveling in 3D space and undergoing a series of deformations. In order to recover the trajectory and the deformation along the trajectory, we designed an optimization scheme with respect to two criteria, one is the goodness-of-fit between the sensor data and the model for sensed information, generated based on the deformed object parameters, and the other is the smoothness of the trajectory. Consider a tubular object E, = {Tb ...,T1,To}, its recovery is through an opti- mi zation: max(P(Tk, ..,To)), (4.49) Where To is the seed tube. The task of global recognition is to find a series of tran- Sitions, starting from a seed, so that P(Tk, .., To) is maximized. The model adopted for tubular objects defines a causal system in which the alphabet of T.- is the set of all combinations of object parameters in a discrete optimization range of a cone determined by a tube hypothesis. In general, the probability for a particular tube i to occur is related to the presence of the previously detected tubes at j <2 i or P (T,- | T,._1,.., To). According to the Markov assumption, P(Ti I Ti—h-uTO) = P(Ti I Ti—l)- (4-50) Applying the chain rule, we obtain: P(Tk,..,T1) = Hf=1P(T,- | TH). I‘Iere, P(T; I T;_1) is the probability of the transition from T,_1 to T,'. Using the 132 logarithmic form, we have I: max{P(Tk,..,To)} 0< max{Zln[P(Tg I Ti—1)l} i=0 I: = Emax{ln[P(Ti I T;_1)]}. (4-51) In Section 4.4.4, we have defined p"(i) to be the criterion to achieve an optimal local recognition. Applying it here, we define the criterion of an optimal global recovery to be: I: k p*(k...,o> = gmawnwm l 7*.-.)1} = 23pm) k = :maxflnlmf I HO] +1n[P(6.-_1,.-)]} +ma:c{ln[P(f I Ho)]}. (4.52) Note, max{ln[P( f | H0)]} is accomplished in the local recognition stage. Hence, the task of global recognition is to maximize: k p‘(k, .., 1) = Zma${ln[P(f | Hg” + ln[P(6,-_1,,')]}. (4.53) i=1 This can also be written in a recursive form: p*(k, .., 1) = p"(k — 1, .., 1) + ma:1:{ln[P(T;c | Tk_1)]} p'(k - 1, 1) + ma${1n[P(f I Hk)l} +max{ln[P(6k_1,k)]}. (4.54) Therefore, sweeping is a recursive operation and the global recognition problem is a recursive optimization problem which leads to a constructive recovery scheme. The two optimization terms in (4.53) correspond to the two criteria discussed 133 earlier. The first reflects the goodness-of-fit and the second is related to choosing a transition that agrees with what we know a priori (or learned) about how object shape typically deforms in space. The overall optimization expressed by (4.53) reaches a best balance between the two criteria. The optimization of the first criterion is equivalent to the maximization of the figure of merit defined in Section 4.4.4 and the optimization scheme introduced there can be directly applied. That is, p’(k, .., 1) = p'(k — 1, .., 1) + max{ln[p(i)]} + ma${ln[P(6,-_1,,-)]}. (4.55) Next, we discuss the issues related to the optimization of the second term, ln[P(6,'_1,,)]. Sweeping: A Stochastic Process Recall, the deformation of a tube is described in terms of several random variables (defined in (4.15)) each of which is a function of time and corresponds to one object parameter of the geometric tube model. Since the translation shift 6,21,,- from T,_1 to T,- can be determined from the parameters of T,_1 by: 619-13 : li—l 0 di-la it does not affect the optimization, we discard it in our further discussion. Lacking otherwise knowledge, we assume the independent deformation behavior of different model parameters, that is, P(6.-_1..-)=P(6?..,.,6*' 63-1,.)=P(6?_.,.-)P(6*' )P(6L.,.-). (4.56) 3—l,iv t—1,i 1 So, we have, 1n[P(6e—1..-)] =1an(69..,.-)1 +1n[P(6*‘ )1 +1n[P(6!-.,.->1. :—l,i 134 Here, 631g, ,-_1, ,, and 6,-l_ ,are random variables in independent stochastic processes. Their statistical properties are determined by the geometric characteristics of the objects and can be either set a priori or learned during the sweeping. Our previously reported work [47, 48] examined how the three distributions, P6(d ,-_ _1 ,-), P(,-"_1',-), and P(6,1_1,,-), can be emipirically estimated based on an on-line maintained sweeping history. Define a weighted sweeping history, denoted by H w, u to time i: Hw = {(Wi—la Iii—1): ..., (W13 ’11)}? where h,- = {61.141} is the jth piece of history and W,- is the weight that specifies the impact of h, on the current sweeping front. Weight W,- for 1 S j < i is defined as: G W- = , J theH‘” 1" X Zk>=j 1" lngi—L Wj is small whenever tube j is relatively insignificant (short in length) in history or tube j is relatively far away from the current sweeping front at i. Based on this history, the three distributions can be empirically estimated. For each independent term P(A) in (4.56), A E {6,-_,6 ,-, f_ -1 ,-, 6,-_ _1 ,-}, define E? to be the expected value of A and 0,4 to be the standard deviation of A estimated at moment i. Their unbiased estimates from the weighted history are: E-A = Z Wj X Aj,j_1, (4.57) t 1$j 95% confidence interval is used, we can set: E? _ 11:30? = EA _ 20A, E? + 123%? = EA + 20“. At each step of sweeping, we can adaptively compute A’s as: E? — EA +20A A 3 0.“ EA _ E? + 20A 0',A ’ low 111 AA; = AAip : where EA and 0A are the statistics learned from off-line training and E? and or:3 are the statistics from on-line sweeping history. Therefore, the derivation of bound 3,3,, is automatic, dynamic, and adaptive throughout the entire process of global object recovery. 4.5 Summary A model-based recognition scheme for tubular objects is designed in this chapter. The explicit integration of the models that characterize different aspects of the objects is intended to improve the reliability of the recognition. Shape information is combined with the sensor measurement configuration, yielding the potential for a recovery of objects that is both more reliable and more precise than what is achievable using a single feature. The incremental reconstruction approach allows a continuous re- finement of recognition. By using a parametric form of the geometric model, when objects are recognized they are simultaneously described and quantified. 137 In the designed recognition scheme, various classical vision problems are encoun- tered. Examples are initial simple segmentation, 2D line and ribbon structure de- tection, 3D surface fitting, 2D and 3D matching, and various feature extraction problems. In the next section, we will show how these typical vision problems can be posed as grouping problems, and how the recognition strategy discussed in this chapter can be realized under the framework of Hierarchical Token Grouping. CHAPTER 5 Tubular Object Recognition Via HTG In Chapter 4, we developed a 4-stage recognition strategy for the problem of tubular object recognition. In this chapter, this recognition strategy is posed as a hierarchi- cal token grouping problem and the detailed system design under this paradigm is discussed. In model-based recognition of tubular objects, various conventional vision problems are encountered such as thresholding, matching, and segmenting instances of objects. This chapter shows how the methodologies to solve these problems can be realized under the paradigm of hierarchical token grouping. In posing a specific problem as a token grouping problem, the following issues are considered: representation problems, recognition methodologies, integration of information, and interaction among different modules. Specifically, token representa- tion for each type of perceptual entity needs to be designed. Different methodologies that extract various perceptual entities are designed as grouping agents with their input, output, and grouping criterion defined. Such defined grouping agents provide information about the interactions so that the system communication network can be extracted from their specifications. These will be discussed in detail in the subsequent sections. 138 139 5.1 Representation Hierarchy In a vision system built under the architecture of hierarchical token grouping, there are two hierarchies, one is top-down, representing knowledge, and the other is bottom-up, representing the organization of the objects to be recognized. Although their content differs, the two are similarly structured. In the top-down hierarchy, each level holds the knowledge related to the perceptual entities at the corresponding level of the bottom—up hierarchy. In the bottom-up hierarchy, each level holds the instances of a particular type of perceptual entity, each characterized by a set of features. Below, we design these two hierarchies for the problem of tubular object recognition. 5.1.1 Object Model Decomposition In a hierarchical paradigm, a model-based recognition strategy is to use the model information decomposed at different levels to assist the process of organizing visual data. To recognize tubular objects via hierarchical token grouping, the object model for this class of objects needs to be decomposed. The generalized stochastic tube model developed in Chapter 4 captures the geometric shape of a tubular object, ap- proximating an object as a set of connected tubes, each of which is modeled by a RSHGC. Each RSHGC can be further decomposed into its boundary and interior region. Both object boundaries and interior regions consist of a set of image points. Such a hierarchical decomposition of the generalized tube model is shown in Fig- ure 5.1. Since both 2D and 3D data will be processed, the decomposition can be made more specific for both cases as shown in Figure 5.2. In the 3D case, a straight tube is a volume enclosed in a cylindrical surface which consists of a set of 3D cross sections each of which is composed of a collection of connected surface voxels. In the 2D case, the decomposition is based on the perceptual regularities when projecting a 3D 140 Elongated Object s \I’ Straight Tubes Tube Boundary Tube Region Image Points Figure 5.1. Decomposed object model. tube onto a 2D image plane. The visible surface of a 3D cylinder is projected onto a rectangular region in a 2D plane and the shadow produces two straight lines that are parallel. Therefore, the 2D image of a 3D straight tube consists of a straight ribbon and its interior region, where the ribbon is composed of two parallel straight lines, individually formed by a set of edge points. During the process of recognition, which branch of the hierarchy in Figure 5.2 is used depends on the type of input (or sensor used). 5.1.2 Tokens and Token Hierarchy Having decomposed the object model into different levels of abstraction, a hierarchy of tokens can be designed accordingly because the correspondence between the decom- posed model and tokens facilitates the recognition task at different levels. We classify tokens into three categories based on their geometrical dimensionality. Within each 141 Elongated Object : 1 Straight Tubes 1 3D 20 1 \V 1 \1 JD Region: Cylinder Surface 20 regions Ribbon: 1 1 1 1 3D Object Points Cross Section: 20 Object Points 501138,” lines 1 1 Surface Voxel Boundary Pixel: Voxels E Pixel: Figure 5.2. Hierarchy of decomposed object model. category, there is more than one kind of token. Category I contains tokens of zero-dimension, called point tokens, including orig- inal input point tokens, either for pixels or for voxels, boundary point tokens, either 2D edge or 3D surface points, background point tokens, as well as object point to- kens. Category 11 contains four types of two-dimensional tokens, corresponding to perceptual entities of connected component, straight line, curve, and ribbon. All three-dimensional tokens belong to category III. They are cross sections, tubes, and tubular objects. These three categories of tokens are listed below. 0 Category I: Point Tokens TP‘ : the set of input point tokens. de : the set of boundary point tokens. 142 Ti" : the set of interior point tokens. Tbkg : the set of background point tokens. 0 Category II: 2D Tokens TCc : the set of 2D connected component tokens. TI : the set of straight line tokens. Tc : the set of curvilinear structure tokens. Trb : the set of ribbon tokens. 0 Category III: 3D Tokens Tac : the set of cross section tokens. Ttb : the set of tube tokens. Tel : the set of tubular object tokens. These tokens form a token space V at different levels of abstraction which can be expressed using a set of names for the above defined discrete levels: V: {pt, bkg, itr, cc, bd, I, c, rb, 3:,tb, el}. In Chapter 2, each token is characterized by its organization, its feature set, and its deformation characterization. We now introduce the design of each type of token listed above. Point Tokens In this category, a token represents a single cell, pixel or voxel. Each cell in an input image is represented as an input point token described by a set of property 143 features and some spatial relationships such as n-connectivity. Due to the zero- dimensionality of point tokens, the organization of any token in this category has a singleton component set. Input point token: An input point token is designed as: Tipt = {CFtaFiptaDiDtla with component set C3” = {TF} and the feature set F,“ = P,” U Rf” where Pipt : {X,I,m,gd}, and th:(Mna{(j1ptaai.j)})a aid = d, where x represents the coordinates of the cell to be represented, I is the sensor measurement at x, m and g; are the gradient magnitude and the gradient direction at x, and N" is a spatial n-connectivity relationship. In the 2D case, n is 4 or 8. In the 3D case, n is 6 or 26. For each n, token T," is related to 17. other point tokens; that is, for each N", set {( j, pt, a,-,j)} contains n elements, each of which is a 3-tuple (j, pt, am) that relates token T," to another input point token T3?” (its neighbor) with a distance measure d between the two tokens as the attribute of this particular relation. For example, when n = 4, token T,“ is related to four other input point tokens and all four attributes (d’s) have distance of one. This example is shown in Figure 5.3(a). Another case is for n = 8 where four of the eight distance measures equal one and the other four equal V2. These features are useful for higher level token grouping. The deformation characterization for the tokens at this level is null. Boundary point token: A boundary point token, denoted by T,bd, represents a single cell that is initially identified as a point on the object boundary. Such a token has a component set with a single input point token Cf"i = {Tft}. Its property feature set Figure 5.3. Examples of spatial relationships among point tokens. (a) 4-connectivity relation among input point tokens, (b) 8—connectivity relation among 3 boundary point tokens. is defined as: P“ _ {x,1,m,gd, k} if 2D case i {x, I, m, gd, kM, mg, ngn, km”, vmin, vmax, n} if 3D case For both 2D and 3D input, the property feature set contains coordinates x of the point, sensor measurement I , gradient magnitude m, gradient direction gd, and the curvature is at that point in 2D case. In the 3D case, the property feature set also includes a set of invariant surface features. They are mean and Gaussian curvatures (kM, kg), minimum and maximum curvatures (ngn, km”), minimum and maximum curvature directions (vmimvmax), and inward surface normal (n). The relational feature set for a boundary point token is designed as: Rid = {94”, {(j, bd,a.,,~)}}. The n-connectivity relationship is with respect to two other boundary point tokens (rather than to other point tokens). Figure 5.3(b) shows an example in which the shaded cells are boundary point tokens. The token at the center is T,“l and it is related to two other boundary point tokens, T}? and T3" under an 8-connectivity relation 32’ (M8). The value of n is 8 for the 2D case and 26 for the 3D case. The attribute set 145 for an n-connectivity relation is defined as: 0M = {t1 ‘1}: where t is a tangent vector, indicating the linking direction, and d is the distance between the two points. There are no deformation features defined for this class of tokens. Interior point token: An interior point token, denoted by TE”, represents a single cell that is initially identified as an interior point of an object. Similar to a boundary point token, its component set contains only one input point token, Cf" = {TJPt}. Its property feature set is defined as: Pi" = {x, I}. The relational feature set is: 12:“ = {04", {( j, bd, am), (k, m, a,,,,), (1, bkg, a”) }}. That is, the relationship is established with all its n-connected neighboring point tokens that can be boundary, interior, or background point tokens. The attribute is the distance between two points. Again, the value of n depends on the type of input image. Background point token A background point token, denoted by 71,-bkg, represents a single cell that is initially identified as a background point. Its component set, feature set, and its deformation characterization are all defined similar to those designed for an interior point token. 2D Tokens In this category, each type of token represents a 2D perceptual entity, including a connected component token, a line token, a curve token, and a ribbon token. 146 Connected component token: A connected component token T,-CC represents a con- nected 2D region. Its component set contains a group of all 8—connected interior point tokens. Therefore, Cfc : {TJitr I GitT(C,-itr)}, where grouping criterion G‘" will be discussed in Section 5.3. Each such region is characterized by its feature set which is designed as: Fix : Picc U Ric = Rec : {C, Iavga Area}, with c the centroid of the region, [avg the average sensor measurement, and Area the area of the region. There is no relational and deformation feature defined for this type of token. Line token: A straight line is represented as a line token, denoted by T}, with a component set containing a group of boundary point tokens C,’ = {Tj’d I GI(C,')}. A set of property features is used to describe a line token: 1):: {(Xh,Xt),l, d9 mavgagd}, where (xh, xt) gives the two ending points of the line, 1 is the length, d is the orientation, mm,g is the average magnitude, and gd is the gradient direction of the line. The relational feature set of a line is: Ri’ = {(M8) {(3.111 ai.j)})1 (Maul? {(191 1’ aivk)})}, where N8 denotes an 8-connectivity relation, and Wm” an overlap relation. For each relation, say T,’ N8 T}, a set of attributes is used to describe it. For example, if two lines are 8-connected when one end of one line touches one end of another line, the 147 \‘1 Figure 5.4. Examples of overlap regions and the four corners of the region. (a) Region generated by an overlap relation. (b) Region generated by an enclose relation. attribute set for this relation is at)“ = {X17}, where 7 is the the angle formed by the two lines at the coordinate position x where the two lines touch. For overlap relation NW1 , the attribute set contains a set of four coordinate points 01,3' = {X11,X12,X21,X22}, which represents the four corners of the region generated by this relation. Figure 5.1.2 shows an example of such a generated region (shaded) and its four corner points. As discussed in Section 4.4.3, the sufficient condition for finding a ribbon is the existence of either an overlap or an enclose relation between two straight lines. The common property of these two relations is the overlap region yielded in between. The regions shown in Figure 5.1.2 that are specified by the four corner points (x11,x12,x21,x22) illustrate this. Curve token: A curve token, denoted by Tf, represents a 2D curve structure. Such a structure can be approximated using a polyline which is formed by tracing all 8- connected straight lines. So, the component set of a curve token contains a set of line tokens when other rated such 6‘): am Where E e“Close . f0rarib Where C I 1’19}. are 18 the me 148 tokens: Cf = {T} | 0705)}. where the grouping criterion GC demands that all the components relate to one an- other by an 8-connectivity spatial relationship. Other conditions can also be incorpo- rated in grouping criterion 0: to make the formed curves possess certain properties such as curvilinearity. The design of the grouping criterion for curve tokens will be examined in Section 5.3. The feature set of a curve token is: Fic = Pic U R? = {(xhaxtlalv navgvmavg} U {N81 {(Jalvai.j)}}a aid = {x17}: where navy denotes an average curvature and others have the same meaning as defined for a line token. Ribbon token: As defined in Chapter 4, a straight ribbon is a region formed by a pair of straight lines that either overlap or enclose one another. A ribbon token T,” represents such a region. Its component set consists of two straight lines: 05" = {(T}»Tl) | T157): 6 T',G'b(C.-’b)}, where grouping criterion 02" specifies the condition that either an overlap or an enclose relationship must exist between line tokens T} and Ti. The property features for a ribbon are designed to be: Rrb = {(3, d1 17 W9 Ilow, Ihigha Iavga Bb}: where c is the centroid, d is the orientation, 1 is the length, w is the width, 110w and [high are the minimum and maximum sensor measurements within the ribbon, 1m,g is the average sensor measurement, and 35 is the bounding box for the ribbon. A bounding h be represen and x2 are figure. FlgUft‘ The rela1 “hi 1 Ch Tj ant C00, - dlnates I 149 bounding box is the smallest rectangle that contains the ribbon. This rectangle can be represented by the two extreme points of the box. See Figure 5.5(a) in which x1 and x2 are the two extreme points of the bounding box for the ribbon shown in the figure. X2 (b) Figure 5.5. Bounding boxes for (a) a 2D ribbon, (b) a 3D cross section. The relational feature set of a ribbon connects each ribbon to the two curves on which TJ-I and T]: reside: 33" = imp, {(jlacs ai.j1)v(j2acv wall}, where MP is a “part of’ relationship, (jq,c),q = 1, 2 points to curve token T J: , and ‘1qu = (quaxq’l), q = 1,2, characterizes the relation between T,” and chq where (xq,,xq2) represents the two coordinates between which a straight line is used to form a ribbon. 3D Tokens 9mm posed of a 51 properties: A cross sect Where c is t the normal maximum 51 that enclose by tw0 extr relational f, M 0f input POirl Where A l , IdlSCUssed in 150 3D Tokens Cross section token: A 3D cross section is represented by a token Tf. It is com- posed of a set of 3D boundary point tokens which possess consistent cylindrical surface properties: C.” = {de l G’(Cf)}- A cross section token is characterized by the following property features: Pix 2 {C9 1', d7 Ilowa Ihighs Bb}: where c is the center of the cross section, 1‘ is the radius of the cross section, d is the normal vector of the cross section plane, 110w and [high are the minimum and maximum sensor measurements within the cross section, and 3;, is the bounding box that encloses the cross section. A bounding box for a 3D cross section is represented by two extreme points of the box. This is illustrated in Figure 5.5(b). There is no relational feature designed for cross sections. 'Ihbe token: A token T,“ represents a 3D straight tube which is composed of a set of input point tokens: 05" = {Tf‘ | G‘b(C.-‘b)}, and is characterized by a set of property features: P.“ = {c.r,d,1,A,p*, 3.}, where A is the straight axis of the tube and p“ is the optimized figure of merit (discussed in Section 4.3.4) serving as a confidence measure of the tube. The relational feature set of a tube token describes an adjacency relation between tube token T,“ and other where 2111] Object 1:1 The comp. Each tubul Where A is radius lnf0r1 describes th Spatial felat WhEre the lV SlDCe a l Space With a explicitly Cl] 151 and other tube tokens: Rib = {Nadj1{(]1tbvaid)}}v aid = X, where attribute x is the coordinate where the two tube axes meet. Object token: Finally, a tubular object is represented by a token, denoted by T5”. The component of T," is specified by a set of connected local tubes: 0:” = {231 I G:'(C:”)}. Each tubular object is described by its feature set: F18, = Pie, U Rf], with Pie, : {“4}, RI, = twin", {(3, 61), “Lilli where A is the attributed axis of the object with each point on A carrying the radius information about the corresponding cross section; relational feature set Rf’ describes the intersection relationship among tubular objects. For each particular spatial relationship, between two object tokens, it can be described by the position where the two intersect: a,,j = X. Since a tubular object is modeled as the outcome of sweeping a local seed tube in space with a series of deformations along the sweeping trajectory, the deformation is explicitly characterized in the token representation for a tubular object. Define a set D1 of fifSl-t and an over Where D,“ c order defori Parameter . measured a« the deforme a tube mod example. ll‘lr indication a ObIECts from intlicate 50,: can he imp Elm-e much of corn, Tl, tubular obj, Token Hier lllllll all Ca: IOkenS' Con two are iden Along th 152 D1 of first-order deformation parameters: 0‘ = {£1.13 1', p‘}, and an overall deformation characterization: Di] = UAeDI{(EA,0A)}, where D?1 characterizes the deformation using the statistical properties of these first- order deformation features defined in D1. The statistical properties of deformation parameter A 6 D1 are captured by the mean (EA) and the variance (0A) of A measured across the entire tubular object. The four deformation features in D1 reflect the deformation in orientation, radius, length, and the degree of homomorphism to a tube model so that they capture the shape variation of the underlying object. For example, the statistical characterization of the deformation feature in orientation is an indication about how sharply the underlying object curves in space. For the tubular objects from different domains, the corresponding deformation characterization may indicate some statistically significant difference in object shape. Such information can be important in applications. For instance, the roots from one type of corn may curve much more sharply, in a statistical sense, than the roots from another type of corn. Through the deformation characterization, knowledge about the shape of tubular objects from different classes or different domains can be learned. Token Hierarchy With all categories of tokens defined, Figure 5.1.2 shows the organization of these tokens. Comparing it with the decomposed object model shown in Figure 5.2, the two are identically structured except the arrows are reversed. Along the hierarchy, objects are incrementally identified. Tokens across different relationship 5.2 K KHOWledge I that is useli gm”Ping hi. as an expert aISO stored l3 (lWOmPQg 153 el a” B- 30 ZD F- 3- CC 8 "I "l "I g *1 5H , a )9 5 a 3 ———9~1.9 ~12; ——:u ”1&9 H H "I _I Figure 5.6. Bottom-up hierarchy of tokens. levels of abstraction are related by component relationship. All levels together de- scribe a recognized object at different levels of detail. By tracing down the component relationship from a top level token, the organization of an object can be recovered. 5.2 Knowledge Hierarchy Knowledge hierarchy K H is a hierarchy in which each level provides the knowledge that is useful in recognizing the perceptual entities at a corresponding level of the grouping hierarchy. At each level of K H, a decomposed object model is instantiated as an expected event (E:). Its associated homomorphism validity conditions (HE) are also stored at that level. In Section 5.1, we have shown how the object tube model is decomposed at various levels of abstraction. In this section, we discussed how thes cond CUI‘I’E 5.2.2 There necess ofinpr \ohnne the (ha The the type abstractj against c render tt generate j [llcan b ACCluir hmsmne] tHOWledge Shrenjenxs eflahhghing I3) Charact, depend,ng ( onsenSOrml 154 these expected events are represented in K H and how the homomorphism validity conditions can be established by dynamically acquiring the information about the current recognition environment. 5.2.1 Acquiring Dynamic Environment Information There are two types of information that are acquired in our system to establish a necessary description about the current recognition environment. One is the type of input data which, in our case, is either a 2D photometric image or a 3D MRI volume. The second type of knowledge about a dynamic recognition environment is the characteristics of current input data. The first type is associated with an environment variable whose values encode the types of sensor used. This variable can be stored in I M . At certain levels of abstraction, such information is needed. For example, to match a 3D tube model against data, depending on the type of sensor used, a different approach is used to render the model before matching. Chapter 4 discusses the different methods to generate 2D and 3D matched filters. In this case, the environment variable stored in I M can be used to decide the appropriate method to render a model. Acquiring the second type of knowledge is necessary in order to dynamically estab- lish some homomorphism validity conditions. Currently, this type of environmental knowledge is acquired by (1) obtaining the maximum and minimum sensor mea- surements and the gradient magnitudes from an input (Imam, Immmmammmm), (2) establishing the histograms for both sensor measurements and gradient magnitudes, (3) characterizing the histograms using a unimodal or bimodal Gaussian function, depending on the type of sensor used, and (4) determining the dynamic thresholds on sensor measurement and gradient magnitude, {I¢, mt}, from the parameters of the estimated Gaussian functions. The process of acquiring these thresholds is described in rep 5.2. Know any 16 phism con di t i eXpectr GXpecte We no“ represen How Sue, the next Interior fill this let Sensor mea “how high i W t here Eetr ( pblsm Validi Slimmer], lo 155 in Section 4.4.2. The obtained characterization is represented by a set of environ- mental measurements {Imam 1min, mmammmgn, Inmt}. It instantiates the knowledge representation at the lowest level of K H (see Figure 5.7). 5.2.2 Knowledge Representation Knowledge hierarchy KH is structured identical to the token hierarchy. A model at any level of K H is described by a set of features, each has a corresponding homomor- phism validity condition or tolerance associated with it. The homomorphism validity conditions specify how much deviation is allowed between an observed event and the expected event. Object models at different levels often need to be instantiated as expected events before they can be utilized by the grouping agents at parallel levels. We now examine, starting from the lowest level, how the decomposed models are represented in K H and what are their associated homomorphism validity conditions. How such stored knowledge is to be utilized by grouping agents will be discussed in the next section. Interior Point Level At this level, a decomposed model is a cell. Due to the nature of the sensor types used in this study, a cell belonging to an object should, ideally, have a substantially higher sensor measurement than an non-object cell. The acquired knowledge It specifies “how high is high”. That is: E:;": 1.; Hi." : [o,1,,,.,,], where Ei” describes the expected event for an interior point, H}? is the homomor- phism validity condition associated with E:;", It is the expected lowest sensor mea- surement for object cells, 1mm, is the maximum sensor measurement, and [0, 1mm] gives a lower t then thi Eétr. T homomc feature , Exterio AD exter token. T. 156 gives a range of allowed deviation of a real sensor measurement from the expected lower bound 1,. Together, this means that if I(x)—It€ 3’, or I-ItZO then the cell at x (an observed event) is considered to be similar to expected event E:;". That is, x is identified as an interior point token. Here, I — It is used as a homomorphism evaluation function based on feature I from an observed event and feature I, from an expected event. Exterior Point Level An exterior point is a cell that does not satisfy the condition defined for an interior token. That is, Efkg : 113 Hi)?” : I—Imaxa—lla where Hg” denotes an opposite range compared with that for interior points. That is, if a point has a sensor measurement that deviates from It by some value within [—Imax, —-1] or I — I, < 0, then this point is identified as a exterior or background point. Boundary Point Level A boundary cell is modeled as a point token whose gradient magnitude is high with respect to mt. Such a model is represented as: E2", : m,, H?! : [0,mmax]. That is, if a point has m - m, 6 Hi? or m — mt Z 0, then it is considered to be a boundary point. Conner For a 2. 8-conne< This the connectr token is Note. Sin P0nent 0} Condition Line LEV A 20 stra a Straight :1 Straight. Where the \ That js‘ a li CtlrtvdtUre a r a alige SPEC” 157 Connected Component Level For a 2D connected component token, its model can be represented by a set of all 8-connected interior points, that is, EEC: N8 ({E:"}), Hf)C : True. This model denotes a graph with nodes being interior points and arcs being 8- connectivity relationships. Therefore, the identification of a connected component token is to find an isomorphism between this model graph and a graph from data. Note, since the number of nodes (size of {E;"}) is not specified, a connected com- ponent of any non-zero size can be identified as far as the homomorphism validity condition is satisfied by all the components that are related under N8. Line Level A 2D straight line is a set of 8-connected boundary points. One distinct property of a straight line is its zero curvature everywhere along the line. Therefore, a model for a straight line can be represented by two features: Ei= {r({Efidlh'xl8 ({Efd})}. where the value of n({Ede is the curvature measured on its components which should be zero. Accordingly, the homomorphism validity condition is designed as: H); : {I—Cn, +eK], True}. That is, a line is ideally composed of a set of 8-connected boundary points with zero curvature along the line. The requirement of zero curvature is relaxed within the range specified in H j; when a line is grouped from a set of quantized points. Curve A mod 8-conne Where 1 formed ' Where 3., Can be C. to be grr 56‘ to be Ribbon A 2D ribl gradient ( either all ' The 130,110] ture values 158 Curve Level A model for a curve is approximated by a polyline which consists of a group of all 8-connected lines. It is represented by two features: E§= {7(Eé‘,E§‘“),N8 ({E§})}, where Ei‘ and E:;“ are two adjacent lines along the polyline, and 7 is the angle formed by these two lines. The homomorphism validity condition is given as: H; = {Mme}, where 7, is a threshold on the angle 7 through which the shape of the desired curves can be controlled. When ’7: is a smaller value, the curve structures that are allowed to be grouped are curvilinear. Due to the shape property of a tubular object, 7, is set to be 25 degree in this study. Ribbon Level A 2D ribbon is modeled using two features, one measuring the absolute difference of gradient directions of two component lines and the other describing the existence of either an overlap or an enclose relationship between the two lines. That is: E§b= {I ME?) - 9.103.?) I, (”“0”1 (Ei‘aEi’)V NW (199,139)}- The homomorphism validity conditions describe the constrained ranges on both fea- ture values: 11,3”: {[180 — a“, 180 + cw], True}. Ideally, th the value in H I? by directions Cross Seq A 3D cross orientation C: the radii denoted b\ Therefore, E evaluated (11 Q I . in a real situ range can be data points { 159 Ideally, the two lines that form a ribbon have opposite gradient directions. Therefore, the value of | gd(E,f,‘) — gd(Ei3’) I in K H should be 180. This requirement is relaxed in HIE” by 69,, which means that the two lines may not have exactly opposite gradient directions but the allowed deviation is within 69,. Cross Section Level A 3D cross section is a thin disk in three dimensional space with an arbitrary location, orientation, and size. Such a disk can be uniquely characterized by its center point c, the radius r, and the normal vector n. Using a parameterized model for this disk, denoted by ’P(c, r, n), a set of points {xi} from an ideal cross section satisfies: 7”({Xj}; 6.1311) = 0- Therefore, a model cross section is represented using two features, one is the above evaluated disk function value and the other is a connectivity relation among all cross section points: Ex. {P({Er}; c,r,n).t><126 ({Ern). C In a real situation, the evaluated function value may not be precisely zero. A tolerance range can be specified. Hence, the homomorphism validity condition is: H}? : {[—ex, +ex],True}. That is, whenever model ’P(c, r, n) is used to match against a set of 26-connected real data points {Xj}, a cross section is identified if 79({>< RIO], [(5% X E?) U (57; X EDI), Where the feature selection functions for property and relational features are: Si; 2 (anllsijnzl'"811ilNK)’ Sin, = [031], and SI? : (3:113:21°'33:NR)1 8:, = i011l° In this definition, both bottom-up input (tokens from grouping domain G3) and top- down input (E:;,Hj’3) participate in grouping decisions. The former is specified by {(5}; X PL”) U (57;, x RL”)] while the latter by [(S}; x E:;) U (5;, x E:)]. Each grouping agent can be viewed as an encapsulated object (in the object- oriented design sense), combining data structure and behavior together. The data structure of each grouping agent is T” and the behavior is determined by group- ing criteria G13. Next, we define all the grouping agents at v E V where V = {bkg, itr, cc, bd, l, c, rb, 1:,tb, el}. 5.3.2 Initial Classification As discussed in Chapter 4, in the initial stage of tubular object recognition, each cell, pixel or voxel, is classified into one of the three classes: interior, exterior, and bound- ary. Three grouping agents are designed to perform the task. They are (1) GA(itr) which is responsible for interior point tokens, (2) GA(bkg) which is responsible for background point tokens, and (3) GA(bd) which is responsible for boundary point tokens. We now define these grouping agents separately. Initial Seg Initial 36911 and exterio each of whi point token Interior P. The groupir with The inPtlt cc The criterior feeltures of st selection flint flincthn to be 166 Initial Segmentation Initial segmentation means to classify all the cells into classes of interior of objects and exterior of objects. Grouping agents GA(itr) and GA(bkg) perform the task, each of which extracts one type of token, either interior point tokens or background point tokens. Interior Point Tokens The grouping agent for interior point tokens is defined as GA(itr) = {1:37, 0:", 03,7} with 1;," : {Gitr, (Eétr, ng)}, 0:31“ : Titr. 8 The input consists of a grouping domain G1" : web-"T“, L"tr 2 {pt}, of all the input point tokens and the knowledge from K H : E;tr : I, Hg," : [0, [max]. The criterion for forming an interior point token merely depend on the property features of sensor measurements of an input point token. Therefore, the feature selection functions are designed as: 53;:r _—. {1,1,o,0}P‘, 53’" = {0}“, where S}? is to choose property features of an input point token. With respect to the property feature set P," = {x,I,m,gd} for input point token, setting selection function to be {1, 1, 0, 0} means that only the first two property features are selected; while 3:" 1 meaning tl. selected fee and the co whiCh says eXceeds (h Ohio) is EXterior ] The STOU} ifs)”. Gfkg. Th6input Of All the if The Conditi. 167 while Si" is with respect to the relational feature set of input point tokens with {0} meaning that no relational feature is used by grouping agent GA(itr). Using these selected features, the homomorphism evaluation function is designed as H : I (x) — It, and the corresponding homomorphism criterion is: G:," : H(E., E0) 6 Hf." => I(x) — I. e [o,1,,,,,], which says: an input point token is an interior point token if its sensor measurement exceeds (homomorphic to) the expected value It. The output of grouping agent GA(itr) is a set of interior point tokens. Exterior Point Tokens The grouping agent for exterior point tokens is defined as GA(bkg) = {13kg , G3” . 031°” } With 1:19 = {0:19. (Era, so}. 02:9 = The. The input consists of a grouping domain G?" = u,,€L...T”', Lbky = {Pt}, of all the input point tokens and the knowledge from K H : E2,” : It; Hgkg : [—ImaI,—1]. The condition of forming a background point token also depends only on the sensor 168 measurement of an input point token. As in the design for GA(itr), feature selection functions are: 52kg = {1.1.0.0}"‘, St.” = {0}“- The homomorphism evaluation function for GA(bkg) is also designed similar to GA(itr): H = I(X) — It. Its homomorphism criterion is: off-1 : I(x) — I. e Hf.” = Hm, —1], which says: an input point token is used to form a background point token if its sensor measurement is below (homomorphic to) the expected value It. The output of grouping agent GA(itr) is a set of background point tokens. Because relational features are not involved in these grouping agents, a non- structural homomorphism is sought in both GA(itr) and GA(bkg) with respect to the conditions specified by H}? and 171%”. Since such a non-structural homomorphism is allowed, a conventional thresholding operation is now formulated as a grouping operation. Initial Boundary Detection Initial boundary detection means to identify all the cells that may he points on the boundary of an object. We now define a grouping agent, GA(bd), whose responsibility is to identify all boundary point tokens. Exterior Point Tokens The grouping agent for boundary point tokens is defined as GA(bd) = {[31, 02", Off} 169 with 12d = laid, (Essen, 0:: = T“. Its grouping domain is the set of all input point tokens: ng 2 UvreLdeU’, Lbd 2 {pt}. The knowledge Efd and Hgd are retrieved from K H. Grouping agent GA(bd) chooses all the properties features except sensor measurement of an input point token. Some of the chosen features are used in making grouping decisions and some are used to compute features for boundary point tokens. Therefore, 33;“ .-= {1,0,1,1}P‘, 5,2" = {0}“. Boundary point tokens can be formed in a way similar to interior point tokens except the property feature that participates in grouping decisions is gradient magnitude. The chosen feature 9,; is used for computing the curvature feature for a boundary point token. The homomorphism evaluation function is: H: m(x) — m,. The homomorphism criterion for GA(bd) is: Glc’d : m(x) — rm 6 [0,mmax]. The output of grouping agent GA(itr) is a set of boundary point tokens. 170 5.3.3 2D Seeding The tasks involved in this stage are to identify various types of intermediate perceptual entities that may evidence the objects of interest. From 2D boundary point tokens, bigger perceptual entities such as lines and ribbons can be formed. A set of grouping agents, GA(I), GA(c), and GA(rb), are designed to extract these different types of tokens. They are defined separately below. Straight Lines Grouping agent GA(I) is responsible for line tokens and is defined as GA(I) {171116190th With IS. = {G:,(Eéfil’an. 01.. = T’. The grouping domain of GA(l) consists of all the boundary point tokens: G’, : was“, I.1 = {bd}. The top-down knowledge E; and H}, from K H are used in forming line tokens. The property feature of curvature and the relational feature of 8-connectivity of boundary point tokens are used in deciding which sets of boundary point tokens should be grouped. Hence, the feature selection functions are defined as: s}. = {1,o,o,o,1}bd, S}, = {1}“. Because an ideal straight line should have zero curvature everywhere along the line, the homomorphism criterion for grouping agent GA(l) is defined as: G: = {(405) — m(EiDMs (05)} 6 Hi» = {l-ém +€~l,True}. 171 where n(Ei) is set to zero in K H and C,I is the component set of line token T}. The output of grouping agent GA(itr) is a set of line tokens. Curves Grouping agent GA(c) is responsible for curve tokens and is defined as GA(c) {I§,G§,0,fi,} with 1::{T13(E::HP)}3 Olen : TC: Since a curve is approximated by a polyline which can be formed from a set of adjacent lines, the grouping domain of GA(C) is the set of all line tokens: G; : leeLch', LC = {I}. To group a set of lines, the adjacency relationship among lines is used to identify the candidates. That is: 5;, = {0, ..,0}’, 5;, = {1,0,0}’, where the chosen spatial relation is N8 which is characterized by an attribute set a, j = {7,x} with 7 to be the angle formed by two adjacent lines at coordinate position x (see Section 5.2.2). With the attribute values of this relational feature, the homomorphism criterion for grouping agent GA(c) is: G§= (7(T},,T}.,,).N8 (0.9)} 6 H?» = {VuTrue}, Where Cf is the component set of curve token Tf. The output of GA(c) is a set of curve tokens. 172 Ribbons Grouping agent GA(rb) extracts ribbons and constructs their token representations It is defined as GA(rb) = {I,’,5,G’C'b,0,’,l,’} with 15." = {T1, (Bib,Hf)b)}, 0;? = T“ U {525}- Since a straight ribbon is a pair of lines, the grouping domain of GA(rb) is the set of all line tokens: Gg”: u,.6,,..:rv’, L” = {1}. Two lines form a ribbon as long as the lines have opposite gradient directions and are related by either an overlap or an enclose spatial relationship. To test these two conditions, property feature of gradient direction and relational feature N0“ of line tokens need to be propagated from level I to level rb. Hence, 5;: = {0,0,0,0,1}’, 5;? = {0,1}’. Identifying ribbons is a process of finding all pairs of lines that satisfy the above stated conditions. Therefore, the homomorphism criterion is: Gib: {I gd(Tj) — 94(Ti) I,N°”1(TJI,T,:)} E H” = {[180 -— 69d, 180 + 69d],True}. This criterion describes both structural and property homomorphisms simultaneously. The output of GA(rb) is more than just a set of ribbon tokens. It also includes a set of expected tubes instantiated in knowledge hierarchy K H : or}; = T” u {E31}. 173 Since a ribbon provides evidence of a tube, the grouping agent at this level will instantiate an expected event Efib at level tb of K H whenever a ribbon token is formed. Such an expected event will be used by grouping agent GA(tb). The instantiation is performed by the functional module KJnstantiate(T,-’b) of grouping agent GA(rb) which maps ribbon features to a set of tube parameters. 5.3.4 3D Seeding The purpose of 3D seeding is to identify 3D cross sections that reveal distinct prop- erties of cylindrical surface. Such identified cross sections are used to hypothesize 3D tubes or to instantiate an expected event of 3D tubes. Grouping agent GA(r) is designed for this task. Cross Sections The grouping agent at this level is GA(r) = {Iff, G:, 05,} with 1§= {de9(E:aHP)}v 0:. = T” U {193’}- A cross section is formed by grouping a set of 3D boundary point tokens as well as a set of input point tokens. So, the grouping domain of GA(r) is defined as: G: : leeLxTv’, LI 2 {pt, bd}. A boundary token is described by a set of invariant surface features as defined in Section 5.2.2. Every input point token carries its 3D coordinates. The boundary point tokens on a cross section have surface features that consistently predict the same cylinder. All points on a 3D cross section should form a disk in 3D space. Therefore, to identify 3D cross sections, the set of invariant surface features from 174 boundary point tokens and the coordinates of all point tokens involved need to be propagated into this level. That is, the following set of property features are chosen from levels pi and bd: S; = {1,0,..,0}P‘ u {1,1,0,0,1,1,1,1,1,1,1}bd. Since all the point tokens on a 3D cross section should be connected, the relational features of 26-connectivity from both domain levels are also chosen: 8: = {1}“ u {1}“. The homomorphism criterion for grouping agent GA(x) describes the condition when a set of connected domain tokens can be grouped to form a 3D cross section token: Gf: {7,(C.; é,f,ri),l><1“ (05)} 6 Hi5, where H}? : {[—63, +63], True}. In the homomorphism criterion, Cf is a set of component tokens consisting of both boundary point tokens and input point tokens c, i', and n are the cross section parameters estimated from the surface features of boundary point tokens. The output of GA(x) includes a set of cross section tokens as well as a set of expected tubes to be instantiated in knowledge hierarchy K H. Such an expected tube is instantiated at level tb of K H using a set of tube model parameters estimated from a cross section token formed in grouping agent GA(x) and will be used by grouping agent GA(tb). 175 5.3.5 3D Tubes ‘5}. Its input and output at The grouping agent for tube tokens is GA(tb) = {135, 6'2”, 0 are defined as: [rib : {TptvEnva(E:baHlt’b)}a Ottlbt : Ttb U {Eébi' Note, GA(tb) uses the knowledge about the type of sensor used (EM) as part of its input. This information is needed in deciding how to render a tube model (an expected tube E?) in order to match it against an observed event. The grouping domain consists of a set of input point tokens: Gib I Uv’EL‘bTv’a Lib = {pt}. A tube token is formed by a set of connected point tokens whose position config- uration exhibits the shape of a cylinder and their sensor measurement configuration exhibits a certain distribution. To see whether these conditions are satisfied, two property features, coordinate and sensor measurement, and one relational feature of input point tokens are selected: 5;? = {1, 1,0, ..,0}P‘, 5;: = {1}“. An expected tube E? is a parameterized 3D cylinder. To test whether a set of input point tokens is homomorphic to an expected tube event, the following homo- morphism criterion is designed: 02'” {P'(C.-‘b; c,d,r,l)aw" (035)} 6 H“ = {mTrue}, where p“ is the measure developed in Chapter 4 that quantitatively evaluates the 176 degree of match between a set of points (0,“) and a parameterized tube model ({c,d,r,l}). Therefore, p“ serves as a confidence measure which has to exceed a certain expectation level specified by H}? in order for C,“ to be considered as being homomorphic to the expected tube. The homomorphism validity condition speci- fies both a structural homomorphism (“l> destination, meaning that the information being delivered flows from the indicated source level to the indicated destination level. For example, for the communication channel from source level pt to destination level tb, it is labeled as pt — — > tb. —————1 —_————‘ Figure 5.10. Communication channels within GH. 180 Combining the I: and 0:, specifications at all u E V, an overall interaction model can be formed. Figure 5.11 visualizes the overall information flow in the system. It can be seen that all the arrow of the interaction channels point at the same direc- tion, signaling a bottom-up information flow or recovery. Most channels in K H point downward, signaling a top-down information flow. Some in K H are also bottom-up which indicate how the acquired knowledge about the current recognition environ- ment is used to instantiate the knowledge hierarchy. The parallel communication channels between GH and K H facilitate the integration of top-down and bottom-up information at almost every level of grouping within GH. At some levels of GH, expectations are generated dynamically and used to update K H. K}! (3FI 01n-—01n ebu l ————— i 01--01 ek tb—-tb ------ *‘ fl>-—tb 1b: V i r x: U """ : x—tb _-;;:- -— Til tb--rb 44.. n>_4b — — — — l- d c--c C.’ 3 I: 1"1 _ _ — — i l" 4 bd: V “'4” , _ _ _ _lL i cc--cc cc: __ _ - — l 1tr--1tr ttrr ll — — — — bk --bk bkg: g g env. bd-x Figure 5.11. All the communication channels in the system. 181 Table 5.1. Bottom-up information flow on the interaction channels within GH. Channel Information Flowing on the Channel pt — itr {x, 1}"t pt — bkg {x, 1}”t pt — bd {x, m}?t itr — cc {x, N8?" bd -— l {N"}bd bd — a: {1}”t U {x, 52M, Kg, mm”, mm”, vmimvmax, n? l _ 6 {N8}! 1— Tb {91,040va pt — tb {x, 1, N8ipt tb — 61 {c, r, d, l, A, p, 81,, WWW 5.4.2 Data Flow Models The information about exactly what kind of data, including both bottom-up cues and top-down knowledge, flows in which channel can be further extracted from the feature selection functions (Sf; and 5%) defined for all v E V. Table 5.1 lists the bottom-up information flow on all the communication channels among grouping agents. The left column indicates the specific channel and the right column gives the set of features that are propagated from the source to the destination through a specific channel. Table 5.2 describes the specific information that flows between the grouping hierarchy GH and the knowledge hierarchy K H along either direction. The top part of the table lists the data that flows from K H to GH on each parallel level of the hierarchies. The lower part of the table describes the data flows from GH to K H which are not necessarily along parallel levels of the two hierarchies. 182 Table 5.2. Information flowing on interaction channels between GH and K H . Information Flowing on the Channel Direction Channel KH—GH el—el t—t rb—r :r—cr c—c bkg—bkg itr—itr GH—KH tb—tb rb—t cc—t —:1: 5.5 HTG Design For Recognizing Tubular Ob- jects 5.5.1 HTG System Architecture Using all the designs presented so far, including the overall designs for the hierarchies and the detailed designs for involved tokens and the grouping agents, the architecture of the vision system for recognizing tubular objects is shown in Figure 5.12. In this figure, at all levels of abstraction, the grouping agents (levels within GH on the right of the figure) are all homogeneously designed. The entire architecture is structured. The communication channels shown in Figure 5.11 can be superimposed on Fig- ure 5.12 to form a complete graphical description of the system. Different types of information can be extracted to describe various aspects of the system. For example, 183 & 9L v: E. Hp v: In Go out eln: E:;, Hg" eln: cg”; 5%”: 5:" H(E6”,' E31" ) Teln E:;, 6’: 133,113 el: G:,sfififi H(Eg, ES’) 'r" tbs E2, H? m :b: of. 811.8% umes.") T’fEii’ l? x: EévH; : x: G:,s;,s; Hues. Eé) Tie? rb.-Effiéb rb: Gisfisg” n(Eé". E?) 'r"’, El.” c:E£,H; = c: G:,S;,S; n(ES.E2) T‘ 1: Ban; 1: G’..sj,,S£I meant) T’ bd: Effiid bd: Gfis’i‘lfi'fi’ n(eg‘iEi") TbilE: *— cc: Bic,pr cc: G:, Seas: H(ESC.E§C ) T“ itr: Edi", Hf." itr: G:,,Sjiifiiii H(E”,E2r) Ti" bkg: $3,113“ bkg: cfigs’iifsfi‘ a (Bob? E2“) Tbkg env: "‘ E: pt: Tpt Figure 5.12. Tubular object recognition system diagram under HTG. it is conceivable that, with appropriately developed tools, data flow activities on any communication channel can be monitored or even visualized. Such dynamic data flow information can be used to describe or visualize the dynamic behavior of the entire system. 5.6 Summary In this Chapter, the problem of tubular object recognition presented in Chapter 4 was posed as grouping problems at different levels of abstraction. Specifically, the generalized stochastic tube model developed for tubular objects was decomposed into 184 different levels which determines (1) the number of levels of both the knowledge hier- archy K H and the grouping hierarchy GH, and (2) the tokens that are to be formed during the recognition process. The knowledge hierarchy was constructed based on the representations of the knowledge decomposed at different levels. Token represen- tation for various types of perceptual entities and associated features were designed. Individual grouping agents, each of which is responsible for one type of perceptual entity, were formally defined. Their input, output, and the grouping criteria were given, all in the format of formal specifications. Based on these specifications, inter- action models, within GH or within K H or between GH and K H , are extracted. The data flow model that describes what data, including bottom-up cues and top-down knowledge, flows in which channel was also extracted from the formal specifications of the grouping agents. The interaction models of the entire system was visualized. Finally, the architecture of the HTG system design for recognizing tubular objects was constructed. Since the problem of tubular object recognition involves many classical computer vision tasks , this chapter serves as a demonstration in terms of how a specific com- puter vision system can be developed under the paradigm of hierarchical token group- ing. Results of using this system are given in the next Chapter. CHAPTER 6 Experiments and Results In this chapter, experiments and results are reported. First, we examine the is- sues related to performance accuracy and robustness evaluation. Three performance accuracy evaluation methods are proposed. Robustness is assessed in terms of the ac- curacy degradation with respect to varying degrees of noise added to the input data. In order to control the quality of the testing data, synthetic data is generated using parametric methods. A visualization scheme for 3D tubular objects is presented that allows one to view a “cross section movie”. The system has been tested on both 2D images and 3D volumes, synthetic and real data for both cases. Performance and robustness are evaluated based on the testing data for which ground truth is available. Experimental results from many real data are also visually evaluated at every stage of the processing. Some recognition results are shown using the proposed 3D visualization method. Experimental results from noiseless synthetic data show that accurate object recovery is achieved. This is concluded from the observations that (1) the object parameters such as radius and orientation estimated by the system from the test data are very close to the ground truth parameters, and (2) the system nicely recovers the shape dynamics of the underlying objects because the estimated parameters change with the change of true object parameters. The integrated model—based approach and the incremental 185 186 recovery of tubular objects leads to a reliable and robust system behavior. When different degrees of Gaussian noise are added into the synthetic test data, the system performance degrades gracefully with the increase of noise because the deviation between the estimated object parameters and the ground truth parameters gradually increase at small paces. These conclusions also apply to real data. The system makes mistakes whenever the data does not agree with the object model used. In the following sections, experimental issues and results are presented in detail. 6.1 Performance Evaluation To assess the performance of a vision system, quantitative evaluation methods are needed. In this section, we propose three types of evaluation schemes, namely parameter-based method, region-based method, and boundary-based method. The first scheme is suitable only when the objects of interests can be modeled in a parametric form. The evaluation is conducted in terms of the precision of the model parameters estimated by a vision system with respect to the ground truth model parameters. The deviations between the two sets of model parameters are indications of the sys- tem performance. The region-based approach evaluates performance in terms of the accuracy in classifying object regions. The boundary-based approach measures the precision in localizing objects and the correctness of the extracted object shape to quantitatively assess the system performance. All three evaluation schemes can be used only when ground truth is available. The measures developed in both region-based and boundary-based evaluation schemes may change drastically in some extreme cases with the number of points involved in ground truth. For example, if there is only one labeled pixel point in some ground truth, the evaluation measure will take value of either 100% or 0%. 187 Nevertheless, the proposed evaluation methods will serve the purpose in most situa- tions. 6.1.1 Performance Accuracy and Robustness In evaluating a machine vision system, two aspects of performance need to be con- sidered: one is its accuracy and the other is its robustness. The accuracy ought to be evaluated in terms of the precision of recognition with respect to ground truth. The robustness should be assessed in terms of corresponding accuracy degradation with respect to the degradation of the quality of the test data. In either aspect, the key issue in evaluation is ground truth. Currently, synthetic images generated with predetermined model parameters are used not only to test the system but also to serve as the ground truth in evaluating the accuracy and the robustness of the system. Ground truth for some plant root images was established for evaluation purposes based on discussion with several panels of the domain experts. 6.1.2 Two Types of Errors There are two types of errors which must be considered in evaluating performance accuracy. Conceptually, they are associated with hypothesis testing. In a hypothesis- testing problem, one is presented with two claims about the true value of a variable, of which exactly one must be true. Based on experimental evidence, one wishes to decide which of the two contrasting claims is correct. One of the two claims is called the null hypothesis, denoted by H0, and the other is called the alternative hypothesis, denoted by Ha. For example, within the context of image understanding, these two claims can be established with respect to either object or non-object classification of a pixel (voxel). Assume the variable that labels the classification of a pixel (voxel) 188 is L. Let the value of this labeling variable for object class be one and the value for non-object class be zero. In this case, the two hypotheses are: A type I error consists of rejecting the null hypothesis when it is true. A type [I error arises if H0 is not rejected when it is false. With the two hypotheses in the previous example, a type I error is committed if a pixel (voxel) has been classified as an non-object point when it is indeed an object point. On the other hand, a type II error occurs if a pixel has been mistakenly classified as an object point. We will use these concepts to establish some performance measurements in the next few sections. 6.1.3 Parameter-Based Evaluation Assume an object to be recognized can be precisely and uniquely described by a set of known parameters. This set of parameters forms the ground truth about the object. Let the corresponding object recognized by a vision system be represented by a set of parameters that are estimated from the input image. The discrepancy between these two sets of parameters specifies the performance of the vision system to be evaluated. We now formally define this evaluation scheme. Let GP be the set of ground truth parameters: GP = {9“,9”, ..., 9”}, and P be the set of estimated parameters: P : {plapib "'9 pk} Then the difference between 9” and P, denoted by 6(QP, P), is defined as: 5(QPJ’) = {5(QP‘,p.-)| 1 S i S kl, 189 where 5(gp‘mi) =l g“ —Pi I- That is, 6(QP,P) is a set of deviations measured as the absolute differences between pairs of corresponding parameters, one from the ground truth set and the other from the estimated set. A smaller deviation indicates a better performance. The amount of increase in these deviations with respect to the degree of noise added into the input image reflects the sensitivity of the system to noise. That is, the performance degra- dation with respect to the quality degradation of input data measures the robustness of a vision system. The parameters used for evaluating tubular object recognition system developed in Chapters 4 and 5 are the object model parameters. 6.1.4 Boundary-Based Evaluation This evaluation scheme is designed to assess the quality of an object recognition system in extracting the boundaries of the objects of interest. Let f (x) denote an input image and B be the boundary point set derived from f by the machine vision system to be evaluated. Assume the boundary ground truth for f, 93, is available. The goal of this evaluation method is to describe the discrepancy between 93 and B using certain criteria. We propose three measures to form this description, namely distance distribution index, ’DB, type I error, eg, and type II error, eg. Overall Performance Index We define the distance distribution index, denoted by D3, as a discrete function whose distribution characterizes the discrepancy, measured in distance, between the boundary ground truth and the boundary derived by the underlying recognition sys- tem. The distance from an arbitrary point x in set B to ground truth 93 is defined 190 as the minimum absolute distance from x to 93 : d(x,gB) = min{dz(x,y)}» W 6 G”, where dE(x,y) denotes the Euclidean distance between point x and point y. The distribution of 193 can be constructed from the histogram of distances between in- dividual points in B to the ground truth boundary QB. Distribution D3, its mean E, and its standard deviation M together characterize the degree of match (or mismatch) between B to 93. Figure 6.1 shows several examples of this distribution. Since means are known to be sensitive to outliers, the median, denoted by D3, of distribution 733 can also be used to describe the degree of match. (a) (b) (C) Figure 6.1. Examples of the distance distribution D3 for image R2 at (a) initial stage, (b) intermediate stage, and (c) final stage. In this research, to implement this evaluation scheme, the technique of Chamfer- ing distance transformation [7] is adopted and extended to 3D volumetric space to compute a. distance map with respect to Q”, denoted by d(x,gB) based on which D3 can be constructed. Assume both 93 and B are defined on a grid, 2D or 3D, {x,}, 1 S i S MXNXD(MXNxDisthesizeofthegridandD=1in2D case). Derive d(x,gB) by performing a distance transformation based on QB using 191 the Chamfering algorithm. Both the original and extended Chamfering algorithms are given in Appendix A. From the resultant distance map d(x, 93), whose value at x represents the mini- mum distance from this point to ground truth 93, D3 is derived from the histogram of d(x,d(x,gB)). The mean and standard deviation of D3 can be computed as: — _ 2x68 d(X, 98)) DB " l B l as, : \szesd(x,98)2— I B l >613; | B l -1 A perfect match between g3 and B will yield D3. = 0 and 038 = 0. Generally, a D3 with a zero mean and a small standard deviation indicates a good performance of the machine vision system to be evaluated. A large standard deviation may signal the existence of outliers. In that case, median D3 of distribution D3 may provide a better indication of the performance accuracy of the system. Type I Error Type I error in boundary detection specifies the percentage of the points on 93 that have been mistakenly classified as non-boundary points. It is defined by: E where B1={x|(x€QB/\(X¢B)}. 192 Type II Error Type II error indicates the percentage of the boundary points on B (x) that have been mistakenly classified as boundary points. It is calculated from: where B2={x|(xeB)/\(x¢93}. 6.1.5 Region-Based Evaluation This approach evaluates the performance of a vision system in terms of the accuracy in extracting object regions. Let the recognition result labeled by the vision system be R(x) and the corresponding ground truth labeling be gR(x). Both R(x) and QR(X) are binary functions of x with function value of H as the label for the object region and L as the label for the background region. The goal is to quantitatively describe the degree of mismatch between R(x) and gR(x). The smaller the degree of mismatch, the better the system performance. Similar to the boundary-based approach, three indices are designed to serve the purpose. They are performance index, p, type I error, e19, and type II error, cg. Overall Performance Index We define a performance index, denoted by p, to measure the degree of the agreement between gR(x) and R(x): _ 2.6,[1— W1 — If I i 193 where f is the image function. The measure p = 1.0 when a perfect match occurs and p = 0.0 when R(x) and 93(x) completely mismatch. This index actually gives the percentage of the mismatched points. Type I Error The type I error in a region-based evaluation method indicates the percentage of the object points in QR(x) that have been mistakenly classified as non-object points in B(x). That is: where HI = {x | (X E GEM) A (X ¢ R(X))}- Type II Error This index describes the percentage of the object points in R(x) that should be background points, according to ground truth QR(X). Therefore, 2 152;, R | R(X) I where R2 = {x | (x E R(X)) /\ (x ¢ 9R(X))}- 6.2 Synthetic Data Generation Synthetic data is needed to (1) test the tubular object recognition system and (2) provide ground truth in evaluating system performance. Motivated by the axial representation for Generalized Cylinders, all the synthetic tubular objects in this study are generated using a parametric approach described below. 194 Let E denote a tubular object to be generated and it is represented by its center Every point on A is attributed, carring a set of line or axis, denoted by A(E). parameter values that describe the corresponding cross section centered at that point. In our study, we only deal with a special class of tubular objects with circular cross sections that are always perpendicular to the axis. Therefore, only radius information is carried on each axis point. The trajectory of A describes the shape of E and the radius attributes on all the points of A specify the size of E. Therefore, such an axial representation completely characterizes object E. Due to the descriptive power of an axial representation for tubular objects, the following algorithm is used in generating a synthetic tubular object, either in 2D or 3D. 1. Generate axis A using a given set of control parameters. 2. Derive a minimum distance map d(x) with respect to curve A. 3. Segment d(x) using a given radius r to derive 3D region R(E) occupied by E. 4. Generate E by rendering R(E) using some appropriate sensing model. The region R( E) consists of all the points x whose minimum distance to curve A is smaller than or equal to the given radius. Based on the sensing models used in this study, the rendering operation here uses the techniques described in Section 4.3. To generate synthetic objects with different shapes, we adopt a parametric ap- proach through which object shape and pose can be controlled via a set of parameters. Below, we examine the parametric approaches proposed to generate various axes with different shapes and poses in both 2D and 3D cases. 195 6.2.1 2D Synthetic Data Generation In 2D image plane, we use a cubic curve to represent a general axis. A cubic curve is defined as: yzAx(:r—xo)3+BX(x—xo)2+CX($—$o)+Da where A, B, C, and D are the coefficients and (.730, yo) where yo = D is the translation of the curve in the image plane. All the points (z, y) that satisfy the above equation form curve A. Through different combinations of the coefficients, curves with various shapes can be derived. For example, the bigger the coefficient A is, the more curved the axis becomes. If A = 0.0 and B 75 0.0, the axis generated is symmetric with respect to 3:0. Another special case is a straight line when A = B = 0.0. In this case, the slope of the curve is a constant which is the orientation of the line. The tangent at every point on the curve can be computed from the first derivative of the curve at that point: y' = 3A(:r — mg)2 + 2B(:1: — $0) + C, which provides the ground truth of the orientation of the object at that point can then be computed and will be used as ground truth. The choice of using such a parametric cubic form for generating an axis is because (1) the shape of the curve it represents can be easily controlled by changing the coefficients and (2) the cubic form is the lowest order that guarantees a second order differentiable curve CZ, hence smooth everywhere. 196 6.2.2 3D Synthetic Data Generation To generate a 3D axis, two approaches are used. One is to apply a 3D homogeneous transformation to a 2D curve generated in a cubic form discussed above. That is, A3 = A2 x ’H, where It represents a homogeneous transformation with translation T = [X0, Y0, Z0], rotations 123(0) (rotate (1 degrees with respect to the the X axis), Ry(0) (rotate 0 degrees with respect to the Y axis), and RAE) (rotate ,3 degrees with respect to the Z axis. That is, H.- = Rx(a) x Ry(0) x RAB) + T an. as, —m oi smwrcmgmmm+dm.mno casgcg+sasg 005359—5009 0.0,, 0 X0 n 25 1 .l The other approach is to use directly some 3D parameterized curve. One such example is the mathematical helix. It is formulated as follows. A helix, denoted by h, is parameterized by 0, for 0 E R: where r(0) = gxcosw), y(9) = bx0, 2(0) 2 —Xsin(0). 197 There are several parameters that can be used to control the shape of a helix. Param- eter 11) controls the radius of the helix and parameter b controls how rapidly a helix rises. In this particular application, since a synthetic object is generated within a normalized 3D cube, parameter b can be computed from a specified number of cycles of the helix, denoted by c, that are to be seen within the cube. So, b can be derived by: 1 b = —. 360 X c Figure 6.2 shows two discrete helixes generated in 3D volumes using different sets of parameters. Figure 6.2. Examples of helixes generated using different parameters, (a) w = 36, c 21.0, b = 0.1778, (c) w = 20, c = 3.5, b = 0.0508. 6.3 3D Tubular Object Visualization One of the advantages of a concise axial representation for tubular objects is that it supports an effective visualization scheme for 3D blood vessels. This visualization 198 scheme allows a viewer to “travel” along the axes of chosen vessels. The sensed blood flow pattern for every cross section can be directly visualized by projecting the sensor measurements in the cross section onto a plane that is perpendicular to the viewing direction. This can be achieved by aligning the V axis of the object- centered coordinate system with the viewing direction and then transforming all the points (3:, y, z)’s on a cross section into (u,v,w)’s in UVW. Since each cross section has a known intrinsic dimensionality of two, the proposed visualization scheme can be realized through an eigenvector transformation. In this way, there is no need to explicitly compute the rotation matrix needed to transform (2:, y, z)’s to (u, v, w)’s. Let A be the axial representation for a blood vessel: A = {(Aia 73)}, 1S i S 19 Ai : [xiiyiazilTa where I is the total length of the axis and Ag is a 3D point on axis A, rg is the radius information that Ag carries, and l is the length of A. To achieve the proposed visualization scheme, every cross section xg, represented as (Ag, rg), 1 S i _<_ I, needs to be visualized individually. A cross section Xg = (Ag, rg) resides on a 3D plane Pg which can be described by Pg: agxx+bg>< Sm mm 3m 2 mm c.5302 oweam 8.225 803 883 no eoneEuotoa 8828 Mo 220383.86 nonmebevcwom 66 @388 209 (a) Figure 6.6. Estimated local tube centers superimposed on input images (a) S23, (b) S26, (c) 527. (a) Figure 6.7. Examples of man made tubular objects of pipes. (d) boundary information continuously gets discarded. Type I error actually increased slightly when the processing proceeds. This may be partially due to the side effect of discarding spurious information. In general, the absolute values of measured Type I and Type II errors are not as significant as the trend of their changes due to the following reason. Recall, both types of errors are measured based on the number of pixels that are mistakenly classified either as boundary points or as non-boundary points. Because of the quantized points in an image, many pixels that are classified wrong may be only one pixel away from the correct positions. Therefore, absolute percentage of the points that are not classified correctly is a very conservative measure. The most significant measure in Table 6.6 is averaged D3 that indicates the median distance between ground truth boundary to 210 (b) Figure 6.8. Examples of man made tubular objects of wires. Wflw-zswmfi-amfi Figure 6.9. Examples of organic tubular objects of bacteria. estimated boundary. Figure 6.13 visualizes the improvement of recognition accuracy in terms of D3 and eg based on the averaged measures in Table 6.6. For D3, the desired distribution is: |B(f)| d=0 0 (1750 D3(d) = That is, the closer the D3 and 033 are to zero, the better. Figure 6.1 and Figure 6.14 show how D3 changes when processing continues for the plant root image R2 and R5. The center columns of these distributions in the two figures correspond to zero in distance. Therefore, it is clear that the distance distributions for both test images gradually changes toward the desired distribution. 211 Figure 6.11. Detected tubes for test images (a) R1, (b) R2, (c) R3, (d) R4, (6) R5. We now present some of the results obtained from objects of wires. Due to lack of the ground truth for this type of data, performance is currently assessed through visual evaluation. The results for this type of data were obtained using system parameter 77:0.65. The precision of the estimated center points of local tubes can be seen from the superimposed images for input MM7 and MM8 in Figure 6.15. One example of the effectiveness of the system in disambiguating 2D ribbons because of the use of 3D model information is shown in Figure 6.16. There are ribbons in Figure 6.16(c) that do not correspond to object. Only two of the ribbons Were verified as tubes and labeled as local tubes in Figure 6.16(c). 212 .. J \\ r / l 2‘ — "‘ \ y \ I X, a) (b) (C) (d) (6) Figure 6.12. Detected tubes for test images (a) R6, (b) R7, (c) B1, (d) B2, (e) B3. 1‘1"“ .3 6.5 Experimental Results From 3D Volumes _Q . cum. 1 A The model-based recognition system using hierarchical token grouping is tested on 3D synthetic data and 3D real blood vessels from MRA scans. To evaluate the system, synthetic tubular objects with different structural properties are generated using a parametric approach. The object parameters used in generating synthetic data are also used as ground truth in evaluating the system performance and robustness. The robustness is assessed at different levels of noise. Different degrees of Gaussian noise are added with n = 0,10,15,20, where n is the standard deviation of a Gaussian white noise distribution. For synthetic data, 17 = 0.5 is used. For real data, 1} = 0.3 is used. In all experiments, A’s are set to 2.0. A total of more than 50 synthetic volumes and 30 volumes of real data with arterial blood vessels from MRA brain scans are tested. 6.5.1 Results and Evaluation on Synthetic Data Test Data As introduced before, two methods can be used to generate an axis of a 3D synthetic tubular object. One is to obtain a 3D curve by applying a homogeneous transforma- tion to a 2D curve whose shape can be controlled through a set of parameters. The 213 Improvement in Distance to Groundtruth Improvement in Type 11 Error zcl Y T 1' I V I U 18* . 0.6» 16’ 14+ _, = 05* 8 1-3 5 12’ g “5 10 0.4t a i '3 8* =3 2' a g3.“ 0.3» 6+ x 4 . L 02, . 2 . 4 G L l. I T 0. l l 1 l 0 0.5 l 1.5 2 0 05 1 1.5 2 Stages of Processing Stages of Processing Figure 6.13. Average improvement in D3 and 6119’. other method is to use directly the parameterized 3D helix formulation. Different axes with different shapes can be generated by controlling two helix parameters (w and c). Table 6.7 lists the parameter settings for the 3D synthetic test data generated using the first method. Three of the synthetic test volumes are shown in Figure 6.17, displayed using Maximum Intensity Projection. Table 6.8 gives the parameter settings for generating 3D helixes using the second parametric method. Figure 6.18 shows three helical synthetic objects generated. Similar to the 2D case, all 3D test volumes were corrupted by different degrees of Gaussian noise and then used in assessing the robustness of the system. 214 88 S 8.:- 888- 888- 888 88 s. 883. 83 888- 888 3. 8 8888.8 888- 888- 8888 8 88 88 2 8888.8 83- 888 888 8 8.8 3. 882 8888.8 83- 888 8888.8 88 8.2. 888 883 8888.8 8888.8 8 88 88 s. 888- 888- 888- 888- 88 S 88.2 888 888- 888 8 88 8.2 8.2. 8888.8 883 8888.8 8888.8 8 88 .. 8 Q Q m 8. .3088 88.5 82839» 03059? Qm no.8 58.5 “836.80 so @389 215 (a) (b) (c) Figure 6.14. Examples of the distance distribution 133 for image R5 at (a) initial stage, (b) intermediate stage, (c) final stage. (a) (b) Figure 6.15. Superimposed center points of local tubes for (a) MM7, (b) MM8. Performance and Robustness The results for the synthetic volumes were obtained by setting 71:0.50 and A’s = 2.0. The experimental results for the first set of 3D synthetic objects (in Table 6.7) are shown in Table 6.9. All 65’s and 6”s are small and all but one value of 32' are above 0.92. The values of 6”s and p"s reported in Table 6.9 are weighted averages taken over the entire length of an object. The weight put on the measures from each local tube is determined by the ratio of the tube length to the total length of the object. Figure 6.19(a) and (b) plot the individual values of p‘ for local tubes along three ‘216 - l (a) (b) (F) Figure 6.16. The recognition result for input image MM4, (a) input image, (b) de- tected boundary. (c) the tubes found. (a) (b) (C) Figure 6.17. Examples of SD synthetic volumes. (a) volume S32, (b) volume S34, (c) Volume S35. Syr‘ithetic objects. It is clear that values of 33" from individual connected local tubes are stable and smoothly distributed. Similar phenomenon has been observed for 6" and 6‘3 (whenever it applies). Table 6.10 gives the experimental results from 3D helixes. For each test volume, different degrees of noise were added (indicated in column two of Table 6.10). At every situation of noise, three measures are listed: average error in estimating radius 5r, average estimated length of local tubes law, and average value of 53'. All the a"fe‘l‘ages are taken over the entire length of the objects. The degradation of these '31 Lle-s w1th the 1ncrease of n01se indicates how sensmve the system is to the norse. The 217 Table 6.8. Ground truth for 3D synthetic helixes. Figure 6.18. Three synthetic helical objects. (a) helix-1, (b) helix-3, (c) helix-6. error in estimating object radius does not have an obvious change with the increase 0f noise but their corresponding 30‘ values do decrease, signaling less confidence in estimated radius. The average lengths of local tubes seem to decrease slightly with the increase of noise. But with the increase of object curvature (the more cycles a helix has, the higher curvature it has), the average local tube lengths decrease, meaning that the sweeping operation adaptively adjusts itself to an appropriate situation in Order to track the objects. The optimized figure of merit p“ has obvious trend of degradation when the noise level increases but the amount of degradation is fairly SIT1311], indicating that the system performance is fairly robust. Figure 6.20 describes ”1 more detail how the estimated radii along object helix-2 fluctuate under different 218 Table 6.9. Results from 3D curved synthetic objects. Obj. fl 63 i‘ 6" p“ 831-1 45.0 0.0 11.7 0.3 0.9314 832-1 na na 6.0 0.0 0.9284 832-2 na na 8.3 0.3 0.9877 833-1 45.0 0.0 6.1 0.1 0.9735 833-2 135.0 0.0 4.3 0.3 0.9593 834-1 na na 9.1 0.1 0.9458 835—1 na na 4.2 0.2 0.9819 835-2 na na 4.8 0.2 0.8982 835-3 na na 6.0 0.0 0.9522 noise situations. The higher the noise degree, the more often the estimates deviate from the ground truth. Significantly higher fluctuation occurs at one point only at noise degree level n = 20. Since the step length used in optimizing estimated radius is 0.5, the minimum fluctuation is 0.5. To see the detailed degradation in p“, individual values of p“ along two helical Objects under different degrees of noise are plotted in Figure 6.21. It is demonstrated that the figure of merit degrades gracefully with the increase of noise. This can also be seen in Figure 6.22 which visualizes the estimated object axis for helix-6 under different degrees of noise. Figure 6.22(a) and (c) are the input volumes of helix-6 With noise degrees n = 0 and n = 20. Figure 6.18(b) and (d) are the estimated object 33:68 for the input in (a) and (c), respectively. From the figure, we can visually see that, even with corrupted data, the object axis was estimated fairly precisely with Very slight deviation in positions. The experimental results suggest that it is more difficult to track more curved Obj ects than less curved objects. We compare the estimated radius values of helix-2 and helix-6 (more curved) because these two objects have the same radius of 3.0. 219 Figure of Merit AlongObjectlcngth FigurcofMuitAlongObjecthgth l-3Iiiirrirrriilri l-3IIIIIIITTllllT—T S34— $35-2— l.2 . ., [.2 - 535-3 ----.. 5 1.1 - - g 1.1 - - 2 2 “5 1" " “6 [FM-— 'l a \________ a ________________________________ 1: MP q ii". 0.9. 1 0.8- 1 0.8 - - 0.7JJ_LLL1J_1LJI_1111 mininiiiiiiiiiii l 61116212631364146515661667176 I 6Ill6212631364l46515661667l76 Length Len (a) (b) Figure 6.19. Values of figure of merit along objects (a) 834-1, (b) 835-1 and 835—3. The estimated radius values along objects helix-2 and helix-6 when n = O are shown in Figure 6.23 (the values of p“ from helix-6 when n = 0 is also plotted in Fig- ure 6.23(b)). Compared the variation of the estimated radius values for helix-2 with that of the estimated radius values for helix-6, the latter seems to vary more. How- ever, examining the plotted figure of merit values along helix-6 in Figure 6.23(b), Whenever the estimated radius values deviate from the ground truth, the figure of merit drops, signaling less confidence in the estimated object parameters, a desirable Property. The estimated local tube lengths should vary with the curvature along an ob- ject. Since a helical object has a constant curvature (theoretically but may vary in quantized data), we use two helixes with obviously different curvatures to show that the optimization scheme we used will adapt to the changing curvature so that the estimated tube lengths will be adjusted so that the object is correctly tracked. I:‘igllre 6.24 plots the estimated lengths of local tubes along synthetic objects helix-1 and helix-6. The plot is made based on the first 30 local tubes of both objects. The ,1 220 Table 6.10. Results from 3D synthetic helixes. Helix Noise 6" lavg @269 0 0.0500 3.8300 0.9635 helix-1 10 0.1333 3.1667 0.9209 15 0.0833 3.1667 0.8634 20 0.1167 3.0667 0.7956 0 0.1067 2.9600 0.9613 helix-2 10 0.0773 2.7667 0.9047 15 0.1311 2.7333 0.8615 .., 20 0.1133 2.7943 0.7491 i; 0 0.0667 3.2600 0.9788 1;" helix-3 10 0.1500 3.1000 0.9321 ll. 15 0.1333 3.1000 0.8801 20 0.1333 2.9800 0.7865 0 0.1636 3.2500 0.9652 helix-4 10 0.2734 2.9444 0.9224 15 0.1944 2.7647 0.8731 20 0.1833 2.8333 0.7633 0 0.0500 2.8000 0.9626 helix-5 10 0.0833 2.7843 0.9187 15 0.0167 2.7667 0.8696 20 0.0333 2.7333 0.7868 0 0.2381 2.5800 0.9648 helix-6 10 0.2000 2.4500 0.9238 15 0.2167 2.3054 0.8714 20 0.2333 2.5667 0.8012 difference between the two distributions of the local tube lengths from helix-1 and helix-6 can be clearly seen. Almost all the tube lengths of helix-6 are shorter than those of helix-1 even though fluctuations exist in both. 221 Impact of Noise on the Estimated Radius 8 1 l 1 l l r I 7357 : Ground Truth — : n— ---- m 6-5 ' n=10 ----- "‘ .5 6 - “=15 ......... - E 5g ‘ n=20 ----- " B 4.5 " '2‘ '- N 4 1— .I . _ g 3.5 73'. '1 A .- "l \‘7. 1m": .‘. .r'y' 3,198 \ .'fi.‘ [1‘3 3 "1;! "‘ 'li‘! ' ‘54:) '1’ \I u ‘5 '1 "J M‘. 1". "" 2.5 6 V 1 2 ~‘- _ 1.5 - '- 1 1 1 1 1 1 1 1 O 10 20 30 40 50 6O 70 Length along object Figure 6.20. The estimated radius for helix-2 under different noise situations. 6.5.2 Results and Evaluation on Real Data Test Data 30 blood vessel subvolumes from MRA brain scan have been processed. Some of the Subvolumes are visualized in Figure 6.25 using the Maximum Intensity Projection (MIP) technique. Lacking ground truth for the data in this category, system perfor- Inance is evaluated visually based on the results from different stages of processing. Pel‘formance of Initial Stage The noisy nature of MR volumes is a well-known problem and results obtained from the initial stage of processing are often not reliable. This can be seen in Figure 6.26. All the images shown at the left in this figure are the MIP of the original 3D volumes. The second column is the initial global segmentation results in which each voxel is an 1 1115'631‘ior point token. The third column shows all the initial boundary points of the four 222 lmpactotNoiseontlrFigmeochnt lmpactolNoiseonthefigumolMetit _ I l l T f l l l l l l _ 5 I l l l l l l l l l :_ - 11:0 — - l4 - 0:0 - - - : 0:10 ----: _ n=lO . i _ ll=l5 ..... - 1’3 n=l5 ..... E .1 : n=20-: :5 1.2 r n: — - g 3 - - z 1.1 - - a- - " ‘ a. o - o _ . E I “s: E ’.~"‘\'2“-7'1."\‘. ' - Q 5‘ N - .'-,_ . . if. 3 '7'. ix”. . - 'V' - .. .1 3 0.7) - I} < 0.6 - 4 ‘ l l l l l l l l l l l -‘ 05 l l I l l l 1 l l l 051015202530354045505560 05101520253035404550 lzngth along object langth along objcct (a) (b) Figure 6.21. Sensitivity of the figure of merit to noise (n). (a) helix-2, (b) helix-6. input volumes. No set of initial classification results is satisfactory. Nevertheless, they provide partial evidence. For example, both the segmentation and boundary detection results from sub21 represent fairly good classification. Some type II errors (misclassify background points as interior points) are obvious in Figure 6.26(k). In the boundary detection result (Figure 626(1)) , some type I errors are made (classify boundary Figure 6.22. Experimental results on a helix. (a) the helix without noise, (b) the estimated helix axis from (a), (c) the helix with noise degree n = 20.0, (d) the estimated helix axis from (c). 223 Impact ofNoiseonthertimatcd Radius Impact 01 Noiscoutliefigute of MentandEstimated Radius 55 1 1 1 1 1 1 1 1 55 1 1 1 1 1 1 1 1 5- 0101deth Radius —* .a 5- GroundTruthRndius — - 4 5 . Estimated Radius --- .. a 45 . Estimated Radius ---- .. FigmcofMerit ----- g 41- -t B 41- - I t - r- - d "1 t " 1' I" -~ n 1 - 3 33‘": 11.11 11 1'11: . .5. ”31:11 111701. 1.11.114 7 g 25 r - g: 15 - - g 2 .. . g 2. . m 1.5 - '1 “5 15 - - 1 - a g 1 t- ----- --: 0.5 - a a; 05 - - 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0102030405060708090 0102030405060708090 lxngth along object Length along object (a) (b) Figure 6.23. Estimated local tube radii with n = 0 along (a) helix-2, (b) helix-6. points as non-boundary points). As we will see later, these missing boundary points will eventually cause the loss of the entire thin object seen clearly in the top half of the volume. Another demonstration of reasonable results from initial processing is from the third set of results from input sub5. Even though the initial segmentation is very noisy, the initial boundary detection provides good evidence for object surface. To see more clearly, the initially detected object surface for sub5 is displayed from four different viewpoints and rendered as range images in Figure 6.27. Later we will see that the initial surface detection on sub5 yields good seeds, one of which (the best one) activates a global sweeping that recovers the entire object. Performance of Automatic Seeding To examine its effectiveness, the results from automatic seeding stage are evaluated in terms of two aspects. First, the generated seeds should be located on objects instead 224 Impact of Object Curvature on Tube Length r 1 l 1 1 1 Tube length on helix-l —-— Tube length on helix-6 ---- lllllllllllll Estimated Tube Length 0 '-‘ NUIU) #UIUIUIQ \) “MO 0 5 10 15 20 25 30 Length along object O.) M Figure 6.24. The estimated tube lengths for synthetic objects helix-1 and helix-6. of in background noise. Secondly, there should be as few as possible false alarms such that when there is no object, no seed should be generated even when input data is noisy. Figure 6.28 displays some of the seedng results. The seeds detected are repre- sented by the dark dots superimposed on the original volumes. The dark dots are the center points c’s of the cross sections that the seeds represent. Ideally, the super- imposed seeds should be on the axis of an object. From these three sets of seeding results, we can see that most seeds are located fairly precisely near objects even though there are sometimes many spurious boundary points detected. This is due to the use of model-based seeding process which relies on the presence of both distinct Surface features and the circular group of boundary points. Due to noise, the location Of the seeds may not be precise. There are two possibilities. One is because of the Size of the underlying objects. When an object is very thin, due to quantization, one (01) Figure 6.25. Volumetric blood vessel data displayed using MIP, (a) subvolume sub2, (b) subvolume sub14, (c) subvolume sub16, (d) subvolume sub33. position off the real axis may seem to be a big deviation. For example, some objects in these volumes have radius about 1 to 2 voxels. In that case, it is impossible to center the seeds. However, there are indeed some bad seeds that fall either on the boundary of objects or fall in the background region, i.e., false alarms. For these seeds, it is expected that they will be invalidated at the stage of local recognition because of the explicit use of both object geometric model and the sensing model (this will be discussed in the next section). Because seeds are generated based on the distinct features of cylindrical surfaces, they are generally located on essentially straight segments of objects. One example is 226 shown in Figure 628(1) where seeds are not found near the relatively curved segment of the object even though there is good evidence. Figure 6.30 gives several examples that illustrate the effectiveness of the seeding process when there should be no seed detected even though many spurious surface points are present. Figure 6.31 shows other two examples in which seeds are generated only at positions near objects even though the initially detected surface points are noisy. These two sets of seeding results demonstrate the effectiveness of the model- based seeding approach we adopted. Generating seeds from only boundary points causes some problem. Whenever there is no boundary point, there is no seed to be found. This can be seen in Fig- ure 6.29. Even though some boundary points are missing, the initial segmentation actually also provides good evidence of object boundaries. Since the initial segmen- tation result is not participating in the seeding process, seeds can not be recovered near those regions. As a consequence, no object will be recovered which will be seen in the section where the effectiveness of global recognition is discussed. Performance of Local Recognition Given a set of seeds (hypotheses), the task of local recognition is to verify and optimize the seeds. Each seed may either be verified or be invalidated. A verified seed may or may not activate a global sweeping because a seed may become unnecessary once the object on which the tube that the seed represents has been recovered from a sweeping activated by another seed. Therefore, a seed can be in one of the following four status: valid, invalid, used (to activate a global sweeping), or unused. False alarm seeds should be classified as invalid seeds at this stage because of the use of the blood flow model. In order to recover objects as reliably as possible, seeds (or local tubes) are ranked 227 by their optimized figure of merit p“ and global sweepings are activated in the de— creasing order of p'. A valid seed is one that has p“ Z 17. Figure 6.32 shows the results of local recognition for input volume subl4. Figure 6.32(a) visualizes the set of seeds that are detected from the previous stage (dark dots superimposed on the original volume). Figure 6.32(b) shows the set of valid seeds as bright dots, ranked by their p*’s (the brighter a dot is, the higher its p“ is). The ranks of the seeds in Figure 6.32(b) agree with what we can see from the input volume. A better seed position (on the object axis) yield higher rank. The more the surrounding data of a seed looks like a vessel, the higher the p“ is. The set of invalidated seeds are shown in Figure 6.32(c) as superimposed dark dots. It can be seen that the seeds that are invalidated are either not positioned right or the surrounding data does not look like a blood vessel. There is one seed centered at the middle of a long thin vessel. It gets invalidated because its ,0" is just below 1]. Lastly, Figure 6.32(d) gives the set of seeds that are actually used in initiating global sweeping operations. Visually, they seem all to be good seeds. The unused seeds are the ones that are seen in Figure 6.32(b) but not in Figure 6.32(d). A similar set of results for input volume sub34 is shown in Figure 6.33. From this set of results, we can see that some objects (the big one in Figure 6.33(a)) can be completely recovered from one good seed (the recovered object axis is given in the next section). Performance of Global Recognition The performance of the system at the stage of global recognition is evaluated based on the quality of recovered objects, including how many objects are recovered and the accuracy of the recovered axes. Some of the experimental results from blood vessel VOlumes are shown in Figure 6.34 to Figure 6.36. It can be seen that the stochastic 228 model allows the flexibility for each seed to grow in space and the optimization scheme developed in Chapter 4 makes the estimated object axes follow the correct object trajectories. Although the blood vessels in these volumes have varying shapes and sizes, the sweeping is effective in all cases as far as there is a seed to start with. The smallest radius that are detected so far is one voxel. Some vessels in Figure 6.34(d) and (g), Figure 6.35(d), Figure 6.36(a) and (c) have segments of such small size. They are all recovered. Some vessel segments have very low contrast and surrounded by noisy data. One example is the left branch of vessel in Figure 6.34(g). Because of the correctly located seeds, it is recovered. The estimated axis for volume sub5 (shown in Figure 6.34(j)) is superimposed on the original data which visually shows how accurate the estimated axis is. The right column of the first three sets of results in Figure 6.34 illustrate that sim- ple branch points between blood vessels can be correctly identified. Axes for different vessels are visualized using different intensities. In tubular object token representa- tion, the relational feature Min" describes such detected intersection relationships. Assume we have recovered two tubular objects that are represented as tokens Tf’ and T 2", respectively. If they intersect at position x, this spatial relationship will be recorded in their relational feature sets as follows. For token Tf', the relational feature Mm" is: R1121 : {Mintr, (2,CI,X)}, Ineaning that token T,“ has a intersection relationship with token T? at position x. Similarly, for token T 2“ , its relational feature set contains: 123': {Mintr,(1,el,x)}. 229 With the explicitly intersection relationship description, other types of spatial rela- tionship analysis among vessels may also be performed. There are some cases that reveal the weakness of our current seeding approach and show its effect on the overall object recovery. Figure 6.35 gives two such cases. The second column in this figure shows the seeding results from the two volumes in (a) and (d) of the same figure. As we pointed out in evaluating the effectiveness of the seeding method, lacking of seeds on some vessels is because there is no surface points detected from these vessels (see Figure 6.29). Therefore, even though the initial segmentation results from these volumes actually provide better evidence, no seed is generated because a seed has to be constructed from surface points. Without any seed, the global recovery completely missed underlying objects. The thin vessel that is on the top of a large vessel in Figure 6.29(a) did not get recovered which can be seen from Figure 6.29(c). One possible reason for not being able to detect the needed surface points on this small object can be the magnitude threshold mt used. It is computed as mtzum+2-am, where um and cm are estimated from a unimodal Gaussian fitting. When there is a big vessel present (as in the volume in Figure 6.29(a)), such computed threshold will likely cut off small objects. Similar situation occurred to the right branch vessel in Figure 6.29(d). The reason for not being able to detect the surface points for this object is likely due to the low contrast. More sophisticated integration is needed in order to make the system more robust in such situations. Figure 6.36(a) and (b) illustrate that the sweeping history can constrain the pos- sible extension of object. The small object on the right of the big object has a sharp turn near the edge of the volume. The sweeping operation stops there and reported reason for such a stop is “due to a large curvature”. The recovered object axes for this 230 volume shows another point. When two objects intersect, if one object is much bigger than the other, it is almost impossible to recover the intersection relation through sweeping. The sweeping along the small object will never be able to get into the re- gion where the big object is because the verification will fail. Therefore, the two axes are apart, hence no intersection can be identified. To detect this type of intersections, more analysis is necessary. Figure 6.36(c) and (d) gives another situation in which the tube model-based verification may lead to wrong recognition. On the top right corner of input volume sub33 (shown in Figure 6.36(c)), there are two separate vessels. Because the gap between the two objects is very small and there is a seed generated near that region, the system sweep through the gap and connect the two objects. Therefore, even though many objects are recovered in our test volumes, the results should still be considered as a rough segmentation of instances of vessels. Table 6.11 gives a summarized subjective evaluation on the system performance on 30 real volumes. The evaluation is with respect to the quality of the input volumes and the system performance on these volumes. The system performance is assessed with respect to the results from initial surface detection, automatic seeding, local recognition, global recognition, and intersection relationship detection. The scale of the evaluation is “very good”, “good”, “fairly good”, “not very good”, “not good”, “bad”, and “undecided”. The reason we include “undecided” is that, for some vol- umes, it is impossible even for human eye to make a judgement about blood vessels and their spatial relationships. As it can be seen from Table 6.11, a number of input Volumes do not have good quality and most of it is due to the noise present in the input. When the processing progresses, more and more results show a trend towards promising recognition. This can be seen from the shift of bigger numbers to the left Of the table, especially at automatic seeding and local recognition stages. The sum- Iilarized information about intersection detection only includes thosed of the volumes 231 from which the positions of the intersection points can be roughly identified by the human eye via some visualization tools. Due to the complex structure of blood vessels in many test volumes, it is impossible to even roughly determine their ground truth. Therefore, the detection results from those volumes are not evaluated here. Some of the intersection detection results are visualized previously. 6.5.3 3D Visualization We choose the blood vessel volume sub5 to illustrate the new proposed visualization method. Due to space limits, Figure 6.38 displays only 25 of the cross sections on the segment of the object marked in Figure 6.37. Nine of these 25 cross sections are further displayed in the form of 3D plots for visualizing their real blood flow patterns. The results are shown in Figure 6.39. For each of the nine cross sections, r.- + 2.0 is used in order for the viewer to be able to visually judge the location of the blood vessel boundaries. It is clear that the detected radii are very close to what the data reveals. Such 3D visualization of blood flow can be helpful in diagnostic purposes. For example, the cross section plotted in Figure 6.39(h) has a blood flow pattern that seems to deviate from an expected flow pattern which, if extended, may signal some abnormality. A radiologist can view this “cross section movie” along with the subvolume from which the vessel is extracted in order to make diagnoses in cases of aneurysm or stenosis. 6.6 Summary A model-based recognition scheme is used to recognize tubular objects. The 4-stage recognition system is developed under the framework of hierarchical token grouping. A set of grouping agents perform various vision tasks at different levels of abstraction. This recognition system is tested on synthetic and real data. Experimental results are 232 8838838 .8288.“er— 3 85338." Eco—U a“ cosmnwonxz 88.8 NH wameoom osflboflé n n A: o 83.5w HE:— m o h o QED “8286285. 8.3 eoow 80: teem .38, Ho: teem Dame.“ voow voow 50> momma .8838 82 on no vocefieotom 83$? 2: mo dosage: Roseanne—em .26 @388 233 presented and assessed in this chapter using the proposed three evaluation schemes. The explicit integration of the models that characterize different aspects of the tubular objects yields both more reliable and more precise recognition results than what is achievable using a single technique. The incremental reconstruction paradigm allows a continuous refinement of recognition and it has so far produced improved recognition results. By using a parametric form of the geometric model, when objects are recognized they are simultaneously quantified. An attractive visualization method can be realized based on the recognition result which may benefit clinical diagnosis. The experimental results have demonstrated the effectiveness of both the recognition System developed in Chapter 5 under the framework of hierarchical token grouping and the model-based recognition strategy proposed in Chapter 4. 234 Figure 6.26. Initial classification results from four subvolumes (sub30, sub26, sub5, aIld sub21). The first column ((a),(d),(g),(j)) shows the original volumes. The Second column ((b),(e),(h),(k)) shows the initial segmentation. The third column ((c),(f),(i),(l)) displays all the initially detected surface points using MIP technique. 235 Figure 6.27. Result of initially detected surface points from subvolume sub5 (rendered as range images), viewed from front, left, back, and right, respectively. 236 Figure 6.28. Seeding results for volumes sub16 ((a)-(c)), sub7 ((d)-(f)), sub29 ((g)- (i)), and sub5 ((j)-(l)). The first column: original input volumes. The second column: initially detected surface points. The third column: seeds detected and superimposed on input. 237 Figure 6.29. Seeding results for volumes sub14 ((a)-(d)) and sub21 ((e)-(i)). The first column: original input volumes. The second column: initially detected surface points. The third column: initially detected interior points. The fourth column: seeds detected and superimposed on input. Figure 6.30. Effectiveness of the model-based automatic seeding approach. For all three input volumes, even with very noisy surface detection, there is no seed detected. The first row: original input volumes. The second row: initially detected surface points. 238 (’0 Figure 6.31. Effectiveness of the model-based automatic seeding. The initial surface detection results ((b) and (f) for input volumes sub17 and subl3 ((a) and (e)) are Very noisy. Seeds are generated only near the objects in these volumes ((c) and (g)). 239 Figure 6.32. Effectiveness of the model-based verification of seeds from data sub14 (a) the detected seeds, (b) the seeds that are verified, the brighter the seed, the higher the confidence in the seed, (c) the seeds that are invalidated using the tube model, ((1) the seeds that actually activate global sweepings. 240 Figure 6.33. Effectiveness of the model-based verification of seeds from data sub34 (a) the detected seeds, (b) the seeds that are verified, the brighter the seed, the higher the confidence in the seed, (c) the seeds that are invalidated using the tube model, (d) the seeds that actually activate global sweepings. 241 Figure 6.34. Sweeping results for volumes sub26 ((a)-(c)), sub7 ((d)-(f)), sub29 ((g)- (i)), and sub5 ( (j )—(l)). The first column: original input volumes. The second column: seeds detected. The third column: recovered object axes. 242 Figure 6.35. Sweeping results for volumes sub21 ((a)-(c)) and sub14 ((d)-(f)). The fiI‘St column: original input volumes. The second column: seeds detected. The third C01umn: recovered object axes. 243 (C) Figure 6.36. Sweeping results for volumes sub34 and sub33; (a),(c): original input volumes; (b),(d): recovered object axes. Figure 6.37. A real volume with the portion of visualized segment marked. 244 Figure 6.38. Twenty five visualized cross sections (frames of a “movie”). 245 (9) (h) (2‘) Figure 6.39. 3D visualization of nine of the cross sections, starting from the 14th in the previous figure. CHAPTER 7 Conclusions, Contributions, and Future Work Our study makes contributions to two different yet related areas of computer vision research. The first area is the study of vision problem solving formalisms. The contri- bution is a homogeneous problem solving architecture for vision, called Hierarchical Token Grouping. The second area is the study of integrated model-based object recognition. Our contributions in this area are (1) a parameterized Generalized Stochastic 'Ihbe Model that describes the geometric shape of the class of tubular objects, (2) an integrated, model-based, and constructive approach for recog- nizing tubular objects, and (3) a visualization method that can produce a “movie” of the cross sections of a recognized 3D object. The work advances the capability of ex- tracting and quantifying plant roots from images taken underground and blood vessels from magnetic resonance images. The contributions to these two areas are linked by applying the proposed formalism of Hierarchical Token Grouping to the development of an integrated model-based vision system for recognizing tubular objects. In the following sections, research conclusions are presented first. Then the con- tributions are discussed. 246 247 7 .1 Conclusions 7 .1.1 Hierarchical Token Grouping Both modularity and integration are indispensable aspects in vision problem solving. Identification of individual modules in the human visual system has led to the isolated study of heterogeneous modules, producing a bag of tools for a bag of problems. How- ever, this has not resulted in an adequate overall solution to computer vision. The issues of cooperation and competition among individual modules are crucial to build- ing computer systems capable of vision. This is directly related to an important topic in computer vision research: integration formalism. A major obstacle in achieving integrated vision system behavior is the heterogeneity in modules, in data repre- sentations, and in knowledge. Consistent and systematic formalisms for integrating complex machine vision systems need to be established. Motivated by the above observations, we asked following questions: 0 What homogeneous characteristics exist? 0 What do they offer and why are they useful? 0 How do we utilize them in the context of integration? The Hierarchical Token Grouping resulted from our attempt to answer these ques- tions. First, we eXplored the homogeneous characteristics behind seemingly hetero- geneous machine vision techniques. By identifying grouping to be the only operation performed in solving the two computational problems of vision, the process of vision problem solving can be viewed as the process of continuous grouping operations. Such a perspective establishes the foundation for a homogeneous architecture for vision problem solving because heterogeneous individual modules are ultimately con- sidered as homogeneous operational units that systematically perform aggregation 248 operations. The heterogeneity in representation is not inherent. In the context of hierarchi- cal vision problem solving, the functional role of data representations is to provide the interface among individual modules and to bridge the processes that are below or above in the hierarchy. Heterogeneous representations at different levels hinder flexible interactions among modules. For vision problem solving, a homogeneous rep- resentation scheme is not only possible but also beneficial. We proposed the token representation for the perceptual data at any level of abstraction. This generally designed representation offers a consistent interface for the interacting modules. The heterogeneity in modules is due to the inherent heterogeneous nature of dif- ferent vision tasks. Grouping operations are governed by rules or grouping criteria. Ultimately, grouping criteria determine the behavior or the functionality of grouping processes. Due to the inherent heterogeneity among different visual tasks, grouping criteria need also to be heterogeneous. But, grouping criteria affect only the internal behavior of grouping processes. In order to identify a general syntax for specify- ing flexible grouping criteria, we employed the more abstract grouping principle of homomorphism. The concept of homomorphism has been used in the literature to define structural similarity[41, 90]. In this thesis, we extended this concept so that both structural and non-structural similarity can be described. Such a generalization of grouping principles makes it possible to use a unified specification language to instantiate flexible grouping criteria. Using formally defined data representation and operational principles, a grouping agent can also be formally specified by its domain, its internal behavior, and its range. From such formal specifications for grouping agents, both interaction models and data flow models can be extracted. Utilizing the homogeneity in operation, in representation, and in grouping prin- ciple, a homogeneous problem solving architecture for vision, called Hierarchical 249 Token Grouping, was presented. In this architecture, both the role of knowledge and the role of sensor are made explicit. Organized grouping agents systematically perform grOupings that naturally impose a hierarchical structure on the token rep- resentation. They act both individually and cooperatively through interacting with each other. Information can flow bottom-up, top-down, and across. The proposed architecture has several desirable properties. It lends itself to a constructive paradigm. It is homogeneous and it possesses systematic be- havior. Since each grouping agent combines data structure (token) with behavior (grouping), it is a conceptually encapsulated object. Therefore, the architecture is object-oriented. The token representation provides a consistent interface that facilitates the interaction and integration among different modules. The general syn- tax for defining grouping criteria offers a cohesive and concise way to specify interaction models and data flow models. Since grouping agents are activated whenever domain data is ready, the proposed architecture supports both oppor- tunistic problem solving and distributed computing. From a practical point of view, Hierarchical Token Grouping presents a formal method to build complex vision systems. Since grouping agents are encapsulated, they can be used to assem- ble different vision systems. Such software reusability leads to quick prototyping. Since interaction models and data flow models can be extracted from the concise specifications for grouping agents, it is conceivable that large parts of the process of developing vision systems can be automated. From a theoretical point of View, because a grouping agent can be modeled as an algebraic system, an entire vision system built from a hierarchy of grouping agents can also be modeled as an (big- ger) algebraic system. Therefore, the process of vision problem solving is formally modeled, which makes it possible to utilize many existing mathematical formalisms to describe dynamic behavior of vision systems. 250 We demonstrated that (1) token representation generalizes a wide range of rep- resentation schemes, (2) token grouping abstracts most recognition methodologies, (3) the grouping principle of homomorphism unifies organizational principles across levels of abstraction, and (4) the framework of Hierarchical Token Grouping has the potential to be used as a general paradigm for vision problem solving. While the proposed architecture offers a more systematic and more formal way of constructing vision systems, there are serious issues in this proposed architecture that have not been attacked in this thesis. One issue concerns the trade-off in computation and space complexity between using a homogeneous representation scheme versus using heterogeneous representations at different levels. While a homogeneous interface benefits the interaction among modules and the data flow control within the system, it may not be most appropriate for certain types of search strategy adopted within some grouping agents. Yet another issue is the possibility of quantitative analysis of the effectiveness of the proposed architecture. Further research is needed to study these issues. We further address such needs below. 7 .1.2 Integrated Model-Based Object Recognition At the initial stage of this research, our interest was to extract plant roots from images taken underground. To solve this practical computer vision problem, a model- based approach was taken with the intention of deriving a generalized approach for recognizing the class of tubular objects so that different application domains can also benefit from it. The development of such a generalized model-based approach enables us to transfer the same technique to the tubular objects from different application domains such as plant roots, blood vessels, bacteria, and wires. In our model-based approach, parameterized models were developed for character- izing different aspects of the objects and integrated at multiple levels of abstraction. The distinct shape of tubular objects was explicitly modeled. Realizing that the 251 difficulty in using conventional Generalized Cylinders for recognition is due to its qualitative definition, we made this powerful representation scheme mathematically concrete. A parameterized GC-based model, called the Generalized Stochastic 'Ihbe Model, was proposed to capture both the salient and the dynamic geometric shape properties of a special class of GC. With this model, the recognition problem was rendered as an object parameter estimation problem. Acknowledging the indispensable role of the sensor in recognition, we established sensor models for different imaging conditions. Specifically, both photometric imag- ing and Magnetic Resonance imaging conditions were explicitly modeled. These mod- els predict the expected configuration of sensor measurements within object regions. With a specified pose estimated by utilizing the geometric model of the object, the sensor models render the recognition problem as a matching problem between an ex- pected configuration and the observed configuration of sensor measurements. Such an integration of geometric model with sensing models is realized under the framework of matched filters. Based on the models developed, an automatic multi-level recognition strat- egy was designed which exploits and integrates the power of the models at different levels of abstraction. Shape information is combined with the sensed information, yielding the potential for a recovery of objects that is both more reliable and more precise than what is achievable using a single feature. The incremental recon- struction paradigm allows a continuous refinement of recognition. Because of the parametric form of the geometric model, when objects are recognized they are simul- taneously described and quantified. Region-based and boundary-based recognition techniques are integrated in order to achieve reliable performance under difficult noise conditions. An visualization method for 3D tubular objects is presented that al- lows a sweeping along an object trajectory to be visualized. This has strong potential for clinical diagnosis. 252 The proposed recognition system was developed under the paradigm of Hierar- chical Token Grouping. We demonstrated how a computer vision problem could be posed as a token grouping problem and how a vision system could be built under this paradigm. To assess the system, three performance evaluation schemes were proposed that are applicable to the vision systems with different recognition tasks. The implemented system was tested and evaluated using both synthetic and real data. Real data from different application domains was used that included 2D plant root images, wire images, bacteria images, and 3D blood vessels from human MR brain scans. In the 2D case, a total of 33 synthetic and 20 real images were tested. In the 3D case, a total of 29 synthetic and 30 real volumes were processed. System performance was evaluated by applying the three evaluation schemes to the test data and the robustness of the system was assessed based on the performance degradation under different controlled noise situations. System performance was robust when evaluated using synthetic data with different levels of white noise. Recognition results with both 2D and 3D real data showed that the Generalized Stochastic Tube Model is appropriate and applicable to the object domain we dealt with. The sensor model for both the photometric sensing and the MRA sensing conditions worked properly. The model we developed for the MRA sensing condition is more effective than the step function model used by other researchers for recognizing blood vessels. The model-based automatic seeding method is effective in terms of both false alarm invalidation and accurate seeding positions. With noisy data such as MR volumes, having a reliable automatic seeding procedure is crucial. Some problems in seeding remain and they are mainly caused by the poor initial segmentation results. Further improvement is needed. The system is capable of tracking objects with different shapes and sizes. For blood vessel data, the system can identify vessels in MRA without human intervention and can present visual displays of their cross sections while traversing them. The recognition results are encouraging 253 and are shown to enable useful visualization of recognized objects. Experimental results have demonstrated the effectiveness of both the proposed recognition strategy and its hierarchical token grouping implementation. Analyzed structure is still short of what is seen by the trained human eye, however. 7 .2 Contributions The contributions this thesis makes to each of the two areas of computer vision research are emphasized in separate sections below. 7 .2.1 A Formalism for Vision Problem Solving The contributions of this thesis toward vision problem solving result from our research effort in answering normative questions about vision. 0 We explored, using the central theme of grouping, the homogeneous charac- teristics behind seemingly heterogeneous machine vision techniques. Specif- ically, with our philosophical view toward vision problem solving, we identi- fied the homogeneity in operation, in representation, and in grouping princi— ples. This established the foundation of a homogeneous vision problem solving paradigm. 0 We exploited such identified homogeneity and proposed a homogeneous prob- lem solving architecture for vision, called Hierarchical Token Grouping. Through this proposal, we established a formalism for developing complex computer vision systems. We showed that this architecture possesses a set of desirable properties that impact both the practical and theoretical potentials of the architecture. 254 0 We claim that the proposed architecture generalizes a wide range of com- putation vision techniques in representation, recognition methodologies, and interaction frameworks. Therefore, it has the potential of being used as a general paradigm for vision problem solving. 0 We predict that, using the proposed architecture, computer vision systems can be developed more efficiently and systematically because a large portion of the system implementation can be conceivably automated due to the homogeneous structure of the framework and the formal method of specifying vision systems. 0 We demonstrated that the proposed vision problem solving architecture can be applied to real-world problems. 7 .2.2 Model-Based Tubular Object Recognition Via Hier- archical Token Grouping The contributions of this thesis toward model-based object recognition methodologies are listed below. 0 We developed a Generalized Stochastic 'Dnbe Model that describes both the salient and dynamic shape properties of tubular objects. This model can be used for a wide range of recognition problems because it is a parameterized representation for a class of Generalized Cylinders. 0 We acknowledged the role of sensor in recognition and modeled sensing condi- tions within our research context. 0 We integrated the models that describe different aspects of objects using the framework of matched filters. 0 We proposed a new visualization method that may benefit a number of vision application domains. 255 0 We proposed three quantitative methods to evaluate computer vision system performance. 0 We designed an automatic multi-level information recovery strategy for tubular object recognition. 0 We posed the problem of recognizing tubular objects as a hierarchical token grouping problem and developed a vision system that performs the task using a hierarchy of grouping agents. a We tested the system on 33 synthetic and 20 real 2D images as well as 29 syn- thetic and 30 real BD volumes, including 2D bacteria, wires, and plant roots as well as 3D blood vessels. The system performance and robustness was quanti- tatively evaluated. The system built for recognizing tubular objects was shown to be capable of automatically localizing instances of tubular objects in noisy images and correctly tracking the objects in BD space. The estimated object pa- rameters were accurate and the system performance degraded gracefully when noise is increased. 7 .3 Future Research 7 .3.1 Formalisms in Vision Problem Solving In the future, we would like to further exploit the potential of the HTG architecture presented in Chapter 2. Theoretically, formal analysis is needed to better understand the full potential of the HTG, including various trade-offs between a homogeneous design and a heterogeneous design of vision systems. Practically, an environment can be built that facilitates vision system development under the HTG architecture. This environment should include interactive tools, in— cluding a specification tool that supports efforts by system designers to define and 256 specify related system components, an architecture generator tool that accepts a user’s specification as input and produces a HTG architecture defined by the specification, including all of its communication channels, a visualization tool that graphically dia- grams the interaction models, data flow models, and vision system simulation model such as Petri nets, a behavior modeling tool that describes the dynamic information flow within the architecture. Another objective will be to develop a collection of grouping agents that perform a set of generic vision tasks and to reuse these standard agents to assemble different vision systems. 7 .3.2 Model-Based Object Recognition While the system developed in this thesis produced fairly robust recognition perfor- mance, some problems remain. For example, because the automatic seeding procedure relies on initially detected surfaces, poor surface detection may result in missing seeds which can have serious impact on recognition results. There are several possible ways of improving the seeding process. One is to integrate the region information with the surface information during the seeding process. Another possible strategy is to iteratively perform recognition. Before each iteration, recognized objects are removed from the scene and surrounding regions of the removed objects are processed in the next iteration. The effectiveness of such possibilities need to be explored and tested. The current recognition results provide a rough separation of objects from back- ground. To precisely describe the shape of the recognized objects, further processing is needed. In future research we will study the use of deformable models to accurately describe the deviations of objects from the tube model in order to be able to represent abnormalities such as aneurysms and stenoses. 257 Another issue that needs to be addressed in future research is analyzing the spa- tial relationships among different objects. The current approach to detecting in- tersections among objects during sweeping may be inadequate for some situations. Methods which analyze more than one type of spatial relationship based on the axial representation of objects are possible and need to be developed and tested. Finally, we like to parameterize more flexible GC-based models, including the geons. Combining with some common sensor models such as photometric sensing or range sensing, integrated model-based methods can be employed to recognize a generic set of geons through estimating object parameters from input images. The long term goal is to realize a Recognition-By-Component (RBC) recognition scheme via Hierarchical Token Grouping. Individual modules that are responsible for recognizing the geons can be developed as a set of grouping agents under the HTG paradigm which can be reused to assemble many different recognition systems. Geons recognized by the grouping agents can be used to assemble many different objects or to be further grouped to form bigger objects under the HTG paradigm to realize the RBC theory. APPENDICES APPENDIX A Distance Transformation: Chamfering Technique The technique of Chamfering was proposed to perform efficient distance transforma- tion [7]. Using this technique, only two scans (raster scan and reverse raster scan) are needed to compute a 2D distance map in which every discrete point is assigned a value representing the minimum distance between this point and a given set of points. In the boundary-based evaluation method proposed in Chapter 6, we need to assess the match between the ground truth boundary QB represented as a set of discrete points and the corresponding boundary B detected by the vision system to be evaluated. Such a match is described by distance distribution index ’DB established based on a distance map d(x,g3) where x is a discrete point: In order to efficiently compute d(x, 93), we employed the original Chamfering technique for 2D case and extended it to 3D case. A.1 2D Chamfering Assume both QB and B are defined on a 2D grid {x} = {(x,y)}, 1 S a: S M,1 S y S N. Derive d(x, 93) by performing a distance transformation using the following 258 259 Chamfering algorithm: 2D Chamfering Algorithm[7] STEP 1: Initialization: Set d(x,gB) to zero iff x 6 QB and infinite iff x ¢ QB. STEP 2: Forward pass: Vx = (22,31) 6 [(1,1),..,(M,N)], update d((:c,y),gB) in araster scan by: «(w/W) = mz‘n(d((x,y),93), d<.93) = min(d((x,y),g3), dam — 1. M”) + d,, dam —1,y +1),93)+ d2,d((x,y — 0.9”) + 4,, (1((3 +1a3/ ‘1),gBl‘l' d2)- Here, d1 = 1.0 and d2 = fl = 1.4142 approximate the distance to a 4-connected neighboring point and to a diagonal neighboring point on a discretized grid, respec- tively. A.2 Extended 3D Chamfering In order to compute a distance map in a 3D discrete space, we extended the 2D Chamfering technique to 3D case. The goal of the extension is to conserve the merit 260 of only needing two scans of the original 2D Chamfering technique. Extended 3D Chamfering Algorithm Assume both 93 and B are defined on a 3D grid {x} = {(x,y,z)}, 1 S a: S M,1 S y S N,1 S 2 S D. Derive 3D distance map d(x,gB) by performing a distance transformation using the following Chamfering algorithm: STEP 1: Initialization: Set d(x,g3) to zero iff x 6 GB and infinite iff x ¢ 93. STEP 2: Forward pass: ‘v’x = (:c,y,z) E [(1,l,1),..,(M,N,D)l, update d(($,y,z),gB) in a raster scan by: d(($,y,z),gB) = min(d((x,y,2),93),d((x - 1,y+ 1,2 + 0,0”) + d3, d((:r -1,y+1,z),gB)+ d2,d((x — 1,y,z+ 1),gB)+ d2, d((x- 1,31- 1,Z+1),QB)+ ds,d(($,y- 1.2+1),GB)+ d2, d((:r,y,z+1),gB)+ d,,d((x,y+1,z+1),gB)+ d2, d((:c,y+1,z),gB)+ d,,d((x+1,y,z+1),gB)+ d2, d((:c+1,y+1,z+1),93)+d3,d((x+ 1,y,z),gB)+ 4,, d((:c+1,y+ 1,2),g3)+d2,d((2:+ 1,y-—1,z+ 1),gB)+ d3). STEP 3: Backward pass: Vx = (x,y,z) E [(M,N,D),..,(1,1,1)], update d((:c, y, z),QB) in a reverse raster by: d((:r,y,z),gB) = min(d((:c,y,z),gB),d((x — 1,y+ 1,2 — 1),g3) + d3, d(($—1,y,Z—1),gB)+ d2ad(($ _ Iay— 132—1)ag3)+ d3? d(($ —1,y,z),gB)+ d13d(($ —1ay- 1,2),g3) "l' d2: d((:c,y +1.2 —1),gB)+ d2,d((z,y,z —1),gB)+ d1, 261 d((~r,y-1,2-1),QB)+d2,d((x,y+ 1,2),QB)+ dz, d((:z:+1,y,z-1),QB)+d2,d((:r+1,y+1,2— 1)ygB)+d3a d((a:+1,y—1,z— 1),g3)+ d3,d((a:+1,y— 1,2),g3)+ d2). Here, d1 = 1.0, d; = x/2 = 1.4142, and d3 = \/3 = 1.7321 approximate the distance to a 4-connected neighboring point, to a 2D diagonal neighboring point, and to a 3D diagonal neighboring point on a discretized grid, respectively. BIBLIOGRAPHY BIBLIOGRAPHY [1] N. Abuja and A. L. Abbott. Active stereo: Integrating dosparity, vergence, focus, aperture, and calibration for surface estimation. IEEE' Transactions on Pattern Analysis and Machine Intelligence, 15(10):1007—1029, October 1993. [2] Narendra Ahuja and Mihran Tuceryan. Extraction of early perceptual structure in dot patterns: Integrating region, boundary, and component gestalt. Com- puter Vision, Graphics, and Image Processing, 48(3):304—356, December 1989. [3] M. Atiquzzaman. Multiresolution hough transform - an efficient method of detecting patterns in images. IEEE Transactions on Pattern Analysis and Ma— chine Intelligence, 14(11):1090—1095, November 1992. [4] Dana H. Ballard and Christopher Brown. Computer Vision. Pretice Hall, 1982. [5] H. G. Barrow and R. J. Popplestone. Relational descriptions in picture pro- cessing. Medical Imaging, 6, 1971. [6] H. G. Barrow and J. M. Tenenbaum. Recovering intrinsic scene characteristics from images. In Computer Vision Systems, pages 3—26. Academic Press, 1978. Edited by Allen R. Hanson and Edward M. Riseman. [7] H. G. Barrow, J. M. Tenenbaum, R. C. Bolles, and H. C. Wolf. Parametric cor- respondence and chamfer matching: Two new techniques for image matching. In Proceedings of the 5th International Joint Conference on Artificial Intelli- gence, page Vol. 2, 1977. [8] Harry G. Barrow and J. M. Tenenbaum. Retrospective on “interpreting line drawings as three-dimensional surfaces”. Artificial Intelligence, 59:71—80, 1993. [9] Paul Besl. Machine Vision for Three-Dimensional Scenes, chapter The Free- Form Surface Matching Problem, pages 25—71. Academic Press Inc., 1990. [10] Paul J. Besl and Ramesh C. Jain. Three-dimensional object recognition. Com- puter Surveys, 17(1):75-145, March 1985. [11] Irving Biederman. Human image understanding: Recent research and a theory. C VGIP, pages 29—73, 1985. [12] Irving Biederman. Recognition-by-components: A theory of human image un- derstanding. Psychological Review, 94(2):115—147, 1987. 262 [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] 263 T. O. Binford. Visual perception by computer. In IEEE Conference on Syetems and Control, Miami, 1971. Harry Blum. Symposium on Models for the Perception of Speech and Visual Form. CambridgezMIT Press, 1964. R. M. Bolle, A. Califano, and R. Kjeldsen. A complete and extendable approach to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(5):534-548, May 1992. K. Bowyer, D. Eggert, J. Stewman, and L. Stark. Developing the aspect graph representation for use in image understanding. In Proceedings of Image Under- standing Workshop, pages 831-849, 1989. Michael Brady. Preface-the changing shape of computer vision. Artificial In- telligence, 17:1—15, 1981. Michael Brady and Haruo Asada. Smoothed local symmetries and their im- plementation. The International Journal of Robotics Research, 3(3):36—61, Fall 1984. Rodney A. Brooks. Model-based three-dimensional interpretations of two- dimensional images. IEEE Transactions on Pattern Analysis and Machine In- telligence, 5(2):]40—150, March 1983. Rodney Allen Brooks. Model-based Computer Vision. UMI Research Press, 1984. John Canny. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-8(6):679—698, November 1986. J.Y. Catros and D. Mischler. An artificial intelligence approach for medical picture analysis. Pattern Recognition, 8:123—130, September 1988. Subhasis Chaudhuri, Shankar Chatterjee, Norman Katz, Mark Nelson, and Michael Goldbaum. Detection of blood vessels in retinal images using two- dimensional matched filters. IEEE Trans. on Medical Imaging, 8(3):263—269, September 1989. S. Chen and H. Freeman. Computing characteristic views of quadratic surfaced solids. In Proceedings of the 10th Int. Conf. Patt. Recogn., 1990. Roland T. Chin and Charles R. Dyer. Model-based recognition in robot vision. Computing Surveys, 18(1):67—108, March 1986. Chen-Chau Chu and J. K. Aggarwal. Image interpretation using multiple sens- ing modalities. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 14(8):840—847, August 1992. 264 [27] Steven D. Cochran and Gerard Medioni. 3-d surface description from binocu- lar stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(10):981—994, October 1992. [28] Alan Doerr and Kenneth Levasseur. Applied Discrete Structures for Computer Science. SRA Pergemon, 1988. [29] J. Dolan and E. Riseman. Computing curvilinear structure by token-based grouping. In Proc. IEEE C VPR, pages 264—270, 1992. [30] J. Dolan and R. Weiss. Perceptual grouping of curved lines. In Proc. of DARPA Image Understanding Workshop, pages 1135—1145, Palo Alto, CA, 1989. [31] J. H. Duncan and T. Birkholzer. Reinforcement of linear structure using parametrized labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(5):502—515, May 1992. [32] J. C. Ferguson, A. J. M. Smucker, and Qian Huang. Segmentation of roots from their soil background in minirhizotron video images by adaptive thresholding and ridge detection algorithms. In Proceedings, ASA-CSSA-SSSA 1991 Annual Conference, Denver, November 1991. [33] F. J. F ierens, Van Cleynenbreugel, P. Suetens, and A. Oosterlinck. Iconic repre- sentation of visual data and models. Pattern Recognition, 12:781-792, December 1991. [34] Bruce E. Flinchbaugh and B. Chandrasekaran. A theory of spatio-temporal aggregation for vision. Artificial Intelligence, 17:387—407, 1981. [35] Patrick J. Flynn. CAD-Based Computer Vision: Modeling and Recognition Strategies. PhD thesis, Michigan State University, 1990. [36] H. Freeman. Computer processing of line drawing images. Computing Surveys, 6(1):57—98, March 1974. [37] Stuart A. Friedberg. Finding axes of skewed symmetry. Computer Vision, Graphics, and Image Processing, 34:138—155, 1986. [38] Z. Gigus and J. Malik. Computing the aspect graph for line drawings of poly- hedral objects. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 654—661, 1988. [39] Ari David Gross. Shape from a symmetric universe. Technical Report CUCS- 065-90, Columbia University Department of Computer Science, Department of Computer Science / Columbia University / New York City, New York 10027, 1990. [40] Allen R. Hanson and Edward Riseman. Segmentation of natural scenes. In Computer Vision Systems, pages 129—164. Academic Press, 1978. Edited by Allen R. Hanson and Edward M. Riseman. 265 [41] Robert M. Haralick. Scene analysis, arrangements, and homomorphisms. In Computer Vision Systems, pages 199—212. Academic Press, 1978. Edited by Allen R. Hanson and Edward M. Riseman. [42] Robert M. Haralick and Linda G. Shapiro. Computer and Robot Vision. Addison-Wesley Publishing Company, 1992. [43] Simon Haykin. Communication Systems. John Wiley and Sons, New York, 1978. [44] Julian E. Hochberg. Effects of the gestalt revolution: The cornell symposium on perception. Psychological Review, 64(2):73—84, 1957. [45] Richard L. Hoffman. Object Recognition From Range Images. PhD thesis, Michigan State University, 1986. [46] Berthold Klaus Paul Horn. Robot Vision. The MIT Press, McGraw-Hill Book Company, London, New York, 1986. [47] Qian Huang and G. C. Stockman. Generalized tube model: 3d elongated object recognition from 2d intensity images. In Proceedings of International Conference on Computer Vision and Pattern Recognition, New York, USA, June 1993. [48] Qian Huang and G. C. Stockman. Model-based elongated object recognition using invariant surface features and matched filters. In Proceedings of SPIE: Symposium on Mathematical Methods in Medical Imaging, San Diego, USA, July 1993. [49] D. H. Hubel. Eye and Brain. Scientific American Library, New York, 1988. [50] C. L. Jackins and S. L. Tanimoto. Oct-trees and their use in representing 3-d objects. Computer Graphics and Image Processing, 14(3):249—270, 1980. [51] Anil K. Jain. Fundamentals of Digital Image Processing. Prentice Hall, 1989. [52] Anil K. Jain and Richard Hoffman. Evidence—based recognition of 3-d objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(6):783— 802, November 1988. [53] Ramesh C. Jain and Thomas O. Binford. Dialogue: Ignorance, myopia, and naivete in computer vision systems. C VGIP: Image Understanding, 53(1):112— 117, January 1991. [54] Laveen N. Kanal. On pattern, categories, and alternate realities. Pattern Recog- nition Letters, 14:241—255, March 1993. [55] Koichi Kitamura, Jonathan M. Tobis, and Jack Sklansky. Estimating 3d skele- tons and transverse areas of coronary arteries from biplane angiograms. IEEE Trans. on Medical Imaging, 7(3):173—187, 1988. 266 [56] Greg C. Lee. Reconstruction of Line Drawing Graphs From Fused Range and Intensity Imagery. PhD thesis, Michigan State University, 1992. [57] Martin D. Levine. Vision in Man and Machine. McGraw-Hill Book Company, 1985. [58] David G Lowe. Perceptual Organization and Visual Recognition. Kluwer Aca- demic Publishers, 1985. [59] David G. Lowe. Organization of smooth image curves at multiple scales. Inter- national Journal of Computer Vision, 3:119—130, 1989. [60] Y. Lu and R. C. Jain. Reasoning about edges in scale space. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(4):450—468, April 1992. [61] D. Marr and S. Ullman. Directional selectivity and its use in early visual processing. Technical Report A.I. Memo No. 524, M.I.T. A.I.Lab.,, Cambridge, MA, 1979. [62] David Marr. Vision. McGraw-Hill Book Company, 1982. [63] James D. McCafferty. Human and Machine Vision. Digital and Signal Process- ing. Ellis Horwood, 1990. [64] R. D. Merrill. Representations of contours and regions for efficient computer search. Communication ACM, 16(2):69—82, February 1974. [65] M. Minsky. Logical versus analogical or symbolic versus connectionist or neat ersus scruffy. Artificial Intelligence Magzine, 12(2):34-51, 1991. [66] Olivier Monga, Nicholas Ayache, and Peter T. Sander. From voxel to intrinsic surface features. Image and Vision Computing, 10(6):403—417, July/August 1992. [67] Olivier Monga and Serge Benayoun. Using partial derivatives of 3d images to extract typical surface features. Technical report, INRIA-ROCQUENCOURT, France, F ebuary 1992. [68] Ahmed M. Nazif and Martin D. Levine. Low level image segmentation: An expert system. IEEE Trans. PAMI, 6(5):555-577, September 1984. [69] R. Nevatia and T. O. Binford. Description and recognition of curved objects. Artificial Intelligence, 8(1):77—98, 1977. [70] Ramakant Nevatia and Thomas O. Binford. Description and recognition of curved objects. Artificial Intelligence, 8:77—98, 1977. [71] P. Parent and S. Zucker. Trace inference, curvature consistency, and curve detection. IEEE Trans. PAMI, 11(8):823—839, 1989. 267 [72] Alex Pentland. Automatic extraction of deformable part models. International Journal of Computer Vision, 4:107—126, 1990. [73] J. Ponce, D. Chelberg, and W. B. Mann. Invariant properties of straight ho- mogeneous generalized cylinders and their contours. IEEE Trans. on PAMI, 11(9):951—966, 1989. [74] J. Ponce and D. J. Kriegman. Computing exact aspect graphs of curved objects: Parametric surfaces. In Proceedings of the 8th Nat. Conf. Artificial Intell., pages 1074—1079, 1990. [75] John M. Prager. Extracting and labeling boundary segments in natural scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI- 20(1):16—27, January 1980. [76] Narayan S. Raja and Anil K. Jain. Recognizing geons from superquadrics fitted to range data. Image and Vision Computing, 10(3):l79—190, 1992. [77] Kashipati Rao. Shape Description From Sparse and Imperfect Data. PhD thesis, University of Southern California, 1988. [78] L. G. Roberts. Optical and Electro—optical Information Processing, chapter Ma- chine Perception of Three-dimensional Solids, pages 159—197. CambridgezMIT Press, 1965. [79] Hille Rom and Gerard Medioni. Hierarchical decomposition and axial shape description. In Proceedings of International Conference on Computer Vision and Pattern Recognition, pages 49—55, Champaign, Illinois, June 1992. [80] A. Rosenfeld. Pyramid algorithms for perceptual organization. Behav., Res, Methods, Instr., and Computers, 18(6):595—600, 1986. [81] Azriel Rosenfeld. Axial representations of shape. Computer Vision, Graphics, and Image Processing, 33:156—173, 1986. [82] Azriel Rosenfeld and Avinash C. Kak. Digital Picture Processing. Academic Press, 1982. [83] James Rumbaugh, Michael Blaha, William Premerlani, Frederick Eddy, and William Lorensen. Object-Oriented Modeling and Design. Prentice Hall, 1991. [84] H. Samet. Region representation:quadtrees from boundary codes. Comm. ACM, 23:163—170, 1980. [85] P. T. Sander and S. W. Zucker. Inferring surface trace and differential structure from 3-d images. IEEE Trans. on Pattern Analysis and Machine Intelligence, 12(9):833—854, September 1990. 268 [86] S. Sarkar and K. L. Boyer. Integration, inference, and management of spatial information using bayesian networks: Perceptual organization. IEEE Trans- actions on Pattern Analysis and Machine Intelligence, 15(3):256—274, March 1993. [87] E. Saund. Symbolic construction of a 2d scale space image. IEEE Trans. PAMI, 12(8):817—830, 1990. [88] E. Saund. Labeling of curvilinear structure across scales by token grouping. In Proc. IEEE CVPR, pages 257—263, 1992. [89] Steven A. Shafer. Shadows and Sihouettes in Computer Vision. Robotics and Vision. Kluwer Academic Publishers, 1985. [90] Linda G. Shapiro and Robert M. Haralick. Structural descriptions and inexact matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3(5):504—519, 1981. [91] J. Sklansky. Measuring concavity on a rectangular mosaic. IEEE Trans. Com- puters, 21(12), December 1972. [92] T. Sripradisvarakul and R. Jain. Generating aspect graphs for curved objects. In Proceedings of IEEE Workshop: Interpretation 3D Scenes, pages 109—115, 1989. [93] K. Stevens and A. Brookes. Detecting structure by symbolic constructions on tokens. CVGIP, 37:238—260, 1987. [94] P. Suetens, C. Smet, F. Van De Werf, and A. Oosterlinck. Recognition of the coronary blood vessels in angiograms using hierarchical model-based iconic search. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 576—579, 1989. [95] Paul Suetens, Fua Pascal, and Andrew J. Hanson. Computational strategies for object recognition. Computing Surveys, 24(1):5-61, March 1992. [96] Steven L. Tanimoto. Regular hierarchical image and processing structures in machine vision. In Computer Vision Systems, pages 165—174. Academic Press, 1978. Edited by Allen R. Hanson and Edward M. Riseman. [97] Gabriel Taubin, Ruud M. Bolle, and David B. Cooper. Representing and com- paring shapes using shape polynomials. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 510—516, 1989. [98] Saeid Tehrani. Knowledge-Guided Boundary Determination in Low-Contrast imagery: An Application to Medical Images. PhD thesis, University of Michi- gan, 1991. [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] 269 A. Trisman. Properties, parts, and objects. In Handbook of Perception and Hu- man Performance. Wiley-Interscience, 1986. Edited by K.R. Boff, L. Kaufman and J .P. Thomas. Leonard Uhr. “recognition cones,” and some test results; the imminent arrival of well—structured parallel-serial computers; positions, and positions on positions. In Computer Vision Systems, pages 363—378. Academic Press, 1978. Edited by Allen R. Hanson and Edward M. Riseman. F. Ulupinar and R. Nevatia. Shape from contour: Straight homogeneous gen- eralized cones. In Proceedings of ICC V’90, pages 582—586, 1990. Faith Ulupinar and Ramakant Nevatia. Inferring shape from contour for curved surfaces. In Proceedings of the 10th International Conference on Pattern Recog- nition, pages 147—154, 1990. Faith Ulupinar and Ramakant Nevatia. Recovery of 3—d objects with multi- ple curved surfaces from 2-d contours. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 730-733, 1992. H. B. Voelcker and A. A. G. Requicha. Geometric modelling of mechanical parts and processes. Computer, 10:48—57, December 1977. Lars Westberg. Hierarchical contour-based segmentation of dynamic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(9):946— 952, September 1992. Andrew P. Witkin and Jay M. Tenenbaum. On the role of structure in vision. In Human and Machine Vision, pages 481—544. Academic Press, 1983. Edited by Jacob Beck, Barbara Hope, and Azriel Rosenfeld. S. W. Zucker and R. M. Hummel. A three-dimensional edge operator. IEEE Trans. on Pattern Analysis and Machine Intelligence, 3(3):324—331, May 1981. Steven W. Zucker. Vertical and horizontal processes in low level vision. In Computer Vision Systems, pages 187—195. Academic Press, 1978. Edited by Allen R. Hanson and Edward M. Riseman. Steven W. Zucker, R. A. Hummel, and A. Rosenfeld. An application of relax- ation labeling to line and curve enhancement. IEEE Transaction on Computers, 26, 1977. "‘Tlllllllllillllllill“