‘V'ILJ' ~.u. 0, and the dot product is largest when V] J. 1. There- fore, a weight w > 0 is needed in this case. Reversing the orientation on 7 would require a weight w < 0. img 2 xiii 83 88 89 91 6.4 6.5 6.6 6.7 6.8 Interpolation snake. Orientation along the snake is from top to bottom. Blue and red: control points. Green: arbitrary points on the snake. The weights of each of the control points are indicated by the shape (and color) of the marker: a blue circle means at = 0 (hard constraint), whereas a. red d: means w = i1. Despite starting out on an edge, the particular weights guide the snake towards other edges, as follows: w < 0 selects dark: ——+ light. gradient, whereas at > 0 selects light ——> dark. gradient, as we move along the curve with the chosen orientation. The points with w = 0 (blue circles) remain fixed during the fitting process, acting as hard constraints. ........................ Computation of the image gradient. .................... Integral map. The integral of I over the reet angle A B CD can be computed by Equation (6.3), knowing the integrals of I over the four rectangles with the lower left corner at the origin 0, and the upper left corner at. A, B, C and D. .............................. Multi-scale, multiple-component interpolation snake. (a) 20% salt-and- pepper image; (b) Initial snake; (c) Coarse scale fit: 30 x 60 rectangles for computation of VI. ((1) Fine scale fit: 10 x 20 rectangles. (e) Large scale gradient used in (c) (:1: direction only); (f) Small scale gradient used in (d) (a: direction only); ...................... The need for arclength parametrization. Green: interpolation snake (con- trol points omitted). Yellow: ground truth. (a) Initial snake (arclength parametrization). (b) Manually drawn border of a human kidney. (c) Snake fitted using image energy only. The points do not remain equidis- tant, and tend to accumulate along strong boundary segments, pro- ducing an artificially good score (E = Eimg = -—44.7159), better even than the ground truth score (E = Eimg = —16.4465), but a poor fit. Average Hausdorff distance from ground truth: 7.68 pixels. ((1) Same as (c), overlaid over the ground truth. (e) Snake fitted using both the image energy Eimg and the arclength energy Earc. Average Hausdorff distance from ground truth: 3.60 pixels. (f) Same as (c), overlaid over the ground truth. ............................ 6.9 Squashed average chamfer transform, used as the shape model with inter- 6.10 Effect of the shape energy E polation snakes. Compare to Figure 4.3 ................. shape: (a) Clear kidney image. (b) Occluded kidney, and initial snake (green). (0) Final snake with no shape energy. ((1) Final snake with Shape energy. ................... xiv 93 93 94 96 99 101 6.11 Limitations of the shape energy. (a) Initial snake. (1)) Ground truth. (C) Final snake without shape energy. ((1) Comparison of (c) with the ground truth. (e) Final snake with. shape energy. (f) Comparison of (e) with the ground truth. ........................ 7.1 Boundary detection by finding a break point. in an intensity profile. across the boundary. .............................. 7.2 Break point detection of a randomly generated signal. Blue: noisy data y = y HT+77 with 1000 points and 4 segments. Red: piecewise constant function yH,T used to generate y (ground truth). Green: denoised sig— nal. The noise 7] was Gaussian with zero mean and standard deviation = 20. The true break points were T = (0, 169, 877, 924, 1000) and the true weights vector was H = (7.17897, 96.6553, —66.3123, —24.9187). The denoised signal has breaks identical to the true breaks, and weights H" = (9.41748, 97.0672, —68.4484, —27.1419) ............... 7.3 Break point detection along an intensity profile in an ultrasound image. (a) We move along the yellow segment from the top green end towards the red end. The detected interior points are marked with cyan (high to low) and magenta (low to high). (b) Blue: 1D intensity profile along the yellow segment from (a). Green: segmented signal. ........ 8.1 [a] Kidney ultrasound. [b] true boundaries (GT) manually traced by a human expert ................................ 8.2 Synthetic images. (a) Clusters of control points. There are 30 points at each site, drawn from a Gaussian distribution of a user-defined stan— dard deviation 0. (b) Interpolation splines using the control points from (a). (c)-(f) Binary images obtained as the interior of the first. four splines in (b). ............................ 8.3 Various types of noise. (a) Clear image. (b) Gaussian noise with standard deviation 0 = 30. (c) Uniform noise with probability p = 0.3. (d) Salt-and-pepper noise with probability p = 0.3. (3) Spread noise with standard deviation 0 = 30 ......................... XV 104 108 116 118 126 127 130 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13 Different types of initializatitm. Yellow: ground truth (CT). Red: initial snake. (a) Clear image. The four points marked with a cross were defined by the sonographer when the image was acquired. (b) Scaling initialization by a factor Of 0.7. (c) Gaussian randomization with a standard deviation 0 = 10 pixels. ((1) Gaussian randomization with standard deviation 0 = 20 pixels. (e) Distance initialization at distance d = 20 pixels inside the ground truth. (f) Model initialization. The red curve is the image of a template kidney (not shown) via. a TPS transform (Section 3.2.1) that maps four pre-defined anchors on the model (not. shown) onto the four cyan anchors in the image (user- defined). ................................. Segmentation of kidney ultrasound using the C han-Vese model. Left col- umn: binary segmentation of the region of interest. defined around the ground truth. Right column: ground truth (yellow) and traced bound— ary between the segmented regions on the left (red). Top row: small— est normalized symmetric difference (N SD) error (14.9%). Middle row: NSD = 38.4%. Bottom row: NSD = 88.6%. Average NSD = 40.8%. Chan-Vese model: twenty random samples. The samples are sorted in increasing order of the normalized symmetric difference (NS D). . . Chan-Vese model: the same images as in Figure 8.6 with the ground truth overlaid onto the image (yellow). .................... Segmentation of kidney ultrasound using the GAC model. Left column: output snake. Right column: ground truth (yellow) and output snake (red). Top row: smallest normalized symmetric difference (NSD) error (10.7%). Middle row: NSD = 22.98%. Bottom row: NSD = 49.28%. Average NSD = 25.32%. ........................ GAC model: twenty random samples. The samples are sorted in increasing order of the normalized symmetric difference ( N S D). ......... GAC model: the same images as in Figure 8.9 with the ground truth overlaid onto the image (yellow). .................... Segmentation of kidney ultrasound using the KWT model. Left. column: output snake.Right column: ground truth (yellow) and output snake (red). Top row: smallest normalized symmetric difference (NSD) error (4.36%). Middle row: NSD = 9.46%. Bottom row: NSD = 27.22%. Average NSD = 10.44%. ........................ KWT model: twenty random samples. The samples are sorted in increas- ing order of the normalized symmetric difference (N SD) ........ KWT model: the same images as in Figure 8.12 with the ground truth overlaid onto the image (yellow). .................... xvi 139 140 141 143 144 145 146 147 8.14 8.16 8.17 8.18 8.19 8.20 8.21 8.22 8.23 8.24 8.25 8.26 8.27 8.28 Segmentation of kidney ultrasound using the Interpolation snakes model. Left column: output snake. Right column: ground truth (yellow) and output snake (red). Top row: smallest normalized synnnetric difference (NSD) error (3.71%). Middle row: NSD = 7.72%. Bottom row: NSD = 13.57%. Average NSD = 7.60%. ............... Detected contours. 3.71 S NSD S 5.52(%). ................ Red: detected contours (same as in Figure 8.15). Yellow: GT. 3.71 S NSD S 5.52(%). ............................. Detected contours. 5.52 < NSD S 6.65(%). ................ Red: detected contours (same as in Figure 8.17). Yellow: GT. 5.52 S NSD S 6.65(%). ............................. Detected contours. 6.65 S NSD < 7.87(%). ................ Red: detected contours (same as in Figure 8.19). Yellow: GT. 6.65N SD S 7.87(%). .................................. Detected contours. 7.87 < NSD S 8.79(%). ................ Red: detected contours (same as in Figure 8.21). Yellow: GT. 7.89 < NSD S 8.79(%). ............................. Detected contours. 8.79 S NSD < 9.91(%). ................ Red: detected contours (same as in Figure 8.23). Yellow: GT. 8.79 S IVSD < 9.916%) ............................. Detected contours. 9.91 S NSD S 13.57(%) ................. Red: detected contours (same as in Figure 8.25). Yellow: GT. 9.91 S NSD S 13.57(%) .............................. Graphical user interface for the blind study comparison of three types of kidney boundaries (I—SNAKE, KWT and GT). A human expert chose between outputs of three algorithms, displayed two at the time. Chan-Vese model on synthetic images. Left column: initial snake (scaling to 0.7 - GT). Middle column: final snake. Right: final snake (red) and CT (yellow). On rows, top to bottom: Gaussian noise, salt.- and-pepper noise, uniform noise, spread noise. Original images were cropped and scaled down for display purposes, so some noise was lost in the process. .............................. 159 160 161 162 163 164 165 166 167 168 169 170 171 173 8.30 8.31 8.32 8.33 8.34 8.35 9.1 9.2 Geodesic Active Contours model (GAC) on synthetic images. Left column: initial snake (scaling to 0.7 - GT). Middle column: final snake. Right: final snake (red) and GT (yellow). On rows, top to bottom: Gaussian noise, salt—and—pepper noise, uniform noise, spread noise. Original images were cropped and scaled down for display pur— poses, so some noise. was lost in the process. .............. Kass-Witkin—Terzopoulos model (KWT) on synthetic images. Left column: initial snake (scaling to 0.7 - GT). Middle column: fi- nal snake. Right: final snake (red) and GT (yellow). On rows, top to bottom: Gaussian noise, salt-and-pepper noise, uniform noise, spread noise. Original images were cropped and scaled down for display pur- poses, so some noise was lost in the process. .............. Kass-Witkin-Terzopoulos model (KWT) on synthetic images. Left column: initial snake (Gaussian randomization with standard de- viation 0‘ = 10 pixels). Middle column: final snake. Right: final snake (red) and GT (yellow). On rows, top to bottom: Gaussian noise, salt- and-pepper noise, uniform noise, spread noise. Original images were cropped and scaled down for display purposes, so some noise was lost in the process. .............................. Interpolation snakes model (I-SNAKE) on synthetic images. Left column: initial snake (scaling to 0.7 - GT). Middle column: fi— nal snake. Right: final snake (red) and GT (yellow). On rows, top to bottom: Gaussian noise, salt-and-pepper noise, uniform noise, spread noise. Original images were cropped and scaled down for display pur- poses, so some noise was lost in the process. .............. 2D border detection using radial 1D break-point detection. (a) Break points along 72 rays starting from a user-defined center. The actual intensity profiles not drawn. Red: dark—to—light break point. Blue: light-to—dark break point. (b) Shape model (green) fit to the red break points. (c) Tube (cyan) across the fitted model (green). (d) Break points on the tube intensity profiles. (e) Smoothing spline on the red points from (d). (f) Smoothing spline (beige) and ground truth (yellow). ................................. Interpolation snakes on truck images. Output of interpolation snakes. Four major classes of fingerprints ....................... Extraction of the flow field from a fingerprint impression. (a) Original (scaled down to 0.3 of its original size, to fit the page). (b) Flow field. xviii 179 Interpolation snakes on truck images. Initialization of interpolation snakes. 187 188 191 193 9.3 9.4 9.6 9.7 9.8 Definitions of fingerprint kernels using splines. Yellmv: control points. Red: kernel. The kernels model the shape of the ridge pattern around the center of the fingerprint. Procrustes model for learning fingerprint kernels [78]. (a)-(d) Manually traced left loops in fingerprint images. The yellow dots are the land- marks, homologous across all images. Only four out. of 20 fingerprint impressions are shown here. (e) All 20 training contours extracted, 1.1nregistered. (f) Registered training contours (green) and their mean (purple). The angles a and :3 defining the measure of fit (Eq. (9.1)) Elliptic kernel fit to two impressions of the same finger. Classification error due to wrong best-fit kernel. Sources of classification errors. (a) Right loop fingerprint with an arch as kernel of best fit. (b) The same kernel (dark segments) overlaid on top of the flow field of the impression in (a). The fit is good, but the flow field alone does not show a. right loop pattern. xix 194 196 197 199 200 Chapter 1 Introduction 1.1 To model, or not to model: A personal perspective In order to find their way in the world, most - if not all - organisms must sense the environment around them. Most species try to “make sense” 1 of the world in their own way, enough to ensure their survival, and limited (sometimes severely) by the complexity of their nervous system. We, humans, probably like many other species, parse the world into physically and functionally distinct. entities. We mindlessly refer to these entities as objects. We use nouns like “apple”, “tree”, “heart”, etc. to label these objects. But what is an object? Of course, in our daily life, we do just fine without a definition of the notion of “object”, or even without a definition of most of the natural objects: we see it, and we can name it effortlessly and instantly, and we can do so from a very young age too. However, the moment we try to build intelligent. artificial systems that are. able to act in the world autonomously, in a human—like fashion, we must define or l\‘Ve use this term here without reference to controversial concepts like “mind", “consciolisness", or “cognition”. represent objects in some precise. mathematical way. so that the computer can act 011 them. Sometimes a human refers to things generically ("t-211‘", “lmuse”, etc), and other times specifically ("this car”. “that house”). Although largely miconscious. this dis- tinction is crucial in trying to formalize in any useful way any concrete concept in the “mind” of the computer. “Car” is an abstraction, a generalization; “this car” is a particular instance which has roughly all features that describe a car. and possibly '. certain attributes or details that distinguish it from ‘another car”. Although in daily life we loosely categorize both the abstraction and any of its instances as “objects”, in pattern recognition, the term class is used to refer to a generic category of objects. Let us also note that we can also readily sketch on paper (from memory) just about any of the categories of objects that we normally perceive. But, since a category of objects is a collection of many particular exemplars, what exactly is it that we are drawing? However little artistic talent we may posses, a drawing of an apple would never be mistaken for a boat. say. It appears then, that. our brain carries some sort of representation, some sort of mental image (awquired from experience) of the various categories of (simple) objects. This representation somehow summarizes the essential characteristics of each of these categories. Whatever is being drawn on paper (or just imagined) is some sort of average, something representative for the entire class - in other words, a model. So far, we have argued that humans parse the world into (classes) of objects, and build models of them. To be sure, we are far from claiming that the human brain builds a model of the entire world (at once), nor do we claim that the brain builds an internal panoramic image of the field of view, and acts upon that. as it was proposed in David Marr’s theory of vision [72]. Studies on visual cognition, eye—movement and visual attention [18, 67, 106, 91, 3, 105, 48, 75. 76, 56. 55] (to name a few) have un—ambiguously dismantled these ideas, but do not exclude the possibility that the [O human brain does utilize models of individual objects. We need to understand then. how to define (models of) classes and/ or objects, for the purpose of using them on an intelligent machine. With few notable excep- tions (such as some geometric objects like circle, line. etc. that can be described by equations), we, humans, comn'ionly use different attributes or features to describe objects. Naturally, we can try to port these features to computer world, which entails that. each of them must be quantifiable mathematically, so it can be measured and represented as a number. Thus, for instance, size attributes (such as length, width, volume), or color. curvature, etc. make for good feature candidates, while “beauty”, “grace” or “malice” do not. Precisely which features should be selected for describing one concept or another is the problem of feature selection. In traditional pattern recognition [33, 98, 57, 41], the programmer selects, often subjectively, whatever features seem to make more sense for the problem at hand, in effect trying to make the computer “perceive” the world the way we do. A system built in this manner is often termed knowledge-based. This is but one, heavily criticized, but still very popular way of building artificial intelligence (AI). It is by no means the only, or the best way. Most notably, one could mention genetic algorithms [50, 44], reinforcement learning [61, 101] or developmental learning [108, 109]. The point we are trying to make here is that we see the need for building com— putational models for AI systems. Based on our personal experience, we think that while the knowledge-based approach may not. be powerful enough (as many people argue) to solve “the vision problem”2 , it is, at least for the time being, the most straightforward and the most reliable way of building an expert system, such as a face recognizer, or an object detector. 2Legend has it that some 50 years ago, Marvin Minsky assigned “the vision problem” to an undergraduate student, as a summer project. 1.2 Contributions and Overview of the Thesis Throughout the thesis, our 7‘68(i(lfl‘(,'ll is confimxd erclusieelg to 2D shapes, so unless otherwise specified, by “shape” we mean “2D shape”. Our main problem is border detection in ultrasound images. We do so using a new type of active contour. In the process, some impm'tant aspects regarding shape representation and modeling. are being addressed. C oncretcly, we propose o a curvature-based representation of shape, invariant to similarity transforms (scale, rotation and translation) (Chapter 2, Section 2.5). O a new algorithm for fast shape alignment without correspondences (Chapter 3, Section 3.4). o a shape learning method without— point correspondences (Chapter 4, Sec- tion 4.2). o a new type of snakes — interpolation snakes — based on cubic splines, which, unlike other snakes — have smoothness built-in, — allow for spatial hard constraints, — incorporate a shape model, which makes them suitable for very difficult. image domains such as ultrasound. These snakes are defined in Chapter 6 - the centerpiece of the entire thesis. 0 a powerful algorithm, for parsing 1D signals as a piecewise constant function, supported mathematical proof (Chapter 7). We apply it to extract feature points for border detection in Chapter 8, Section 8.5. 0 an objective, comprehensive and systematic comparison of several active contour algorithms on ultrasound images (Chapter 8). 4 0 a subjective blind study including our interpolz‘itimi snakes. the traditional snakes [63] and manually traced ground truth boundaries. in order to demon— strate clinical validity for our methods. The thesis is organized in three parts. The first is dedicated to shape learning (rep- resentation and modeling). The second covers border detection using active contours. The last part consists of applications. experiments and results. A note on the presentation. It seems customary that a literature review be done in the beginning of a thesis. separate from, and followed by the authors research contributions. Since this particular thesis deals with related. but nevertheless distinct topics - representation, modeling and detection of shape - we. find it more natural to split the background literature. and seamlessly incorporate the relevant pieces at the appropriate chapters, interleaving the background with the new ideas. This is because we believe it is important to see the entire train of thought. In this sense, the style of this exposition is closer to a book. It goes without saying, of course, that we provide references for everything published elsewhere, so it should be clear when we refer to existing concepts. This thesis started out as a concrete project of border detection in ultrasound images, at Siemens Corporation Research, in the sunnner of 2002. The approach was to define and use an active contour with smoothness built-in. Splines come with smoothness built-in, but it. y ’2 s soon apparent that smoothness was not sufficient for a commercial-grade application: the snake would have undefined, uncontrollable behav- ior when passing through gaps in the boundary of the object. (which were not fcwl). It became apparent that, in order to bring the snake under control at those gaps, we had to introduce shape—prior information in the snake evolution. That is, a shape model was in order. We have explored (and implemented!) a number of registration methods, shape learning techniques, and many different “species” of snakes. All the shape representation, shape alignment and model learning methods that we develop 5 in Part 1 of this thesis are a build-up for creating a smooth snake, with knowledge of shape. In the process, we have come. up with new algorithms. Sometimes, out of frustration, we looked for distant alternative methods for border detection, suitable for ultrasound. Such is the case, for instance, with the 1D segmentation algorithm in Chapter 7. Part I Shape Learning Chapter 2 Shape Representation Shape is arguably the most important feature in vision. This part. of the thesis is about building mathematical representations of shape and shape models that can be used for object detection and recognition. We begin this chapter by defining the concept of 2D shape (Section 2.1). We then discuss various mathematical representatations of it, and some of their properties. With the exception of the curvature representation of shape (Section 2.5), the current. chapter constitutes mostly foundational material, and is included in the thesis for completeness. The representations detailed here will be used at. one point or another in later chapters. 2.1 Shape The foundation block of building a model for a class of shapes is the problem of curve registration. The crux of the matter is captured in the following very general and apparently simple question: how can we compare two (or more) shapes? The goal is, of course, to tell whether or not the two shapes represent the same object. If the two shapes do represent the same object, they are similar, but perhaps not identical. This is the case, for example, if the images were taken at different times, or from 8 (slightly) different viewing angles (Figure 2.1). Figure 2.1: Different views of the same object produce similar, but not identical shapes. Before a similarity measure is built between two shapes, one must understand the nature of the differences between shapes. On one hand, we can capture different contours of the very same object due e.g., to different viewing angles, or due to the motion of the object. Thus, we may have apparently different contours arising from the same objects. These contours will differ e.g., by a rigid or affine transformation, if the object is rigid. Things are more complicated with non-rigid objects (e.g., heart, lungs, etc.), which can deform by non—linear transformations. These differences in the apparent shape of an object are not due to the intrinsic geometry of the object itself, so we term them extrinsic differences. It is therefore apparent that. when comparing shapes, we must factor out a suitable group of 2D transformations (most commonly rigid or scaling), in order to rule out non-essential differences in shape (the extrinsic differences) . On the other hand, there are intrinsic differences between shapes. lntuitively, these are the shape differences that distinguish, say, an apple from a banana, or an airplane from a car. It is the intrinsic differences that a similarity measure between shapes must capture. i\-‘la.thema.tically, then, we compare shapes by means of a similarity metric (l(-, -,) which is invariant under the appropriate group of transformations (Section 2.1.1). That is, for any two curves ’7' and 6, the distance between them does not change under the action of arbitrary transforms T and S in the group: If the metric (l is not invariant under the desired group of transformations, then our only chance of comparing curves is to first align them (effectively factoring out the extrinsic differences, usually scale, translation and rotation), and then compute the distance between the aligned curves. Specifically, if e/ and 6 are the curves, and d is the non-invariant metric, then we seek a transformation T (from the appropriate group of transformations), which minimizes the quantity d(T('~,'), 6). Then, with this T, we can define an invariant metric: di~7‘2.1i(’77(5) = d(T(Af)~ 6) (21) Finding the transformation T that. factors out the extrinsic differences is usually termed curve registration. Various metrics have been used in the literature for shape comparison: the Haus- dorfir distance was used by Huttenlocher et al. [54], while Dubuisson and Jain used a modified version of the Hausdorff distance [32]. Ton and Jain [104] use support functions; Sclaroff and Pentland [94] use strain energy. Least-squares (squared error) and Procrustes metrics are popular among statisticians (Goodall [45], Dryden and Mardia [30]); also used by Horn [52, 53], Besl and McKay [7], Gold et al. [43], Cootes [21], and Duta et al. [35]. i 10 2.1.1 Classes of 2D transformations The choice of the. appropriate group of transformations that describes the extrinsic differences is crucial. and depends on the problem at hand. We describe next. some of these classes of transformations most. frequently used in computing a model from a set of training sanmles. Figure 2.2 shows these transforms. -. . 2 2 . , , 9 , - -. Unless otherwrse stated, T : R ——-+ IR represents a ._D transformation. X is an arbitrary pomt in IR and T is a translatlon vector. Linear transformations are, by definition. ti'ansfm'mations that preserve colinear- ity, i.e., map lines onto lines. These are usually the most appropriate transformations when building a model from a set of training samples, but they can also be used in comparing two standalone shapes. 0 Rigid transformations. These are transformations that consist of a rotation followed by a translation: T(X) 2 HQ.) - X + r where R is a rotation of angle ob. Thev )reserve colinearitv. arallelism, (,6 o . . . lengths, angles and areas. 0 Scaling transformations. These are also called similarity transformations. First, a rotation is performed, followed by an isotropic scaling, and last, a translation: where p > 0 is the scale factor. The rotation and the scaling can be interchanged with each other, but not with the translation. As with the rigid transforma- tions, scaling preserves colinearity, parallelism and angles, but the lengths are multiplied by the scale p, and areas differ by a factor of p2. ll It is wortl'iwlule to notice that 1f we identify the real plane. R“ With the complex plane (C in the usual way (X = ((1.1)) +—> z = a. + 27 - b), then the sealing transformation can be seen as a map T : C ——> (C given by with w = p- (coscfi + '17 - sine) = p - e’b. o Affine transformations. These transformations are (‘lescribed by a 2 x 2 matrix A and the translation vector T, for a total of 6 parameters. The matrix A accounts for rotation, non-isotropic scale (2 parameters), and shear. T(/Y) Z 1 ' /Y + T The affine transforms still preserve colinearity and parallelism. 0 Pro jective transformations. These are linear transformations of the projec- tive plane. By definition, the projective plane of dimension It, is the set of lines in Rk+1 going through the origin. Explicitly, a 2D affine transformation is of the form ,I _ a.r+by+c I _ gx+hx+l ,/ _ drr+e;r+f y — ggr+hr+1 where a,b,c,d,e, f,g and h, are the 8 parameters that. define the projective transformation. In addition to linear transforms, there is obviously the much larger class of non- linear transforms. In general, the non-linear transforms are not used to compute 1‘) h- models, but. rather for image registration, under the paradigm of deformable templates ([58, 70] and references therein). <3 9 Rigid Scahng A fry/ Waive fl 6 Figure 2.2: Linear transforms. Blue: original shape. Red: linear transformations of the blue shape: rigid (rotation + translation), scaling (preserves shape up to scale), affine (introduces shear and non-isotropic scale, but still preserves parallelism) and projective (only preserves colinearity). 2.2 Explicit representation of shape Mathematically, a 2D shape is represented as a parameterized curve in the plane, i.e., a (smooth) function of one real variable defined on some interval [(1, b]. 7 : [(1,1)] —> R2, 7(t) = For numerical computations, the curve is sampled at discrete time intervals. This yields an array of points (70, . . "772—1) = (120,310, . . . ,.'r.n_1,yn__1), which, for con- venience, will be denoted by the same symbol 7 as the continuous representation. In 13 our computations it will be clear from the context whether we refer to the continuous version or to the discrete version of the curve. Occasionally, some computations can be carried out in a more concise form in complex notation, by viewing “,4 as a. curve in the complex plane (I. via. the canonical isomorphism R2 % (C given by (.r y) <—-> z = .L‘ + iy. We shall denote complex curves by z(t), u..(s). etc. and use the Greek letters ’7, (5, etc. for curves in R2. With these notations, the discrete version of a (continuous) complex curve 3(1) is simply a vector 2 = (20,...,zn_1) E C". 2.2. 1 Parametrization Let t = (t0, . . . ,t.n,_1) be the time instants at which 7 is sampled, so that 7,: = 7(1‘. 1) In general, the points 7;," are not equally spaced along 7 even if the ti’s are equally spaced. It all depends on the particular parametrization of 7'. Example 2.2.1 Two parametrizations 0f the unit circle. t“: +1 Sampling from t = —20 to t = 20 at constant increments of At = 0.2, we get 200 points shown in Figure 2.3 ( a ), which are obviously non-uniform. 0n the other hand, if we parameterize the circle by 7(3) : (cos 8, Sin 8) then we get a nice discretization, in which the points are equally spaced, if we subsample from s = 0 to s = 2n at constant increments of As 2 27r/200. See Figure 2.3 (b). 14 (a) (b) Figure 2.3: Two parametrizations of the unit circle. (a) .r = (t‘2 —1)/(t2+1),y = 2t/(t2+ l). (b) lie—parametrization by arclength: r = cos 8, y 2 sins. Of course, choosing a parametrization with equidistant points is not a matter of aesthetics, but, as it will turn out, it is essential in border detection. While it is trivial to parametrize the circle in a way that produces a good dis- cretization, things may not be so obvious for arbitrary curves. 2.2.2 Arclength parametrization Along any smooth curve 7, there are two important vector fields that can be used to describe the local properties of *7: the first and the second derivative. If we imagine a particle moving along the curve, then the first derivative represents the velocity of that particle: it is a vector tangent to ’y (at each point.) and its magnitude is equal to the speed of the particle. The second derivative, measures the rate of change of the velocity, so it represents the acceleration along the curve. It is still a vector defined at all points on 7, but it is no longer tangent to the curve. We shall use the classical notation 7’ and '7” for the tangent and for the acceleration to 7, respectively. The length of 7 is defined as the integral of the magnitude1 of the velocity: 1As a matter of notation and terminology, we denote the magnitude of a vector v by [1| We also use interchangeably the terms “length”, “norm” and “magnitude”. go to v b I.engllz(7)=/ |7Ildt ( (1 Just as the discretization depends on the parametrization. so do the vector fields 7’ and 7”. Indeed. if the time parameter l is itself a. function of a new parameter t = t(s), then, although 7(t) and 7(t(s)) have the same geometric image. they have different tangent vectors d7/dt and (1’7/(15, related, obviously, by the chain rule: (17 (17 (It (1s = W . (ls Consequently, the computations that involve the tangent and / or the acceleration depend (a priori) on the parametrization. The length of the curve (eq. 2.2) is one exception. Fortunately, there exists a special parametrization - a “canonical" parametriza- tion, in some sense - that has many desirable properties and which facilitates the computations. Let 7 = 7(1) be any parametrization. One defines the arclength pa- rameter on 7 as the distance along 7 traveled up to time t: A priori, s = s(l) is a function of the original parai‘netrization t. Obviously, s is differentiable and s’(t) = |7’(t)| Z 0. If the velocity 7’ is nowhere zero, then s is strictly increasing, and one can compute the inverse of the function 5 = s(t) and express t as a function of s: t = t(s). We obtain this way a new parametrized curve 7(5) = 7(t(s)) with the following properties: 1. 7 and 7 have the same geometric image. 2. By the chain rule, the velocity 7, along the re—parameterized curve has norm 1: 16 3. The acceleration 7” is perpemlicular to the velocity: “3'” i a" (2.4) . ~ . . - . _ ~// ,. .‘ . ,- - As 7 changes com-'exlty. the normal vector 7 changes sides, always pomtlng I towards the “interior" of the turn. 4. Last, but not least, the discretization of a curve parameterizml by arclength is equally spaced, because the arclength parameter 5 represents distance on the curve itself. Indeed, if the curve is discretized at .90....sn_1 with A3 = 8,41 — Si = coast and 7,- : 7(a), then l7"i"/j+1| z 82? — 8,’+1| 2 As = const Remark 2.2.1 A word on notation: because of the fact that 7 and 7 have the same geometric image, it is generally accepted to drop the ~ symbol, and use the same letter 7 for both curves, indicating explicitly whether one is working with the arclength parameter (7 = 7(s)) or with the time parameter {7 = 7(t)). Differentiation with respect to the time parameter is indicated using the I ( “prime") symbol, whereas derivatives with respect to the arclength parameter s, are indicated using the i ( “dot”) symbol. Thus, 7’ and 7” are the first and second derivatives along the curve, in an arbitrary parametrization, while 7 and 7 are the first and second derivatives with respect to the arclength s. In Example 2.2.1, without getting into details, the (cos s. sin 3) parametrization is in fact the arclength parametrization of the circle. This is the reason the points are 17 equally spaced in Figure 2.3 (b). Under the arclength parametrization. integrals along the curve are aq>p1'(i)ximated numerically as finite sums: 2.2.3 Curvature Let 7 be a curve parameterized by arclength. and 7 its tangent. Let u be the unit to 7, chosen so that the frame (7. V) has positive orientation. Clearly, since 7 _L 7, then at each point on 7, the acceleration 7 is a scalar multiple of the normal I/: Definition 2.2.1 The curvature of a curve 7 is the scalar 1.77 in. (2.5). Trivially, by taking the dot product against l/, we get from (2.5) that [*7 = (X/q V) (2-5) and |k7| = |7| (2-7) The normal V does not change from one side to another, as 7 changes concavity, but the acceleration 7 does. Consequently, the sign of the curvature k7 changes as well, the same way 7 changes. The next proposition summarizes some properties that we shall use in this thesis: 18 Proposition 2.2.1 1. Under an. arbitrary parameter t. k _ .1"- ya _ J.” . y’ A/ __ '- ‘7 (1"? + .U/2)13/_ 2. If 7' is given. implicitly (Section 2.4), as the zero level curve of a surface f (.17. y) = 0, then the carvatzu'e can. be computed by 9 9 A. _ lf.r.zrf!/' + fax/ff - 2f.rgf.rfgl ' IVf|3 (2'9) 3. If a7 is a scaled version of 7, and s is the arclength on A, then s - a is the arclength on. (1.7 and length(a7) = a . lengthh) (2.10) k7,;(s) a has; (8(1) 2 2.3 Cubic spline approximation. Although the presentation of the material in this thesis is top-down (i.e., first the theory and then the applications), the new concepts and algorithms were derived in a bottom-up manner. starting from a concrete practical problem: border detection in ultrasound images (Section 8.2). In general, serious algorithms in medical imaging require clinical validity, and that, in turn. requires that the automatically detected boundaries be compared against manually traced boundaries by a human medical expert for training and evaluation purposes. Invariably. the expert does not draw the entire contour as a continuous curve, as one might imagine, but rather clicks on some 20-30 points on the contour. The traced contours will most likely have, to 19 Figure 2.4: 3D view of a 2D contour (blue) with its curvature (green) drawn on top of the contour. be matched across patients, so if the organ being traced happens to have distinct anatomical landmarks (usually rigid tissue like bones, teeth, vertebrae but also some brain regions), then the human expert will mark those, in equal number in each image. This way, point correspondences are established a priori, i.e., before any pre- processing and shape learning, in the most natural way. If, on the other hand, there are no, or very few, clearly defined anatomical landmarks (e.g., kidney, bladder), then there is no choice but to mark more or less arbitrary points on the contour, in different numbers across patients, leaving it to the computer to recover or establish the point correspondences. Under this inherent non—uniformity we have determined that the best way to represent the hand-drawn contours is to interpolate the markers using a cubic spline, parameterized by arclength, using a judicious choice of the spline parameters. See Figures 2.5, 2.6 and 2.7. It is central to our applications, so we provide all the necessary details next. Following [28] (Chapter IV) and [29] an interpolation spline is defined in 2D by 20 :LAJ SL1. :fi'ut H MED Figure 2.5: Contours marked by a human expert on kidney ultrasound. Top row: markers (two different patients). Observe the sparseness, lack of anatomical land- marks, non—uniformity and the different number of markers across patients. Bottom row: corresponding kidney contour represented as a dense, cubic interpolation spline determined by the markers, parameterized by arclength (equally spaced points). Both splines are discretized using the same number of points (150). 21 SAC. RLQ RENAL TX LAT Figure 2.6: Detail from Figure 25: patient one (left column). 22 SAC RLQ RENAL YX MED #:- SAC RLQ RENAL TX MED Figure 2.7: Detail from Figure 2.5: patient two (right column). 23 the following elements: 0 break points: a strictly ii’icreasing set of real numbers a = 50 < £1 - - - < {l = b 0 control points: a set of data vectors: yo, . . . y) E R2 one y, for each 5,; 0 degree: a positive integer k. Given this data, the interpolation spline of degree k. is defined as a curve 7 : [a,b] —> R2 t-—* 7(1) = (W), 31(0) with the following properties: 0 7 passes through the control points y,- at 52-, that is 7(5i) = 311' W 0 componentwise (i.e., each of .r and y), '7 is a degree—k polynomial on each interval l€i£i+1l o the pieces fit smoothly at the break points: 7 is of class Cit—1, i.e., it has It — 1 continuous derivatives. Of particular importance are the cubic interpolation splines (k = 3), due to the so called smoothest interpolation property ([28] Corollary 7, Chapter V): Proposition 2.3.1 Given breaks a 2 £0 < < £1 = b and corresponding control points 310,. . . ,yl, consider the following minimization problem with constraints: 24 b Em = / More (212) Then, over the space C2 of twice continuously difiercntiable functions on [at], the minimum (a) eatists, ( b ) is unique, and (c) is achieved by the ( necessarily unique ) cubic interpolation spline on the given breaks and control points. Minimization of the total second derivative (as opposed to other ('lm'ivatives), is important because of its geometric significance: for any curve (i.e., not. just splines), [7”] measures the amount of bending of the curve. For this reason, the energy (2.12) is called bending energy. In fact, under the canonical parametrization on 7, the curvature of 7 is defined precisely as e = lil (2.13) where, following the convention in Remark 2.2.1, 7 is the second derivative of 7 with respect to the arclength parameter s. The computation of a cubic interpolation spline can be carried out in linear time in the number l of control points. That is. given t E [£.,-,£,- +1], the complexity of computing 7(t) is 0(l), (where l is the number of control points), because, as we shall see, the first derivatives at all break points must. be determined. Since these derivatives do not depend on t (but only on the break points), they can be computed once and for all, making computation of 7(t) an 0(1) process. We present in detail the numerics of cubic interpolation splines in Appendix B. In the end, we end up with a function to C1 7 : [a,b] —) R2 t-—* 7(t) = (W). 11(1)) P1(l»)if€0£t<€1 (21(1) if€03t<€1 g:(t)=< W) =< [Pl—1U)” €1_2£l<€1_1 [Q1_1(l)lf€1_23t<€1_1 (2.14) where Pi and Q, are cubic polynomials defining the .r and the y components, respectively, on the interval [£.,j_1,{i). Numerically, the spline is discretized by applying Equations (2.14) on an array t = (t0, . . . ,t.n_1), where the number n is the user-defined number of points on the spline. Most commonly, the ti’s are equally spaced, i.e., the spline is generated by increasing uniformly the time parameter. The resulting points 7(tlj) on the spline, are, of course, not uniformly spaced, as t is not the arclength parameter (Figure 2.8, left). We explain how this can be corrected, next. 2.3.1 Arclength parametrization of an interpolation spline We can compute an (approximately) arclength parameterized spline, by manipulating the break points 5,. If yi are the user-defined control points, then we choose {0 = O, and {i - €i—1 = [yiyZ-_1| for i > 0, so that the length along the curve between two consecutive control points yi_1 and yi is approximated (to the extent that the chord yi_1y,' approximates the corresponding arch along the spline) by A: = {i — Ei—l- By choosing the breaks this way, the usual (uniform) parameter t along the real axis, becomes - approximately — the arclength parameter along the resulting spline ’7 = 7(t) .2 | Control poms l .2. / \ / \ [ Brads points I Figure 2.8: Cubic interpolation spline (blue). Red: control points. Black: break points. The spline on the left has equidistant breaks, and is not parametrized by arclength (the density is higher near points of high curvature). The spline on the right is built on the same control points as the one on the left. Choosing the breaks 6,- by 5,- = €i—1 + [ya--1312] produces an arclength parameterized spline. Remark 2.3.1 This choice of the breakpoints, which seamlessly produces a curve parametrized approximately by arclength, is not possible with types of splines that do not pass through their control points (e.g., B-splines, Bezier); it is only possible with interpolation splines, because of the particular significance of the control parameters, as points on the curve. 2.4 Implicit representation In addition to the explicit representation of shape as a parametric curve, there exists a qualitatively different representation. If f (:r,y) is a function of two independent variables, then the equation f (:c, y) = const defines y implicitly, as a function of 1:, provided that f is smooth and By f is nowhere 0. Each constant defines a different curve, so f determines, in fact, a family of curves, called the level sets of f (Figure 2.9). We digress here, to explain why the level-set representation of a curve is useful. 27 Figure 2.9: Implicit (level sets) representation of curves. Given a smooth function 2 = f (x, 3;), each constant c defines implicitly a curve c = f (z,y). The traditional snakes, as defined by Kass, Witkin and Terzopoulos [63], have several drawbacks, many of which have been addressed in subsequent studies [2, 20, 42, 111], but two of which are that (a) the algorithms are numerically unstable, and (b) the snake cannot change the topology (e.g., if the snake is initialized as a closed simple curve, then it cannot break to converge to a configuration with several connected components). A level-set snake is the zero level set of a moving surface 2 = f (:r, y). Clearly, if the surface has more than one hump, the snake will change its topology (Figure 2.9). The problem, of course, is to control the movement of the surface, so that its zero level set converges to the salient. features in an image. We consider this revolutionary approach to curve evolution so important that we describe it in some detail in Section 5.2. 28 2.4.1 Distance map representation of shape We have seen that. cutting a surface :: = .r; with a )lanc z = const )roduces n y a curve. Conversely, given a parametric representation of a plane curve 7(t), it is possible to find a function : = f(.1:. y) whose zero level set is precisely 7. Specifically, for any point X = (.r. y) in the plane, define f(.z:, y) to be the distance from X to 7: f(\) 2 “11m [X — 7(t)] (2.15) Clearly, a point X is at distance 0 from 7 if and only if X is on 7, so 7 is precisely the zero level set. of f. The function f is called the distance map representation (or the chamfer transform) of 7 [4. 9] (Figure 2.10). If the curve 7 is simple and closed, then 7 is usually represented by the signed distance map, which is the same as the usual distance map, except. that the points in the exterior have negative sign (but the same magnitude as before). Remark 2.4.1 A word on the sign convention. If 95 is any surface, then the gradient Vd) is orthogonal to the level sets, and points in the direction in which to increases most rapidly. As such, if 7 is the zero-level set ofd) and if 7 is a. simple, closed curve, then the vector V := Vcb/[Vcb] is the inward unit normal to 7 ifo < 0 outside 7 and d) > 0 inside. With this sign convention, the unit circle, with the positive orientation (counterclockwise) has positive curvature ( as it should). Regarding the complexity of computing the distance map, if I is an m x n image, and 7 is a curve of length L in the image, then a straightforward, exact computation of the distance map takes 0(m - n - L) because for each image pixel one must compute the distance to each curve pixel. If L is of the order of a few hundred points, then the algorithm is too slow. Borgefors [9, 10] introduced an algorithm which requires only two passes through the image (hence with complexity ()(m - n), which computes a. good approximation 29 (C) Figure 2.10: Distance map representation of a plane curve. (a) Explicit representation as a (discrete) parametrized curve. (b) Distance map representation. (c) Signed distance map representation. 30 of the distance map. Her work was directed toward image matching using the dis— tance map (termed chamfer mutt-lung). Chamfer matching pre-dates Borgefors. being introduced in 1977 by Barrow et al. [4]. Yet another way of computing the distance map of a given curve. is to have a. moving front evolve along its own normal. with constant speed of 1. This method is most readily implemented using level-sets, and, naturally, it is the method of choice in the level-set. community. 2.5 Curvature-based representation of shape The metrics we have encountered so far were not invariant to plane transformations (Section 2.1.1), for the very reason that. - implicitly or explicitly - the curves were represented in terms of their cartesian coordinates (:L', y). This gives rise to the entire problem of registration (Chapter 3): bring the curves together, to some standard pose, and then compare them. We develop in this section a measure of similarity between curves which is invari- ant to scale, rotations, and translations, i.e., invariant to similarity transforms. To show the intuition behind this new representation, we reason in the discrete case first. Let, as before, 7 = (70,...,7n_1) be a curve. Assume 7 is parameter- ized by arclength, which translates mathematically that the length between any two consecutive points on 7 is 1: [Vii/1+1] = 1 Since all the lengths of the constituent segments of 7 are known, then, all it takes to move along 7 is to know how much to steer at each of the 7is. See Figure 2.11. This suggests a representation of a curve using the angles between the vectors 7.,-_17,- and 72-7241, adjacent to 7,. Let A6, = 6,1 —— 6i_1 be this angle. As shown 31 7. 1+1, /<~\ Mk??? \ / \ Y. ”39;" _______ l _’_ -___\_e_.l—.1_ Figure 2.11: Moving along a. discrete, equally spaced curve 7 only requires to know the angle between consecutive segments. in Figure 2.11, for each i, 0,- is the angle between the ar-axis and the vector 7,i7,j+1. All these angles can obviously be computed from the cartesian representation of 7. Conversely, knowing an array A6 = (A01, . . . ,A6n_1) permits the recovery of the cartesian representation of 7, modulo a translation and a rotation (uniquely deter- mined once a starting point 70 and an initial direction 60 are specified). lntuitively, the “amount of steering”, must have something to do with curvature. It is indeed the case, and we shall explain this in the continuous setting. Let now 7 : [0, L] —-> R2 be a smooth curve (e.g., of class C2). At an arbitrary point 7(3), let 0(8) be the angle between the tangent. 7 and the r-axis, that is 7 = (cos 0,sind) (2.16) Proposition 2.5.1 Ifk is the curvature of 7, then i=9 (2N) Proof: Take the derivative in (2.16), then the dot. product against the normal V = 7—L = (— sin 0,cos 0). Apply Equation (2.5). C] From this proposition, it is clear that the curve 7 is detern'iined by its curvature 32 alone, modulo translations and rotations. Indeed. knowing 1:, then by (2.17) and (2.16) we have 9(s) : 90 _+_ ,2: f v n q: .- + A (\D p-i 00 V l.‘(o)d0 cos 6(o)do sin 9(o)(lo Remark 2.5.1 A representation of a curve 7 in terms of its curvature k would be invariant to translations (as k. is defined in terms of derivatives), and to rotations (as they preserve dot products). However, the curvature is inversely proportional to scale. For instance, the curvature of a circle of radius r is I/r. The larger the circle, the smaller the curvature. All these facts are well known to geometers. What we bring new to computer vision here is an invariant metric between curves, based on curvature. To this end, let 7 and 6 be two curves parameterized by arclength, and let L7 and L5 be their respective lengths, and k7 and 196 their respective curvatures. We would like to define the similarity between 7 and 6 by comparing their curvatures. e.g., by their L:2 distance: fur, - k6|2 Of course, since k7 is defined on [0, L7] and ho- is defined on [0,1,5], this integral does not make sense unless the curves have the same length. This problem is solved if we first normalize 7 and 6 to have length 1. For any arclength parameterized curve a : [0, L] ———> R2 we define (11 : [0,1] —> R2 (i(lL) L (11(t) = (2-19) The next lemma expresses the derivatives, arclength, length. and curvature of 01 in terms of those. of (1. Lemma 2.5.1 If (r and (11 are as above, then: dal dt (l) = (tar). (x) = L - sat) (2.20) 2. ifs is the arclength on (r. then t = s/L is the arclength on (11. Also, L01 = 1. 3. kal(t) = L ~ ka(tL). Proof: Trivial. [:1 Using the length-normalized curves, we can make the following Definition 2.5.1 (Curvature distance) 2 lea/1U) — k61(t)] dt (1(745) ==/()1 1 2 =1) [Ly-h‘n/(ILAI) — lid-lideLo‘) (ll Since curvature is invariant under rotations and trz-inslations, so is d. Moreover, due to the unit length normalization of the curves, d is also scale-invariant. Clearly d is symmetric, d 2 0 (positive), and also satisfies the triangle inequality. However, precisely because of its invariance, it is not positive-('lefinite. This means that. (1(7,(5) = 0 does not imply 7 = 6 (this latter equality does not even makes sense, as 7 and 6 have 34 a priori different lengths). For instance. scale invariance means that (1(7. (1. - ’7) = 0, i.e., this distance does not distinguish between a. curve and any scaled version of it. The f(')llowing proposition shows what l'ia.[‘>pens when (l = 0. Proposition 2.5.2 (“of-.6) : () if and only if 71(1) 2 (51(1‘). Vt E [0.1]. That is, the curvature ’Illfll‘flf (‘2.21) is a positiee-definite metric on. the space of curves of unit twat/i. Q Proof: Since by definition (1(7. (5) = (“71.01) we have to prove the equivalence for curves of length 1. Assume therefore that L7 = [‘6 = 1. Of course, (K736) = 0 means the curves have the. same curvature. The proof follows now from (2.18) C] There is an additional complication in the case of closed curves, arising from the possibility of a phase shift. We want the metric d to be invariant to shifts in the arclength parameter along the curve. That is, if 7 is a closed curve, and (3(8) 2 7(s+a) is a phase-shifted copy of '7', we want. (1(7, 6) = 0. This requirement is natural, since 7 and (5 have the same geometric image. However, their curvatures are also shifted by a: l.‘(5(s) = lea/(s + a) so, if we let L be the length (of both 7 and 6) 0 = (“7.6) 1 2 2/0 [Mam—idem (It 1 2/0 M's/UL)—Ara,(tL+a)|2dt which implies A'~(s) = A.'~’,A(s + (1.) VS 6 [0, L] I That is, a is a. )eriod of the curvature of course, both I" and its curvature kn, are . . I , periodic, and L is a period because ’7' is closed, of length L, but a is a smaller period than L). Let T > 0 be the smallest period of k7. Then. necessarily a = nT for some integer n. This is a necessary condition for the distance to be 0 on phase—shifted curves. Most commonly though, tlns condition 18 not satisfied , so curves w1th identical geometric image will, have non-zero distance due to shift. in the parameter along the curve. The remedy is, of course, to apply all possible phase shifts to one of the curves, and take the minimum. Definition 2.5.2 (Curvature distance for closed curves) ( ("/50) 3: min d("/1,51(' + 0)) new,” 1 = min / a.€[0,1] 0 The quantity defined by Equation (2.22) measures shape similarity between two closed curves, and is invariant to similarity transforms and to phase shifts. Example 2.5.1 We consider in this example two classes of shapes: a circular shape, and a “bow tie” shape. Each of the two classes consists of 30 synthetically generated samples, drawn from a gaussian distribution around their respective mean. The mean and the samples were defined as arclength-parameterized, cubic interpolation splines. We then computed the intra-class (db/1:. 7]) for all '71,, 71- within the same class, and 9 . . . . . “One example when we can shift. the parameter s on a curve by a multiple of the period T IS if '7 resembles a circular saw blade. 36 inter-class distances (d(e,','.o‘I-) with ’7,- and (5 j in different classes). We obtain this way three distributions of distances. shown in Figure 2.12 (c). Curves from within the same class tend to be closer to each other, than curves from difierent classes. 37 (A) (b) Intra and inter class distributions of curvature distance 140 I I I I —' circle vs circle bow tie vs bow tie 120 - —— circle vs bow tie . 100 [- _ 80 - 4 60 - - 40 - 4 20 _ M 0o 7 o 2 014 ole 0:8 1 112 1 4 (C) Figure 2.12: Two classes of shapes, and the distributions of inter and intra class cur- vature distances. (a) Circular class: 30 samples (blue) normally distributed around a user—defined mean (red). (b) “Bow—tie” class: 30 samples (green) normally distributed around a user-defined mean (red). (c) Distributions of curvature distances d(7,~. 6'.) Blue: both 7i and 6 j are from the circular class. Green: both 7i and (3 - are from the bow-tie class. Red: 7i is circular, and dj is a bow—tie shape. Note the separability of each of the within—class distributions (the blue and red distributions) from the intra Class distribution of distances (green). 38 Chapter 3 Shape Alignment Curve alignment. (or registration) and shape similarit' are two ingredients towards building a shape model. We define the problem of registration in Section 3.1 and provide explicit formulas for the transform that. aligns two curves, in the case of similarity, affine and thin-plate-spline transforms (T PS), under the assumption of known point correspondences (Section 3.2). We review some of the representative existing registration algorithms in the case of unknown point correspondences (Section 3.2.2), and discuss as a special case distance map registration (Section 3.3). In the last section, we propose a fast and practical registration algorithm still in the case of unknown correspondences, which works best. for elongated contours in 2D. 3. 1 Curve registration Suppose now that we have two curves 7' and 6 which we want to register (for instance for the purpose of defining an in rariant metric on the curves, i.e., a shape distance). That is, we want to find a transformation T : R2 —> R2 (e.g., from one of the groups of linear transformations in Section 2.1.1), such that. rm = 5. (3-1) Clearly, there is little chance that Equation (3.1) has an exact solution in T. For one, in the discrete setting, (3.1) entails that. ’7 and (5 have. the same. number of points, which may not. be the case. The next best. thing then, is to solve this equation approximately, by specifying an optimality criterion E = E7,_(5(T) and by finding the transform T optimal with respect. to E: T = arg min EA,,‘(;(T) (3.2) Computation of the optimality metric E may, or may not require known point. correspondences between the curves to be registered (Section 3.2). Most. notably, the most straightforward metric between two parametric curves - the L2 norm - involves pointwise subtraction of the two curves, so one must know, for each point on one curve, what its counterpart on the other curve is. On the other hand, there are metrics that do not. require known point correspon- dences. These are metrics that compare each point on one curve, against all points on the other curve. We present some of these below. To begin with, the distance between a point .1: and a set Y (in any Euclidean space) can be naturally defined as: d(.z:,Y) = inf la: — y| (3.3) yEY Next, the distance between two sets X and Y (in our case these will be curves in the plane) can be defined as the largest. distance from any point. in X, to Y: d(X,Y) = sup d(.1:,Y) (3.4) .‘L‘EX This notion of distance between two sets (Equation (3.4)) was introduced for the very first time in 1905 by Dimitrie Pompeiu — a distinguished Romanian mathemati- 40 cian [87]. It was originally termed the e’cart 1 of X and Y, and it is sometimes referred to today as the directed Hausdorjf}r distance between X and Y. The distance between a point and a set (3.3) is known today in the computer vision (onnnunity as the chamfer distance ( transform ) Certzéiinly. the distance (3.4) is not symmetric. In the same work [87], Pompeiu defined the mutual distance (l 'écart inutuel) of the two sets as the sum our. Y) = (ax. Y) + d(Y. X) (3.5) The Pompeiu distance (3.5) was modified by Felix Hausdorff in his classical 1914 book Grundziige der Mengenlehre 2 [46] (republished and translated from German multiple times; for an English translation of the 3-rd edition, see. for instance [47]). Hausdorff defined what is now the very popular Hausdorfl distance by replacing the sum in (3.5) by the suprenuun: H(X,Y) = sup{d(X,Y),d(Y,X)} (3.6) The supremum in Equation (3.4) makes the écart distance sensitive to outliers. This can be alleviated by replacing the suprenmm with the average [32]: 1 dap(X, Y) = '7' /X d(.c, Y)d.L' (3.7) Accordingly, the average Hausdorfi distance can now be defined in a similar way to the regular Hausdorff distance (3.6): Hm)(X, Y) = sup{dap(X, Y). (la-MY, X)} (3.8) We shall use the term Hausdorff-type metrics for any of the distances (3.4) (3.5). lpit, chasm. 2Basics of Set Theory. 41 (3.6), (3.7) and (3.8). If D(X,Y) is any of them, then an 01.)ti1nality criterion for registering two curves 7 and 6 can be defined by E,,,-(r) = D(T(7).6) (3.9) One issue with Hausdorff-type metrics is the complexity. For discrete curves of about the same size. the complexity is quadratic in the number of points. This complexity is prohibitive for an iterative minimization of (3.9), but there are ways to reduce the complexity to linear (at some cost) discussed in some detail in Section 3.3. We present some known registration techniques in Sections 3.2 and 3.3. 3.2 Registration of parameterized curves h-llathematically, if 7(9) and 6(t) are two curves, with a priori different parameters 3 and t, then the point correspondence problem consists of finding a re-parametrization s = h(t) in some optimal way, i.e., establishing a meaningful correspondence between the points of the two curves. There are several proposed solutions for finding the correspondences (Section 3.2.2), each with its strengths and weaknesses, and with its own optimality metric, but. there is no dominant method, nor a demonstrable best method over the others. Only after introducing the re—parametrization h does it make sense to define the squared error energy: E(T, h) W/IT ((th — )6(t)|2dl. (3.10) The discrete version of this is E(T h) =§Z|r( "(i/h 4,)? (3.11) 42 An invariant. similarity measure3 between the two curves 7 and 6 can now be defined by ‘2 - . r _ (l (7,6) = nun E(F, h). (3.12) [.12. 3.2.1 Known correspondences Landmark rer istration is one of the sim )lest cases of registration of )arametric curves, 0 . because it takes )lace under the follLHVIIIO' assum )tions: I o I o All curves to be registered have the same number of points. 0 The point. correspomlences are known: all curves are labeled so that for each t} tt index i, the i 1’ point on one curve is homologous with the i l point on any other curve, that is h(i) = i. The squared error energy (Equation (3.11)) only depends on T now. In the case of scaling and affine transforms, it. is possible to find the parameters of the minimizing T algebraically (Propositions 3.2.1 and 3.2.2 below). Scaling registration and the Procrustes distance This problem was first. solved by Horn [52, 53] for points in 3—D, and then revived by other authors (e.g., Cootes [21], Dryden and Mardia [30]) in the context of plane Curve registration and model learning. We present here a 2-D version using complex numbers, which is one instance when working over the field C of complex numbers Simplifies computations. Let z = (21, . . . ,2”) and w = (nil, . . . ,wn) be two (discrete, complex) curves with the same number of points. These are vectors in C” so the usual L2 norm in C" can be used to define a distance l‘)etween the shapes :5 and w which non-invariant under Scaling transform (so a fortiori non-invariant. under more general transforms): \ JThis measure is not a metric. per se. because it lacks symmetry. but. it can be symmetrized. 43 d(z. a") = |:: —1r|2 (3.13) Equation (2.1) and the arguments leading to it show how one can turn E into an invariant metric between curves. i.e., into a distance on shapes. \Ve have seen in Section 2.1.1 that a similarity transformation is of the form T(w)=)\-w+r VwEC for some unknown complex parameters /\ and T. we must find these pz-n'ameters by minimizing the squared PI‘I’or energy; rv 2 I(~.-) — u'jl quur(/\, T) = Z _ ~J .7 = Z(/\Zj+T—u.‘j)-(/\Zj+7‘—u‘j). (3.14) J Proposition 3.2.1 If '1‘(z) = /\z + T 7nini7nizes the squared error energy ( Equation 3-14), then its parameters are given by (21.7,,2, . t. /\ = I’ll2) (3.10) T = —/\20 + "(1'0 (3.16) where (-, ) denotes the complex inner product on C", 30 and (1'0 are the centroids of Z and 10, respectively. and z, = (:51 — 20, . . . zn — :0) and w, 2 (ml — (1'0. . . . u'n —— u'O) are the zero-mean curves. The minimum value of the energy is . I 4mm _ l'U’lZ _ ii" ,z (3.17) ’ZJL' _ 44 Proof: Straightforward. See Appendix A.1 Cl So far we have a non-inmu'iant metric. E between curves (Equation (3.13)), which by Equation (2.1) and Equation (3.17) can be turned into an invariant metric by defining / 2 ~ o _ y, 2 || (...u)—|ul —————- d '3’]? inc Obviously this is not symmetric, but it becomes symmetric if we normalize the vectors 3 and u.‘ to have norm one. in addition to zero mean. Definition 3.2.1 The Procrustes distance between two ("(071168 2.11! E C” is (3.18) By Proposition 3.2.1, the Procrustes distance between curves :2 and w is the squared error between the registered curves. each normalized to have zero mean and size one prior to registration: Z — :0 Il‘ - “’0 2 d(z.u‘) = min T(—— — T I2 — 30' la: — ”'(ll Affine registration If we try to minimize the squared error (Equation (3.14) by the more general affine transforms T(X) = A - X + T, it is still possible to avoid iterative minimization Inethods, and find a closed form solution. However. since the affine transformations no lOnger have a representation in terms of complex numbers, the formulas are somewhat more complicated than in the case of scaling transforms. Let us first introduce some notation. Suppose the curves are now '37 = ('71. . . . em) and 6 = ((51, . . . (in). Let 70 and (50 be their respective means. and let A," : (7.1 _ 45 ”7'0 . . 4n —- 70) and (5, = ((51 -— (50 . . . (in, — 60) be the. zero-mean cm‘ves. Denote the coordinates of each point. A,” by (”i‘ 1:), and let u, = (a,1 . . . 215,) and v, 2 (1’1 . . .115). Let 2’ and in, be the analogues of a' and i". res )ectivelv. for (5,. Last, let a. b, c and o I . d be the coefficients of the matrix A: Proposition 3.2.2 With the above notations, the eocfiicients a, b, c, d and 7' of the afiine transform T(X) = A ~ X + 7' that minimizes the squared error energy (3.11) are given by a b lull? (al. 2,!) (2’, 2’) (2’, 3') C d (“Is Pl) lul|2 (z’ , 22’) (10’, vi) Proof: See Appendix A2. Non-linear registration: Thin Plate Splines The landmark registration under the assumption of known correspondences can be viewed as a 2D interpolation problem: we have two sets of matched landmarks '71,...771 E R2 and 61...(5n E R2, and we would like to find a transformation T : R2 -—> R2 which satisfies the constraints T(e/lj) = 61-. Without additional constraints, there is always a trivial solution: 6. ,1) if X = 7i. T(X) = X otherwise. 46 This solution maps each 7,- onto the cm'resporulii1g (5,. but leaves all the other points unchanged. This solution. however. is not smooth. We must find a transfor— mation that satisfies additional smoothness constraints. A thin plate spline solution (TPS) [8] satisfies such a requirement. The intuition behind TPS is as follows. Imagine the plane as a rubber sheet, with a -- ’1d' 'Ifr ' -=-111 "11"" +1.1. «— square gI‘IC rawn on it. a. point .. 1s p11 €( onto anot 1er o( ation .. in t it p ane, an elastic deformation occurs: the lines of the grid bend in some optimal way, trying to follow the translation 3 ——> 3’. From physical considerations. if T(X) = (a(X ), 'U(X)) is a (non-linear) deformation of the plane, one defines the bending energy of T by E(’1“) = // AZT (1X. (3.19) The natural mathen'iatical forn'mlation of the 2D interpolation problem is then to find a transformation T that minimizes the bending energy (Equation (2.12)), subject to the constraints T(yi) = 6,: To minimize the bending energy, one must compute the gradient in Equation (2.12) and solve the partial differential equation that results from setting the gradient equal to zero. The fundamental solution of this equation is the radial basis function B(X) = —|X|210g|X|2 (3.20) which happens to satisfy B (O) = 0. Clearly, the functions still minimize the bending energy, and satisfy B ,(e/i) = 0, i = 1,. . . ,7). To find the solution T of the interpolation problem, we seek T as a linear combi- nation of the basis functions B, plus an affine part: 47 —A.-1V._j T(X) = 2 us, - 8,:(X) + A - X + r (3.21) F j and the coefficients a. = ((1,,l,,-.a.y)r, b = (b_,;,by)r and The unknown weights u', T 7' = (7";_1;,Ty) of the affine component can be found by solving the linear system obtained by imposing the constraints T(e/i) = 6" in Equation (3.21). In matrix form, this system can be written as: (81(71) "' Bn(":’1) Al? 1 \ {wl\ ((5? Bl (7'2) ’ ' ' 371(72) ”I"; 1 ”(U2 6; 81(771) ' ' ° Bn (”r'n) “/77; 1 wn — 6;]; '71 ' ' ' 7n 0 0 A 0 K 1 . . . 1 0 0 / (7T) K 0 / Figure 3.1 shows an example of TPS registration. In Figure 3.1 (a), the red points are the source points, and the blue points are the target points. We must. deform smoothly the entire plane, so that the source points overlap exactly over the target. points. The resulting deformation is shown in Figure 3.1 (b). 3.2.2 Unknown correspondences Registration of parametric curves with known correspondences occurs e. g., in medical imaging, where a human expert manually annotates certain anatomical structures which should be matched across several images of the same organ (e.g., the brain). These homologous structures are called landmarks (hence the term "landmark regis- tration” ). However, if the contours are extracted antomatically (e.g., edge (‘letectiom followed by morphological transforms such as erosion or dilation [98]), the point correspon- dences across images is not known. In this case, the real effort in shape registra— 48 (a) Figure 3.1: (a) Red: source points. Blue: intended target. (b) Red: the source points from (a) transformed by a TPS. They overlap exactly over the target points (blue points in (a)). tion is in solving the point correspondence problem, either separately, or in con- junction with the registration proper. The existing proposed solutions for registra- tion and model computation in the case of unknown correspondences are inherently more difficult, and more elaborated than the algorithms for landmark registration [85, 79, 69, 102, 26, 19, 113, 103, 6, 89, 34, 95, 7, 114]. It is therefore not feasible to present here all the mathematics behind these methods. Nevertheless, we shall summarize some of the main ideas. One class of algorithms that register curves under unknown correspondences makes use of a match matrix M = (Mij) [89, 34]. In the ideal case, 1 7,: corresponds to 0] Mi j = 0 otherwise. However, to establish correspondences, M can be allowed to have continuous coeffi- 49 cients between 0 and 1. Rangarajan et al. [89] introduced the soflassign algorithm to register two curves 7'1 ”.77)., and (51...67,6. which have different. number of points 727 aé 72.6, and in which the correspondences are. not. known. They generalize the Procrustes metric (Section 4.1). which makes use of known corresporulences. to a. metric which handles the unknown corresporulences by means of the match matrix ill: ”'7 "‘5 \/— ‘._ - s(",w —/i~,) (0] .110) 2 E ”,0, 2 ll-- I l ’ — —Ii’9 —— — (1 . T s) E E l 1} 0",, T () $05 a i=1j=1 When correspondences are known. ya). 07- and “(5’ (IO- are the respective means and covariances of 7 and 6. In the unknown correspondences case, these are computed by making use of the match matrix: 72.5 :1—1 "’7 ijl All] '1 “’7 — 716 :1—172'7'ijl M1] n5 as, :2 Z n7 2 MUM, n7] :1 j=1 and similarly for 6. The curves are then registered iteratively, in a way similar to the EM algorithm: given an estimate of the parameters, one registers 7 and 6 (using the Procrustes method - Section 4.1). This changes He: 07, “6 and 06' The parameters are re- estimated using the new values of the means and variances, and the procedure is repeated until convergence. Another class of algorithms tries to establish the correspondences based on the local geometric properties of the two curves [19, 95, 26, 6]. The idea is that. homologous points on the two curves must have similar shape contests. To find the points with 50 similar shape contexts, one defines an energy functional based on the local descriptors of the two curves. For instance, when the curvature is used to describe the local geometry of two parameterized curves 7(9) and (5 (t), one can define and minimize the curvature energy E(h) = /|h~,(s) — A‘6(h(s))|2ds where Ira/(s) and 156“) are the respective curvatures of the two curves, and h. is the (unknown) diffeomorphism which provides the correspondences: t = [2(3) Alterna- tive local shape descriptors are used in [6, 102]. Yet another class of techniques for establishing correspondences is the class of iterative closest point (ICP) algorithms [7, 114]. The idea is to match each point on the source curve ’7 to the closest point on the target curve 6. This establishes (non-optimal) correspondences, and a registration (with known correspondences) can be performed. Then, the closest points are computed again to re-establish correspon- dences, and a new registration is performed. This two—step iteration is repeated until convergence. 3.3 Distance map registration We have seen that with parametric curves and with the squared error measure, the registration is complicated by the point correspondence problem: we must find an optimal way of linking the different parametrizations along the curves to be registered. In the discrete case, this means finding the homologous points on the two curves. Using the distance map representation of a curve (Section 2.4.1, and also Equation (3.3)) and the directed average Hausdorff similarity measure (3.7) the point correspon- dence problem can be totally averted (at some cost, as we shall see). To fix the ideas, suppose '7 and 6 are two plane curves that we want to register. Let T(-) = T(G, ) be a 2D transformation defined by a set of parameters 8 = ((91 . . . 6A.). The set of parameters can be any one of the sets of parameters described in Section 2.1.1, or more general. We define an energy function (chamfer energy) E(T) : (falv(r l "I 1( ) <5) 1 ' .. s i s ‘ — [1(2)] A1(7)d(T(,(. )).())(l. (3.22) In this equation, d(-,6) is the distance map representation of the target curve 6, and |T(7)| is the length of 7(:) Division by the length [T(y)| penalizes the transforms that tend to map everything onto a single point. It should be noted that Borgefors [9] reports better results with the root mean error. l — 1 "t s 2 s 15(7) — (/ To» /T(,/)d<:r(,t)),6> d. Since we only apply T to the source curve "f, keeping 6 fixed, the chamfer trans- form of 6 can be computed once, before the minimization proper, using for instance Borgefors’ algorithm [9], or the level method (Section 2.4.1). This reduces the time complexity for computing E to linear (in the number of points on y). This trick only works precisely because we do not have to compute the distance transform of the moving curve T(y), because we are using the directed version of the chamfer metric (3.7). This would not be possible with any of the symmetric versions (3.5), (3.6) or (3.8) of the Hausdorff-type measures, as these entail computation of both d(-, 6) and d(-,T(7)) (and T(y) varies). Observe also, that the distance map energy (Equation (3.22)) depends on the transform T, but does not depend on any unknown mapping h : ’y —> (5 that es- tablishes point correspondences. (Compare with Equation (3.10)). The important consequence of this is that the distance map registration does not require a solution to the nontrivial problem of point correspondences. All that needs to be done now is to compute the transformation T that registers *) and 6, as 7 mm = arg min E(1). _ ,1, Since T is uniquely identified with its parameter vector 9. the energy (3.22) is a function of finitely many variables, and so it can be minimized using well established, off-the—shelf algorithms such as the. Nelder—Mead simplex method [66, 81, 88], or - our favorite - Powell’s conjugate directions method [88]. Figure 3.2 shows the results of distance map registration. In part. (a), the unregis- tered source curve '7 (blue) and the target. curve 6 (red) are shown. The initial value of the energy (Equation (3.22)) is E = 2584. l\"Iinimization using scaling transforms yields an energy of E = 1563. The results of the registration are shown to the right, in Figure 3.2 (b). It is interesting to see what happens if we use e.g., afiine transforms instead of scaling transforms. Since the affine transforms allow for more variation in shape (e.g., shear), the two curves can come closer to each other, so it is expected that the minimum in Equation (3.22) be smaller in the case of affine transforms. This is indeed the case in practice. If we register the same two curves from Figure 3.2 using affine transforms, we get a value E = 989, significantly lower than E = 1563 in the case of scaling transforms. The affine counterpart of Figures 3.2 (a) and (b) is shown in Figures 3.2 (c) and (d). The two curves come much closer to each other (Figure 3.2 (d)) than in the case of scaling transforms (Figure 3.2 (b)), but the results are hardly what was intended. This shows the importance of two factors: 0 choice of the appropriate class of transformations. 0 importance of the non-symmetry of the chamfer metric (3.7) and (3.22). we have to use the non-symmetric version of the distance between T (7) and 6 to 53 reduce the time complexity, but the price we pay for this is the possibility of artificially good, but disproportionate alignments of the source onto the target, like the on in Figure 3.2 (b). (C) ((1) Figure 3.2: (a),(b): Distance map registration with scaling transforms. The measure of fit is E = 1563. (c), ((1): Distance map registration with affine transforms. Measure of fit: E = 989. Left column: unregistered. Right: registered. Blue: source. Red: source. 3.4 Semiaxes registration We describe in this section a novel registration algorithm with unknown correspon- dences, most suitable for objects with one axis longer than the other. We call this semiaxes registration. Let ‘y = ((71, . . . ,yn) and 6 = ((61, . . . ,6m) be two sets of points (possibly, but 54 not. Ilecessarily, discretized continucms curves). If each of '7 and 6 have roughly the 511a1)e of an ellipse. then it would make sense to try to align the two sets of points by aliglling the underlying ellipses. Here is the algorithm: Normalization. F irst. we normalize the two sets to have zero mean and size one. The normalized sets of points are: \) l ~>| \>I on O) I I 04 cal Rotation. Compute the unit eigenvectors (e1.(72) of the. 2 x 2 covariance matrix of the set ’7, with corresponding eigenvalues A1 > A2. Similarly for 6 , let (f1, f2) be the unit eigenvectors, with corresponding eigenvalues #1 > )72. We choose the signs of the eigenvectors so that both the (e1, eg) and the (f1, f2) frames have positive orientaticm. In general, computation of eigenvalues and eigenvectors is non-trivial, but powerful iterative algorithms exist [88]. However, in dimension 2 it is trivial to derive closed form formulas for both the eigenvalues and the eigenvectors, due to the fact that the characteristic polynomial of the 2 X 2 covariance matrix is quadratic. Thus, computation of the eigenvectors and the eigenvalues is done in constant time, if the covariance matrix is given. Computation of the covariance matrix is obviously linear in the number of points in the set. If the eigenvectors play the role of semiaxes, then we must align them, so we must compute the rotation (around the origin, as y and 6 have zero mean now), which brings e1 over f1 and e2 over f2. This rotation exists, because the two frames are orthonormal and have the same (positive) orientation. Of course, when computing the eigenvectors of 7, say. there are two choices in choosing the positive-orientation eigenvector frame: :l:(el,e9). Each of these frames must be rotated over each of i(f1. f2), resulting in four rotations 55 [71,. . . R4. We have to choose the best one. To define “best”, we can use any metric between sets of points in R2 that does not assume point correspondences (e.g., the Hausdorff distance (3.6)) The transform T that aligns the source set. 7' onto the target set 6 is a composition of translation, scale and rotation, possibly also composed with a. symmetry. It is, therefore, a scaling transform (possibly with negative scale). 1. Advantages: 0 does not assume correspondences, nor does it try to find them. 0 speed and simplicity: no minimization is required. 0 can be combined with distance map registration (Section 3.3), by providing a very good initialization to the distance map minimization procedure. 2. Limitations: 0 ambiguity (ill-posedness) increases when registering circular sets (i.e., when the eigenvalues are approximately equal). This would be the case, for instance, when trying to align two circular saw blades: even if they were identical, the teeth might be offset by a random amount. 0 there is no obvious metric with respect. to which the aligning transform T is optimal. (C) Figure 3.3: Semiaxes alignment of two contours. (a) and (b): Two contours and their corresponding eigenvectors drawn at their respective centers. Each eigenvector is proportional to its own eigenvalue. (c) Alignment. The blue contour (source) was brought over the red contour (target), by aligning the eigenvectors. 57 3.5 Summary Here are the main points of this chapter: 0 Given two curves '7 and 6 we would like to measure their shape similarity. To this end, we need a metric (12”,, = dinvifif‘ 6) invariant under e. g., scaling transforms. It is often easier to define a rum-invariant metric first (1 = d(7,6), and turn it into an invariant metric by minimizing the energy functional defined by E,,.; = («Tc/)6) where T ranges over a set of parametric transformations (Section 2.1.1), and E is an optimality criterion (such as the squared error energy (Equation (3.10)), or the distance map energy (Equation (3.22))). Minimizing the energy functional E (T) is called curve registration. The invariant shape metric can then be defined by (l in c ( ”7.6) = 11%11 EA/‘(5(T) O Curves extracted from digital images can be represented — explicitly, as parametric curves: 7' = 7(t) (continuous case), or as n- dimensional complex vectors 2 = (:1, . . . , zn) E C”. Registration of para- metric curves depends on the point correspondences between the curves. These can be * known (Section 3.2.1). Closed form formulas exist in the case of scaling and affine registration (Propositions 3.2.1 and 3.2.2) as well as in the case of TPS registration (Equation (3.21)). 58 :1: unknown (Section 3.2.2). Registration is much more difficult than in the case of known point corresporidences. — implicitly. as the zero set of their distance map. a: distance map registratirm avoids establishing point correspon R2 be an arbitrary snake, and let “it be a l parameter deformation of '7'. That is, 2This technique of adding an extra term. for constraining the solutions is called Tykhonov regu- larization. See for instance http://en.wikipedia.org/wiki/Ridge_regression and references therein. 71 A; ; [or] x [0.1] —’ R2. (1.5) —> ayes) 2: €,(s-) The derivative of E at “t is then defined as I (113M _ (“em-a) (II, I — (It [1:0 _ 5.5 _t)E(i(t..s-)){ ( ) _ at 1:0 In general, computation of the derivative (5.5) is not always straightforward, de- pending on the definition of E. The resulting system of equations dE/dt = O is called the Euler-Lagrange equations of E, and, in our case, it is a 2-equation system of ordinary differential equations (ODE), because the variable 7 = (.r, y) is a function of only one real variable 3. For the explicit derivation of the Euler-Lagrange equations in the case of snakes and a numerical implementation of a. solution, see the original paperl63l This model is known to have a number of shortcomings. some of which are: 0 The snake energy (5.4) depends on the paraimrtrization of 7. One could change the energy without. changing the shape of the snake. simply by changing the parametrization. 0 Numerical stability: as the snake evolves, the points on it. can slide along the snake coming too close together in some regions (e.g. being “attracted” by an energy pit), while leaving the rest of the snake sparse. For numerical stabil- ity, the points on the snake must be maintained relatively equidistant, which requires meticulous point insertions and deletions at each snake iteration. 3111 general, the Euler-Lagrange equations is are partial differential equations (PDE). 72 o In homogeneous regions (i.e. where V] I z 0). the snake does not move. as the forces acting upon it are (close to) zero. 0 The snake does not change topology. Although in general this is viewed as a negative feature. it can become. useful when detecting objects whose topology is known a priori. such as in medical imaging. See Section 5.3 for an example. The active contour model underwent major inmrovements (themselves variational models) e.g., in [20, 2. 110, 112. 73]. trying to address the above, and other issues. 5.2 Geometric Models The geometric models were introduced independently in [14] and [71] based on front propagation theory [82, 83] and were a revolutionary way of viewing active contours. These authors proposed a non-variational approach, namely mean curvature motion along the normal direction. More precisely, let. 71(3) 2 7(t, s) be the evolving curve, at moment t. Here 5 is the parameter along the curve and t. shows the dependence of the curve on time. Let u be the inward unit normal along the curve. See Remark 2.4.1 for this choice of V. The requirement. that any point on "it move along the normal is described mathematically by the following differential equation: at? = F - 1/ (5.6) where F : R2 ——> R is a real-valued function (speed). Note that Equation (5.6) does not necessarily arise as an Euler-Lagrange equation of some energy functional, but it is rather simply a geometric requirement. The evolution of ”y is controlled by a judicious choice of the speed function F, based on the following observation: when F is small, 7 changes very slowly, and when F is large, '7 changes rapidly. Though not mandatory, a good (and well studied) choice for the speed is to take F a function on 73 the curvature A' of 7, i.e., F 2 EU.) The first and simplest example turns out to be surprisingly important. and well understoml: Example 5.2.1 F = c + k where c > 0 is a. real constant. The corresponding flow is an = (c +1..) - u The c~ 1! component moves the curve inward {c > 0) or outward (c < 0), whereas the k. - 1/ component has been shown to shorten as well as smooth out 7, and to decrease the total curvature of 7 [97]. Let us now discuss the implementation of geodesic models. As explained in Sec— tion 2.2, there are two ways to represent the curve 7);: explicitly, as 7t = 77(5) = (:rt(s),yt(s)), or implicitly, as the level set of some surface. in the first case, the implementation is straightforward: discretize 7, compute the (unit) normal 1/ at each point and move the point in the direction of V. However, this naive implementation is known to be very unstable numerically. Fortunately, a. far more elegant, powerful and stable method arises if, at. any time t during the evolution, we view the active contour as the zero level—set of an (also evolving) surface @tflfi, y) = (,r')(.r, y, t). That is, 7(s, t) = (:z:(s, t), y(s, 1)) satisfies the equation q§(.r,y, l) = 0 W 2 O (5.7) We now have a surface cedar, y) whose zero-level set 7t must. evolve according to (5.6). Naturally, the question is how must at evolve in order that 77 satisfy (5.6)? By differentiating (5.7) with respect to t. we get 74 0 = llt<,o(.r. y, t) = ()10 ° (5)).1.‘ + 0.1/0 - (ll?! + 0165 = ( V7101?) + (){0 2: F. ( Vo'. V) +0195 (by (56)) Vé : - —- 7) I) = F.|vo1+o,a (5.8) In this last equality we have used the fact. that. since 7) is a level curve of (lot, its unit normal 1/ is given by z/ = V@/ |V. Conse- quently, as with any geodesic model, the evolution of 7' can be carried out numerically using a level—set implementation. The surface evolution equation corresponding to (5.14) is: 8m = g - 0, and the dot product is largest when VI .1. 7. Therefore, a weight w > 0 is needed in this case. Reversing the orientation on '7 would require a weight. w < 0. An illustrative example of the general case is shown in Figure 6.4. The green curves represent the initial (a) and final (b) interpolation snake. A blue circle represents a 91 control point with 21' = 0 and will be kept fixed during the minimization. The variable points are marked in red. either with a plus sign (w = +1) or with a minus sign (w = —1). The energy (6.1) is a function of the red control points only. The weights at any other point on the spline (green points) are computed by interpolation of the known weights. In this example, the orientation of the spline is from top to bottom. Although initialized on an edge, the snake moves away from it, picking the edges with the polarity dictated by the weights: as we walk downward along the final snake (b), the w < 0 points are attracted by a dark —+ light. edge, whereas points with w > 0 are attracted by a light ——+ dark edge. This edge polarity selection by the snake is due to the fact. that unlike the image energy Eimg in (5.4) which only depends on the magnitude of the gradient, the Eimg in (6.1) depends on the magnitude of the gradient and on the angle between the gradient VI and the (unit) tangent to 7. We have found that this latter image energy performs better than the original magnitude—only image energy. A final, but important detail about Eimg is the computation of the image gradient VI. Computation of the intensity gradient requires smoothing of the initial image, and the amount of smoothing determines the scale at which the snake evolution takes place. We compute the gradient in a way that allows us to seamlessly vary the scale, from coarse to fine, during the fitting process. Instead of the usual Gaussian smoothing, or the more sophisticated anisotropic smoothing methods [86], we compute the directional derivatives at each pixel as 045] = [right — [left 8y] = [— bottom _ flop where each average intensity is calculated on a small rectangle, as shown in Figure 6.5. 92 (a) Initial (b) Final Figure 6.4: Interpolation snake. Orientation along the snake is from top to bottom. Blue and red: control points. Green: arbitrary points on the snake. The weights of each of the control points are indicated by the shape (and color) of the marker: a blue circle means to = 0 (hard constraint), whereas a red :1: means to = :l:1. Despite starting out on an edge, the particular weights guide the snake towards other edges, as follows: 11) < 0 selects dark ——> light gradient, whereas w > 0 selects light —> dark gradient, as we move along the curve with the chosen orientation. The points with w = 0 (blue circles) remain fixed during the fitting process, acting as hard constraints. _ __ top r—i Ibottom Figure 6.5: Computation of the image gradient. 93 Computation of the average intensity over a rectangle of any size can be carried out in constant time, regardless of the size of the rectangle, using only four look-up operations, by using a clever representation of the image I in the form of the integral map [107]: ~ '1' y 1(r.y)=/ / Remedy 0 0 Clearly (Figure 6.6), on any rectangle [$1,513r] >< [yl, yr] we have (Er yr ~ ., ~ ~ / / I(.z:,y)d.z:dy = [(35% yr) - 1(1'7'» 91) — [(113], 317‘) + [(175,311) (63) 171 yl which makes possible the computation in constant time of the average intensity on any rectangle. ll yr C 0 x1 xr T Figure 6.6: Integral map. The integral of I over the rectangle ABC D can be computed by Equation (6.3), knowing the integrals of I over the four rectangles with the lower left corner at the origin 0, and the upper left corner at A, B, C and D. 94 In the discrete setting. the integrals get replaced by finite sums. The integral map itself. can be computed in quadratic time using a straightforward, bottom-up dynamic programn’iing algorithm. based on the following recurrence. obtained by taking .1:,- 2 .rl + 1 and gr = y] + 1 in (6.3): I(.lr+1.y+1)= I(.r.y+1)+I(.1:+1.3/)—I(.r.y)+I(.r+1.y+1) Note that this last relation gives [(13 + 1.y + l) in terms of the values of I, so knowing the integral map of I we can recover exactly the image itself. In other words, the integral map is an invertible operator. The size of the rectangular patches on which the gradient is computed determines the amount. of smoothing, and hence, the amount of detail that will be used during the fitting: a larger scale ( z 30 x 60 rectangles), makes the snake sensitive to more distant edges, but produces a rather coarse fit, whereas a small scale (e.g., smaller than 10 x 20) makes the snake sensitive only to nearby edges, and produces a closer fit. We illustrate multi-scale fitting of a nmlt'iple-component interpolation snake in Figure 6.7. The original image is noisy, with 20% salt-and—pepper noise. Figure 6.7 (b) shows the initial multi-component snake. There is a single (closed) component around the circle, and four open components around the square, concatenated. As before, the red “+” signs represent variable control pints with positive weight, and the blue circles represent fixed control points. Figure 6.7 (c) shows the detected boundary at a coarse scale (30 X 60 rectangles used in the computation of VI). The fitting continues at scales 20 x 40 and 10 x 20. The final result is shown in ((1). Figures 6.7 (e) and (f) show the actual gradient (x-component only) used in (c) and (d), respectively. Since computation of the integral map need only be done once, in one pass through the image, and since afterwz-irds computation of the gradient. VI can be done in 95 (0) (f) Figure 6.7: Multi—scale, multiple-component interpolation snake. (a) 20% salt—and— pepper image; (b) Initial snake; (c) Coarse scale fit: 30x60 rectangles for computation of VI. ((1) Fine scale fit: 10 x 20 rectangles. (e) Large scale gradient used in (c) (.1: direction only); (f) Small scale gradient used in (d) (.1: direction only): 96 constant time, regardless of scale, we, can and do vary the scale during the fitting process, gradually, from coarse to fine. 6.2.2 Arclength energy The arclength energy Earc is designed to keep the points of the snake equidistant throughout the minimization process, by penalizing non-arclength parametrization. L I E(1.7‘(:=A) ll“! (Ill — lldt (6'4) Since under the canonical parameter s, the speed along the curve is 1 (i.e. l"/’(S)| = l7’(T)| -1 parameter. The necessity of the arclength term is due to the following argument. 1), then the amount is indicative to what extent. 7' fails to be the canonical Without exception, a curve *7 is discretized by sampling it at equally spaced grid points t,- = 2L - At. However, as we have seen in Section 2.2.1, the points 7'27 = 7(ti) need not be equidistant, so this procedure does not correspond to the arclength parametrization. Of course, if a non-uniform distribution of the points along the snake is allowed, then the image energy will indicate a misleading measure of fit, if e.g., a densely populated segment along the snake happens to fall into an energy pit. (strong edge). It is therefore crucial to start out with, and maintain equally spaced points on the snake. This situation is illustrated in Figure 6.8, which shows a typical case. We fit an interpolation snake twice, starting from the same initial configuration: once using Eimg only - hence without. the arclength energy (6.4) - in Figure 6.8 (c), and one more time using both Eimg and Earc (Figure 6.8 (e)). For visual comparison against the ground truth (Figure 6.8 (a)) we overlay each result on top of the ground truth (Figures 6.8 (d) and (f), respectively). In the absence of the arclength energy, the goodness of fit is measured by the image energy alone, and has a value of —44.7159, 97 “better” even than the image energy along the ground truth itself (—16.4465), erro- neously indicating a very good fit. When the arclength energy is combined with the image energy, the fit is visibly better than in the former case. Quantitatively, in each case we measure the difference between the detected boundary and the ground truth using e.g., the modified (average) Hausdorff distance [32]. Without the arclength energy, this measure is 7.68 pixels, whereas with the arclength energy, the average Hausdorff distance is only 3.60 pixels. Somewhat. surprisingly, the problem of preserving arclength parametrization along the snake during the evolution, although noticeable both theoretically and practically, has been silently overseen, and left as an implementation detail. Different active con— tours implementations deal with this problem in an ad-hoc manner, 8. g. by removing points from regions along the snake where the points have come too close together, and inserting points in regions where the snake has become too sparse. The first au- thors to address this problem were, to our knowledge, Williams and Shah [110], who replace the first order term in the internal energy Eint (5.3), by a term which penal- izes for deviations from the mean distance d]: :lA/i,+1 — 7,-l/n along the contour, namely Z(]’)’i+1 — ’7.le - (f). In the discrete setting, we take an equidistant subdivision of the interval [0, L], namely T2- = 2' - L/ n, and let. 7,: = ’7’C(Ti). We approximate the derivative at 7",: as "/(Tz') z (1.4,.+1 — 71-) / AT, and approximate the arclength energy Eaxrc by a Riemann SUIDZ '_1l/ Til E(L'I'C x Z ]— i+1;— — 1] ° AT (6.5) 2 "1x; lam _,.,| .. m] (6.6) 98 Figure 6.8: The need for arclength parainetrization. Green: interpolation snake (con- trol points omitted). Yellow: ground truth. (a) Initial snake (arclength parametriza- tion). (b) Manually drawn border of a human kidney. (c) Snake fitted using image en- ergy only. The points do not remain equidistant. and tend to accumulate along strong boundary segments. producing an artificially good score (E = Eimg = -44.7159). better even than the ground truth score (F : Eimg : —l6.~l'1t35), but a poor fit. Average Hausdorff distance from ground truth: 7.68 pixels. ((1) Same as (c), overlaid over the ground truth. (e) Snake fitted using both the image energy Eimq and the arclength energy Rum. Average Hausdorff distance from ground truth: 3:60 pixels. (f) Same as (e). overlaid over the ground truth. 99 If L is chosen to approximate the length of the (initial) contour, and if this length remains roughly the same during the evolution. then 72—1 2: lA/I-l-I - "/I'] % L :11.- AT i=0 so AT is the mean of all distances between consecutive points along the snake. We arrive this way at the elastic energy proposed by Williams and Shah [110]. The initial snake is also constructed so that. the sampling at uniform time intervals corresponds to (an approximate) arclength parametrization. This was explained in Section 2.3 6.2.3 Shape energy The shape energy E represents the shape prior knowledge and can be imposed shape e.g., as a distribution on the vector of coefficients. Any of a number of techniques for distribution estimation can be used, such as e.g., Principal Component Analysis (PCA) on a number of training contours, Parzen windows, or more general kernel methods [24, 25]. In our experiments we use the average chamfer transform model, discussed in detail in Chapter 4. Specifically, let 71,. . my): be registered training samples, represented respectively by their chamfer transforms C1, . . . Cl.“ Let _‘ 1 c =EZC, the average chamfer transform, and let it be the curve that. minimizes it: ,u = arg min C(n’) (6.7) #1 If 7C is an interpolation snake, then we let TC be the similarity transform that 100 does semiazes alignment (Section 3.4) of 70 onto a, define the shape energy as Eshapc(C) = 3(CiTCWCll) (6.8) where the function 3 = 3(1) has the form 2 2 s(x = a -——- ) 1:2 + b2 for some constants a and b. The reason for “squashing" the average chamfer transform by composing with s is to bring C within a fixed, controlled range (Figure 6.9). Figure 6.9: Squashed average chamfer transform, used as the shape model with in- terpolation snakes. Compare to Figure 4.3 Note that even though the shape energy is defined as a function on the control vector C, at each iteration we register the entire spline '70 to it. (not just the control pointsl). The actual similarity to the training samples is computed again on the entire 70' Both the registration of 7C to a and evaluation of C (TC(7C)) have time complexity linear in the number of points on 7C: the same as the complexity of the other two energy terms Eimg and Earc- 101 The benefits and limitations of the shape energy should be well understood and not overestimated: the shape energy. in general. can correct for significant deviations from the general shape of the training samples, but cannot account for details in any particular test. case. The example in Figure 6.10 illustrates the the benefits of using the shape energy. A clear kidney ultrasound image (Figure 6.10 (a)) is occluded by applying a white stripe with dark islands (Figure 6.10 (b)), in an attempt to “confuse” the snake. Figures 6.10 (c) and ((1) show the detected boundary without the shape energy term (c), and with E shape (d). The “distractor” stripe tends to deform the snake toward a shape not part of the model, and the shape energy term manages to correct. for that deviation. What the shape energy cannot do, is to correct the shape of a snake (towards the desired boundaries), if the snake already happens to be similar to the training shapes. We illustrate this point in Figure 6.11. Starting from the same initial contour (Figure 6.11 (a)), we fit the snake with (Figures 6.11 (c) and (d)) and without shape energy (Figures 6.11 (e) and (f)). The results in both cases are similar, and similar to the training shapes (on average), yet different from the ground truth (yellow). The reason is that unlike in the previous example, there is no distractor to alter the shape of the snake significantly, from the learned model. The shape energy has nothing to correct for, and has a minimal effect, because near convergence, the snake is already from the distribution of the training samples. In fact, without the shape energy, the average Hausdorff distance between the detected boundary and the ground truth was 5.13 pixels, whereas with the shape energy, the average Hausdorff distance was slightly higher: 5.85 pixels. 102 (d) Figure 6.10: Effect of the shape energy Eshape' (a) Clear kidney image. (b) Occluded kidney, and initial snake (green). (c) Final snake with no shape energy. ((1) Final snake with shape energy. 103 ._. ‘ e""‘¢.: “xv I ..I\ '-.'II---I-. ’- .I I'- II. a" I I I I a I I I I c I l I s I (e) Figure 6.11: Limitations of the shape energy. (a) Initial snake. (b) Ground truth. (c) Final snake without Shape energy. (d) Comparison of (c) with the ground truth. (e) Final snake with shape energy. (f) Comparison of (e) with the ground truth. 104 .I‘ '. I I I I I I l I I n. I I l I as I. '1' ""I ..“lfi u. IA.- . I II. (8) Figure 6.11: Limitations of the shape energy. (a) Initial snake. (b) Ground truth. (c) Final snake without shape energy. (d) Comparison of (c) with the ground truth. (e) Final snake with shape energy. (f) Comparison of (e) with the ground truth. 104 6.3 Conclusion to Chapter 6 \Ve have. defined here a type of active contour - intcrpolution snakes - with shape memory, smooth by construction, and that allows for hard spatial constraints. This is achieved by means of an interpolation spline, preferred over other splines because the spline passes through all its control points. The snake can have multiple connected components (Figure 6.7), and each compo- nent preserves topology during the boundary finding process. This is not a drawback in applications such as medical imaging, where the topology of the object of interest. is generally known a. priori, it is almost a “must-have” feature. A three-term energy functional is being minimized (Equation (6.1)). The main term (E Equation (6.2)) is the boundary—seeking component. Unlike the image imgv term in the original snakes algorithm [63], which only seeks pixels with large intensity gradient, our image energy term guides the snake based on the angle between the tangent to the snake and the image gradient (Figure 63). In particular, this makes the snake sensitive to the edge orientation1 (Figure 6.4). The second term (Earc, Equation (6.4)) helps the snake remain parametrized by arclength, thus preventing the constituent points of the snake to accumulate near very strong edges in the case of objects with non-uniform boundaries (Figure 6.8). Last, but not least, there is a shape prior term (E Equation (6.8)) that prevents significant. deviations of the snake shape’ from a known, predefined shape model. The three terms must be commensurate, so some normalization is in order. The choice of splines for the active contour reduces the search space from an in- finite dimensional space of C2 curves in R2 to a low dimensional R2,“, where k is the number of control points. As such. the energy E is a function of 21" independent. variables and can be minimized by standard routines, some of which do not. require 1This feature can be turned off. however. by taking absolutee value in the definition of Hwy (Equation (6.2)) 105 con‘iputatitm of the gradient. thus avoiding computation of the Euler-Lagrange equa- tions. 106 Chapter 7 A Different Direction: Break Point Detection We devote this chapter to a non-variational approach to boundary detection. The new approach finds the boundary by detecting jumps in the intensity profile along segments across the boundary of the object of interest. The core of this approach is a general, mathematically sound algorithm for detection of jumps in a 1D signal. We expect this algorithm to have applications well beyond border detection. It. is one of the contributions of the thesis. 7.1 One-Dimensional Break Detection We arrived at the problem of break (or jump) detection in 1D signals while trying to find the kidney boundaries in ultrasound images. By and large the kidney has overall lower intensity than its surroundings. Thus, if AB is a segment. that goes across the boundary of the kidney, for any point. P 6 AB we can compute the average intensity on AP and on P8. lntuitively, the border point we seek should be the one for which one average is as low as possible while the other is as high as possible. See Figure 7.1 and also Example 7.1.2. 107 Figure 7.1: Boundary detection by finding a break point in an intensity profile across the boundary. It is a trivial matter to detect a single break point: simply move along the profile and choose as break point, the point which maximizes the difference between the left and right average intensity. From the very beginning, it is clear that the problem of break point detection should be formulated in the more general case, that of multiple break points. It should also be clear that we cannot expect a break point detection algorithm to work well for arbitrary 1D signals: in this motivational example the intensity on either side of the break point is relatively constant. After some thought, we have come up with the right mathematical model of the situation we just described. We present this model next. For any interval [a, b] C R, let the set of n-segment partitions of la, b] be Pn={T=(a=t0 O and only require that r) integrate to zero only on subintervals of length no less than x\, then. we can find many such functions. Divide up the interval [a, b] into a large number of equal length subintervals, and call their common, nonzero length A. On each of such subintervals consider arbitrary Haar-like functions, i.e., functions symmetric with respect to the midpoint of the interval. Because of symmetry, any such function integrates to zero on its respective subinterval. If n is now any combination of such functions, then 7) integrates to zero on. any union of length-A subintervals. Accordingly, we can no longer work with arbitrary partitions of the whole interval [a,b]. but only with partitions whose break points are multiples of A. With these modifications, the proof of Proposition 7.1.1 remains unchanged. This modification amounts precisely to the discretization of the problem. Thus, in summary, Proposition 7.1.1 is true in the continuous case, but not interesting, as the noise 77 must be zero to begin with. Discretization allows for non-trivial noise, and, with the proper modifications the result remains true. For clarity and simplicity, however, we state the results and present the proofs in the continuous framework. In segmentation problems or 2D boundary detection problems one defines an en- ergy functional heuristically, in such a way that the e.g., the boundaries of interest. are expected to be among the local minima of that energy functional. Usually there is no proof that the minima of the energy agree with the human perception, and it is often difficult to control which minima the minimization procedure is going to find. By contrast, Proposition 7.1.1 asserts that if y = y H T + n is a noisy signal, then 111 its defining piecewise constant core l/IIT - which we would like to remix-er - is the absolutee minimum of the 1.“2 energy (7.3). so by minimizing the energy (7.3) after all C E R” and S 6 P77. we recover the weights H and the partition T. We can do even better though. If we assume the noise 7] has zero mean on any subinterval of [(1.1)]2, then the weight h, is precisely the mean of the signal y on [t,j_1. til' In other words. the weights H are not independent on the partition T, but. rather they are uniquely determined by T (and by the signal y itself). This suggests that we can consider the energy B only as a function of the partition (rather than partition and weights). To prove this fact. let. us first introduce some notation. Let S = (a = so < 51 < < sn = b) be a partition of [a,b]. Let Y8 = (3)1, . . . y”) be the weight vector whose compcnwnts y, are the mean of y on [s ,f_1, 5i)- Proposition 7.1.2 E(G, S) 2 E(YS,S) VG E R" Proof: : b 9 E(G.S)=/ ly-ycsl“ (I. T) 8] = / (y—yj +37J' —.(Ij)2(1t . 3- n s 3. 5.. J _ _ J _ J = E: (/ (y—yj)2dt+2(yj -gj)-/ (y-yj)dt+/ (yj —.qj)2dt) . 3’ 8' .S.‘ \__..v__/ 0 = E(YS.S) + Ivy/5s — ”310.5 L2 8'. v J 0 f . v '1 a — . : J I . _ ' o . 7‘ ' ‘ .7 where we have used that y] f9j_1y(t)dt/(sj 53—1) is the mean of y over 2Again, in the sense of Remark 7.1.1. [S‘j— 1 , Sj ). C] It follows from this proposition. that in order to segment the signal y. it suffices to linimize 1e enerU‘VS——i L .ra ier ran 1..' ——> L LS . To summarize. n tl g“; F158 tl ti ('8 PC Corollary 7.1.1 Suppose y = ij: + 7) with [ii ml! 2 0 for any subintcrval [a, 1'] C [(i.b]. For any partition S E P", let yJ- be the mean of y on [sJ-_1.sJ-). and let Y-.= ' ....— lztl:’;:7’l.tx'x:t " fly :1": r w tlz' Twirl): b (yl y”) )( If any: l(( ()l of 1c mums am yb 305.5 f 2r puuuzst constant function (l(.fin.cd by Y9 and S ) Define the energy b as = [G (y — is)? (7.4) Then T = arg min E(S), H : Yr 15.61)", F That is, the partition T is the absolutee minimum of E, and the weights vector 11 consists of the means of y on each. of the segments of T. Certainly. the problem now is to minimize the energy (7.4) with respect to S 6 P71. Equivalently, we must minimize E after S E R", subject to the constraints a = 30 < 51 < < sn 2 b. In the discrete case, if y is given as a sequence (yo, . . . .yN), then for large N and n an exhaustive search for the optimal T would have complexity 0(N") (“N-choose-n"). Fortunately, a very elegant dynamic programming solution exists, based on the following observation (stated here in the continuous case): Proposition 7.1.3 Suppose T = (a = to < t1 < < [n—l = b < tn, 2 c) is the optimal partition on [ac] {i.e., T minimizes E {Equation (7.4)) over [a,c]). Let T, = (a = to < t1 < < fn—l = b) be the truncation ofT up to the last interior break point b = tn—l' Then T, is the optimal partition over the subinterval [a, b] C [a, c]. 113 Proof: Since we‘re. working with different intervals. we must specify the depen- dence of the energy E on the interval and on the number of segments. Thus Each. is the energy for partitions over [a, c] with n segments. and F Jawb‘n_1 is the energy for partitions over [a, b] with n, — 1 segments. Here b = tn—l- As a matter of notation, for any interval [11,. 1] let 37W 1,] denote the mean of y over (u. L] Obviously. by the very definition of E (Equation (7.4)). we have c _ 9 13041:"- : Hahn—1 “Jr/1) (y — y[b.c]))~d’ Let. S, = (a = '50 < 31 < -- - < sn — 1 = b) E 79n_1 be the optimal partition over [a. b]. Then the partition 8 = (S’. c) has n segments. and spans [ac]. Since among such partitions T is optimal. we have Ewan. (T) S E(L,C.fn (S) and by subtracting fbc(y — 3)“) (1])2 on both sides we get. . I wI Ea.b.n—l(T ) S Ea.b.n—1(b ) But since 5’ is optimal on [a. b]. we also have the opposite inequality: I I Ea.b.n—1(T ) Z Ea,b.n—1(S ) ‘ * . , y 4 * ' . . Corollary 7.1.2 Let Ea.c.n be the minnnum of Ema", and Eab _1 be the mini- . TI. mum of Ea,b.n—1' for any a < b < c. Then, c y * _ - * — 2 r' [30.0.71 — bgiclijc] (Ea.l).n.-—1 +/b (ll - t/[ydl ) (7'0) 114 Once we have. established the recurrence relation (7.5), the bottom-up dy- namic prograrmning algorithm is straightforward: given a. one-("lin‘iensional signal y 2 ((yo. . . . . yN) and a. desired number of segments n, we fill incrementally a matrix E*(b,k) for integers 0 S b g N and O S A: g n using Equation (7.5). The value E*(N,n) is the minimum energy for partitions over the entire interval [0. N] with n segments. The actual optimal partition T is then obtained by backtracking the indices b which achieve the minimum on the right—hand-side of (7.5). Example 7.1.1 For this earample refer to Figure 7.2. We generate a random 4- segment integer partition T of the interval [0. 1000] r = (0,169,877,924,1000) and a random weight vector H = (7.17897, 96.6553, ——66.3123. —24.9187) The piecewise function y H T determined by H and T is shown in red. This is the 7 signal before adding noise, and can be seen as ground truth. Neat, we construct the noisy signal 31 = nyr + 7I where n is Gaussian noise with zero mean and standard deviation (7 = 20. The resulting noisy signal is shown in blue. The result of the dynamic programming segmentation algorithm is shown in green. The detected partition T, and weights vector H I are 115 200 l l I I I — y=yHP + noise — yHP (gmd truth) 150 - L — segmented - I t 100 h- l _. 50 - - ( l 0 ‘ ‘ _ _50 — _ _1oo — _ _150 1 4L 1 J l 0 200 400 600 800 1000 1200 Figure 7.2: Break point detection of a randomly generated signal. Blue: noisy data y = yHT + 7) with 1000 points and 4 segments. Red: piecewise constant function y H,T used to generate y (ground truth). Green: denoised signal. The noise 1] was Gaussian with zero mean and standard deviation 0 = 20. The true break points were T = (0.169,877.924.1000) and the true weights vector was H = (7.17897, 96.6553, —66.3123, —24.9187). The denoised signal has breaks iden- tical to the true breaks, and weights HI : (9.41748.97.0672. —68.4484. —27.1419). 116 T’ = (0,169,877,924,1000) (7.6) H’ = (9.41748, 97.0672. —68.4484, 47.1419) (7.7) The detected breaks are identical to the ground truth breaks, and the detected weights are close to the original weights. This should not come as a surprise, as the mathematics leading to the dynamic prognmzming algorithm. guarantees that the algorithm. finds the ground truth y H T' Example 7.1.2 In this example we extract an intensity profile along a segment in an ultrasound image, and apply the break point detection algorithm. Refer to Figure 7.3. The the intensity profile is eartracted along the yellow segment in Figure 7.3(a), moving from the top green end towards the red end. This profile is plotted in blue in Figure 7.3 ( b). The break detection algorithm is applied, and the resulting piecewise constant approximation of the profile is plotted in ( b) (green). The interior break points are marked along the yellow segment in ( a ) with cyan { for a high to low break) and with magenta {low to high). Aside from these toy examples, we shall present a real application of this algorithm in Chapter 8.5. In that application, we detect break points in ultrasound images along rays of equal length, originating from a point, for the purpose of border detection. In fact, although we presented the theory first, the road-map was from concrete to abstract. We were looking for single break points along radial intensity profiles, optimal with respect to some metric. It turned out that the energy (7.4) was giving good results, and for two-segment partitions, the break point. was readily found by a linear scan of the profile. Since the results were encouraging, we. devised the experi- ment in Example 7.1.1, in which the ground truth is known: starting with a known piecewise constant function y H T7 segment a noisy version of it y = y H T + 77. In 117 300 .4 250 - 200 150- 50- - All WV W _ o l l_ 0 100 120 140 160 20 40 (b) Figure 7.3: Break point detection along an intensity profile in an ultrasound image. (a) We move along the yellow segment from the top green end towards the red end. The detected interior points are marked with cyan (high to low) and magenta (low to high). (b) Blue: 1D intensity profile along the yellow segment from (a). Green: segmented signal. 118 all experiments, the denoised signal was surprisingly close (almost identical) to the ground truth. and that could not be by chance: it had to be a. mathematical expla- nation for it. We were able to provide rigorous proof to support the experimental findings. Moreover, the mathematics showed that there was nothing special about two-segment partitions, as we originally started out. with, and that. the results were in fact valid for partitions with any number of segments. Last, but not. least, we were able to find a dynamic programming algorithm that. would detect these partitions. The only limitation of this work is that the number of partitions has to be specified a priori. It would be highly desirable to be able to choose the number of segments automatically, but we currently do not have a solution for that. We contend, however, that an MDL3 solution to this problem is very likely. 3Minimum Description Length. 119 Part III Applications 120 Chapter 8 Experimental Results In this chapter we present some experin’ients designed to test the performance of the interpolation snakes introduced in Chapter 6. We do a systematic and objective study of four active contours models. including the interpolation snakes on synthetic noisy images (Section 8.4) and on kidney ultrasound images (Sections 8.2.2, 8.2.1, 8.2.3 and 8.2.4) by measuring the shape similarity of the detected boundaries against manually traced ground truth. For clinical validity of our algorithm we perform also a subjective blind study (Section 8.3), in which a human expert. chooses the better of of two algorithms displayed simultaneously in separate copies of the same image. In addition to the results on ultrasound, we demonstrate in Section 8.6 good performance of interpolation snakes for shape detection in non-medical images. We follow up the 1D break point. detection algorithm from Chapter 7 with some partial results in Section 8.5. 8.1 Introduction We have developed, so far. a framework for boundary detection. specifically targeting ultrasound images. In this chapter of the thesis we assess the performance of our algorithms. We do this objectively and systematically, by comparing the boundaries 121 detected using our methods against the true bcnuidau'its (ground truth). In the case of real images (such as ultrasound). the ground truth usually consists of manually traced contours by human experts. while for synthetic images, the ground truth is the curve used to generate the synthetic image, i.e., the curve whose interior is the object. of interest. We describe the types of images used in Section 8.1.3. 8. 1 . 1 Performance Measures One way to evaluate. the 1:)erformance of active contours is to compare them against ground truth curves. \Ve. use three measures to compare the similarity of two curves. Let X and Y be two simple closed curves in the plane. Hausdorff Distance. This has been defined in Section 3.1 (3.6) as H(X, Y) = sup{d(X, Y), d(Y,X)} where d(X, Y) = suplE X (l(.1:, Y) is the directed distance between the sets X and Y. Here, X and Y can be arbitrary sets in the plane, not necessarily curves. The measure satisfies the mathematical definition of a distance, i.e., it has the following properties [47]: l. H(X, Y) 2 0, with equality if and only if X = Y. 2. H (X, Y) = H (Y, X) (symmetry property). 3. H(X, Z) S H(X, Y) + H(Y, Z) (triangle inequality). This measure is known to be highly sensitive to outliers. To exemplify, suppose the two sets X and Y are identical, except. for one point p. That is, Y = X U{p}. Then obviously d(X, Y) = O as X C Y, but (1(Y, X) = (I(]), X), so the Hausdorff distance is H (X, Y) = (1(1), X). As such, the Hausdorff distance can be made 122 arbitrarily large. by moving a. single point towards infinity, even though the two sets may otherwise be identical. In image prrwessing. when comparing two figures (e.g.. each as the result of an edge detector) using the Hausdorffmeasure, the presence of outliers may deem the two figures as different. even though they may in fact represent the same object. One way to make the Hausdorff distance less sensitive to outliers is to replace the supremum with the average when computing the directed distance (1(X. Y). This leads to the following similarity measure (intrmluced in [32] as the “modified Hausdorff dist ance" ). Average Hausdorff Distance. This too has been defined in Section (3.8): ”(11"(X9 )X) : SllI){d(u)(X, yr): (1011(Y7 X)} where (1(1,-1_:(X , Y) = f X d(.1:, Y)d.r / |X l is the directed average distance. As in the case of the Hausdorff distance, X and Y can be arbitrary sets. Here [X | denotes the length of the curve X. It has been observed [32] that this measure does not satisfy the triangle inequality , so it is not a distance in the mathematical sense (properties 1,2 and 3 defined above). Normalized Symmetric Difference. |(X\Y)U(Y\X)| pron NSD(X,Y) = (8.1) This is a measure that has not been mentioned in this thesis so far.It measures the amount of overlap between the sets X and Y. Strictly speaking, if X and Y are curves, 011 the right-hand—side of (8.1) we have the interiors of X and Y, respectively, but for simplicity, we denote these by the same letters as their boundaries. IX stands for the area of the set (enclosed by) X. 123 The numerator is the usual symmetric. difference, and the (.lenominator is a nm'malization factor. Obviously. 0 S NSD(X,Y) S 1 for any sets X and Y, and N.S'l)(X. Y) = 0 if and only if X = Y. NSI)(X. Y) = 1 if and only if X and Y are disjoint. Claim: The normalized symmetric difference does not satisfy the triangle in- equality. Proof: To prove the claim it suffices to find a counterexample. Let X and Y be arbitrary sets with non-empty intersection: X (7 Y 7E (I) and such that neither set is included in the other: X Q Y and Y g: X. The triangle inequality for X, X \ Y and Y reads Z NSD(X, Y) _<_ NSD(X, X \ Y) + NSD(X \ Y, Y) \_q,____/ 0 |X\Y|+|Y\X| < |XnY| |X|+|Y\Xl ‘ IXI Now if we keep X unchanged and expand Y so that |Y\X| -—> 00 while maintain- ing X F) Y constant, then the left hand side in the last inequality approaches 1 asymptotically, so it will exceed the right hand side, which remains fixed, strictly less than 1. Observe that none of these metrics measures shape similarity per se. For instance. suppose the two sets X and Y are two circles of the same. radius. These two figures definitely have the same shape, so a true shape similarity measure should reflect. that. However, all of the above measures are 0 on X and Y if and only if the two circles are concentric. If the two circles are disjoint, say, like the wheels of a bicycle, then N S D(X , Y) = 1 (largest possible), whereas the two Hausdorff measures can become arbitrarily large. The point. is, therefore, that to use these distances as measures of 124, shape similarity, the sets X and Y must first be brought in the same region of space, e.g., by registration. In the case of our experiments, however, this registration is not. necessary, as we do compare the proxrirmjty of the snake to the ground truth, rather than shape similarity. 8. 1.2 Algorithms We measure the performance of the interpolation snakes model (Chapter 6), by mea- suring the proximity of the output. snake to the target boundaries (ground truth) by using the measures described in the previous section. The significance of the results can be better understood relative to other models. As such, we compare head-to—head the following four algorithms: CV: the Chan-Vese model, described in Section 5.2.2. GAC: the Geodesic Active Contours model, described in Section 5.2.1. KWT: the classical Kass-Witkin-Terzopoulos model, described in Section 5.1. I-SNAKE: the Interpolation Snakes model, introduced in Chapter 6. 8.1.3 Data Sets We test all four algorithms on two types of data: ultrasound images of human kidneys and synthetic and images with different amounts and different types of noise. Ultrasound. Our kidney database consists of 117 images, each of size 733 x 471 pixels. Each kidney in this database was manually traced by a human expertl. These manually traced boundaries represent the ground truth (CT) for the ul- trasound images. It has to be noted that, reliable as it. may be, the kidney GT 1Dr. Dorry Segev, Division of Transplantatkm. Department of Surgery. Johns Hopkins Institu- tions. 125 is not 100% objective, because in several images the boundaries are so weak that the true boundaries are not. distinguishable, even to the trained eye of a kidney surgeon. Figure 8.1: [a] Kidney ultrasound. [b] true boundaries (GT) manually traced by a human expert. Synthetic images. We test our algorithms on synthetic images (in addition to ul- trasound), in which the ground truth is known and is entirely objective. To this end, we create a family of binary images representing smooth, similar shapes as follows. We generate an i.i.d family of vectors Cj = (61,..ch1) E R2”, ' = 1 . . . K from a Gaussian distribution of mean CO and user-defined standard deviation 0. Each component cf is a point in R2 (Figure 8.2 (a)). The mean C0 is either manually defined by the user, or randomly generated. We interpolate each vector Cj by a closed spline Sj (Figure 8.2 (b)). The only condition we impose on Sj is that it does not self—intersect, so that the interior of the spline is well defined. A binary image I j is then constructed for each contour Sj , in which the foreground is the interior of Sj , and represents the object of interest (Figures 8.2 (c)—(f)). Each of the clear binary images [j is contaminated with several types of noise, shown in Figure 8.3. We describe each of them below. In the sequel, the index j in Ij is irrelevant, so we drop it. 126 (e) (D Figure 8.2: Synthetic images. (a) Clusters of control points. There are 30 points at each site, drawn from a Gaussian distribution of a user-defined standard deviation 0. (b) Interpolation splines using the control points from (a). (c)-(f) Binary images obtained as the interior of the first four splines in (b). 127 Gaussian noise. [Gauss = I+ No where at each pixel (.r. y). the value I](.l‘. y) is drawn from a normal distribution of zero mean and standard deviation 0. See Figure 8.3 (b). Uniform noise. We create. the noisy image I. Figure 8.3 (c)) by uniform ( changing with probability p each pixel in the clear image I to a random gray level value. That is: o for every pixel :: = (.r. y) in I — draw a. number p E [0, 1] from a uniform distribution. — define I random. integer E [0,255] if p < p uniformd8 IV | S/IVIIds '7 S lVIlmax ' N7 where N7 is the number of points on ’7. In all out experiments, N7 = 100 points. Dividing by this a priori upper bound, Eimg can be rescaled to the interval [0, 1]. The internal energy L I Earc 2/0 llA/(t)l_1ldt is supposed to maintain the arclength parametrization of the snake. As such, it stays close to 0 throughout the minimization, so it can be assumed to already have range in [0, 1], after dividing by N7. 152 Finally. the. shape energy F ”shape is given by the average chamfer transform (Equa- tion (8.6)) squashed by the function 3(1) 2 (1.2.1‘2/(172 + b2). With a = l. the range of s becomes precisely [(l. 1). The parameter b determines the width of the trough in the squashed average chamfer transform (Figure (5.9). With all energy terms nm‘inalized to [(l. 1]. we can take the. weights all in one range, say [0, 100]. We have determined by trial and error (on a. small number of test images) that, in the absence of the shape energy 11 "shape 2 0, the chmce of “'irng = u'arc produces promising results. As long as w remains zero. it does not matter what shape the common value for u and (cm-(r is. This value does matter though, when we 'img have a non zero shape term. In our experiments we set. u'ljmg = u‘arc = 50- We determine the value of w = 0,10,20,30. sen‘ii-exhai1stively, by trying w shape shape Larger values produce a snake too similar to the model, hence a poor fit to the image. The outcomes of the interpolation snakes experiments are summarized in Tables 8.2 , 8.3 and 8.4, one for each of the three performance measures. Included in these tables are only the values for the interpolation snakes (I-SNAKE) and classical snakes (KWT), as the two level set algorithms (GAC and CV) were far behind due to border leakage (see Table 8.1). o Initialization (columns 1-4): — Scaling. Controlling parameter: scaling factor. — Gaussian randomization of the control points. Controlling parameter: standard deviation. — Distance initialization. Controlling parameter: distance from the ground truth. The sign of the distance parameter indicates either interior initial- ization (negative sign) or exterior initialization (positive sign). — thin-plate—spline model initialization using four pairs of anchor points. The shape prior may or may not. be used further during the i‘ninii’nization, 153 OW shape according to whether 11' shape 0 I—SNAKE (columns 6-7): > 0 or 11' (column 5): weight of the shape prior term E shape = 0. respectively. .9 h (1pc ' — shape model a priori warped onto the image using a. thin plate spline and four anchor points (shape prior contingent upon the image data). — tin-warped shape model (shape prior ind(.>}_)endent on the image data). 0 KWT (column 8). Normalized Symmetric Difference ((76) Scalin g St ddev Distance Model to S h ape Warped Absolute KWT Shape Shape Model Model 0.8 - - - 0 14.80 14.80 46.10 0.8 - - - 10 10.37 15.05 0.8 - - - 20 9.63 15.72 0.8 - - — 30 9.19 16.62 - 10 - — 0 9.38 9.38 10.44 - 10 - - 10 7.95 9.42 - 10 - - 20 7.80 10.65 - 10 - - 30 8.29 12.71 - 20 - - 0 14.21 14.21 22.01 - 20 - - 10 10.64 13.42 - 20 - - 20 9.57 14.08 - 20 - - 30 9.37 14.78 - - -20 — 0 13.22 13.22 49.35 - - -20 - 10 9.58 12.73 - - -20 - 20 9.07 14.06 - — -20 - 30 8.96 15.31 - - - 4 anchors 0 9.52 10.18 - - - 4 anchors 10 7.60 - - - 4 anchors 20 8.08 — - - 4 anchors 30 8.31 Table 8.2: Interpolation snakes vs. KWT: normalized symmetric Smaller NSD means better performance. 4 difference (\ SD). Hausdorff Distance (pixels) Scaling St ddev Distance .\'IO( lel u' s h ape Warped Absolute KWT Shape Shape Model Model 0.8 - - - 0 31.53 31.53 48.35 0.8 - - - 10 22.20 31.44 0.8 - - - 20 19.71 31.88 0.8 - - - 30 18.46 33.82 - 10 - - 0 18.37 18.37 17.10 - 10 - - 10 15.69 18.72 - 10 - - 20 14.67 21.12 - 10 - — 30 15.68 26.10 - 20 - - 0 32.02 32.02 34.82 - 20 - - 10 23.58 28.71 - 20 - - 20 20.31 29.08 - 20 - - 30 19.12 30.94 - - -20 - 0 26.86 26.86 43.28 - - -20 - 10 20.05 25.32 _ - -20 - 20 18.05 28.18 - - -20 - 30 17.10 29.98 - - - 4 anchors 0 19.26 19.25 - - - 4 anchors 10 15.49 - - - 4 anchors 20 15.26 - - - 4 anchors 30 15.52 Table 8.3: Interpolation snakes vs. KWT: Hausdorff distance (H). Smaller H means better performance. CJI CJ'I Average Hausdorff Distance (pixels) Scaling Stddev Distance Model “'5 h ape Warped Absolute KWT Shape Shape Model Model 0.8 - - - 0 9.56 9.56 28.47 0.8 — - - 10 6.87 9.96 0.8 - - - 20 6.36 10.49 0.8 - - — 30 5.96 11.33 — 10 - - 0 5.85 5.85 5.93 - 10 - - 10 5.11 5.91 - 10 - - 20 4.98 6.65 - 10 - - 30 5.27 8.10 - 20 - - 0 8.74 8.74 12.06 - 20 - - 10 6.65 8.37 - 20 - - 20 6.07 8.79 - 20 - - 30 5.92 9.47 - - -20 - 0 8.03 8.03 28.83 - - -20 - 10 6.17 7.90 - - ~20 - 20 5.86 8.82 - — -20 — 30 5.73 9.81 — - - 4 anchors 0 5.97 6.21 - — - 4 anchors 10 5.08 - - - 4 anchors 20 5.16 — - - 4 anchors 30 5.28 Table 8.4: Interpolation snakes vs. KWT: Average Hausdorff distance (AH). Smaller AH means better perforn‘iance. 1 6 Perhaps the first thing to notice about the results reported in Tables 8.2, 8.3 and 8.4 is the consistency across the different performance measures, in the sense that for any of the three models tested, the ordering of any two experiments remains the same, regardless of the performance metric. It. suffices, therefore, to comment (mostly) on the NSD table, say. The only two cases when the classical model (KW T) competes with either inter- polation snake counterparts is when the snake is initialized fairly close to the actual boundaries (normal distribution, with standard deviation of 10 pixels), or when ini- tialized from the shape model. Even so. we find the performance of the KWT model quite remarkable. Based on Table 8.4, on average, in these two cases the KWT snake is within some 6 pixels from the ground truth, which is not bad for a snake with no shape memory. The unexpected behavior of the interpolation snakes when the shape prior is not correlated with the image data can be seen in the “Absolute Shape Model” column: 2 the performance decreases with increasing values of the shape weight w In shape' the best case (Gaussian initialization with standard deviation of 10 pixels and no shape prior), the interpolation snake with absolutee shape model has N S D = 9.38%, only marginally better than KWT’S 10.44%. Things change by a few points in favor of interpolation snakes when the “relative” shape prior is used, i.e., when the shape prior is anchored to the image. Another noteworthy fact is that the interpolation snakes have a better performance than KWT’s absolutee best (10.18%) by about 1 percent, even when initialized far from the ground truth: 9.19% and 8.96% NSD when initialized by scaling (0.8), and 20 pixels within, respectively. This means in a real application, the user would have more slack for initialization with interpolation snakes. When the shape prior is used properly (i.e.. contingent. upon the image data), one 2Smaller numbers represent better performance. observes the expected behavior (column 6): for any given initialization method. the performance achieves an optinunn with some amount of shape prior, but increasing the shape weight beyond a certain point is detrimental. Last, but not least, we have to remark that. the parameters of the interpolation snakes can be better tuned forjust about any of the images in the ultrasound database, producing better results. The problem is that the optimal parameters are not the same across all images. One image may require a certain amount of shape prior, while for another. no shape prior would produce the best results. Tables 8.2, 8.3 and 8.4 show the best (among non-optimal) parameters over the entire data set, not. for each individual image. The lack of a uniform set of parameters can only be due, logically, to one of two causes: either our model is wrong (or at least incomplete), or there simply cannot be (for any model) a common set of parameters, optimal for all images. This dichotomy opens up a stream of philosophical questions, so for the time being we shall only confine ourselves with simply pointing out this problem, and not try to find an answer. Figure 8.14: Segmentation of kidney ultrasound using the Interpolation snakes model. Left column: output snake. Right column: ground truth (yellow) and output snake (red). Top row: smallest normalized symmetric difference (NSD) error (3.71%). Mid- dle row: NSD = 7.72%. Bottom row: NSD = 13.57%. Average NSD = 7.60%. z). ( < NSD S 5.52 Figure 8.15: Detected contours. 3.71 160 Figure 8.16: Red: detected contours (same as in Figure 8.15). Yellow: GT. 3.71 S NSD g 5.52(%). 161 Figure 8.18: Red: detected contours (same as in Figure 8.17). Yellow: GT. 5.52 S NSD S 6.65(%). 163 Figure 8.19: Detected contours. 6.65 S NSD < 7.87(%). 164 Figure 8.20: Red: detected contours (same as in Figure 8.19). Yellow: GT. 6.65NSD S 7.87(%). Figure 8.22: Red: detected contours (same as in Figure 8.21). Yellow: GT. 7.89 < NSD S 8.79(%). 167 Figure 8.23: Detected contours. 8.79 S NSD < 9.91(%). 168 Figure 8.24: Red: detected contours (same as in Figure 8.23). Yellow: GT. 8.79 S NSD < 9.91(%). 169 Figure 8.25: Detected contours. 9.91 S NSD S 13.57(%). 170 Figure 8.26: Red: detected contours (same as in Figure 8.25). Yellow: GT. 9.91 S NSD g 13.57(%). 171 8.3 Blind study To confer clinical validity to the interpolation snakes algorithm (Section 6.1) we de- vised a blind evaluation experiment. in addition to the objective comparison against the ground truth presented in the previous sections. Specifically, in this experiment a human expert 3 compares the boundaries de- tected by several algorithms, without knowing which is which. For each of the 117 images used in our tests we draw three types of contours, each contour on a separate copy of the image, as follows: 0 interpolation snakes (I-SNAKE) using the manual four-point model initializa- tion (NSD = 7.95%; see Table 8.2). o the classical snakes (KWT) initialized randomly around the ground truth with a standard deviation of 10 pixels (N SD = 10.44%). 0 the ground truth itself (GT). For each kidney ultrasound image I we have three pairs of images: (II—SNAKEJKWT)’ (II—SNAKEJKWT) and (warJorl- We generate the complete list of all 351 possible pairs, shuffle it and display each pair in turn in a graphical interface, shown in Figure 8.27. The interface never makes reference to the names of the algorithms whose results are being displayed. We built the application trying to minimize the judge’s time and effort. The user can toggle between the top and bottom images selecting this way the better of the two, either by clicking with the mouse, or using the up/ down keys on the keyboard. The thickness of the contours is adjustable and can be controlled by the user with the left / right keys. A thickness of 0 is permitted, in order to allow the user to view the image without any contours drawn. Also, the contour marked as the better of 3Dr. Dorry Segev, Division of Transplantation, Department of Surgery. Johns Hopkins Institu- tions. 172 the two is drawn thicker than the other one, for easy visualization. Once a selection has been made, the user advances to the next pair. When ready, the selections are saved in a plain ASCII file with one row per pair, and with two columns. The winner of each pair is listed in the first column. The task was clearly explained to the human expert. The ease of use of the application did not require any training session prior to the actual test. No time constraints were imposed. File Firs! Sec and 5000101 sonod) sonoZ sonoZZ sonol?) sonnld) sono97 sonoel sonoSS sonoZD sonollZ sonolld f Info 929 733 471 x.y 734 2):: NEWS 13117 SaveTo val-223m: u! gun Figure 8.27: Graphical user interface for the blind study comparison of three types of kidney boundaries (I-SNAKE, KWT and GT). A human expert chose between outputs of three algorithms, displayed two at the time. 173 Tables 8.5 and 8.6 show how many times the I-SNAKE output was preferred over KWT and over GT, res1,)ectively. Somewhat surprisingly, the interpolation snakes won over GT 80 times out of 117 (Table 8.6). Under a null hypothesis that both I—SNAKE and GT are equally likely to win ([2 = 0.5), the probability of observing (due to chance) 80 I-SNAKE outcomes 72. k with n = 117.}; = 80 and p = 0.5 we get b = 000004359, or about 4 in 100,000. We out of 117 is given by a binomial distribution b(k; n, p) = ( ) -p"'—k fink. In our case, are, therefore, justified to reject the null hypothesis because there are only two logical possibilities: 0 either it is true that I-SNAKE and GT are equally likely and the observed outcome of 80 I-SNAKE winners out of 117 total is one of the lucky 4 in 100, 000, 07" o the null hypothesis is false to begin with. Similarly, in the case of I-SNAKE vs. KWT (Table 8.5), the probability of being wrong in rejecting the null hypothesis is even smaller: b(99; 117, 0.5) = 4.862 - 10—15 (zero within machine precision), so the we reject the null hypothesis in that case as well. winner loser KWT I-SNAKE Total KWT 0 18 18 I-SNAKE 99 0 99 Total 99 18 117 Table 8.5: Blind comparison: KWT vs. I-SNAKE The complete results of the blind study are shown in Table 8.7. For each type of contour (GT, KWT and I-SNAKE) there are three rows. To fix the ideas, let us look at the I-SNAKE algorithm as winner. The top row shows how many times I-SNAKE 174 winner loser GT I-SNAKE Total GT 0 37 37 I-SNAKE 80 0 80 Total 80 37 117 Table 8.6: Blind comparison: GT vs. I-SNAKE won over GT and KW T: 80 and 99 times, respectively, making for a total of 179 wins out of a 351 comparisons. The middle and the bottom rows represent the row percentage and the column percentage of the entries in the top row in each group of three. Thus, for instance, I-SNAKE won 179 times out of 351 comparisons (51.00%). Of these wins, 80 were over GT (44.69%) and 99 were over K WT (55.31%). Reading on columns now, I-SNAKE lost a total of only 55 comparisons out of 351 (15.67%) of which 37 times it lost to GT (67.27%) and 18 times to KWT (32.73%). I winner loser J | GT KWT I-SNAKE TotalJ GT 0 82 37 119 row percentage 0.00 68.91 31.09 100.00 column percentage 0.00 45.30 67.27 33.90 KWT 35 0 18 53 row percentage 66.04 0.00 33.96 100.0 column percentage 30.43 0.00 32.73 15.10 I-SNAKE 80 99 0 179 row percentage 44.69 55.31 0.00 100.00 column percentage 69.57 54.70 0.00 51.00 Total 115 181 55 351 row percentage 32.76 51.57 15.67 100.00 column percentage 100.00 100.00 100.00 100.00 Table 8.7: Blind comparison: complete results. On rows, for each of GT, KWT and I- SNAKE categories: frequencies (top), row percentages (middle), column percentages (bottom). 8.4 Experiments on Synthetic Images \Ve leave now the ultrasound domain and present. the tests on synthetic images. Table 8.8 summarizes the numerics of the four algorithms (CV, GAC, KWT and I-SNAKE) each under four different types of noise and under different initializations. It is similar to Tables 8.2, 8.3 and 8.4, and we shall not describe its elements in detail. In general, the results are far better on synthetic images than on ultrasound, despite high levels of noise. In the case of CV, GAC and I-SNAKE, there is a signifi— cant. decrease in the performance under spread noise, than compared with Gaussian, salt-and-pepper and uniform noise, which can be attributed to the fact that all the algorithms have a built-in smoothing filter, which removes to a large extent the gaus— sian, the salt-and—pepper and the uniform noise, but are not as effective on the spread noise. W’hen initialized at some distance from the target boundaries, KWT snakes per- form very poorly, with or without noise, and this should come as no surprise. When initialized randomly, with standard deviation of 10 pixels from the true boundaries, the classical snakes (KWT) perform almost perfectly under all noise conditions, and we explain this based on the proximity to the ground truth more than anything else. Note that initialization at about 10 pixels from the ground truth produced also the best KWT results in ultrasound images as well (but no better than I-SNAKE). This observation reiterates the fact that the apparently good performance of KWT in ultrasound is due not so much to the merits of the algorithm per se, but rather to initialization. The interpolation snakes have better performance than KWT when initialized within 10 pixels from the ground truth, or when using the 4-anchor model initialization method, but also have comparable performance to KWT even when initialized at a significant distance from the ground truth (either at distance 20 pixels from the ground truth, or by a 80% downscaling of the true boundaries). Compare Table 8.8 with Tables 8.2, 8.3 and 8.4). 176 Algorithm Noise Scaling Rand NSD (%) Hausdorff Av. Hausdorff CV gaussian 0.7 3.4 7.83 3.31 8&1) 3.58 8.46 3.54 uniform 3.10 7.91 3.29 spread 9.81 17.69 6.76 GAC gaussian 0.7 5.97 10.47 4.44 s&p 7.84 11.25 5.52 uniform 7.06 10.10 5.08 spread 16.22 22.67 10.60 KW T gaussian 0.7 68.02 86.82 56.09 step 68.11 86.39 56.18 uniform 68.78 87.02 56.94 spread 62.67 85.11 50.66 KWT gaussian 10 1.62 8.09 2. 71 s&p 1.58 8.10 2.69 uniform 1.55 8.01 2.69 spread 1.41 8.03 2.65 I-SNAKE gaussian 0.7 2.16 5.63 2.82 s&p 2.64 7.60 3.00 uniform 2.36 6.78 2.87 spread 5.94 13.01 4.74 Table 8.8: Results of the active contour algorithms on synthetic images, under differ- ent conditions of noise (column 2), and initial conditions (columns 3 and 4). We present next typical images for each of the 5 categories in Table 8.8. See Figures 8.28, 8.29, 8.30, 8.31 and 8.32. In each case, the images are arranged in a 4 x 3 display. On each row there is a different type of noise. The left column shows the initial snake, the middle column shows the final result, and the right column shows also the final result (red) with the ground truth overlaid on top (yellow). 177 Figure 8.28: Chan-Vase model on synthetic images. Left column: initial snake (scaling to 0.7 - GT). Middle column: final snake. Right: final snake (red) and GT (yellow). On rows, top to bottom: Gaussian noise, salt-and—pepper noise. uni— form noise. spread noise. Original images were cropped and scaled down for display purposes, so some noise was lost in the process. 178 Figure 8.29: Geodesic Active Contours model (GAC) on synthetic images. Left column: initial snake (scaling to 0.7 - GT). Middle column: final snake. Right: final snake (red) and GT (yellow). On rows, top to bottom: Gaussian noise, salt-and- pepper noise, uniform noise, spread noise. Original images were cropped and scaled down for display purposes, so some noise was lost in the process. Figure 8.30: Kass-Witkin-Terzopoulos model (KWT) on synthetic images. Left column: initial snake (scaling to 0.7 - GT). Middle column: final snake. Right: final snake (red) and GT (yellow). On rows, top to bottom: Gaussian noise, salt—and- pepper noise, uniform noise, spread noise. Original images were cropped and scaled down for display purposes, so some noise was lost. in the process. 180 Figure 8.31: Kass-Witkin—Terzopoulos model (KWT) on synthetic images. Left column: initial snake (Gaussian randomization with standard deviation 0 = 10 pixels). Middle column: final snake. Right: final snake (red) and GT (yellow). On rows. top to bottom: Gaussian noise. salt-and-pepper noise, uniform noise, spread noise. Original images were cropped and scaled down for display purposes, so some noise was lost in the process. 181 Figure 8.32: Interpolation snakes model (I-SNAKE) on synthetic images. Left column: initial snake (scaling to 0.7 - GT). Middle column: final snake. Right: final snake (red) and (1'! ‘ (yellow). On rows, top to bottom: Gaussian noise. salt—and— pepper noise, uniform noise, spread noise. Original images were cropped and scaled down for display purposes, so some noise was lost in the process. 182 8.5 Border Detection Using 1D Break-Point De- tection. We return in this section to border detection in ultrasound, using this time the 1D break-point detection algorithm from Chapter 7. The process involves the following steps: 1. Manually define a region of interest (ROI), by drawing a circle around the target object, centered roughly at the object center of mass. Move outward along equally spaced rays from the center of the ROI. In our application we used 50 between rays. The intensity profile along any of the rays is obtained by averaging over a sector of a certain angle (user defined). 2. Apply the 1D break point detection algorithm from Chapter 7 with a single break point, along each of the radial intensity profiles (Figure 8.33 (a)). We are only interested low—to-high intensity points (marked with red), as the kidney interior is in general darker than the background. 3. Some of the (correctly) detected break points are not on the target boundary. We try to eliminate them by using a shape model, in several steps. First, fit the shape model to the red points using non-isotropic similarity transforms (translation, rotation, and different scales in the a: and the y direction, but no shear). The measure of fit is given by the chamfer transform on the red points. See Figure 8.33 (b). 4. Define a tubular neighborhood along the fitted model (Figure 8.33 (c)) The width of the tube is proportional to the standard deviation of all the red break points from step 1. Extract. intensity profiles within the tube, along segments normal to the fitted model. 183 5. Re—compute the break points on the newly defined intensity profiles (Figure 8.33 (d)). 6. If need be, repeat steps 3 to 5. 7. Use a smoothing spline [29] to approximate the object boundary, based on the detected break points (Figure 8.33 (e)). We employed a smoothing spline al- gorithm that. automatically determines the breaks and the control points. It only depends on a single parameter called smoothing factor. When the smooth- ing factor is zero, the smoothing spline goes through all the data points, being nothing more than an interpolation spline. As the smoothing factor increases, so does the smoothness of the spline, at. the expense of missing the data points. The results are promising in many cases. Sometimes the fitted model (step 3) is so good that the tubular neighborhood refinement and the smoothing spline are not necessary. When the tube is necessary though, it must be sufficiently wide, to cover completely the target boundary. This condition, in turn, depends on how close the model fitted to the break points (step 3) comes to the target boundary (not to the breakpoints!) If in some regions, the model is too far from the kidney boundary, the tube will miss that region, and no break point will be detected there at step 5. Last, but not least, we have observed experimentally that the smoothing factor of the final spline varies greatly from case to case. It may be possible to solve this latter problem by computing a family of splines over an entire discrete range of the smoothing factor, and by choosing the optimal one, with respect to some optimality criterion (yet. to be determined). 184 Figure 8.33: 2D border detection using radial 1D break-point detection. (a) Break points along 72 rays starting from a user—defined center. The actual intensity profiles not drawn. Red: dark-to—light break point. Blue: light—to—dark break point. (b) Shape model (green) fit to the. red break points. (0) Tube (cyan) across the fitted model (green). (d) Break points on the tube intensity profiles. (e) Smoothing spline on the red points from (d). (f) Smoothing spline (beige) and ground truth (yellow). 185 8.6 Miscellaneous Images We. present here an application of the interpolation snakes framework to real-life images. Since we are not. using any full fledged general purpose object detector, manual initialization of the snake is still required. Specifically, we test the interpolation snakes on eight images of a truck with vary- ing background and occlusion. As with the kidney images, we first build a model. Since this is not a comprehensive test, we only build a fast and frugal model: we simply take as our model the distance map built on a manually traced truck contour in just one of the eight images. The reason this is justified is that the shape variations can be described mostly by affine transforms (no non-linear shape variations, as is the case with kidneys), as the truck is a rigid object, photographed in near profile. The affine transforms are factored out by the registration process that takes place at every iteration during the snake evolution. For the results, refer to Figures 8.34 (initialization) and 8.35 (output). Observe, in the first place, that detection of a light object on a dark background using a clockwise oriented snake requires negative weights. This is being visualized in the images by marking the control points with a red minus sign. In the case of kidneys, the object was generally darker than the background, and the control points were all positive. Each of the bottom four images contain occlusion, which the snake with shape memory can handle successfully. Also, the snake seems remarkably robust, remaining undisturbed by the trees in the background. This is one application which can lead to a spin-off of our work in ultrasound, aimed at the recognition and the tracking of vehicles. 186 Figure 8.34: Interpolation snakes on truck images. Initialization of interpolation snakes. 187 Figure 8.35: Interpolation snakes on truck images. Output of interpolation snakes. 188 Chapter 9 Model Fitting For Fingerprint Classification we present in this chapter a simplified version of the work that we did in the area of fingerprint classification [78]. It was done prior to our work on active contours, so it does not involve interpolation snakes. However, chronologically and conceptually this work is the precursor of our snakes, as it is still based on the fitting of a shape model to an image. The nature of the problem requires that the shape remain roughly unchanged for classification purposes, so the fitting is done via affine transforms, rather than by free-form deformations as in the case of snakes. 9. 1 Problem Definition Fingerprints are biometric characteristics of humans, that consist of ridges and fur- rows at the tips of the fingers. There is evidence [37] that human awareness of these patterns predates Christianity. Since the last century fingerprints have been used systematically for authentication, particularly in criminal investigation. Based on empirical evidence [37, 38] it is widely accepted that fingerprints uniquely identify an individual. Unlike other authentication methods (such as face or voice recognition, 189 which may not distinguish identical twins), fingerprint authentication is unequivo— cal. lVloreover, fingerprints do not. change over time, which gives them an important advantage over other biometric measurements. Lastly, in comparison to other biomet- rics tecl'miques (such as retinal scan and iris scan), fingerprint acquisition is relatively simple and inexpensive. All these characteristics make the use of fingerprints one of the most pervasive methods of authentication, both in the forensic sector as well as in the commercial sector (e.g. access control). Given the astronomical size1 of fingerprint databases, it became apparent that an indexing methodology which would assign fingerprints into a small number of categories was in order. The system developed at the end of the 19—th century by Sir Edward Henry is the basis of modern automatic fingerprint identification systems (AFIS) in the majority of English-speaking countries, and serves the purpose of what is usually termed as (primary) classification of fingerprints [13, 51, 62, 60]. The system adopted by the FBI contains eight basic fingerprint patterns [37, 38]: arch, tented arch, left loop, right loop, whorl, central pocket loop, double loop and accidental. We shall confine ourselves here to only four major classes of fingerprints 2: arch (A), left loop (L), right loop (R) and whorl (W). See Figure 9.1. There have been several methods proposed for fingerprint classification, including (but not limited to) syntactic methods [90], singularity based methods [62, 64], and neural network methods [12]. For a more detailed literature review see for instance [13, 51, 60] and the references therein. We address the problem of automatic fingerprint classification into one of the 4 types (A, L, R, W) using a novel method which we term kernel fitting, and provide detailed classification results. 1There were 810,188 records (containing 10 fingerprints each) upon the formation of the Identi- fication Division of the FBI in 1924 [37]. Currently there are 226 million cards on file, representing about 79 million individuals [36]. 2However, the method we propose can be readily extended to more classes of fingerprints. 190 (b) Left loop (3) Arch (d) Whorl (c) Right loop Figure 9.1: Four major classes of fingerprints. 191 9.2 Mathematical model 9.2.1 Flow field We model the ridges of a. fingerprint as curves in the plane, and define the flow field as the direction of the tangent to these curves at each point 3. Several methods have been proposed in the literature for the extraction of the flow field [17, 63, 64]. We employ the method developed by Hong [59]. See Figure 9.2. Thus, conceptually, we think of the flow field as a unit vector field in the plane, or, equivalently, as its argument t3 6 [0, 7r) 4 (the angle between the direction of the vector and the x—axis). 9.2.2 Fingerprint kernels We define class-specific kernel curves based on the original definition of fingerprint classes of Sir Edward R. Henry [49]. We model mathematically Sir Henry’s definitions of fingerprint classes as follows. Let the kernel of the whorl be the unit circle. The kernels of the other classes we define empirically, using splines (see Figure 9.3). For each class we learn a kernel model by performing a generalized Procrustes analysis (Section 4.1) on 20 manually traced splines with homologous control points. 9.2.3 Kernel fitting Suppose we have a smooth vector field V defined over some region in the plane 1R2. Let 7(t) = ($(t), y(t)) be a parametric curve in R2, let "7 be the tangent to 7, and 0 its argument. As before, let [3 be the argument of V. See Figure 9.5. Define an energy functional that captures the difference between the direction of ‘y and that of the vector field V at. the point 7(t) by 3Except at a small number of points, where the curve is not smooth (e.g. bifurcations). 4There is no canonical orientation of the ridges, so 1'3 6 [0, 7r) rather than [0, 2n). 192 ——'—h-\_“q‘\‘—_“‘—_i;—“'\“‘-.- -_.__—_.\- -\.\-—_\:—-— 53:5 ——Jw~¢-r<-—;~..\:~;x.;x.:»;-~..'-~.:‘;-—4--~'—-~‘-~\ —-'—=«\*-4--:\\;-._a,:x:~._\,_x_\:-—:—;-—;-—.\ _—-'-=-=-:-—:~\\:~:x_\\:~.;\\.;\\:~¢~:-¢\ ——“'—*~>«b<\:\:~><\:\:s:~:s:\#<\>\fxfit —-'--‘-—:-—¢~.r\.‘\‘--~,\.‘\;\\\;\;~-.~\x‘xg‘x.‘ “fives—“\eoxrx.\-\\\.‘-g\\._\.\\ [\ .\ \~.\'\ "'\\\.\\‘ «'r/ , _"“"<‘~-‘\-‘\‘~.\‘~-..\..\\\v—._\,\\;~9.;- 7/ f'; /- ’"fiwwxx\\\x\xxx\\x\x\ 77f: . X if ?f , “““HX\\\\\K\\\\X\\\\ L' /’ " '7' f' r / h/x‘r‘" —*4--\\‘-\‘\‘\\\ \.\ \.\\\\~.‘~~.\\_ r? /'/ ' I," ' / ,ti 2’.- ,.' , / j 1" /,- // ////,_rc_- “\\__-.\-\h\\\ \\ \\ \\ ~\\\\\\.,\..\\\N\~\-\\‘\ 777 3/ «"1/ '1' f / ‘ M “ 17719 7 ll’Il We drop the original internal energy 1 E... = [O m + aids since the interpolation splines are smooth by construction. We replace it with a term (still denoted by Eint) that keeps the points equidistant on the curve by penalizing for deviations from the arclength parametrization: I E... = f | I“. l —1Ids , This addresses the problem of arclength parametrization effectively and in a principial fashion, quietly treated as “implementation detail” elsewhere. Regarding shape learning, we propose a fast and natural alignment method for simple connected curves, that does not assume or require point correspondences. 204 We term this method semiares alignment (Section 3.4). For a discretized plane curve 7, we define the semiaxes to be its modes of variation. that. is, the eigendi- rections of the covariance matrix computed on the (finite) set. of points resulting from the discretization of ’7. We introduce the notion of average chamfer transform (Section 6.2.3), and use it. as a shape model, by measuring the simultaneous proximity of a (test) curve, to an entire set of training contours, hence, implicitly, shape similarity. We propose an algorithm for segmenting a 1-dimensional piecewise linear signal perturbed by noise, unrelated to active contours (Chapter 7). This algorithm can be used as an edge detector, by applying it along rays starting from the center of an object of interest, but we predict it will have important standalone applications. Currently, the number of constant pieces in the input signal must be specified a priori. We do not have an automatic way of detecting this num- ber, but we suspect that there might be a way to do so by using a minimum description length (MDL) criterion. We propose a method for fingerprint classification using again a shape model (Section 9 and [78]) and demonstrate good performance in a first implementa- tion. 10.2 Future Work The insight we have gained through this research raises more questions than it pro- vides answers. We have some understanding on what shape is, and adding one extra term to the energy functional of a boundary seeking contour is but one way to incor— porate the shape memory into the model. But this is by no means the only way, and we believe there may be better ways, which we certainly plan to explore. 205 First and foremost, we intend to pursue the boundary detector based on the 1- dirnensional break-point. detection algorithm (Chapter 7), as we believe this approach would have better chances for success. The idea would be to detect. the jumps in the intensity profiles along rays starting out. from the center of the region of interest, and keep only the strongest. and the most reliable boundary points. We would be left with disparate segments along the target boundaries, which somehow have to be interpolated to encompass the entire object of interest. A shape model would play an important. role. and we anticipate having to deal with the problem of partial matching (which parts on the model correspond to the detected segments along the boundaries). We have already done preliminary experiments using this approach (Section 8.5), but more work and a systematic evaluation of the performance is needed. Secondly, the work on the curvature-based shape representation (Section 2.5) needs to be completed by assessing the discriminatory power of the induced shape similarity metric, in comparison to other known similarity metrics. There are a number of extensions of the interpolation snakes framework proposed for border detection that can be pursued. 0 Apply to other ultrasound problems, specifically left ventricle segmentation. o Extend to 3D and 2D+time (tracking). Motion video consists of a sequence of 2D frames that are perceived as continuous motion by the human eye when played at a sufficiently high rate. Detecting an object in one of the image and following it in time accross the sequence of frames is the problem of object tracking. When these frames are viewed simply as a stack of 2D images, we are talking about detecting an object in 3—dimensional data. In the case of a video clip, the three dimensions are two spatial dimensions (a: and y), with time as the third dimension. One application that doctors would like to have, we are told1 would be to have the contour of an organ displayed at. all times, 1According to Dr. Dorry Segev. Division of Transplantation, Department of Surgery, Johns 206 in real time. during the actpiisition of an ultrasound clip as the transducer 11'1(,)ves. Needless to say, there are numerous 3D applications in which all three dimensions are spatial. We intend to explore the applicability of our boundary detection methods to 3D data. 0 Determine to what extent the globally optimal parameters reported in Tables 8.2, etc. are suboptimal with respect to each individual image. 0 Explore the }_)ossibility of application of interpolation snakes to detection and tracking of objects (e.g. cars) in real-life images, prompted by the promising (though insufficiently tested) results on the truck images (Section 8.6). Ideally, the system would be fully automatic, which implies that a. selective visual atten- tion module and a recognizer be used in conjunction with the border detector. Hopkins Institutions. 207 APPENDICES Appendix A Landmark registration by linear transforms A.1 Scaling transforms Proposition A.1.1 (Scaling registration with known correspondences) If T(z) = /\z + r minimizes the squared error energy E(m) = Zara-p.512 j : ZOVZJ'+T—1L‘j)°()\3j+T—u.?j). (A1) J then its parameters are given by I I < 11.7.2 > )x - W (A2) 7' = —)\30+u«‘0 (A3) 209 win—ire < > denotes the compler inner product on (C, 1', :0 and 'er are the centroids ofz and u' respectively]. and z, = (21 — 20. . . . 3n — :0) and a" 2 (11‘1 —11'(). . . . u'n — 11'0) (ll'ff the 2670-7116111). C?l'l"’l"€3. Proof: : To minimize this function. one would normally compute the partial derivatives with respect to all four real parameters of the transformation T. Let us recall from calculus. that if C = .1: + ig is a complex variable, and Z = .L' — iy is its conjugate, then one can compute partial derivatives directly with respect to C and 2, rather than with respect. to :L' and y. These differential operators are related by O 1 l 'I ()C 2 §(()_1: — I()(/) 1 1 a‘ 'I ()5 = E(th + (03/) Therefore, in comrmting the gradient of E, we must compute 0A E, (3)—\E, 87E, and 07—.E. However, smce for any complex function f (C ) we have ()5 f = ()C f , and because E is a real valued function, we only have to compute two (rather than four) sets of derivatives, e.g. de and ()fE. axe = 2on- + T — 2.5) -::_J- (A.4) .7 871E = E(Azj + r — u'j) (A.5) i We set these two equations to 0 and solve for r and A. First, from Equation (A.5) T = -—A - £50 + 1.1’0 (A6) where z = - z: n is the mean of the z..-’s and similarlv. UP is the mean of the 0 J J .1 ~ 0 210 j s. 'I‘his equation can be re-written as T(:()) = “'0’ i.e. the optimal trai’isformation Inaps the. center of 3 onto the center of w. U.’ To find A, we substitute 7‘ in Equation (A4) to get ('JXE = Z:(A~zJ-+r—zej)z J = Z()\-Zj+T—U’j)(ZJ—ZO) J = Zn (.2, — :0) — (“J — ..0))(;~j — :0) J _ /2_ 17 _ AZM “J“J J J = 0 I I . where z- = z: — z and at = w- — u' . We then )‘et. J J 0 J J 0 a A.2 Affine transforms In this case, there are no constraints on the matrix A: In particular, there is no convenient representatitm of the transform T using com- plex numbers, so all computations will be carried out. over the real numbers. Let. X and Y represent the means of the Xi’s and Yis respectively. Let also 211 ~ 1. . — [3 JV; 2 )Cl- — X l:__f (115,11) and Y; = i, — Y (62f (31.21;). We collect all the .117 and y coordinates of the \;s in two n-dimensional column vectors U, = (11,1415 . . . umT and V, = (imiré . . .141)? Similarly, for the Yl-Is we set 2’ = (35.2% . . . 2;,)T and W", : (iv-3.113,2 . . .11';,)T. Proposition A.2.1 (Affine registration with known correspondences) With the above notations, the coeflicients a, b, c, d and r of the afline transform T(X) = A - X + 7’ that minimizes the squared error energy 1 - E(T) = E ZlTW) — o,|2. (A.8) i are given by —1 a b |U’|2 (U’, V’)) (Z’, U’) (w’, U’) c d IV’I2 i andrz—(Y—A-X). Proof: In this case, E = E(a,b, c, (1, 7113.731). Taking the gradient in Equation (A.8), we get. VE = Z< Wee). T(X.) — Ya First, by setting the derivatives with respect to r]; and to Ty equal to zero, we get, as in the scaling registration case, that T(Y) = 7, which is equivalent to r = —(l7 — A - X). The rest of the proposition follows by setting to zero the deriva- tives with respect to a, b, c and d. C] We conclude this appendix with the observation that minimization of the squared error energy (A8) reduces mathematically to a least. squares problem, linear in the coefficients of the unknown transformation T (either scaling or affine). If we identify 212 T with the column vector formed by its coefficients, then we must. solve a system of t he form Y 2.11-'1’ in the least squared sense. This system is nothing but the matrix form of VE = 0. If ."i/ were a square, invertible matrix, then the solution would be simply T = “—1 ~Y. In general though, the least squares solution is given by T 2 Mi - Y where All is the Moore—Penrose pseudoinverse of ll] . Computation of ll] is usually done using the singular value decomposition (SVD) of M , which, given its popularity in the numerical analysis community, can be used as subroutine. However, the special purpose, closed form formulas for T that we derived here are far more elementary, stable and fast. 213 Appendix B N umerics of cubic interpolation splines. B.1 Representation of a cubic polynomial in terms of second derivatives. Con‘iputation of a cubic interpolation spline is based on the following lemma 1: Lemma 8.1.1 Ify = 7(t) is a cubic polynomial, then on any interval [a, b] 7 can be erpressed uniquely as (B.1) where b-t A: . b—a' B=1—A ll\-Ientioned in "Recipes in C” [88] without proof. 214 _V‘Q'?" . $ (1': (.43 — Aw) — “)3‘ o ___ (o3 — B)(b — (1)2 (i 6 Remark B.1.1 The sigmficance of this lemma is as follows: if we take a.b to J .-. 1",, - - -. — C. .- - — . a. y) A. C. — ,, l)(. constcutiie bimlpoints. say. a — g, and b — 51+1, thm, smut, ,(gl) — y, and 7(51- +1) 2 llj+1 arc control points (hence given ) we can compute 7(t) for all t 6 [547.6141]. provided we know the second derimti'ucs ’7” at the break points. Proof: \Vith A and 13 as in the statement of the Lemma let 11s write 7(t) = A“,(a) + 137(1)) + 2(t). Clearly, I. 3 must be a cubic polynomial (as 7 is of degree 3 and A and B are of degree 1). 2. :(a) = () and 2(b) 2 0. and 3. z”(a) = 7”(a) and z”(b) = 7”(b). Since 2(a) = 0, we shall seek z of the form 2(t) = A(t — a)3 + ,u(t —— (1)2 + I/(t — a) Then the second (’lerivative is and by property 3 above, we get ,\ = 7”“) - 7’70) fl = “if/((1) (B 2) 6(b —- a) 2 I From 2(b) = O, we also get II II A j 2. V: —(b—a)' I (0:) r (a) (B3) With these coefficients. 2 Z A,,//(()i)()b-_“:l’;(0> (I, _ (1)3 + ’7”2(a) (t __ “)2 __ (b _ a) . 7”(b) 1227/7!le _ a) = ("W—3,133 E—i— - H "W 2 i ‘25” [$2113 H We 2 (b ‘6“)2(—B3 + 3132 -— 28)7”(a) + (Lg—@3133 — B)7”(b) : (” ‘6”l2m3 _ far/”(a + “’ ‘6“)2(B3 — B)~/’(b) [3 Corollary B.1.1 With the same notations as in Lemma B.1.1, the first and second derivatives are A _ la, 2_ 2_ 7’