‘V'ILJ'

 

 

~.u.
<uv
"w

m‘

. 1
91$“..2iz; '\ 3 .
2!.ik'ahhitu i3;

 

 

 

This is to certify that the
dissertation entitled

INTERPOLATION SNAKES WITH SHAPE PRIOR FOR
BOUNDARY DETECTION IN NOISY IMAGES

presented by

SILVIU D. MINUT

has been aocepted towards fulfillment
of the requirements for the

 

 

 

Ph. D. degree in COMPUTER SCIENCE AND
ENGINEERING
--~- A 1
«WE

 

 

Major Professor’s Signature

22 Dec. 2004,

 

Date

MSU is an Afﬁrmative Action/Equal Opportunity Institution

 

 

INTERPOLATION SNAKES WITH SHAPE PRIOR
FOR BOUNDARY DETECTION IN NOISY IMAGES

By

Silviu D. l\»’Iinut

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Computer Science and Engineering

2007

ABSTRACT

INTERPOLATION SNAKES WITH SHAPE PRIOR.
FOR BOUNDARY DETECTION IN NOISY IMAGES

By
Silviu D. Minut

We develop an active contour algorithm (I-SNAKE) designed to cope well with
the noise, specularity and incomplete visual support typically found in ultrasound
images. To impose smoothness. we model mathematically our active contour as an
interpolation spline. This model also allows a user to set hard Spatial constraints on
the snake by ﬁxing one or more control points of the spline. We introduce a shape
prior term in the snake energy functional that acts as shape memory and prevents
excessive deviations from a known Shape model computed from shape training sam-
ples. This shape model also supports operation with occlusion and loss of contrast.
Extensive tests on 117 ultrasound images comparing the similarity of the detected
boundaries to manually traced contours (ground truth) showed that. the interpolation
snake algorithm (I-SNAKE) did significantly better than three other active contour
algorithms: the original Kass-Witkin-Terzopoulos snakes (KWT), the Geodesic Ac-
tive Contour model (GAC) and the Chan-Vese model (CV). l\="Ioreover, for clinical
validity a subjective evaluation by a human expert. was performed via a blind study
that compared I-SNAKE with the ground truth (GT) or with the KWT model. The
I-SNAKE boundary was preferred in 80 out of 117 comparisons with CT, and in 99
out of 117 with KWT. Other example results are given in several different image

domains.

© Copyright 2007 by Silviu D. lV'llllllt
All Rights Reserved

 

TO Ana and Aurelia

iv

ACKNOWLEDGMENTS

First and foremost I would like to express my deep gratitude and respect to my
adviser, Dr. George C. Stockman — a man of great stature, patience and understanding
and a true gentleman, who supported me. in more ways than one. Fervent academic
discussions contributed to my understanding of different. aspects of computer vision.
and led to a number of ideas materialized in this thesis. Literally, this document.
would not exist without the support of Dr. Stockman.

I also want to thank the other members of the committee, Dr. John VVeng, Dr.
Robert McGough and Dr. Charles MacCluer for their invaluable expertise and for
their unconditional support. To me, this thesis marks the end of one period in my life,
but more importantly, it represents a new beginning for the many research projects
that have yet to be pursued. Stronger than the words that I can find for expressing
my gratitude is the promise and the hope for future collaboration on these projects,
with all the committee members.

I am grateful to the faculty and student members of the IGERT program in Cog-
nitive Science at MSU, for the many challenging and intellectually stimulating discus—
sions. I am particularly indebted to Dr. Fred C. Dyer from the Zoology Department
at Michigan State University and to Dr. John M. Henderson from the Department
of Psychology (now with the University of Edinburgh) — two men with a vision, and
with broad scientiﬁc horizons, who Offered me ﬁnancial support under the IGERT
grant, and who helped me see well beyond computer vision.

I am grateful to Drs. Bari Olivier and Augusta Pelosi, from the Department of

Small Animal Clinical Sciences at. MSU for providing a database of cardiac ultrasound

images.

I want to acknowledge Kitware Inc. - the producer of the Insight Toolkit (ITK)
- for making available their wonderful. open-source software. I used their code on
several occasions in this thesis.

Special thanks go to Dr. Dorry Segev. from the Division of 'I‘rzms1.)1antation,
Department of Surgery at .Iolms Hopkins Medical Institutions and to his wife Dr.
Sonnner Gentry. from the US. Naval Academy. for their support and generosity. Dr.
Segev took an interest in my work. and prrwided me with a substantial database of
kidney ultrasound images - the centerpiece of the entire thesis. Furthermore, Dr.
Segev took invaluable time in marmally tracing the kidney contours in all images in
the database - the. basis for the evaluation process of the quality of my work. Dr.
Segev graciously agreed. and took part as a judge in a blind study of the performance
of several border detection algorithms - an experiment which, hopefully, will confer
clinical validity to my work. In short, Dr. Segev provided the raw material for this
thesis, and his medical expertise, without which this thesis would have not been
completed. His recognition of the results brings the hope that the. ideas in this thesis
may materialize into a real world application and gives meaning to my work.

Last, but not least, I must thank my wife for standing by my side for better and
for worse, for putting up with a mad scientist all this time, while raising a family and
developing her own career at the US Naval Academy. Her achim'ements far exceed
mine, and I can only hope that the completion of this thesis will enable me. to do all

the things that she does.

vi

TABLE OF CONTENTS

LIST OF TABLES
LIST OF FIGURES

1 Introduction
1.1 To model, or not to model:

A personal perspective ................
1.2 Contrilmtions and Overview of the Thesis .......

I Shape Learning

2 Shape Representation

2.1 Shape ...........................
2.1.1 Classes of 2D transformations .............
2.2 Explicit representation of shape .............
2.2.1 Parametrization .....................
2.2.2 Arclength parametrization ...............
2.2.3 Curvature ........................
2.3 Cubic spline approximation. ...............

2.3.1 Arclength parametrization of an interpolation spline

2.4 Implicit representation ..................
2.4.1 Distance map representation of shape .........
2.5 Curvature-based representation of shape ........

3 Shape Alignment

3.1 Curve registration ..............................
3.2 Registration Of parameterized curves ....................
3.2.1 Known correspondences ..........................
3.2.2 Unknown correspondences .........................
3.3 Distance map registration ..........................
3.4 Semiaxes registration .............................
3.5 Summary ...................................

4 Model Learning

4.1 The Procrustes Model ............................
4.2 Average Chamfer Transform Model .....................

vii

oooooooooo

..........

0000000000

nnnnnnnnnn

xi

8

11
13
14
15
18
19
26
27
29
31

39
39
42
43
48
51
54
58
60
61
63

II Shape Detection

5 Active Contours

5.1 Variational Models ..................
5.2 Geometric Models ..................
5.2.1 Geodesic Active Contours .............

uuuuuuuuuuuu

oooooooooooo

nnnnnnnnnnnn

5.2.2 The Chan-Vese Model and the l\Iumford-Shah Model ..........

5.3 Parametric Snakes ..................

6 Interpolation Snakes

6.1 Interpolation snakes .................
6.2 The Energy Functional ................
6.2.1 Image energy ....................
6.2.2 Arclength energy ..................
6.2.3 Shape energy ....................
6.3 Conclusion to Chapter 6 ...............

7 A Different Direction: Break Point Detection

7.1 One—Dimensional Break Detection ..........

III Applications

8 Experimental Results

8.1 Introduction ......................
8.1.1 Performance h-Ieasures ...............
8.1.2 Algorithms .....................
8.1.3 Data Sets ......................
8.1.4 Snake Initialization .................
8.1.5 Parameters .....................
8.2 Ultrasound Results ..................
8.2.1 The Chan-Vese model ...............
8.2.2 The Geodesic Active Contour Model (GAC) . . .
8.2.3 The Kass-Witkin-Terzopoulos Model (KWT) . .
8.2.4 The Interpolation Snakes Model (I-SNAKE) . . .
8.3 Blind study ......................
8.4 Experiments on Synthetic Images ..........
8.5 Border Detection Using 1D Break-Point Detection.
8.6 Miscellaneous Images .................

9 Model Fitting For Fingerprint Classiﬁcation

9.1 Problem Deﬁnition ..................
9.2 Mathematical model .................
9.2.1 Flow ﬁeld ......................
9.2.2 Fingerprint kernels .............................
9.2.3 Kernel ﬁtting ....................

viii

............

nnnnnnnnnnnn

oooooooooooo

------------

oooooooooooo

uuuuuuuuuuuu

nnnnnnnnnnnn

uuuuuuuuuuuu

------------

------------

oooooooooooo

nnnnnnnnnnnn

............

 

69

70
70
73
76
78
81

86
86
87
90
97
100
105

107
107

9.3 Experimental Results with Fingerprints ................... 198

10 Conclusions and Future Work 201
10.1 Contribution ................................. 203
10.2 Future Work ................................. 205
APPENDICES 208
A Landmark registration by linear transforms 209
Al Scaling transforms .............................. 209
A2 Afﬁne transforms ............................... 211
B Numerics Of cubic interpolation splines. 214
8.1 Representation of a cubic polynomial in terms of second (lt’l‘lVEtIlVGS. . . . 214
B2 Computation of 7’” .............................. 217

ix

8.1

8.2

8.3

8.4

8.5
8.6
8.7

8.8

9.1

9.2

LIST OF TABLES

Overall performance Of the four active contour models tested on ultrasmmd
images ....................................

Interpolation snakes vs. KW T: normalized synunetric difference (N SD).
Smaller N SD means better performance. ................

Interpolation snakes vs. KWT: Hausdorff distance (H). Sn‘ialler H means
better performance .............................

Interpolation snakes vs. KWT: Average Hausdorff distance (AH). Smaller
AH means better performance .......................

Blind comparison: KWT vs. I-SNAKE ...................
Blind comparison: GT vs. I-SNAKE ....................

Blind comparison: complete results. On rows, for each of GT, KWT
and I-SNAKE categories: frequencies (top), row percentages (middle),
column percentages (bottom). ......................

Results Of the active contour algorithms on synthetic images, under dif-
ferent conditions Of noise (column 2), and initial conditions (columns
3 and 4) ...................................

Comparison between various ﬁngerprint. classiﬁcation methods. (*) On
2000 ﬁngerprints, with a rejection rate of 1.75%. ...........

The confusion matrix. ............................

136

174
175

175

177

2.1

2.2

2.3

2.4

2.5

2.6
2.7

2.8

2.9

LIST OF FIGURES

Images in this dissm'tation are presented in color.

Different views of the same Object produce similar. but not identical shapes.

Linear transforms. Blue: original shape. Red: linear transformations of the
blue shape: rigid (rotation + translation), scaling (preserves shape up to
scale), afﬁne (introduces shear and non-isotropic scale, but still preserves
parallelism) and projective (only preserves colinearity). . . . . ......

Two parametrizations of the unit circle. (a) .1: = (t2 —1)/(t“2 +1).y = 2t/(t‘2 +1).
(b) Re-parametrization by arclength: 1' : cos 5, y : sins. . . . ......

3D view of a 2D contour (blue) with its curvature (green) drawn on top
of the contour. . . . . . . . .......................

Contours marked by a human expert 011 kidney ultrasound. Top row:
markers (two different patients). Observe the sparseness, lack of
anatomical landmarks, non-uniformity and the different number of
markers across patients. Bottom row: corresponding kidney contour
represented as a dense, cubic interpolation spline determined by the
markers, parameterized by arclength (equally Spaced points). Both
splines are discretized using the same number of points (150).

Detail from Figure 2.5: patient one (left column).
Detail from Figure 2.5: patient two (right column). . .....

Cubic interpolation Spline (blue). Red: control points. Black: break
points. The spline on the left has equidistant breaks, and is not
parametrized by arclength (the density is higher near points of high
curvature). The spline on the right is built on the same control points
as the one on the left. Choosing the breaks (,3 by E, = 517—1 + Iyli_1y,jl
produces an arclength parameterized spline. . .............

Implicit (level sets) representation of curves. Given a smooth function
2 = f(:1:, y), each constant c deﬁnes implicitly a curve c = f(.1:,y).

2.10 Distance map representation of a plane curve. (a) Explicit representatimi

as a (discrete) parametrized curve. (b) Distance map representation.
(0) Signed distance map representation. . . . . . . . . ........

xi

15

20

21
22

23

27

28

2.11 Moving along a discrete, equally spaced curve '7 only requires to know the

angle between consecutive segments.

2.12 Two classes of shapes. and the distributions of inter and intra. class cur-

3.1

3.2

3.3

4.1

4.2

4.3

vature distances. (a) Circular class: 30 samples (blue) normally dis-
tributed around a user-deﬁned mean (red). (b) “Bow-tie” class: 30
samples (green) normally distributed around a. user-deﬁned mean (red).
(c) Distributions of curvature distances d("}’.l".6j). Blue: both 7,: and
6: are from the circular class. Green: both 7,- and 6- are from the
bow-tie class. Red: 7,: is circular, and 6: is a bow-tie sliape. Note the
separability of each of the within-class distributions (the blue and red
distributions) from the intra class distribution of distances (green).

(a) Red: source points. Blue: intended target. (1)) Red: the source points from
(a) transformed by a TPS. They overlap exactly over the target points (blue
points in (a)).

(a),(b): Distance map registration with scaling transforms. The measure of ﬁt
is E = 1563. (c), ((1): Distance map registration with affine transforms.
Measure of ﬁt: E = 989. Left column: unregistered. Right: registered.
Blue: source. Red: source.

Semiaxes alignment Of two contours. (a) and (b): Two contours and their
corresponding eigenvectors drawn at their respective centers. Each
eigenvector is proportional to its own eigenvalue. (c) Alignment. The
blue contour (source) was brought over the red contour (target), by
aligning the eigenvectors. .

Procrustes model. (a) Training samples (blue) with landmarks (red). (b)
Registered training samples (blue) and their mean (purple).

Semiaxes registration of training samples. It. does not assume or re-
quire known point correspondences. (a) Unregistered training shapes
(green), and randomly chosen target (blue). (b) Registered training
shapes (red). . .

Two views of the average chamfer transform C, and the base curve It
(purple). The green curve is the trace of the bottom of C, and [I is its
projection onto the my plane.

xii

38

49

54

57

63

68

6.3 Image energy term at a Single point: E

Sample ultrasound images. (a) and (b) heart left ventricle (dog). (c) and

((1) kidney (human). The green curve in (b) and (d) is the true bound—
ary (ground truth) and represents the goal of any border detection al-
gorithm. Observe in the clear images (a) and (c), that the edges have
non-uniform intensity, and are discontinuous. In particular. in (b), the
apex Of the ventricle (left. side of the green curve) is some 10 — 15 pixels
Off the visible boundary (misleading edges), passing through a region of
incomplete data. Similar considerations are valid for the kidney images

(c) and (d).

5.2 Endocardium detection in an echocartliogram (dog heart). Segmentation

using the Geodesic Active Contour model. Note the ruggedness and the
leakage through regions Of low gradient, while stopping at extraneous
anatomical structures inside the ventricle.

Spatial hard constraints on an active contour in cardiac ultrasound. These

are the red points in (c) and (d). The two red points on the right are
the valve base points and the single point on the left is the apex. The
light matter between the two valve points is the actual valve (mitral
valve), in a semi-Open position. Since in any frame of the heart cycle
the valve can be anywhere between fully open, to fully closed, it. makes
sense to detect only the actual heart. wall (green curve), which is an
open contour.

Interpolation snake. The control points are the red and blue markers. A

blue circle (digit 0) is a ﬁxed control point (hard constraint). A red
plus sign, is a variable control point.

A!
—w - ( VI'L, Iii/Tl). Red:
I
active contour 7 and tangent '7'. Black: target Object. Blue: gradient
VI and rotation by +90O (note that in image coordinates, positive
angles are measured clockwise). If VI points to the left of '7, then
( VI'L, ‘7') > 0, and the dot product is largest when V] J. 1. There-
fore, a weight w > 0 is needed in this case. Reversing the orientation
on 7 would require a weight w < 0.

img 2

xiii

83

88

89

91

6.4

6.5

6.6

6.7

6.8

Interpolation snake. Orientation along the snake is from top to bottom.

Blue and red: control points. Green: arbitrary points on the snake.
The weights of each of the control points are indicated by the shape
(and color) of the marker: a blue circle means at = 0 (hard constraint),
whereas a. red d: means w = i1. Despite starting out on an edge, the
particular weights guide the snake towards other edges, as follows: w <
0 selects dark: ——+ light. gradient, whereas at > 0 selects light ——> dark.
gradient, as we move along the curve with the chosen orientation. The
points with w = 0 (blue circles) remain fixed during the ﬁtting process,
acting as hard constraints. ........................

Computation of the image gradient. ....................

Integral map. The integral of I over the reet angle A B CD can be computed

by Equation (6.3), knowing the integrals of I over the four rectangles
with the lower left corner at the origin 0, and the upper left corner at.
A, B, C and D. ..............................

Multi-scale, multiple-component interpolation snake. (a) 20% salt-and-

pepper image; (b) Initial snake; (c) Coarse scale ﬁt: 30 x 60 rectangles
for computation of VI. ((1) Fine scale ﬁt: 10 x 20 rectangles. (e) Large
scale gradient used in (c) (:1: direction only); (f) Small scale gradient
used in (d) (a: direction only); ......................

The need for arclength parametrization. Green: interpolation snake (con-

trol points omitted). Yellow: ground truth. (a) Initial snake (arclength
parametrization). (b) Manually drawn border of a human kidney. (c)
Snake fitted using image energy only. The points do not remain equidis-
tant, and tend to accumulate along strong boundary segments, pro-
ducing an artiﬁcially good score (E = Eimg = -—44.7159), better even
than the ground truth score (E = Eimg = —16.4465), but a poor ﬁt.
Average Hausdorff distance from ground truth: 7.68 pixels. ((1) Same
as (c), overlaid over the ground truth. (e) Snake ﬁtted using both the
image energy Eimg and the arclength energy Earc. Average Hausdorff
distance from ground truth: 3.60 pixels. (f) Same as (c), overlaid over
the ground truth. ............................

6.9 Squashed average chamfer transform, used as the shape model with inter-

6.10 Effect of the shape energy E

polation snakes. Compare to Figure 4.3 .................

shape: (a) Clear kidney image. (b) Occluded
kidney, and initial snake (green). (0) Final snake with no shape energy.

((1) Final snake with Shape energy. ...................

xiv

93

93

94

96

99

101

6.11 Limitations of the shape energy. (a) Initial snake. (1)) Ground truth.
(C) Final snake without shape energy. ((1) Comparison of (c) with the
ground truth. (e) Final snake with. shape energy. (f) Comparison of
(e) with the ground truth. ........................

7.1 Boundary detection by ﬁnding a break point. in an intensity proﬁle. across
the boundary. ..............................

7.2 Break point detection of a randomly generated signal. Blue: noisy data
y = y HT+77 with 1000 points and 4 segments. Red: piecewise constant
function yH,T used to generate y (ground truth). Green: denoised sig—
nal. The noise 7] was Gaussian with zero mean and standard deviation

= 20. The true break points were T = (0, 169, 877, 924, 1000) and the
true weights vector was H = (7.17897, 96.6553, —66.3123, —24.9187).
The denoised signal has breaks identical to the true breaks, and weights
H" = (9.41748, 97.0672, —68.4484, —27.1419) ...............

7.3 Break point detection along an intensity proﬁle in an ultrasound image.
(a) We move along the yellow segment from the top green end towards
the red end. The detected interior points are marked with cyan (high
to low) and magenta (low to high). (b) Blue: 1D intensity proﬁle along
the yellow segment from (a). Green: segmented signal. ........

8.1 [a] Kidney ultrasound. [b] true boundaries (GT) manually traced by a
human expert ................................

8.2 Synthetic images. (a) Clusters of control points. There are 30 points at
each site, drawn from a Gaussian distribution of a user-deﬁned stan—
dard deviation 0. (b) Interpolation splines using the control points
from (a). (c)-(f) Binary images obtained as the interior of the ﬁrst.
four splines in (b). ............................

8.3 Various types of noise. (a) Clear image. (b) Gaussian noise with standard
deviation 0 = 30. (c) Uniform noise with probability p = 0.3. (d)
Salt-and-pepper noise with probability p = 0.3. (3) Spread noise with
standard deviation 0 = 30 .........................

XV

104

108

116

118

126

127

130

8.4

8.5

8.6

8.7

8.8

8.9

8.10

8.11

8.12

8.13

Different types of initializatitm. Yellow: ground truth (CT). Red: initial
snake. (a) Clear image. The four points marked with a cross were
deﬁned by the sonographer when the image was acquired. (b) Scaling
initialization by a factor Of 0.7. (c) Gaussian randomization with a
standard deviation 0 = 10 pixels. ((1) Gaussian randomization with
standard deviation 0 = 20 pixels. (e) Distance initialization at distance
d = 20 pixels inside the ground truth. (f) Model initialization. The
red curve is the image of a template kidney (not shown) via. a TPS
transform (Section 3.2.1) that maps four pre-deﬁned anchors on the
model (not. shown) onto the four cyan anchors in the image (user-

defined). .................................

Segmentation of kidney ultrasound using the C han-Vese model. Left col-
umn: binary segmentation of the region of interest. deﬁned around the
ground truth. Right column: ground truth (yellow) and traced bound—
ary between the segmented regions on the left (red). Top row: small—
est normalized symmetric difference (N SD) error (14.9%). Middle row:

NSD = 38.4%. Bottom row: NSD = 88.6%. Average NSD = 40.8%.

Chan-Vese model: twenty random samples. The samples are sorted in
increasing order of the normalized symmetric difference (NS D). . .

Chan-Vese model: the same images as in Figure 8.6 with the ground truth
overlaid onto the image (yellow). ....................

Segmentation of kidney ultrasound using the GAC model. Left column:
output snake. Right column: ground truth (yellow) and output snake
(red). Top row: smallest normalized symmetric difference (NSD) error
(10.7%). Middle row: NSD = 22.98%. Bottom row: NSD = 49.28%.
Average NSD = 25.32%. ........................

GAC model: twenty random samples. The samples are sorted in increasing
order of the normalized symmetric difference ( N S D). .........

GAC model: the same images as in Figure 8.9 with the ground truth
overlaid onto the image (yellow). ....................

Segmentation of kidney ultrasound using the KWT model. Left. column:
output snake.Right column: ground truth (yellow) and output snake
(red). Top row: smallest normalized symmetric difference (NSD) error
(4.36%). Middle row: NSD = 9.46%. Bottom row: NSD = 27.22%.
Average NSD = 10.44%. ........................

KWT model: twenty random samples. The samples are sorted in increas-
ing order of the normalized symmetric difference (N SD) ........

KWT model: the same images as in Figure 8.12 with the ground truth
overlaid onto the image (yellow). ....................

xvi

139

140

141

143

144

145

146

147

8.14

8.16

8.17

8.18

8.19

8.20

8.21

8.22

8.23

8.24

8.25

8.26

8.27

8.28

Segmentation of kidney ultrasound using the Interpolation snakes model.
Left column: output snake. Right column: ground truth (yellow) and
output snake (red). Top row: smallest normalized synnnetric difference
(NSD) error (3.71%). Middle row: NSD = 7.72%. Bottom row:
NSD = 13.57%. Average NSD = 7.60%. ...............

Detected contours. 3.71 S NSD S 5.52(%). ................

Red: detected contours (same as in Figure 8.15). Yellow: GT. 3.71 S
NSD S 5.52(%). .............................

Detected contours. 5.52 < NSD S 6.65(%). ................

Red: detected contours (same as in Figure 8.17). Yellow: GT. 5.52 S
NSD S 6.65(%). .............................

Detected contours. 6.65 S NSD < 7.87(%). ................

Red: detected contours (same as in Figure 8.19). Yellow: GT. 6.65N SD S
7.87(%). ..................................

Detected contours. 7.87 < NSD S 8.79(%). ................

Red: detected contours (same as in Figure 8.21). Yellow: GT. 7.89 <
NSD S 8.79(%). .............................

Detected contours. 8.79 S NSD < 9.91(%). ................

Red: detected contours (same as in Figure 8.23). Yellow: GT. 8.79 S
IVSD < 9.916%) .............................

Detected contours. 9.91 S NSD S 13.57(%) .................

Red: detected contours (same as in Figure 8.25). Yellow: GT. 9.91 S
NSD S 13.57(%) ..............................

Graphical user interface for the blind study comparison of three types of
kidney boundaries (I—SNAKE, KWT and GT). A human expert chose
between outputs of three algorithms, displayed two at the time.

Chan-Vese model on synthetic images. Left column: initial snake
(scaling to 0.7 - GT). Middle column: ﬁnal snake. Right: ﬁnal snake
(red) and CT (yellow). On rows, top to bottom: Gaussian noise, salt.-
and-pepper noise, uniform noise, spread noise. Original images were
cropped and scaled down for display purposes, so some noise was lost
in the process. ..............................

159

160

161

162

163

164

165

166

167

168

169

170

171

173

8.30

8.31

8.32

8.33

8.34

8.35

9.1

9.2

Geodesic Active Contours model (GAC) on synthetic images.

Left column: initial snake (scaling to 0.7 - GT). Middle column: ﬁnal
snake. Right: ﬁnal snake (red) and GT (yellow). On rows, top to
bottom: Gaussian noise, salt—and—pepper noise, uniform noise, spread
noise. Original images were cropped and scaled down for display pur—
poses, so some noise. was lost in the process. ..............

Kass-Witkin—Terzopoulos model (KWT) on synthetic images.

Left column: initial snake (scaling to 0.7 - GT). Middle column: ﬁ-
nal snake. Right: ﬁnal snake (red) and GT (yellow). On rows, top to
bottom: Gaussian noise, salt-and-pepper noise, uniform noise, spread
noise. Original images were cropped and scaled down for display pur-
poses, so some noise was lost in the process. ..............

Kass-Witkin-Terzopoulos model (KWT) on synthetic images.

Left column: initial snake (Gaussian randomization with standard de-
viation 0‘ = 10 pixels). Middle column: ﬁnal snake. Right: ﬁnal snake
(red) and GT (yellow). On rows, top to bottom: Gaussian noise, salt-
and-pepper noise, uniform noise, spread noise. Original images were
cropped and scaled down for display purposes, so some noise was lost
in the process. ..............................

Interpolation snakes model (I-SNAKE) on synthetic images.

Left column: initial snake (scaling to 0.7 - GT). Middle column: ﬁ—
nal snake. Right: ﬁnal snake (red) and GT (yellow). On rows, top to
bottom: Gaussian noise, salt-and-pepper noise, uniform noise, spread
noise. Original images were cropped and scaled down for display pur-
poses, so some noise was lost in the process. ..............

2D border detection using radial 1D break-point detection. (a) Break

points along 72 rays starting from a user-deﬁned center. The actual
intensity proﬁles not drawn. Red: dark—to—light break point. Blue:
light-to—dark break point. (b) Shape model (green) ﬁt to the red break
points. (c) Tube (cyan) across the ﬁtted model (green). (d) Break
points on the tube intensity proﬁles. (e) Smoothing spline on the
red points from (d). (f) Smoothing spline (beige) and ground truth
(yellow). .................................

Interpolation snakes on truck images. Output of interpolation snakes.

Four major classes of ﬁngerprints .......................

Extraction of the ﬂow ﬁeld from a ﬁngerprint impression. (a) Original

(scaled down to 0.3 of its original size, to ﬁt the page). (b) Flow ﬁeld.

xviii

179

Interpolation snakes on truck images. Initialization of interpolation snakes. 187

188

191

193

9.3

9.4

9.6
9.7
9.8

Deﬁnitions of ﬁngerprint kernels using splines. Yellmv: control points.
Red: kernel. The kernels model the shape of the ridge pattern around
the center of the ﬁngerprint.

Procrustes model for learning ﬁngerprint kernels [78]. (a)-(d) Manually
traced left loops in ﬁngerprint images. The yellow dots are the land-
marks, homologous across all images. Only four out. of 20 ﬁngerprint
impressions are shown here. (e) All 20 training contours extracted,
1.1nregistered. (f) Registered training contours (green) and their mean
(purple).

The angles a and :3 deﬁning the measure of ﬁt (Eq. (9.1))
Elliptic kernel ﬁt to two impressions of the same ﬁnger.
Classiﬁcation error due to wrong best-ﬁt kernel.

Sources of classiﬁcation errors. (a) Right loop ﬁngerprint with an arch as
kernel of best ﬁt. (b) The same kernel (dark segments) overlaid on top
of the ﬂow ﬁeld of the impression in (a). The ﬁt is good, but the ﬂow
ﬁeld alone does not show a. right loop pattern.

xix

194

196
197
199

200

Chapter 1

Introduction

1.1 To model, or not to model:

A personal perspective

In order to ﬁnd their way in the world, most - if not all - organisms must sense the
environment around them. Most species try to “make sense” 1 of the world in their
own way, enough to ensure their survival, and limited (sometimes severely) by the
complexity of their nervous system.

We, humans, probably like many other species, parse the world into physically
and functionally distinct. entities. We mindlessly refer to these entities as objects. We
use nouns like “apple”, “tree”, “heart”, etc. to label these objects. But what is an
object? Of course, in our daily life, we do just ﬁne without a deﬁnition of the notion
of “object”, or even without a deﬁnition of most of the natural objects: we see it,
and we can name it effortlessly and instantly, and we can do so from a very young
age too. However, the moment we try to build intelligent. artiﬁcial systems that are.

able to act in the world autonomously, in a human—like fashion, we must deﬁne or

 

l\‘Ve use this term here without reference to controversial concepts like “mind", “consciolisness",
or “cognition”.

represent objects in some precise. mathematical way. so that the computer can act

011 them.

Sometimes a human refers to things generically ("t-211‘", “lmuse”, etc), and other
times speciﬁcally ("this car”. “that house”). Although largely miconscious. this dis-
tinction is crucial in trying to formalize in any useful way any concrete concept in
the “mind” of the computer. “Car” is an abstraction, a generalization; “this car” is
a particular instance which has roughly all features that describe a car. and possibly

'.

certain attributes or details that distinguish it from ‘another car”. Although in daily
life we loosely categorize both the abstraction and any of its instances as “objects”,

in pattern recognition, the term class is used to refer to a generic category of objects.

Let us also note that we can also readily sketch on paper (from memory) just about
any of the categories of objects that we normally perceive. But, since a category of
objects is a collection of many particular exemplars, what exactly is it that we are
drawing? However little artistic talent we may posses, a drawing of an apple would
never be mistaken for a boat. say. It appears then, that. our brain carries some sort of
representation, some sort of mental image (awquired from experience) of the various
categories of (simple) objects. This representation somehow summarizes the essential
characteristics of each of these categories. Whatever is being drawn on paper (or just
imagined) is some sort of average, something representative for the entire class - in

other words, a model.

So far, we have argued that humans parse the world into (classes) of objects, and
build models of them. To be sure, we are far from claiming that the human brain
builds a model of the entire world (at once), nor do we claim that the brain builds an
internal panoramic image of the ﬁeld of view, and acts upon that. as it was proposed
in David Marr’s theory of vision [72]. Studies on visual cognition, eye—movement and
visual attention [18, 67, 106, 91, 3, 105, 48, 75. 76, 56. 55] (to name a few) have

un—ambiguously dismantled these ideas, but do not exclude the possibility that the

[O

human brain does utilize models of individual objects.

We need to understand then. how to deﬁne (models of) classes and/ or objects,
for the purpose of using them on an intelligent machine. With few notable excep-
tions (such as some geometric objects like circle, line. etc. that can be described
by equations), we, humans, comn'ionly use different attributes or features to describe
objects. Naturally, we can try to port these features to computer world, which entails
that. each of them must be quantiﬁable mathematically, so it can be measured and
represented as a number. Thus, for instance, size attributes (such as length, width,
volume), or color. curvature, etc. make for good feature candidates, while “beauty”,

“grace” or “malice” do not.

Precisely which features should be selected for describing one concept or another
is the problem of feature selection. In traditional pattern recognition [33, 98, 57, 41],
the programmer selects, often subjectively, whatever features seem to make more
sense for the problem at hand, in effect trying to make the computer “perceive” the
world the way we do. A system built in this manner is often termed knowledge-based.
This is but one, heavily criticized, but still very popular way of building artiﬁcial
intelligence (AI). It is by no means the only, or the best way. Most notably, one could
mention genetic algorithms [50, 44], reinforcement learning [61, 101] or developmental

learning [108, 109].

The point we are trying to make here is that we see the need for building com—
putational models for AI systems. Based on our personal experience, we think that
while the knowledge-based approach may not. be powerful enough (as many people

argue) to solve “the vision problem”2

, it is, at least for the time being, the most
straightforward and the most reliable way of building an expert system, such as a

face recognizer, or an object detector.

 

2Legend has it that some 50 years ago, Marvin Minsky assigned “the vision problem” to an
undergraduate student, as a summer project.

1.2 Contributions and Overview of the Thesis

Throughout the thesis, our 7‘68(i(lfl‘(,'ll is conﬁmxd erclusieelg to 2D shapes, so unless
otherwise speciﬁed, by “shape” we mean “2D shape”.

Our main problem is border detection in ultrasound images. We do so using a
new type of active contour. In the process, some impm'tant aspects regarding shape
representation and modeling. are being addressed. C oncretcly, we propose

o a curvature-based representation of shape, invariant to similarity transforms

(scale, rotation and translation) (Chapter 2, Section 2.5).

O a new algorithm for fast shape alignment without correspondences (Chapter 3,

Section 3.4).

o a shape learning method without— point correspondences (Chapter 4, Sec-

tion 4.2).

o a new type of snakes — interpolation snakes — based on cubic splines, which,
unlike other snakes
— have smoothness built-in,
— allow for spatial hard constraints,
— incorporate a shape model,
which makes them suitable for very difﬁcult. image domains such as ultrasound.
These snakes are deﬁned in Chapter 6 - the centerpiece of the entire thesis.
0 a powerful algorithm, for parsing 1D signals as a piecewise constant function,
supported mathematical proof (Chapter 7). We apply it to extract feature

points for border detection in Chapter 8, Section 8.5.

0 an objective, comprehensive and systematic comparison of several active contour

algorithms on ultrasound images (Chapter 8).

4

0 a subjective blind study including our interpolz‘itimi snakes. the traditional
snakes [63] and manually traced ground truth boundaries. in order to demon—

strate clinical validity for our methods.

The thesis is organized in three parts. The ﬁrst is dedicated to shape learning (rep-
resentation and modeling). The second covers border detection using active contours.
The last part consists of applications. experiments and results.

A note on the presentation. It seems customary that a literature review be done
in the beginning of a thesis. separate from, and followed by the authors research
contributions. Since this particular thesis deals with related. but nevertheless distinct
topics - representation, modeling and detection of shape - we. ﬁnd it more natural to
split the background literature. and seamlessly incorporate the relevant pieces at the
appropriate chapters, interleaving the background with the new ideas. This is because
we believe it is important to see the entire train of thought. In this sense, the style of
this exposition is closer to a book. It goes without saying, of course, that we provide
references for everything published elsewhere, so it should be clear when we refer to
existing concepts.

This thesis started out as a concrete project of border detection in ultrasound
images, at Siemens Corporation Research, in the sunnner of 2002. The approach
was to deﬁne and use an active contour with smoothness built-in. Splines come with
smoothness built-in, but it. y ’2 s soon apparent that smoothness was not sufﬁcient for a
commercial-grade application: the snake would have undeﬁned, uncontrollable behav-
ior when passing through gaps in the boundary of the object. (which were not fcwl).
It became apparent that, in order to bring the snake under control at those gaps, we
had to introduce shape—prior information in the snake evolution. That is, a shape
model was in order. We have explored (and implemented!) a number of registration
methods, shape learning techniques, and many different “species” of snakes. All the

shape representation, shape alignment and model learning methods that we develop

5

in Part 1 of this thesis are a build-up for creating a smooth snake, with knowledge
of shape. In the process, we have come. up with new algorithms. Sometimes, out of
frustration, we looked for distant alternative methods for border detection, suitable
for ultrasound. Such is the case, for instance, with the 1D segmentation algorithm in

Chapter 7.

Part I

Shape Learning

Chapter 2

Shape Representation

Shape is arguably the most important feature in vision. This part. of the thesis is
about building mathematical representations of shape and shape models that can be
used for object detection and recognition.

We begin this chapter by deﬁning the concept of 2D shape (Section 2.1). We then
discuss various mathematical representatations of it, and some of their properties.
With the exception of the curvature representation of shape (Section 2.5), the current.
chapter constitutes mostly foundational material, and is included in the thesis for
completeness. The representations detailed here will be used at. one point or another

in later chapters.

2.1 Shape

The foundation block of building a model for a class of shapes is the problem of curve
registration. The crux of the matter is captured in the following very general and
apparently simple question: how can we compare two (or more) shapes? The goal
is, of course, to tell whether or not the two shapes represent the same object. If the
two shapes do represent the same object, they are similar, but perhaps not identical.

This is the case, for example, if the images were taken at different times, or from

8

(slightly) different viewing angles (Figure 2.1).

Figure 2.1: Different views of the same object produce similar, but not identical shapes.

Before a similarity measure is built between two shapes, one must understand
the nature of the differences between shapes. On one hand, we can capture different
contours of the very same object due e.g., to different viewing angles, or due to the
motion of the object. Thus, we may have apparently different contours arising from
the same objects. These contours will differ e.g., by a rigid or afﬁne transformation,
if the object is rigid. Things are more complicated with non-rigid objects (e.g., heart,
lungs, etc.), which can deform by non—linear transformations. These differences in the
apparent shape of an object are not due to the intrinsic geometry of the object itself,
so we term them extrinsic differences. It is therefore apparent that. when comparing
shapes, we must factor out a suitable group of 2D transformations (most commonly
rigid or scaling), in order to rule out non-essential differences in shape (the extrinsic
differences) .

On the other hand, there are intrinsic differences between shapes. lntuitively,
these are the shape differences that distinguish, say, an apple from a banana, or an
airplane from a car. It is the intrinsic differences that a similarity measure between

shapes must capture.

i\-‘la.thema.tically, then, we compare shapes by means of a similarity metric (l(-, -,)
which is invariant under the appropriate group of transformations (Section 2.1.1).
That is, for any two curves ’7' and 6, the distance between them does not change

under the action of arbitrary transforms T and S in the group:

If the metric (l is not invariant under the desired group of transformations, then
our only chance of comparing curves is to ﬁrst align them (effectively factoring out
the extrinsic differences, usually scale, translation and rotation), and then compute
the distance between the aligned curves. Speciﬁcally, if e/ and 6 are the curves, and
d is the non-invariant metric, then we seek a transformation T (from the appropriate

group of transformations), which minimizes the quantity d(T('~,'), 6).

Then, with this T, we can deﬁne an invariant metric:

di~7‘2.1i(’77(5) = d(T(Af)~ 6) (21)

Finding the transformation T that. factors out the extrinsic differences is usually

termed curve registration.

Various metrics have been used in the literature for shape comparison: the Haus-
dorﬁr distance was used by Huttenlocher et al. [54], while Dubuisson and Jain used
a modiﬁed version of the Hausdorff distance [32]. Ton and Jain [104] use support
functions; Sclaroff and Pentland [94] use strain energy. Least-squares (squared error)
and Procrustes metrics are popular among statisticians (Goodall [45], Dryden and
Mardia [30]); also used by Horn [52, 53], Besl and McKay [7], Gold et al. [43], Cootes
[21], and Duta et al. [35]. i

10

2.1.1 Classes of 2D transformations

The choice of the. appropriate group of transformations that describes the extrinsic
differences is crucial. and depends on the problem at hand. We describe next. some
of these classes of transformations most. frequently used in computing a model from
a set of training sanmles. Figure 2.2 shows these transforms.
-. . 2 2 . , , 9 , - -.

Unless otherwrse stated, T : R ——-+ IR represents a ._D transformation. X is an
arbitrary pomt in IR and T is a translatlon vector.

Linear transformations are, by deﬁnition. ti'ansfm'mations that preserve colinear-
ity, i.e., map lines onto lines. These are usually the most appropriate transformations
when building a model from a set of training samples, but they can also be used in

comparing two standalone shapes.

0 Rigid transformations. These are transformations that consist of a rotation

followed by a translation:
T(X) 2 HQ.) - X + r

where R is a rotation of angle ob. Thev )reserve colinearitv. arallelism,
(,6 o . . .

lengths, angles and areas.

0 Scaling transformations. These are also called similarity transformations.
First, a rotation is performed, followed by an isotropic scaling, and last, a
translation:

where p > 0 is the scale factor. The rotation and the scaling can be interchanged
with each other, but not with the translation. As with the rigid transforma-
tions, scaling preserves colinearity, parallelism and angles, but the lengths are

multiplied by the scale p, and areas differ by a factor of p2.

ll

It is wortl'iwlule to notice that 1f we identify the real plane. R“ With the complex
plane (C in the usual way (X = ((1.1)) +—> z = a. + 27 - b), then the sealing

transformation can be seen as a map T : C ——> (C given by

with w = p- (coscﬁ + '17 - sine) = p - e’b.

o Afﬁne transformations. These transformations are (‘lescribed by a 2 x 2
matrix A and the translation vector T, for a total of 6 parameters. The matrix

A accounts for rotation, non-isotropic scale (2 parameters), and shear.
T(/Y) Z 1 ' /Y + T

The affine transforms still preserve colinearity and parallelism.

0 Pro jective transformations. These are linear transformations of the projec-
tive plane. By definition, the projective plane of dimension It, is the set of lines
in Rk+1 going through the origin. Explicitly, a 2D affine transformation is of

the form

,I _ a.r+by+c
I _ gx+hx+l
,/ _ drr+e;r+f
y — ggr+hr+1

where a,b,c,d,e, f,g and h, are the 8 parameters that. define the projective

transformation.
In addition to linear transforms, there is obviously the much larger class of non-
linear transforms. In general, the non-linear transforms are not used to compute

1‘)

h-

models, but. rather for image registration, under the paradigm of deformable templates

([58, 70] and references therein).

<3 9

Rigid
Scahng
A fry/ Waive

ﬂ 6

Figure 2.2: Linear transforms. Blue: original shape. Red: linear transformations of the
blue shape: rigid (rotation + translation), scaling (preserves shape up to scale), afﬁne
(introduces shear and non-isotropic scale, but still preserves parallelism) and projective
(only preserves colinearity).

2.2 Explicit representation of shape

Mathematically, a 2D shape is represented as a parameterized curve in the plane, i.e.,

a (smooth) function of one real variable defined on some interval [(1, b].

7 : [(1,1)] —> R2, 7(t) =

For numerical computations, the curve is sampled at discrete time intervals. This
yields an array of points (70, . . "772—1) = (120,310, . . . ,.'r.n_1,yn__1), which, for con-

venience, will be denoted by the same symbol 7 as the continuous representation. In

13

our computations it will be clear from the context whether we refer to the continuous
version or to the discrete version of the curve.

Occasionally, some computations can be carried out in a more concise form in
complex notation, by viewing “,4 as a. curve in the complex plane (I. via. the canonical
isomorphism R2 % (C given by (.r y) <—-> z = .L‘ + iy. We shall denote complex curves
by z(t), u..(s). etc. and use the Greek letters ’7, (5, etc. for curves in R2. With these
notations, the discrete version of a (continuous) complex curve 3(1) is simply a vector

2 = (20,...,zn_1) E C".

2.2. 1 Parametrization

Let t = (t0, . . . ,t.n,_1) be the time instants at which 7 is sampled, so that 7,: = 7(1‘. 1)
In general, the points 7;," are not equally spaced along 7 even if the ti’s are equally

spaced. It all depends on the particular parametrization of 7'.

Example 2.2.1 Two parametrizations 0f the unit circle.

t“: +1
Sampling from t = —20 to t = 20 at constant increments of At = 0.2, we get 200
points shown in Figure 2.3 ( a ), which are obviously non-uniform. 0n the other hand,

if we parameterize the circle by

7(3) : (cos 8, Sin 8)

then we get a nice discretization, in which the points are equally spaced, if we
subsample from s = 0 to s = 2n at constant increments of As 2 27r/200. See Figure

2.3 (b).

14

(a) (b)

Figure 2.3: Two parametrizations of the unit circle. (a) .r = (t‘2 —1)/(t2+1),y = 2t/(t2+ l).
(b) lie—parametrization by arclength: r = cos 8, y 2 sins.

Of course, choosing a parametrization with equidistant points is not a matter of
aesthetics, but, as it will turn out, it is essential in border detection.
While it is trivial to parametrize the circle in a way that produces a good dis-

cretization, things may not be so obvious for arbitrary curves.

2.2.2 Arclength parametrization

Along any smooth curve 7, there are two important vector ﬁelds that can be used to
describe the local properties of *7: the ﬁrst and the second derivative. If we imagine
a particle moving along the curve, then the ﬁrst derivative represents the velocity of
that particle: it is a vector tangent to ’y (at each point.) and its magnitude is equal to
the speed of the particle. The second derivative, measures the rate of change of the
velocity, so it represents the acceleration along the curve. It is still a vector deﬁned
at all points on 7, but it is no longer tangent to the curve. We shall use the classical
notation 7’ and '7” for the tangent and for the acceleration to 7, respectively. The

length of 7 is deﬁned as the integral of the magnitude1 of the velocity:

 

1As a matter of notation and terminology, we denote the magnitude of a vector v by [1| We also
use interchangeably the terms “length”, “norm” and “magnitude”.

go
to
v

b
I.engllz(7)=/ |7Ildt (
(1

Just as the discretization depends on the parametrization. so do the vector ﬁelds
7’ and 7”. Indeed. if the time parameter l is itself a. function of a new parameter
t = t(s), then, although 7(t) and 7(t(s)) have the same geometric image. they have
different tangent vectors d7/dt and (1’7/(15, related, obviously, by the chain rule:

(17 (17 (It

(1s = W . (ls

Consequently, the computations that involve the tangent and / or the acceleration
depend (a priori) on the parametrization. The length of the curve (eq. 2.2) is one
exception.

Fortunately, there exists a special parametrization - a “canonical" parametriza-
tion, in some sense - that has many desirable properties and which facilitates the
computations. Let 7 = 7(1) be any parametrization. One deﬁnes the arclength pa-

rameter on 7 as the distance along 7 traveled up to time t:

A priori, s = s(l) is a function of the original parai‘netrization t. Obviously, s
is differentiable and s’(t) = |7’(t)| Z 0. If the velocity 7’ is nowhere zero, then s
is strictly increasing, and one can compute the inverse of the function 5 = s(t) and
express t as a function of s: t = t(s). We obtain this way a new parametrized curve

7(5) = 7(t(s)) with the following properties:

1. 7 and 7 have the same geometric image.

2. By the chain rule, the velocity 7, along the re—parameterized curve has norm 1:

16

3. The acceleration 7” is perpemlicular to the velocity:

“3'” i a" (2.4)

. ~ . . - . _ ~// ,. .‘ . ,- -
As 7 changes com-'exlty. the normal vector 7 changes sides, always pomtlng

I

towards the “interior" of the turn.

4. Last, but not least, the discretization of a curve parameterizml by arclength
is equally spaced, because the arclength parameter 5 represents distance on
the curve itself. Indeed, if the curve is discretized at .90....sn_1 with A3 =

8,41 — Si = coast and 7,- : 7(a), then

 

l7"i"/j+1| z 82? — 8,’+1| 2 As = const

Remark 2.2.1 A word on notation: because of the fact that 7 and 7 have the same
geometric image, it is generally accepted to drop the ~ symbol, and use the same
letter 7 for both curves, indicating explicitly whether one is working with the arclength
parameter (7 = 7(s)) or with the time parameter {7 = 7(t)). Differentiation with
respect to the time parameter is indicated using the I ( “prime") symbol, whereas
derivatives with respect to the arclength parameter s, are indicated using the i ( “dot”)
symbol. Thus, 7’ and 7” are the ﬁrst and second derivatives along the curve, in an
arbitrary parametrization, while 7 and 7 are the ﬁrst and second derivatives with

respect to the arclength s.

In Example 2.2.1, without getting into details, the (cos s. sin 3) parametrization is

in fact the arclength parametrization of the circle. This is the reason the points are

17

equally spaced in Figure 2.3 (b).
Under the arclength parametrization. integrals along the curve are aq>p1'(i)ximated

numerically as ﬁnite sums:

2.2.3 Curvature

Let 7 be a curve parameterized by arclength. and 7 its tangent. Let u be the unit to
7, chosen so that the frame (7. V) has positive orientation. Clearly, since 7 _L 7, then

at each point on 7, the acceleration 7 is a scalar multiple of the normal I/:

Deﬁnition 2.2.1 The curvature of a curve 7 is the scalar 1.77 in. (2.5).

Trivially, by taking the dot product against l/, we get from (2.5) that

[*7 = (X/q V) (2-5)

and

|k7| = |7| (2-7)

The normal V does not change from one side to another, as 7 changes concavity,
but the acceleration 7 does. Consequently, the sign of the curvature k7 changes as
well, the same way 7 changes.

The next proposition summarizes some properties that we shall use in this thesis:

18

Proposition 2.2.1 1. Under an. arbitrary parameter t.

k _ .1"- ya _ J.” . y’
A/ __ '- ‘7
(1"? + .U/2)13/_

 

2. If 7' is given. implicitly (Section 2.4), as the zero level curve of a surface f (.17. y) =

0, then the carvatzu'e can. be computed by

9 9
A. _ lf.r.zrf!/' + fax/ff - 2f.rgf.rfgl

' IVf|3 (2'9)

 

3. If a7 is a scaled version of 7, and s is the arclength on A, then s - a is the

arclength on. (1.7 and

length(a7) = a . lengthh) (2.10)
k7,;(s)
a

has; (8(1) 2

2.3 Cubic spline approximation.

Although the presentation of the material in this thesis is top-down (i.e., first the
theory and then the applications), the new concepts and algorithms were derived in
a bottom-up manner. starting from a concrete practical problem: border detection
in ultrasound images (Section 8.2). In general, serious algorithms in medical imaging
require clinical validity, and that, in turn. requires that the automatically detected
boundaries be compared against manually traced boundaries by a human medical
expert for training and evaluation purposes. Invariably. the expert does not draw
the entire contour as a continuous curve, as one might imagine, but rather clicks

on some 20-30 points on the contour. The traced contours will most likely have, to

19

 

 

Figure 2.4: 3D view of a 2D contour (blue) with its curvature (green) drawn on top
of the contour.

be matched across patients, so if the organ being traced happens to have distinct
anatomical landmarks (usually rigid tissue like bones, teeth, vertebrae but also some
brain regions), then the human expert will mark those, in equal number in each
image. This way, point correspondences are established a priori, i.e., before any pre-
processing and shape learning, in the most natural way. If, on the other hand, there
are no, or very few, clearly deﬁned anatomical landmarks (e.g., kidney, bladder), then
there is no choice but to mark more or less arbitrary points on the contour, in different
numbers across patients, leaving it to the computer to recover or establish the point
correspondences. Under this inherent non—uniformity we have determined that the
best way to represent the hand-drawn contours is to interpolate the markers using
a cubic spline, parameterized by arclength, using a judicious choice of the spline
parameters. See Figures 2.5, 2.6 and 2.7. It is central to our applications, so we

provide all the necessary details next.
Following [28] (Chapter IV) and [29] an interpolation spline is deﬁned in 2D by

20

:LAJ SL1. :ﬁ'ut H MED

 

Figure 2.5: Contours marked by a human expert on kidney ultrasound. Top row:
markers (two different patients). Observe the sparseness, lack of anatomical land-
marks, non—uniformity and the different number of markers across patients. Bottom
row: corresponding kidney contour represented as a dense, cubic interpolation spline
determined by the markers, parameterized by arclength (equally spaced points). Both
splines are discretized using the same number of points (150).

21

SAC. RLQ RENAL TX LAT

 

Figure 2.6: Detail from Figure 25: patient one (left column).

22

SAC RLQ RENAL YX MED

#:-

SAC RLQ RENAL TX MED

 

Figure 2.7: Detail from Figure 2.5: patient two (right column).

23

the following elements:
0 break points: a strictly ii’icreasing set of real numbers a = 50 < £1 - - - < {l = b
0 control points: a set of data vectors: yo, . . . y) E R2 one y, for each 5,;
0 degree: a positive integer k.

Given this data, the interpolation spline of degree k. is defined as a curve

7 : [a,b] —> R2

t-—* 7(1) = (W), 31(0)
with the following properties:
0 7 passes through the control points y,- at 52-, that is
7(5i) = 311' W
0 componentwise (i.e., each of .r and y), '7 is a degree—k polynomial on each interval

l€i£i+1l

o the pieces fit smoothly at the break points: 7 is of class Cit—1, i.e., it has It — 1

continuous derivatives.

Of particular importance are the cubic interpolation splines (k = 3), due to the

so called smoothest interpolation property ([28] Corollary 7, Chapter V):

Proposition 2.3.1 Given breaks a 2 £0 < < £1 = b and corresponding control

points 310,. . . ,yl, consider the following minimization problem with constraints:

24

b
Em = / More (212)

Then, over the space C2 of twice continuously diﬁercntiable functions on [at],
the minimum (a) eatists, ( b ) is unique, and (c) is achieved by the ( necessarily unique )

cubic interpolation spline on the given breaks and control points.

Minimization of the total second derivative (as opposed to other ('lm'ivatives), is
important because of its geometric signiﬁcance: for any curve (i.e., not. just splines),
[7”] measures the amount of bending of the curve. For this reason, the energy (2.12)
is called bending energy. In fact, under the canonical parametrization on 7, the

curvature of 7 is defined precisely as

e = lil (2.13)

where, following the convention in Remark 2.2.1, 7 is the second derivative of 7 with

respect to the arclength parameter s.

The computation of a cubic interpolation spline can be carried out in linear time
in the number l of control points. That is. given t E [£.,-,£,- +1], the complexity of
computing 7(t) is 0(l), (where l is the number of control points), because, as we
shall see, the ﬁrst derivatives at all break points must. be determined. Since these
derivatives do not depend on t (but only on the break points), they can be computed
once and for all, making computation of 7(t) an 0(1) process. We present in detail
the numerics of cubic interpolation splines in Appendix B. In the end, we end up

with a function

to
C1

7 : [a,b] —) R2

t-—* 7(t) = (W). 11(1))

P1(l»)if€0£t<€1 (21(1) if€03t<€1
g:(t)=< W) =<

 

 

[Pl—1U)” €1_2£l<€1_1 [Q1_1(l)lf€1_23t<€1_1
(2.14)

where Pi and Q, are cubic polynomials deﬁning the .r and the y components,
respectively, on the interval [£.,j_1,{i).

Numerically, the spline is discretized by applying Equations (2.14) on an array
t = (t0, . . . ,t.n_1), where the number n is the user-deﬁned number of points on the
spline. Most commonly, the ti’s are equally spaced, i.e., the spline is generated by
increasing uniformly the time parameter. The resulting points 7(tlj) on the spline,
are, of course, not uniformly spaced, as t is not the arclength parameter (Figure 2.8,

left). We explain how this can be corrected, next.

2.3.1 Arclength parametrization of an interpolation spline

We can compute an (approximately) arclength parameterized spline, by manipulating
the break points 5,. If yi are the user-defined control points, then we choose {0 = O,
and {i - €i—1 = [yiyZ-_1| for i > 0, so that the length along the curve between two
consecutive control points yi_1 and yi is approximated (to the extent that the chord
yi_1y,' approximates the corresponding arch along the spline) by A: = {i — Ei—l-
By choosing the breaks this way, the usual (uniform) parameter t along the real

axis, becomes - approximately — the arclength parameter along the resulting spline

’7 = 7(t)

.2 | Control poms l .2.

/ \
/ \

[ Brads points I

 

Figure 2.8: Cubic interpolation spline (blue). Red: control points. Black: break
points. The spline on the left has equidistant breaks, and is not parametrized by
arclength (the density is higher near points of high curvature). The spline on the
right is built on the same control points as the one on the left. Choosing the breaks
6,- by 5,- = €i—1 + [ya--1312] produces an arclength parameterized spline.

Remark 2.3.1 This choice of the breakpoints, which seamlessly produces a curve
parametrized approximately by arclength, is not possible with types of splines that do
not pass through their control points (e.g., B-splines, Bezier); it is only possible with
interpolation splines, because of the particular signiﬁcance of the control parameters,

as points on the curve.

2.4 Implicit representation

In addition to the explicit representation of shape as a parametric curve, there exists
a qualitatively different representation. If f (:r,y) is a function of two independent
variables, then the equation f (:c, y) = const deﬁnes y implicitly, as a function of 1:,
provided that f is smooth and By f is nowhere 0. Each constant deﬁnes a different
curve, so f determines, in fact, a family of curves, called the level sets of f (Figure
2.9).

We digress here, to explain why the level-set representation of a curve is useful.

27

 

Figure 2.9: Implicit (level sets) representation of curves. Given a smooth function
2 = f (x, 3;), each constant c deﬁnes implicitly a curve c = f (z,y).

The traditional snakes, as deﬁned by Kass, Witkin and Terzopoulos [63], have several
drawbacks, many of which have been addressed in subsequent studies [2, 20, 42, 111],
but two of which are that (a) the algorithms are numerically unstable, and (b) the
snake cannot change the topology (e.g., if the snake is initialized as a closed simple
curve, then it cannot break to converge to a conﬁguration with several connected

components).

A level-set snake is the zero level set of a moving surface 2 = f (:r, y). Clearly, if the
surface has more than one hump, the snake will change its topology (Figure 2.9). The
problem, of course, is to control the movement of the surface, so that its zero level set
converges to the salient. features in an image. We consider this revolutionary approach

to curve evolution so important that we describe it in some detail in Section 5.2.

28

2.4.1 Distance map representation of shape

We have seen that. cutting a surface :: = .r; with a )lanc z = const )roduces
n y

a curve. Conversely, given a parametric representation of a plane curve 7(t), it is

possible to find a function : = f(.1:. y) whose zero level set is precisely 7. Speciﬁcally,

for any point X = (.r. y) in the plane, define f(.z:, y) to be the distance from X to 7:

f(\) 2 “11m [X — 7(t)] (2.15)

Clearly, a point X is at distance 0 from 7 if and only if X is on 7, so 7 is precisely
the zero level set. of f. The function f is called the distance map representation (or
the chamfer transform) of 7 [4. 9] (Figure 2.10).

If the curve 7 is simple and closed, then 7 is usually represented by the signed
distance map, which is the same as the usual distance map, except. that the points in

the exterior have negative sign (but the same magnitude as before).

Remark 2.4.1 A word on the sign convention. If 95 is any surface, then the gradient
Vd) is orthogonal to the level sets, and points in the direction in which to increases
most rapidly. As such, if 7 is the zero-level set ofd) and if 7 is a. simple, closed curve,
then the vector V := Vcb/[Vcb] is the inward unit normal to 7 ifo < 0 outside 7 and
d) > 0 inside. With this sign convention, the unit circle, with the positive orientation

(counterclockwise) has positive curvature ( as it should).

Regarding the complexity of computing the distance map, if I is an m x n image,
and 7 is a curve of length L in the image, then a straightforward, exact computation
of the distance map takes 0(m - n - L) because for each image pixel one must compute
the distance to each curve pixel. If L is of the order of a few hundred points, then
the algorithm is too slow.

Borgefors [9, 10] introduced an algorithm which requires only two passes through

the image (hence with complexity ()(m - n), which computes a. good approximation

29

 

 

(C)

Figure 2.10: Distance map representation of a plane curve. (a) Explicit representation
as a (discrete) parametrized curve. (b) Distance map representation. (c) Signed
distance map representation.

30

of the distance map. Her work was directed toward image matching using the dis—
tance map (termed chamfer mutt-lung). Chamfer matching pre-dates Borgefors. being
introduced in 1977 by Barrow et al. [4].

Yet another way of computing the distance map of a given curve. is to have a.
moving front evolve along its own normal. with constant speed of 1. This method is
most readily implemented using level-sets, and, naturally, it is the method of choice

in the level-set. community.

2.5 Curvature-based representation of shape

The metrics we have encountered so far were not invariant to plane transformations
(Section 2.1.1), for the very reason that. - implicitly or explicitly - the curves were
represented in terms of their cartesian coordinates (:L', y). This gives rise to the entire
problem of registration (Chapter 3): bring the curves together, to some standard
pose, and then compare them.

We develop in this section a measure of similarity between curves which is invari-
ant to scale, rotations, and translations, i.e., invariant to similarity transforms.

To show the intuition behind this new representation, we reason in the discrete
case ﬁrst. Let, as before, 7 = (70,...,7n_1) be a curve. Assume 7 is parameter-
ized by arclength, which translates mathematically that the length between any two

consecutive points on 7 is 1:

[Vii/1+1] = 1

Since all the lengths of the constituent segments of 7 are known, then, all it takes
to move along 7 is to know how much to steer at each of the 7is. See Figure 2.11.
This suggests a representation of a curve using the angles between the vectors

7.,-_17,- and 72-7241, adjacent to 7,. Let A6, = 6,1 —— 6i_1 be this angle. As shown

31

7.
1+1,

/<~\ Mk???
\

/ \
Y. ”39;"
_______ l _’_ -___\_e_.l—.1_

  

Figure 2.11: Moving along a. discrete, equally spaced curve 7 only requires to know
the angle between consecutive segments.

in Figure 2.11, for each i, 0,- is the angle between the ar-axis and the vector 7,i7,j+1.
All these angles can obviously be computed from the cartesian representation of 7.
Conversely, knowing an array A6 = (A01, . . . ,A6n_1) permits the recovery of the
cartesian representation of 7, modulo a translation and a rotation (uniquely deter-
mined once a starting point 70 and an initial direction 60 are speciﬁed).

lntuitively, the “amount of steering”, must have something to do with curvature.
It is indeed the case, and we shall explain this in the continuous setting. Let now
7 : [0, L] —-> R2 be a smooth curve (e.g., of class C2). At an arbitrary point 7(3),

let 0(8) be the angle between the tangent. 7 and the r-axis, that is

7 = (cos 0,sind) (2.16)

Proposition 2.5.1 Ifk is the curvature of 7, then
i=9 (2N)

Proof: Take the derivative in (2.16), then the dot. product against the normal
V = 7—L = (— sin 0,cos 0). Apply Equation (2.5).
C]

From this proposition, it is clear that the curve 7 is detern'iined by its curvature

32

alone, modulo translations and rotations. Indeed. knowing 1:, then by (2.17) and

(2.16) we have

9(s) : 90

_+_

,2:
f
v
n
q:
.-
+
A
(\D
p-i
00
V

l.‘(o)d0
cos 6(o)do

sin 9(o)(lo

Remark 2.5.1 A representation of a curve 7 in terms of its curvature k would be
invariant to translations (as k. is deﬁned in terms of derivatives), and to rotations
(as they preserve dot products). However, the curvature is inversely proportional to
scale. For instance, the curvature of a circle of radius r is I/r. The larger the circle,

the smaller the curvature.

All these facts are well known to geometers. What we bring new to computer
vision here is an invariant metric between curves, based on curvature. To this end,
let 7 and 6 be two curves parameterized by arclength, and let L7 and L5 be their
respective lengths, and k7 and 196 their respective curvatures. We would like to
deﬁne the similarity between 7 and 6 by comparing their curvatures. e.g., by their L:2

distance:

fur, - k6|2

Of course, since k7 is deﬁned on [0, L7] and ho- is deﬁned on [0,1,5], this integral
does not make sense unless the curves have the same length. This problem is solved

if we ﬁrst normalize 7 and 6 to have length 1. For any arclength parameterized curve

a : [0, L] ———> R2 we deﬁne

(11 : [0,1] —> R2

(i(lL)
L

 

(11(t) = (2-19)

The next lemma expresses the derivatives, arclength, length. and curvature of 01

in terms of those. of (1.

Lemma 2.5.1 If (r and (11 are as above, then:

dal
dt

 

(l) = (tar). (x) = L - sat) (2.20)

2. ifs is the arclength on (r. then t = s/L is the arclength on (11. Also, L01 = 1.
3. kal(t) = L ~ ka(tL).

Proof: Trivial. [:1

Using the length-normalized curves, we can make the following

Deﬁnition 2.5.1 (Curvature distance)

2
lea/1U) — k61(t)] dt

 

(1(745) ==/()1

1 2
=1) [Ly-h‘n/(ILAI) — lid-lideLo‘) (ll

 

Since curvature is invariant under rotations and trz-inslations, so is d. Moreover,
due to the unit length normalization of the curves, d is also scale-invariant. Clearly
d is symmetric, d 2 0 (positive), and also satisﬁes the triangle inequality. However,
precisely because of its invariance, it is not positive-('leﬁnite. This means that. (1(7,(5) =

0 does not imply 7 = 6 (this latter equality does not even makes sense, as 7 and 6 have

34

a priori different lengths). For instance. scale invariance means that (1(7. (1. - ’7) = 0,
i.e., this distance does not distinguish between a. curve and any scaled version of it.

The f(')llowing proposition shows what l'ia.[‘>pens when (l = 0.

Proposition 2.5.2 (“of-.6) : () if and only if 71(1) 2 (51(1‘). Vt E [0.1].
That is, the curvature ’Illﬂl‘ﬂf (‘2.21) is a positiee-deﬁnite metric on. the space of

curves of unit twat/i.

Q

Proof: Since by definition (1(7. (5) = (“71.01) we have to prove the equivalence
for curves of length 1. Assume therefore that L7 = [‘6 = 1. Of course, (K736) = 0
means the curves have the. same curvature. The proof follows now from (2.18)

C]

There is an additional complication in the case of closed curves, arising from
the possibility of a phase shift. We want the metric d to be invariant to shifts in the
arclength parameter along the curve. That is, if 7 is a closed curve, and (3(8) 2 7(s+a)
is a phase-shifted copy of '7', we want. (1(7, 6) = 0. This requirement is natural, since
7 and (5 have the same geometric image. However, their curvatures are also shifted

by a:

l.‘(5(s) = lea/(s + a)

so, if we let L be the length (of both 7 and 6)

0 = (“7.6)
1 2
2/0 [Mam—idem (It

1
2/0 M's/UL)—Ara,(tL+a)|2dt

which implies

A'~(s) = A.'~’,A(s + (1.) VS 6 [0, L]

I

That is, a is a. )eriod of the curvature of course, both I" and its curvature kn, are
. . I ,

periodic, and L is a period because ’7' is closed, of length L, but a is a smaller period
than L). Let T > 0 be the smallest period of k7. Then. necessarily a = nT for some
integer n. This is a necessary condition for the distance to be 0 on phase—shifted
curves. Most commonly though, tlns condition 18 not satisfied , so curves w1th
identical geometric image will, have non-zero distance due to shift. in the parameter
along the curve.

The remedy is, of course, to apply all possible phase shifts to one of the curves,

and take the minimum.

Definition 2.5.2 (Curvature distance for closed curves)

( ("/50) 3: min d("/1,51(' + 0))
new,”

1
= min /
a.€[0,1] 0

 

The quantity defined by Equation (2.22) measures shape similarity between two

closed curves, and is invariant to similarity transforms and to phase shifts.

Example 2.5.1 We consider in this example two classes of shapes: a circular shape,
and a “bow tie” shape. Each of the two classes consists of 30 synthetically generated
samples, drawn from a gaussian distribution around their respective mean. The mean
and the samples were deﬁned as arclength-parameterized, cubic interpolation splines.

We then computed the intra-class (db/1:. 7]) for all '71,, 71- within the same class, and

 

9 . . . . .
“One example when we can shift. the parameter s on a curve by a multiple of the period T IS if
'7 resembles a circular saw blade.

36

inter-class distances (d(e,','.o‘I-) with ’7,- and (5 j in different classes). We obtain this
way three distributions of distances. shown in Figure 2.12 (c). Curves from within

the same class tend to be closer to each other, than curves from diﬁerent classes.

37

 

(A) (b)

Intra and inter class distributions of curvature distance

 

 

 

 

 

 

 

 

 

 

 

 

 

140 I I I I —' circle vs circle
bow tie vs bow tie
120 - —— circle vs bow tie .
100 [- _
80 - 4
60 - -
40 - 4
20 _
M
0o 7 o 2 014 ole 0:8 1 112 1 4
(C)

Figure 2.12: Two classes of shapes, and the distributions of inter and intra class cur-
vature distances. (a) Circular class: 30 samples (blue) normally distributed around a
user—deﬁned mean (red). (b) “Bow—tie” class: 30 samples (green) normally distributed
around a user-defined mean (red). (c) Distributions of curvature distances d(7,~. 6'.)
Blue: both 7i and 6 j are from the circular class. Green: both 7i and (3 - are from the
bow-tie class. Red: 7i is circular, and dj is a bow—tie shape. Note the separability of
each of the within—class distributions (the blue and red distributions) from the intra
Class distribution of distances (green).

38

Chapter 3

Shape Alignment

Curve alignment. (or registration) and shape similarit' are two ingredients towards
building a shape model. We define the problem of registration in Section 3.1 and
provide explicit formulas for the transform that. aligns two curves, in the case of
similarity, affine and thin-plate-spline transforms (T PS), under the assumption of
known point correspondences (Section 3.2). We review some of the representative
existing registration algorithms in the case of unknown point correspondences (Section
3.2.2), and discuss as a special case distance map registration (Section 3.3). In the
last section, we propose a fast and practical registration algorithm still in the case of

unknown correspondences, which works best. for elongated contours in 2D.

3. 1 Curve registration

Suppose now that we have two curves 7' and 6 which we want to register (for instance
for the purpose of defining an in rariant metric on the curves, i.e., a shape distance).
That is, we want to find a transformation T : R2 —> R2 (e.g., from one of the groups

of linear transformations in Section 2.1.1), such that.

rm = 5. (3-1)

Clearly, there is little chance that Equation (3.1) has an exact solution in T.
For one, in the discrete setting, (3.1) entails that. ’7 and (5 have. the same. number of
points, which may not. be the case. The next best. thing then, is to solve this equation
approximately, by specifying an optimality criterion E = E7,_(5(T) and by finding the

transform T optimal with respect. to E:

T = arg min EA,,‘(;(T) (3.2)

Computation of the optimality metric E may, or may not require known point.
correspondences between the curves to be registered (Section 3.2). Most. notably, the
most straightforward metric between two parametric curves - the L2 norm - involves
pointwise subtraction of the two curves, so one must know, for each point on one

curve, what its counterpart on the other curve is.

On the other hand, there are metrics that do not. require known point correspon-
dences. These are metrics that compare each point on one curve, against all points

on the other curve. We present some of these below.

To begin with, the distance between a point .1: and a set Y (in any Euclidean

space) can be naturally deﬁned as:

d(.z:,Y) = inf la: — y| (3.3)
yEY

Next, the distance between two sets X and Y (in our case these will be curves in

the plane) can be defined as the largest. distance from any point. in X, to Y:

d(X,Y) = sup d(.1:,Y) (3.4)
.‘L‘EX

This notion of distance between two sets (Equation (3.4)) was introduced for the

very first time in 1905 by Dimitrie Pompeiu — a distinguished Romanian mathemati-

40

cian [87]. It was originally termed the e’cart 1 of X and Y, and it is sometimes
referred to today as the directed Hausdorjf}r distance between X and Y. The distance
between a point and a set (3.3) is known today in the computer vision (onnnunity as
the chamfer distance ( transform ) Certzéiinly. the distance (3.4) is not symmetric. In
the same work [87], Pompeiu defined the mutual distance (l 'écart inutuel) of the two

sets as the sum

our. Y) = (ax. Y) + d(Y. X) (3.5)

The Pompeiu distance (3.5) was modified by Felix Hausdorff in his classical 1914
book Grundziige der Mengenlehre 2 [46] (republished and translated from German
multiple times; for an English translation of the 3-rd edition, see. for instance [47]).
Hausdorff deﬁned what is now the very popular Hausdorﬂ distance by replacing the

sum in (3.5) by the suprenuun:

H(X,Y) = sup{d(X,Y),d(Y,X)} (3.6)

The supremum in Equation (3.4) makes the écart distance sensitive to outliers.

This can be alleviated by replacing the suprenmm with the average [32]:

1
dap(X, Y) = '7' /X d(.c, Y)d.L' (3.7)

Accordingly, the average Hausdorﬁ distance can now be defined in a similar way

to the regular Hausdorff distance (3.6):

Hm)(X, Y) = sup{dap(X, Y). (la-MY, X)} (3.8)

We shall use the term Hausdorff-type metrics for any of the distances (3.4) (3.5).

 

lpit, chasm.
2Basics of Set Theory.

41

(3.6), (3.7) and (3.8). If D(X,Y) is any of them, then an 01.)ti1nality criterion for

registering two curves 7 and 6 can be defined by

E,,,-(r) = D(T(7).6) (3.9)

One issue with Hausdorff-type metrics is the complexity. For discrete curves of
about the same size. the complexity is quadratic in the number of points. This
complexity is prohibitive for an iterative minimization of (3.9), but there are ways to
reduce the complexity to linear (at some cost) discussed in some detail in Section 3.3.

We present some known registration techniques in Sections 3.2 and 3.3.

3.2 Registration of parameterized curves

h-llathematically, if 7(9) and 6(t) are two curves, with a priori different parameters 3
and t, then the point correspondence problem consists of finding a re-parametrization
s = h(t) in some optimal way, i.e., establishing a meaningful correspondence between
the points of the two curves. There are several proposed solutions for ﬁnding the
correspondences (Section 3.2.2), each with its strengths and weaknesses, and with its
own optimality metric, but. there is no dominant method, nor a demonstrable best
method over the others.

Only after introducing the re—parametrization h does it make sense to define the

squared error energy:
E(T, h) W/IT ((th — )6(t)|2dl. (3.10)
The discrete version of this is

E(T h) =§Z|r( "(i/h 4,)? (3.11)

42

An invariant. similarity measure3 between the two curves 7 and 6 can now be
defined by
‘2 - . r _
(l (7,6) = nun E(F, h). (3.12)

[.12.

3.2.1 Known correspondences

Landmark rer istration is one of the sim )lest cases of registration of )arametric curves,
0 .

because it takes )lace under the follLHVIIIO' assum )tions:
I o I
o All curves to be registered have the same number of points.

0 The point. correspomlences are known: all curves are labeled so that for each

t} tt

index i, the i 1’ point on one curve is homologous with the i l point on any

other curve, that is h(i) = i.

The squared error energy (Equation (3.11)) only depends on T now. In the case of
scaling and affine transforms, it. is possible to find the parameters of the minimizing

T algebraically (Propositions 3.2.1 and 3.2.2 below).

Scaling registration and the Procrustes distance

This problem was first. solved by Horn [52, 53] for points in 3—D, and then revived
by other authors (e.g., Cootes [21], Dryden and Mardia [30]) in the context of plane
Curve registration and model learning. We present here a 2-D version using complex
numbers, which is one instance when working over the field C of complex numbers
Simplifies computations.

Let z = (21, . . . ,2”) and w = (nil, . . . ,wn) be two (discrete, complex) curves with
the same number of points. These are vectors in C” so the usual L2 norm in C" can
be used to define a distance l‘)etween the shapes :5 and w which non-invariant under

Scaling transform (so a fortiori non-invariant. under more general transforms):

\

JThis measure is not a metric. per se. because it lacks symmetry. but. it can be symmetrized.

43

 

d(z. a") = |:: —1r|2 (3.13)

Equation (2.1) and the arguments leading to it show how one can turn E into
an invariant metric between curves. i.e., into a distance on shapes. \Ve have seen in

Section 2.1.1 that a similarity transformation is of the form

T(w)=)\-w+r VwEC

for some unknown complex parameters /\ and T. we must ﬁnd these pz-n'ameters

by minimizing the squared PI‘I’or energy;

rv 2
I(~.-) — u'jl

 

quur(/\, T) = Z

 

_ ~J
.7
= Z(/\Zj+T—u.‘j)-(/\Zj+7‘—u‘j). (3.14)
J

Proposition 3.2.1 If '1‘(z) = /\z + T 7nini7nizes the squared error energy ( Equation

3-14), then its parameters are given by

(21.7,,2, . t.
/\ = I’ll2) (3.10)
T = —/\20 + "(1'0 (3.16)

where (-, ) denotes the complex inner product on C", 30 and (1'0 are the centroids of
Z and 10, respectively. and z, = (:51 — 20, . . . zn — :0) and w, 2 (ml — (1'0. . . . u'n —— u'O)
are the zero-mean curves.

The minimum value of the energy is

. I
4mm _ l'U’lZ _ ii" ,z (3.17)

’ZJL' _

44

Proof: Straightforward. See Appendix A.1
Cl
So far we have a non-inmu'iant metric. E between curves (Equation (3.13)), which
by Equation (2.1) and Equation (3.17) can be turned into an invariant metric by

defining

/ 2
~ o _ y, 2 |<Il"':: >|
(...u)—|ul —————-

d '3’]?

inc

Obviously this is not symmetric, but it becomes symmetric if we normalize the

vectors 3 and u.‘ to have norm one. in addition to zero mean.

Deﬁnition 3.2.1 The Procrustes distance between two ("(071168 2.11! E C” is

(3.18)

By Proposition 3.2.1, the Procrustes distance between curves :2 and w is the
squared error between the registered curves. each normalized to have zero mean and

size one prior to registration:

Z — :0 Il‘ - “’0 2

d(z.u‘) = min T(—— —
T I2 — 30' la: — ”'(ll

Affine registration

If we try to minimize the squared error (Equation (3.14) by the more general afﬁne
transforms T(X) = A - X + T, it is still possible to avoid iterative minimization
Inethods, and ﬁnd a closed form solution. However. since the afﬁne transformations no
lOnger have a representation in terms of complex numbers, the formulas are somewhat
more complicated than in the case of scaling transforms.

Let us first introduce some notation. Suppose the curves are now '37 = ('71. . . . em)

and 6 = ((51, . . . (in). Let 70 and (50 be their respective means. and let A," : (7.1 _

45

”7'0 . . 4n —- 70) and (5, = ((51 -— (50 . . . (in, — 60) be the. zero-mean cm‘ves. Denote the

coordinates of each point. A,” by (”i‘ 1:), and let u, = (a,1 . . . 215,) and v, 2 (1’1 . . .115).

Let 2’ and in, be the analogues of a' and i". res )ectivelv. for (5,. Last, let a. b, c and
o I .

d be the coefficients of the matrix A:

Proposition 3.2.2 With the above notations, the eocﬁicients a, b, c, d and 7' of the
aﬁine transform T(X) = A ~ X + 7' that minimizes the squared error energy (3.11)

are given by

a b lull? (al. 2,!) (2’, 2’) (2’, 3')

C d (“Is Pl) lul|2 (z’ , 22’) (10’, vi)

Proof: See Appendix A2.

Non-linear registration: Thin Plate Splines

The landmark registration under the assumption of known correspondences can be
viewed as a 2D interpolation problem: we have two sets of matched landmarks
'71,...771 E R2 and 61...(5n E R2, and we would like to find a transformation
T : R2 -—> R2 which satisfies the constraints T(e/lj) = 61-.

Without additional constraints, there is always a trivial solution:

6.

,1) if X = 7i.

T(X) =

X otherwise.

46

This solution maps each 7,- onto the cm'resporulii1g (5,. but leaves all the other
points unchanged. This solution. however. is not smooth. We must ﬁnd a transfor—
mation that satisﬁes additional smoothness constraints. A thin plate spline solution
(TPS) [8] satisfies such a requirement.

The intuition behind TPS is as follows. Imagine the plane as a rubber sheet, with a

-- ’1d' 'Ifr ' -=-111 "11"" +1.1. «—
square gI‘IC rawn on it. a. point .. 1s p11 €( onto anot 1er o( ation .. in t it p ane, an
elastic deformation occurs: the lines of the grid bend in some optimal way, trying to

follow the translation 3 ——> 3’. From physical considerations. if T(X) = (a(X ), 'U(X))

is a (non-linear) deformation of the plane, one defines the bending energy of T by

E(’1“) = // AZT (1X. (3.19)

The natural mathen'iatical forn'mlation of the 2D interpolation problem is then to
ﬁnd a transformation T that minimizes the bending energy (Equation (2.12)), subject
to the constraints T(yi) = 6,:
To minimize the bending energy, one must compute the gradient in Equation (2.12)

and solve the partial differential equation that results from setting the gradient equal

to zero. The fundamental solution of this equation is the radial basis function

B(X) = —|X|210g|X|2 (3.20)

which happens to satisfy B (O) = 0.

Clearly, the functions

still minimize the bending energy, and satisfy B ,(e/i) = 0, i = 1,. . . ,7).
To ﬁnd the solution T of the interpolation problem, we seek T as a linear combi-

nation of the basis functions B, plus an afﬁne part:

47

—A.-1V._j

 

T(X) = 2 us, - 8,:(X) + A - X + r (3.21)

F

j and the coefﬁcients a. = ((1,,l,,-.a.y)r, b = (b_,;,by)r and

The unknown weights u',

T

7' = (7";_1;,Ty) of the affine component can be found by solving the linear system
obtained by imposing the constraints T(e/i) = 6" in Equation (3.21). In matrix form,

this system can be written as:

 

 

 

 

 

 

(81(71) "' Bn(":’1) Al? 1 \ {wl\ ((5?
Bl (7'2) ’ ' ' 371(72) ”I"; 1 ”(U2 6;
81(771) ' ' ° Bn (”r'n) “/77; 1 wn — 6;];

'71 ' ' ' 7n 0 0 A 0
K 1 . . . 1 0 0 / (7T) K 0 /

Figure 3.1 shows an example of TPS registration. In Figure 3.1 (a), the red points
are the source points, and the blue points are the target points. We must. deform
smoothly the entire plane, so that the source points overlap exactly over the target.

points. The resulting deformation is shown in Figure 3.1 (b).

3.2.2 Unknown correspondences

Registration of parametric curves with known correspondences occurs e. g., in medical
imaging, where a human expert manually annotates certain anatomical structures
which should be matched across several images of the same organ (e.g., the brain).
These homologous structures are called landmarks (hence the term "landmark regis-
tration” ).

However, if the contours are extracted antomatically (e.g., edge (‘letectiom followed
by morphological transforms such as erosion or dilation [98]), the point correspon-

dences across images is not known. In this case, the real effort in shape registra—

48

 

 

 

 

(a)

Figure 3.1: (a) Red: source points. Blue: intended target. (b) Red: the source points from
(a) transformed by a TPS. They overlap exactly over the target points (blue points in (a)).

tion is in solving the point correspondence problem, either separately, or in con-
junction with the registration proper. The existing proposed solutions for registra-
tion and model computation in the case of unknown correspondences are inherently
more difﬁcult, and more elaborated than the algorithms for landmark registration
[85, 79, 69, 102, 26, 19, 113, 103, 6, 89, 34, 95, 7, 114]. It is therefore not feasible
to present here all the mathematics behind these methods. Nevertheless, we shall

summarize some of the main ideas.

One class of algorithms that register curves under unknown correspondences makes
use of a match matrix M = (Mij) [89, 34]. In the ideal case,

1 7,: corresponds to 0]

Mi j =
0 otherwise.

However, to establish correspondences, M can be allowed to have continuous coefﬁ-

49

cients between 0 and 1.

Rangarajan et al. [89] introduced the soﬂassign algorithm to register two curves
7'1 ”.77)., and (51...67,6. which have different. number of points 727 aé 72.6, and in
which the correspondences are. not. known. They generalize the Procrustes metric

(Section 4.1). which makes use of known corresporulences. to a. metric which handles

the unknown corresporulences by means of the match matrix ill:

 

”'7 "‘5 \/— ‘._ -
s(",w —/i~,) (0] .110) 2
E ”,0, 2 ll-- I l ’ — —Ii’9 —— —
(1 . T s) E E l 1} 0",, T () $05 a

i=1j=1

When correspondences are known. ya). 07- and “(5’ (IO- are the respective means and
covariances of 7 and 6. In the unknown correspondences case, these are computed by

making use of the match matrix:

 

72.5
:1—1 "’7 ijl All] '1
“’7 — 716
:1—172'7'ijl M1]
n5
as, :2 Z n7 2 MUM, n7]
:1 j=1

and similarly for 6.

The curves are then registered iteratively, in a way similar to the EM algorithm:
given an estimate of the parameters, one registers 7 and 6 (using the Procrustes
method - Section 4.1). This changes He: 07, “6 and 06' The parameters are re-
estimated using the new values of the means and variances, and the procedure is

repeated until convergence.

Another class of algorithms tries to establish the correspondences based on the
local geometric properties of the two curves [19, 95, 26, 6]. The idea is that. homologous

points on the two curves must have similar shape contests. To find the points with

50

similar shape contexts, one deﬁnes an energy functional based on the local descriptors
of the two curves.
For instance, when the curvature is used to describe the local geometry of two

parameterized curves 7(9) and (5 (t), one can deﬁne and minimize the curvature energy

E(h) = /|h~,(s) — A‘6(h(s))|2ds

where Ira/(s) and 156“) are the respective curvatures of the two curves, and h. is the
(unknown) diffeomorphism which provides the correspondences: t = [2(3) Alterna-
tive local shape descriptors are used in [6, 102].

Yet another class of techniques for establishing correspondences is the class of
iterative closest point (ICP) algorithms [7, 114]. The idea is to match each point
on the source curve ’7 to the closest point on the target curve 6. This establishes
(non-optimal) correspondences, and a registration (with known correspondences) can
be performed. Then, the closest points are computed again to re-establish correspon-
dences, and a new registration is performed. This two—step iteration is repeated until

convergence.

3.3 Distance map registration

We have seen that with parametric curves and with the squared error measure, the
registration is complicated by the point correspondence problem: we must find an
optimal way of linking the different parametrizations along the curves to be registered.
In the discrete case, this means ﬁnding the homologous points on the two curves.
Using the distance map representation of a curve (Section 2.4.1, and also Equation
(3.3)) and the directed average Hausdorff similarity measure (3.7) the point correspon-
dence problem can be totally averted (at some cost, as we shall see). To fix the ideas,

suppose '7 and 6 are two plane curves that we want to register. Let T(-) = T(G, )

be a 2D transformation deﬁned by a set of parameters 8 = ((91 . . . 6A.). The set of
parameters can be any one of the sets of parameters described in Section 2.1.1, or

more general. We define an energy function (chamfer energy)

E(T) : (falv(r l "I

1( ) <5)
1 ' .. s i s ‘
— [1(2)] A1(7)d(T(,(. )).())(l. (3.22)

 

In this equation, d(-,6) is the distance map representation of the target curve
6, and |T(7)| is the length of 7(:) Division by the length [T(y)| penalizes the
transforms that tend to map everything onto a single point. It should be noted that

Borgefors [9] reports better results with the root mean error.

 

 

 

l — 1 "t s 2 s
15(7) — (/ To» /T(,/)d<:r(,t)),6> d.

Since we only apply T to the source curve "f, keeping 6 ﬁxed, the chamfer trans-
form of 6 can be computed once, before the minimization proper, using for instance
Borgefors’ algorithm [9], or the level method (Section 2.4.1). This reduces the time
complexity for computing E to linear (in the number of points on y). This trick only
works precisely because we do not have to compute the distance transform of the
moving curve T(y), because we are using the directed version of the chamfer metric
(3.7). This would not be possible with any of the symmetric versions (3.5), (3.6) or
(3.8) of the Hausdorff-type measures, as these entail computation of both d(-, 6) and
d(-,T(7)) (and T(y) varies).

Observe also, that the distance map energy (Equation (3.22)) depends on the
transform T, but does not depend on any unknown mapping h : ’y —> (5 that es-
tablishes point correspondences. (Compare with Equation (3.10)). The important

consequence of this is that the distance map registration does not require a solution

to the nontrivial problem of point correspondences. All that needs to be done now is

to compute the transformation T that registers *) and 6, as

7

mm = arg min E(1).
_ ,1,

Since T is uniquely identiﬁed with its parameter vector 9. the energy (3.22) is a
function of finitely many variables, and so it can be minimized using well established,
off-the—shelf algorithms such as the. Nelder—Mead simplex method [66, 81, 88], or - our
favorite - Powell’s conjugate directions method [88].

Figure 3.2 shows the results of distance map registration. In part. (a), the unregis-
tered source curve '7 (blue) and the target. curve 6 (red) are shown. The initial value
of the energy (Equation (3.22)) is E = 2584. l\"Iinimization using scaling transforms
yields an energy of E = 1563. The results of the registration are shown to the right,
in Figure 3.2 (b).

It is interesting to see what happens if we use e.g., aﬁine transforms instead of
scaling transforms. Since the afﬁne transforms allow for more variation in shape
(e.g., shear), the two curves can come closer to each other, so it is expected that
the minimum in Equation (3.22) be smaller in the case of afﬁne transforms. This is
indeed the case in practice. If we register the same two curves from Figure 3.2 using
afﬁne transforms, we get a value E = 989, signiﬁcantly lower than E = 1563 in the
case of scaling transforms. The affine counterpart of Figures 3.2 (a) and (b) is shown
in Figures 3.2 (c) and (d). The two curves come much closer to each other (Figure
3.2 (d)) than in the case of scaling transforms (Figure 3.2 (b)), but the results are

hardly what was intended. This shows the importance of two factors:

0 choice of the appropriate class of transformations.

0 importance of the non-symmetry of the chamfer metric (3.7) and (3.22). we

have to use the non-symmetric version of the distance between T (7) and 6 to

53

reduce the time complexity, but the price we pay for this is the possibility of
artiﬁcially good, but disproportionate alignments of the source onto the target,

like the on in Figure 3.2 (b).

   

 

 

(C) ((1)

Figure 3.2: (a),(b): Distance map registration with scaling transforms. The measure of ﬁt
is E = 1563. (c), ((1): Distance map registration with affine transforms. Measure of ﬁt:
E = 989. Left column: unregistered. Right: registered. Blue: source. Red: source.

3.4 Semiaxes registration

We describe in this section a novel registration algorithm with unknown correspon-

dences, most suitable for objects with one axis longer than the other. We call this

semiaxes registration.

Let ‘y = ((71, . . . ,yn) and 6 = ((61, . . . ,6m) be two sets of points (possibly, but

54

not. Ilecessarily, discretized continucms curves). If each of '7 and 6 have roughly the
511a1)e of an ellipse. then it would make sense to try to align the two sets of points by

aliglling the underlying ellipses. Here is the algorithm:

Normalization. F irst. we normalize the two sets to have zero mean and size

one. The normalized sets of points are:

\)
l
~>| \>I
on O)
I I
04 cal

Rotation. Compute the unit eigenvectors (e1.(72) of the. 2 x 2 covariance
matrix of the set ’7, with corresponding eigenvalues A1 > A2. Similarly for 6 , let
(f1, f2) be the unit eigenvectors, with corresponding eigenvalues #1 > )72. We
choose the signs of the eigenvectors so that both the (e1, eg) and the (f1, f2)
frames have positive orientaticm. In general, computation of eigenvalues and
eigenvectors is non-trivial, but powerful iterative algorithms exist [88]. However,
in dimension 2 it is trivial to derive closed form formulas for both the eigenvalues
and the eigenvectors, due to the fact that the characteristic polynomial of the
2 X 2 covariance matrix is quadratic. Thus, computation of the eigenvectors
and the eigenvalues is done in constant time, if the covariance matrix is given.
Computation of the covariance matrix is obviously linear in the number of points

in the set.

If the eigenvectors play the role of semiaxes, then we must align them, so we
must compute the rotation (around the origin, as y and 6 have zero mean now),
which brings e1 over f1 and e2 over f2. This rotation exists, because the two

frames are orthonormal and have the same (positive) orientation.

Of course, when computing the eigenvectors of 7, say. there are two choices
in choosing the positive-orientation eigenvector frame: :l:(el,e9). Each of

these frames must be rotated over each of i(f1. f2), resulting in four rotations

55

[71,. . . R4. We have to choose the best one. To define “best”, we can use any
metric between sets of points in R2 that does not assume point correspondences

(e.g., the Hausdorff distance (3.6))

The transform T that aligns the source set. 7' onto the target set 6 is a composition
of translation, scale and rotation, possibly also composed with a. symmetry. It is,

therefore, a scaling transform (possibly with negative scale).

1. Advantages:

0 does not assume correspondences, nor does it try to ﬁnd them.

0 speed and simplicity: no minimization is required.

0 can be combined with distance map registration (Section 3.3), by providing
a very good initialization to the distance map minimization procedure.

2. Limitations:

0 ambiguity (ill-posedness) increases when registering circular sets (i.e., when
the eigenvalues are approximately equal). This would be the case, for
instance, when trying to align two circular saw blades: even if they were

identical, the teeth might be offset by a random amount.

0 there is no obvious metric with respect. to which the aligning transform T

is optimal.

 

(C)

Figure 3.3: Semiaxes alignment of two contours. (a) and (b): Two contours and
their corresponding eigenvectors drawn at their respective centers. Each eigenvector
is proportional to its own eigenvalue. (c) Alignment. The blue contour (source) was
brought over the red contour (target), by aligning the eigenvectors.

57

3.5 Summary

Here are the main points of this chapter:

0 Given two curves '7 and 6 we would like to measure their shape similarity. To this
end, we need a metric (12”,, = dinviﬁf‘ 6) invariant under e. g., scaling transforms.

It is often easier to deﬁne a rum-invariant metric ﬁrst (1 = d(7,6), and turn it

into an invariant metric by minimizing the energy functional deﬁned by

E,,.;<T> = («Tc/)6)

where T ranges over a set of parametric transformations (Section 2.1.1), and E
is an optimality criterion (such as the squared error energy (Equation (3.10)), or
the distance map energy (Equation (3.22))). Minimizing the energy functional

E (T) is called curve registration. The invariant shape metric can then be deﬁned

by

(l

in c (

”7.6) = 11%11 EA/‘(5(T)
O Curves extracted from digital images can be represented

— explicitly, as parametric curves: 7' = 7(t) (continuous case), or as n-
dimensional complex vectors 2 = (:1, . . . , zn) E C”. Registration of para-
metric curves depends on the point correspondences between the curves.

These can be

* known (Section 3.2.1). Closed form formulas exist in the case of scaling
and afﬁne registration (Propositions 3.2.1 and 3.2.2) as well as in the

case of TPS registration (Equation (3.21)).

58

:1: unknown (Section 3.2.2). Registration is much more difﬁcult than in

the case of known point corresporidences.
— implicitly. as the zero set of their distance map.

a: distance map registratirm avoids establishing point correspon<lences,
but the lack of symmetry may have undesirable consequences (Section
3.3). The minimization of the optimality criterion (Equation (3.22)) is
done iteratively, each iteration having complexity linear in the number
of points in the source set.

* Semiaxes registration is an efﬁcient, albeit suboptimal, registration
method that also does not assume correspondences. The transform
that registers the two sets of points can be computed in closed form.
This is one contribution of our thesis and will be our method of choice

in our experiments.

59

Chapter 4

Model Learning

Our ﬁrst. step in building a segmentation system with shape memory is to construct
a model of the targeted object. To this end we assume the boundaries of the object
of interest were manually outlined by a human expert in several images containing
various instances of the object. These curves could be, for instance, contours of differ-
ent kidneys (or any other organ, for that matter) in ultrasound images from different

patients. Throughout this chapter, the training set will be denoted by {71, . . . (771}.

Learning a model from a. set of training samples amounts, in general, to computing
statistics of the training set relevant for the problem at hand. For instance, one of the
simplest cases is to compute some sort of mean of the training samples, which could
then serve as a template that could be fit. to an image in search for a certain object.
However, computing the mean is not always straightforward, for the very same reasons
outlined in the beginning of Section 3.1. In this chapter we show how to compute
a mean shape when the notion of mean makes sense at least mathematically, in the
case of known point correspondences (Section 4.1), then we show how we can extend

the notion of shape model to the case of unknown point correspoiidences.

Before we proceed, we must note that the mean is in some sense a minimalistic

shape model, can only be used when it describes sufﬁciently well the variation of

60

the training samples. For instance, one may be interested. besides the mean, in the
modes of variation. of the shape distribution. In such cases more sophisticated models
are required, such as a good approximation of the very distrilnition of the training

samples.

4.1 The Procrustes Model

Let us assume known point correspondences, in the most favorable situation, and see
how we can compute a mean shape, and eventually a probability distribution of the
training shapes. The assumption of known point correspondences entails, of course,
that all ’7,- have the same number of points, so at least mathematically, computation

of the mean (:3 '71:) / 71 makes perfect sense.

Recall, however, from Section 2.1 that. when comparing shapes we must factor
out the extrinsic differences, which means mathematically that we must register the
curves by ﬁnding a transform T optimal in some well deﬁned sense. In the case of
known correspondences, it. is natural to use the squared error energy (L2 distance,
Equation (3.14)) as the optimality criterion for T. By contrast with the problem
of two-curve registration (Section 3.2.1), we are now faced with the more difﬁcult
problem of simultaneous registration of all the training samples (71, . . . pm), termed
generalized Procrustes analysis by Dryden and Mardia [30]. We shall assume the

registration is done using scaling transforms:

T(w)=)\u)+7' VwEC

Following [30], the problem of simultaneous registration can be formulated by

seeking scaling transforms (T1, . . . ,Tn) (one for each 7,) which minimize

61

9

 

Em ..... 71;.) = Z '1‘,’(:.‘,-)- rye/pl“ (4.1)
1i<j
subject to the constraint
I‘Tl = 1
where 2f = ZT,(7,)/n is the mean of the transformed training samples. By

minimizing the energy (4.1). we effectively bring all training samples together in a
global way. The constraint |A,[ = 1 is necessary to avoid the trivial solution T,- = 0 Vi.
Once the optimal transforms (T1. . . . ,Tn) have. been found, then one can compute

the mean shape as

r, = 1 Z Tif?”il <42)

71,

Remark 4.1.1 It turns out ( Result 5.2 in [30]) that the arithmetic mean 7 of the
registered samples is the shape closest to all the training shapes in the sense of the

invariant Procrustes metric (3.18). That is, if d(-, ) is the Procrustes metric, then

,u = arglnliin ZdQ(A/Ij,n) (4.3)
n =1

In fact, Dryden and Mardia take Equation (4.3) as the deﬁnition of the full Pro-
crustes mean. Equation (4.3) does not involve component-wise operations on the
training samples, so it allows generalization to the case of unknown point correspon-

dences.

Unlike the case of two-curve scaling registration (ordinary Proenistes analysis),
there is no closed form expression for the optimal set of transforms Ti' An iterative
algorithm is presented in [30]. Cootes and Taylor [22] propose a more practical one:

Select one training sample as reference curve. and register by similarity transforms

62

all remaining training samples to the reference curve (Proposition 3.2.1). Compute
the mean of the aligned curves and repeat. the procedure with the newly computed
mean as reference curve, until convergence.

Yet another popular (but sub-optimal) Procrustes algorithm is as follows:
0 Normalize each training sample 7.,- to have zero mean and unit size.

0 Pick one of them, say A71 as reference curve and rotate each of the remaining '7,

so as to minimize the squared error [7,,- — ’71I.

(a) (b)

Figure 4.1: Procrustes model. (a) Training samples (blue) with landmarks (red). (b)
Registered training samples (blue) and their mean (purple).

Having registered the training samples in some globally optimal way, we can com-
pute not only the mean, but we can actually estimate the distribution of the given
shapes by using any of the standard density estimation methods such as PCA (termed
point distribution model by Cootes and Taylor in [22]), or kernel-based methods such

as Parzen windows [33].

4.2 Average Chamfer Transform Model

Both the Procrustes and the Point Distribution models require known point corre-

spondences. Our main application is directed at border detection in kidney ultrasound

63

images. Empirically, our experience is that there are. not enough anatomical kidney
landmarks identiﬁable in ultrasoimd images, that are consistent across a signiﬁcant
number of training samples. Always, when building a shape model there has to be
some statistical regularity of the training shapes, or else, the model would be too
coarse.

We found that it is more useful to build a model that only registers the kidney
training samples based on their ()V'QI‘tlll shape, without using any point correspon-
dences, rather than try to look for point correspondences when there are none.

1

Speciﬁcally, let (71, . . . , 77,) be a set of training samples , e.g., contours manually

drawn by a human expert. Our model will provide

0 a base curve - some sort of average shape,

0 a way to measure how similar a test contour 6 is to all of the training samples,

simultaneously.

Let d(-, -) be a similarity measure (distance) between shapes. we have seen (Sec-
tion 2.1) that either d must be invariant to scaling transformations to begin with (e. g.,
the curvature distance in Section 2.5), or, if (1 is not invariant, then the shapes must
be registered before measuring the distance between them (e.g., chamfer transform,
Hausdorff distance, etc.).

In the case of an invariant distance d, we can build a model as follows. If 6 is a
test shape, and 7,- is any of the training samples, then d((5, 71,-) measures the similarity
between 6 and that particular 7,. Naturally, then, a measure of similarity of 6 against
not. only a single training shape, but, against all training shapes, is the average of these

distances after all training shapes:

13(5) =,

:5 lt—*
11.
A
an
\)
~
v

 

1For reasons outlined in the begirming of Section 2.3. we pre-process these contours by computing
a cubic interpolation spline on the control points marked by the human expert.

64

As observed in Remark 4.1.1, we can take as base curve the shape closest (in the

sense of E) to all the training samples:

n = arg méin E((S)

If the metric d is not invariant, we proceed in the same manner, except that. we

register the shapes that must be compared.

 

(a) (b)

Figure 4.2: Semiaxes registration of training samples. It does not assume or require
known point correspondences. (a) Unregistered training shapes (green), and randomly
chosen target (blue). (b) Registered training shapes (red).

To ﬁx the ideas, let

Cry-Z- (p) = 1112an — 72(tll

be the chamfer transform of 72-, and let d be the induced distance:

d(6,7,-) =/6C»7i(6(s))ds
65

We randon'ily designate one training sample as target, and perform a semiares
registration (Section 3.4) of any of the remaining training samples onto the target. For
simplicity, we shall denote the registered training samples also by 7,7, as the originals.
We build the chamfer transform CA”: on each of the registered training shapes, and

compute the average:

_. 1 .
C = E Z (.77,
t-

For each point (.1:,y), C(.I:,y) represents the average distance from (any) to the
training shapes. If 6 is any curve (spline or otherwise), deﬁne the distance from 6 to

the training set by

6(5) = 65 Foam: (4.4)

The quantity 5(6) measures how close a test shape 6 is to all the training samples.
We seek a curve a with the same topology as the training samples, which among all

such curves, is closest on average to all the training samples. That is, we seek a curve

a = arg 1116in5(6) (4.5)

2 60, then Ini115E—(6) would

Now, if 5 were the chamfer transform of some curve
be zero, and would be achieved for u = 60. In general, however, the average distance
map E'- is not the chamfer transform of any curve, but it still has the shape of a
chamfer. See Figure 4.3. We can still compute a curve a as in Equation (4.5) by
tracking the bottom of 5.

So far we have (a) a functional 5 - the average chamfer transform — which measures

for each curve 6 how close that curve is to all the training samples (Equation (4.4),

and (b) curve a closest to all the training samples, in the sense of (4.5).

 

2Non-degenerate, simple and closed.

66

Note, however, that E measures spatial proximity, rather than shape similarity,
because 5 is not invariant under similarity tramsforms. Loosely speaking. this requires
that we must ﬁrst register a curve 6 to the training samples. before computing 67(6).
The most natural way to make sense of this non-rigorous statement is to register 6
to the base curve ,u.. Thus, if we denote by Tm, the transform that aligns 6 onto the
base curve u then we deﬁne the global similarity between 6 and the training samples

as

Ea) = 505,46» (4.6)

In summary, after computing the average chamfer transform C of all the registered
training samples, we compute the base curve a by tracing the bottom of C. Then,
given a test curve 6, we register 6 to ,u. (by a transform T5 In)? and only then compute

the similarity E(T6‘#'(6))

 

Figure 4.3: Two views of the average chamfer transform—E, and the base curve a
(purple). The green curve is the trace of the bottom of C, and p. is its projection
onto the xy plane.

68

Part II

Shape Detection

69

Chapter 5

Active Contours

This thesis is, in essence, about. border detection using active contours, so we re-
view the literature in some detail in this chapter. Active contours have been used
extensively in many computer vision problems, particularly in boundary detection
and motion tracking, with relative success, especially in medical imaging. Almost
two decades after the introduction of the concept by Kass, Witkin and Terzopoulos
[63], a great variety of different. “species” of snakes have been developed. One can

distinguish two major categories: variational models, and geometric models.

5. 1 Variational Models

Chronologically, the variational models occurred ﬁrst [(53], and are based on the ob-

1 7 to be aligned along the edges in an image, then

servation that if we want a curve
7 must accumulate the largest amount. of gradient. In other words, 7 must. minimize

an image energy defined as:

 

l
Eimgh) = _/O IVI ([5 (5.1)

 

1Throughout this chapter, 7 is a closed curve.

70

or, more generally, for any decreasing function g = {1(E).

l
Eimgfﬂr'):/0 g(|V[|)(/S (5.2)

If smooth minima are sought. then an extra. energy term must be added into the

2

equation , to penalize for the. formation of discontinuities:

‘1
I [I r
Ema): /0 I7 |2+li l2ds (5.3)

The original Kass, \N-fitkin and Terzopoulos active contour model includes a third
energy term - the external energy Bert» which can incorporate e.g., prior knowledge
about the location or the shape of the boundaries of interest. One then seeks minima

of an energy functional of the form

1
E(i’)=a- /0 9(IVI|)ds
W—d
Eimg

1
+ ,3 -/0 [7"]2 + I7”|2ds (5'4)

\___\,.___/
Eint

+ Eeatt

Minimization of the energy (5.4) is done through variational principles which
entails that local minima are found. Consequently, initialization near the boundaries
of interest is required. The minima (and maxima, and the saddle points) of the energy
(5.4) are critical points of E, so they are curves which satisfy the equation (1E/dt = 0,
where the derivative of E is computed as follows. Let ’7 : [0, l] —> R2 be an arbitrary

snake, and let “it be a l parameter deformation of '7'. That is,

 

2This technique of adding an extra term. for constraining the solutions is called Tykhonov regu-
larization. See for instance http://en.wikipedia.org/wiki/Ridge_regression and references therein.

71

A; ; [or] x [0.1] —’ R2. (1.5) —> ayes) 2: €,(s-)

The derivative of E at “t is then deﬁned as

I

 

(113M _ (“em-a)
(II, I — (It [1:0
_ 5.5
_t)E(i(t..s-)){ ( )
_ at 1:0

In general, computation of the derivative (5.5) is not always straightforward, de-
pending on the deﬁnition of E. The resulting system of equations dE/dt = O is called
the Euler-Lagrange equations of E, and, in our case, it is a 2-equation system of
ordinary differential equations (ODE), because the variable 7 = (.r, y) is a function of
only one real variable 3. For the explicit derivation of the Euler-Lagrange equations
in the case of snakes and a numerical implementation of a. solution, see the original
paperl63l

This model is known to have a number of shortcomings. some of which are:

0 The snake energy (5.4) depends on the paraimrtrization of 7. One could change
the energy without. changing the shape of the snake. simply by changing the

parametrization.

0 Numerical stability: as the snake evolves, the points on it. can slide along the
snake coming too close together in some regions (e.g. being “attracted” by an
energy pit), while leaving the rest of the snake sparse. For numerical stabil-
ity, the points on the snake must be maintained relatively equidistant, which

requires meticulous point insertions and deletions at each snake iteration.

 

3111 general, the Euler-Lagrange equations is are partial differential equations (PDE).

72

 

o In homogeneous regions (i.e. where V] I z 0). the snake does not move. as the

forces acting upon it are (close to) zero.

0 The snake does not change topology. Although in general this is viewed as a
negative feature. it can become. useful when detecting objects whose topology

is known a priori. such as in medical imaging. See Section 5.3 for an example.

The active contour model underwent major inmrovements (themselves variational

models) e.g., in [20, 2. 110, 112. 73]. trying to address the above, and other issues.

5.2 Geometric Models

The geometric models were introduced independently in [14] and [71] based on front
propagation theory [82, 83] and were a revolutionary way of viewing active contours.
These authors proposed a non-variational approach, namely mean curvature motion
along the normal direction. More precisely, let. 71(3) 2 7(t, s) be the evolving curve,
at moment t. Here 5 is the parameter along the curve and t. shows the dependence of
the curve on time. Let u be the inward unit normal along the curve. See Remark 2.4.1
for this choice of V. The requirement. that any point on "it move along the normal is

described mathematically by the following differential equation:

at? = F - 1/ (5.6)

where F : R2 ——> R is a real-valued function (speed). Note that Equation (5.6) does
not necessarily arise as an Euler-Lagrange equation of some energy functional, but
it is rather simply a geometric requirement. The evolution of ”y is controlled by a
judicious choice of the speed function F, based on the following observation: when F
is small, 7 changes very slowly, and when F is large, '7 changes rapidly. Though not

mandatory, a good (and well studied) choice for the speed is to take F a function on

73

the curvature A' of 7, i.e., F 2 EU.) The first and simplest example turns out to be

surprisingly important. and well understoml:

Example 5.2.1 F = c + k where c > 0 is a. real constant. The corresponding flow is

an = (c +1..) - u

The c~ 1! component moves the curve inward {c > 0) or outward (c < 0), whereas the
k. - 1/ component has been shown to shorten as well as smooth out 7, and to decrease

the total curvature of 7 [97].

Let us now discuss the implementation of geodesic models. As explained in Sec—
tion 2.2, there are two ways to represent the curve 7);: explicitly, as 7t = 77(5) =
(:rt(s),yt(s)), or implicitly, as the level set of some surface. in the ﬁrst case, the
implementation is straightforward: discretize 7, compute the (unit) normal 1/ at each
point and move the point in the direction of V. However, this naive implementation
is known to be very unstable numerically. Fortunately, a. far more elegant, powerful
and stable method arises if, at. any time t during the evolution, we view the active
contour as the zero level—set of an (also evolving) surface @tflﬁ, y) = (,r')(.r, y, t). That

is, 7(s, t) = (:z:(s, t), y(s, 1)) satisfies the equation

q§(.r,y, l) = 0 W 2 O (5.7)

We now have a surface cedar, y) whose zero-level set 7t must. evolve according to
(5.6). Naturally, the question is how must at evolve in order that 77 satisfy (5.6)?

By differentiating (5.7) with respect to t. we get

74

0 = llt<,o(.r. y, t)
= ()10 ° (5)).1.‘ + 0.1/0 - (ll?! + 0165
= ( V7101?) + (){0

2: F. ( Vo'. V) +0195 (by (56))

Vé
: - —- 7) I)
= F.|vo1+o,a (5.8)

In this last equality we have used the fact. that. since 7) is a level curve of (lot, its
unit normal 1/ is given by z/ = V@/ |V<D| (as the gradient of a surface is orthogonal to
the level curves of that surface). We have proven then that if we want. the zero-level
curve 7t of a time-dependent. surface at to move along its own normal with speed F

(eq. 5.6), then the surface itself should evolve according to the following equation:

Where the initial condition ([90 is some surface whose zero-level set is the initial
snake 70. A good candidate for this is the signed distance map of 70 (see Section 2.2,

particularly Remark 2.4.1 ).

Equation (5.9) falls in the category of Hamilton-Jacobi equations (H-J), i.e., it is

an equation of the form

(ltd? = H(0.I¢a 03/95)

75

for some function [I (called Hamiltonian). Specifically, in the case of (5.9)

 

H(u,v) = —F-

The existence of solutions of H-J equations, their properties, as well as stable
numerical schemes for solving them, are highly non-trivial mathematical subjects.
Thanks to the deep results of Lions and Crandall [‘23], Osher and Shu [84], Osher
and Sethian [83], Sethian [97] (and references therein), there are known numerically
stable schemes that can be used to solve H—J equations. Efficient algorithms such as
the fast marching method [96] and the narrow band method [1] have also been found,

and make the level—set snakes suitable for real-time applications.

5.2.1 Geodesic Active Contours

A milestone in the theory of active contours was the introduction of geodesic active
contours by Caselles, Kimmel and Sapiro [15], which provide a unifying view of the
variational and non-variational models, and are arguably the state-of—the-art in active
contours. Speciﬁcally, these authors set out to minimize the energy (5.4), less the
”'2

[’7 term and the Ee;z:t term:

1 1 12
E(7)=a-/O g(|VI|)ds+,l3-/0 [7] ds (5.11)

Next, it is shown via a clever application of the lVlaupertuis’ principle [31], that

minimization of the simpliﬁed energy ((5.11)) is equivalent to the minimization of the

following energy4:

l
Emmi) = [0 yawn-1711s (5.12)

The gain is that the Euler-Lagrange equations of Egac describe a motion in the

 

4This is not to say that the two energies are equal! They just have the same minima

76

normal direction of 7, as in the case of geometric models (Section 5.2):

an 2 g(/) - I: - V — ( Vg, V) - l/ (5.13)

where k is the curvature along 7. and as before, V is the unit inward normal to 7.
In order to make the snake evolve through regions of zero gradient. an extra. constant

is introduced in (5.13), inspired by the flow in Example 5.2.1:

0n = 9(1) . (c+ k) . u — < V9, 12)-V (514)

Clearly, this is, a variational model (at. least. 11p to Equation (5.13)), but. also a
particular case of geodesic model with speed F = g(I) - (c + k)+ < Vg , V >. Conse-
quently, as with any geodesic model, the evolution of 7' can be carried out numerically

using a level—set implementation. The surface evolution equation corresponding to

(5.14) is:

8m = g - <c+ k) - Iva + < vg. W) (5.15)

(45(0, 7) = (90(7) (5-16)

5

where 950 is the signed distance map of the initial snake. As for the function g,

a very popular choice is

1

VI :—
g(| I) 1+|VI|2

(5.17)

Geodesic active contours are known to work better in many cases than both their
ancestors, and are arguably the state of the art in active contours. Being implemented

using the level sets approach, they can seamlessly change topology. These snakes

 

r . . . ‘ . . . . . .
"Followmg the convention 11] Remark 2.4.1: negative outsrde the snake. posmve msnde.

77

were developed under the assumption of uniform boundaries. and if this condition is
violated, the snake will leak across the boundary. through the regions of low gradient.
As with any geometric model. one major difficulty is the stopping criterion. because
Equation (5.9) can be. evolved for arbitrarily large I, and as long as the speed F is not
absoluteely 0, the. snake will continue to move (albeit at. a. low speed) right through
the boundaries it is supposed to detect. A small, but non-zero speed will slow down
the snake evolution. but will not stop it completely. so if given a. chance. a geometric

model could theoretically evolve fm‘ever.

5.2.2 The Chan-Vese Model and the Mumford-Shah Model

No discussion on active contours and segmentation would be complete without the
Mumford-Shah model and one of its particular instances - the Chan-Vese model.

In a milestone paper [80] David Mumford and Jayant Shah introduce a very pow-
erful mathematical framework for image segmentation. To ﬁx the ideas, let I be the
image, deﬁned on a rectangle R. Informally, a segmentation of I is a piecewise con-
stant approxir‘nation of I. More formally, we seek a decomposition of R into a disjoint

union

RleuR2---ul?nu1‘

and a piecewise smooth function f defined also on R which approximates I in some
optimal sense, and which is nearly constant on each component Ri' F is simply the
union of the boundaries of the His. Knowing l‘ is equivalent. to knowing the partition.
As optimality criterion they propose the following energy functional (subsequently

termed the Mumford-Shah functional):

Ear) = u/fRU — o2 + f/R\F|Vf|2 + VIFI (5.18)

78

. 7...- .~~="‘

Note that both the approximation f and the partition F are unknown. The first
term requn‘es that f approxunatc the unage I as much as possible 111 L norm. The
second term forces the function f to be as flat as possible on each of the Ii’I-s. The
last term (the length of F) forces F to be as short as possible. in order to rule out
too fine a. partition. After all. in an extreme (but. useless) case. we could deﬁne a

one-pixel H,- at each pixel.

The. mathematics of this functional far exceeds the scope of this thesis. It is highly
non—trivial, using powerful techniques from the theory of partial differential equations,
and inv<,)lves computation of the first and second variation, and proving results on the

existence of the solutions.

There is one notable particular case of the general Mumford-Shah theory intro—
duced by Tony Chan and Luminita Vese in [16]. In the simplest case, the Chan-Vese
active contour model tries to segment. a bi-modal image 11.0 by trying to ﬁnd the two
gray levels Cl and (:2 as well as the boundary F between regions of different. intensities,

by minimizing the energy functional

Ec*v(6'1~,<72~ I“l'uol = f

2 2
[u (.17)—(: ] (l.r+/ |u (.r)—c I (l.I:+V-|F| (5.19)
WU“) o 1 “(1.) 0 2

(at

where V is a constant. Embedding F as the zero level curve of a surface (I), such that

int(l‘) = {.rlo(.r) <0}

e.rt(F) = {r|o(r) > 0}

then the Chan-Vese functional can be re—written in terms of c1.cr) and 05 as

79

ECVV(C1.(72,C,5[UU) = /Q]'UO — Cll2H(d>) + /Q[110 —- (f72]2(1— H((,b))

+V/ [VI—[(d))] (5.20)
Q
where H is the Heaviside function:
1 i .1: >0
H (r) = f
0 i f .r < 0

Taking partial derivatives with respect to c1, c2 and ab, we get the Euler-Lagrange

equations:

 

_ In Home)
Cl _ f9 H(¢) (5.21)
_.lou0(1- H(<b))
fﬂfl _ HUD) (5.22)
('9th = (5(a))(V - div(%l) — [no — cl]2 + [11.0 — c2|2) (5.23)

where 6 is the Dirac delta function.

Note that the Chan—Vese model is formulated entirely without any reference to
the gradient Vuo of the image. It has also been observed that it is a very particular
case of the Mumford-Shah model, in which the image is assumed piecewise constant,

with only two gray levels.

Segmentation of the image no amounts to ﬁnding the gray levels c1, c2, and
the surface (1) (whose zero level set is ultimately sought), solutions of the system of
Equations (5.21), (5.22) and (5.23). When the unknowns c1,c2 and (b have been

found, the segmented image is

80

u. = (:1 . 11(0) + (‘2 - (1 - 11(0))

i.e., the segmented image u. consists of pixels of intensity ('1 inside the detected bound-
ary F = {o(r) = 0} and pixels of intensity (:2 outside F.

From an implementation standpoint, observe that while it is trivial to compute
c1 and c2 (given (5')) as the average intensities over the interior and the. exterior of F,
respectively, the evolution Equation (5.23) poses major numerical problems, requiring
a discretization that produces a stable numerical scheme. We use the scheme proposed

7

in the original paper [16] 6. To our knowledge , the convergence of the numerical
scheme to the actual solution 95 has not. been shown.
Examples of Chan-Vese segmentation are shown in Section 8.2.1 (ultrasound) and

in Section 8.4 (synthetic noisy images).

5.3 Parametric Snakes

We discuss in this section a sub-category of variational active models, whose shape
is controlled by a. relatively small number of parameters. Perhaps the most popular
snakes in this sub—category are the B—spline snakes (or B—snakes) [27, 29, 5, 39. 11],
but there are other types as well. We term them all as parametric snakes.

There are important domains. such as ultrasmmd, which provide very irregular
visual support for the boundaries of the object of interest, while the true boundaries
are known a priori to be smooth. See Figure 5.1. Ultrasound is one of the toughest
domains in computer vision, and still presents great challenges to just about any
segmentation or border detection algorithm, due to the noisy, specular nature of the

ultrasound images and incomplete data, with misleading visual support. The edges

 

6We are deeply grateful to Luminita Vese for the help with the implementat ion of the (‘han-Vese
algorithm.
7And to Luminita Vese’s knowledge.

81

in such images are usually weak, spurious, and have non-uniform magnitude.

 

(C) ((0

Figure 5.1: Sample ultrasound images. (a) and (b) heart left ventricle (dog). (0) and
(d) kidney (human). The green curve in (b) and (d) is the true boundary (ground
truth) and represents the goal of any border detection algorithm. Observe in the clear
images (a) and (c), that the edges have non—uniform intensity, and are discontinuous.
In particular, in (b), the apex of the ventricle (left side of the green curve) is some
10 — 15 pixels off the visible boundary (misleading edges), passing through a region of
incomplete data. Similar considerations are valid for the kidney images (0) and ((1).

Under these extreme conditions, classical variational models usually follow the
data too Closely, and the detected boundaries do not meet the smoothness require-
ment. Geometric models pose additional problems. Their signature feature — the abil—
ity to seamlessly change topology - is detrimental in ultrasound, because the snake can

and will detect artifacts, muscles or other visually salient structures which are not in

82

fact part of the object of interest. Another problem with geometric snakes - perhaps
the most stringent under the conditions of non-uniform boundaries - is the stopping
criterion: the snake may stop prematurely in some regions, while leaking through the
boundaries in other places. These problems are illustrated in Figure 5.3 (b) 8. The
snake stops around the inner structure (papillary muscle) while it leaks through the
valve, and near the apex. More often than not, a single set of parameters, regardless
of their values, simply cannot accommodate both strong and very weak boundaries

along the same contour.

 

Figure 5.2: Endocardium detection in an echocardiogram (dog heart). Segmentation
using the Geodesic Active Contour model. Note the ruggedness and the leakage
through regions of low gradient, while stopping at extraneous anatomical structures
inside the ventricle.

What can be done, however, if smooth boundaries are required in noisy images,
is to search for minima of the energy (5.4) in a (ﬁnite dimensional) subspace of the
space of all curves. One way to do this is to choose a ﬁnite set of functions {Bi} (with
certain desirable properties), and consider only snakes which are linear combinations

of those basis functions:

 

8Using the ITK [65] implementation of the geodesic active contour model.

83

l

7C(s) = Z c, . 3,-(3) (5.24)
i=1

The search space is reduced now from the inﬁnite dimensional space of, say, con-
tinuous or, perhaps, twice differentiable curves (C0, respectively C2), to the ﬁnite
dimensional space span( Bi ) E“ R2! and the restriction to this space of any snake
energy function is simply a function of 21 variables. The shape of any 7 2 7C is

controlled by the vector C = (c1. . . . cl) 6 R21 of free parameters.

Several varieties of snakes fall in this framework. Staib and Duncan [100] express
a closed curve in the form of a trigonometric series, and then, to reduce the number
of degrees of freedom, they truncate the series to the ﬁrst n harmonics, where n is
a ﬁnite, user-deﬁned integer. Along the same lines, Jain et al. [58] use products of
the form sin nt - cos mt as basis functions. B-Splines [28] were used as active contours
(B-Snakes) for border detection and tracking [5, 11, 93], and for stereo problems
[74]. Cootes and Taylor [21] learn the modes of variation of the outline of an object
of interest and express the snake as a linear combination of the most signiﬁcant
modes. Regarding the selection of the number of free parameters that deﬁne the
snake, F igueredo [39] proposes a very powerful, fully automatic way of determining

this number, using a minimum description length (MDL) criterion.

Parametric snakes are best. suited for problems where smoothness is a must. In
addition to that, they preserve topology, which. in medical images, is often known a
priori. Furthermore, prior knowledge about the shape of the snake can be introduced
in a more natural way, as prior probability on the coefficients, and can be done so in a
way invariant to translations, rotations and scale [24] (although very recently, shape
prior has been introduced in level sets too by Leventon et al. in [68], and Cremers
et al. in [25]). Last, but not least, from a practical standpoint, minimization of an

energy like (5.4) is a much more tractable problem in the case of parametric snakes:

84

such an energy is simply a function of 21 variables. and can be minimized by any
of a number of well tested. already existing minimization algoritlnns. No special
provisions must be made to ensure the numerical stability of the evolution problem.

In the next~ chapter we design a new “species” of parametric snakes which partic-
ularly targets extremely noisy images in which the target boumlaries are known to

be smooth.

Chapter 6

Interpolation Snakes

This chapter is central to our work, and represents one of the main contributions of
the thesis. We introduce here a new kind of active contour based on interpolation
splines, with smoothness and with shape memory built-in. The new snake is designed
with the problem of border detection in ultrasound images in mind, but, as we shall

see, it may be used in natural images, with varying backgrounds.

6.1 Interpolation snakes

We have argued that parametric snakes are likely to work better in difficult environ-
ments, such as ultrasound. Of these snakes. we shall single out one particular type,
most suitable for imposing spatial hard constraints.

Virtually all snakes we have encountered in the literature are closed snakes. One
reason for this is that objects have closed boundary, and usually one wants to detect
the entire contour. However, with non-uniform boundaries, or with boundaries which
are naturally piecewise smooth, it may be easier to detect several open segments of
the boundary, and concatenate them, rather than detect the entire boundary at once.
We therefore want to deﬁne open snakes, in addition to closed snakes. However, in

the case of open snakes, the end points must be ﬁxed throughout the ﬁtting process.

86

or else, the snake would shrink along the boundary. possibly to a single point. This
is one reason why one might want to have hard spatial constraints on a snake.

More importantly-t, there are sitm-iticms where certain points or regions along the
boundary of interest have particular significance for the problem at hand, and so,
they have to be accurately located. Such is the case, for instance, with the two valve
points and the apex of the left. ventricle, in echocardiograms (See Figure 6.1 (a)).
When these points happen to lie along weak edges or incomplete data, or simply in
a. region that does not differ qualitatively from other regions along the border, they
are nearly impossible to detect by a “simple-minded” data ﬁtting snake.

Short of implementing special purpose modules dedicated to the sole detection of
these special points. a compromise solution might. be to allow an expert user to mark
these points as ﬁxed at initialization time. These ﬁxed points are hard constraints
for an evolving active contour and we argue that among the various types of smooth
snakes, interpolation snakes are the most amenable to imposing hard constraints.
These were introduced in [77] and are one of the contributions of this thesis.

The snake per se is a disjoint union of one or more components, each of them
modeled as a cubic interpolation spline (Section 2.3). Each spline is built using some
10 to 20 markers as control points, placed in the image by the user. See Figure 6.2.
We parametrize the initial snake by arclength, as usual. What drives the snake is an
energy potential deﬁned as a function on the control points. We describe this in the

following section.

6.2 The Energy Functional

Let C = (310,. . . ,gl_1) E R21 be the control points of a spline 7 = (70, . . . ,7.n_1).
The breaks are as before (Section 2.3) 5 = (£0,...,fl_1) and the degree is k = 3.

The spline 7 becomes a snake if we vary the parameters (in some coherent way, as to

87

 

Figure 6.1: Spatial hard constraints on an active contour in cardiac ultrasound. These
are the red points in (c) and ((l). The two red points on the right are the valve base
points and the single point on the left is the apex. The light matter between the two
valve points is the actual valve (mitral valve), in a semi-open position. Since in any
frame of the heart cycle the valve can be anywhere between fully open, to fully closed,
it makes sense to detect only the actual heart wall (green curve), which is an open
contour.

88

 

Figure 6.2: Interpolation snake. The control points are the red and blue markers. A
blue circle (digit 0) is a ﬁxed control point (hard constraint). A red plus sign, is a
variable control point.

optimize some measure of ﬁt). While it is possible to vary both the control points C
as well as the breaks 5, we shall only vary the control points; the breaks remain ﬁxed.
Since now 7 is completely determined by its control points, we shall use the notation
7 = 70.

We deﬁne the snake energy starting from Equation (5.4). Any energy functional
E = E (7) deﬁned on splines (or other parametric snakes), depends on the curve 7C

only via the vector of control parameters C, so our new energy will be a function

E = E(C) = E(yo, - - - «Hz—1)

deﬁned on R21.

If any of the control points must remain ﬁxed during the minimization (hard
constraints), we have a situation in which the energy must only be deﬁned on the
subset of variable control points, while the spline must be computed on all the control
points. Naturally, the basic idea is to skip over the ﬁxed points when computing the
energy. The implementation requires some care, but can be carried out in an elegant

way, in C++, for instance, by overloading the index operator (operator/j) of the vector

89

class.

Since sm(_)othness is built into 7C, we shall drop the internal energy term Eint
from the energy (5.4). Our only concerns now are (a) to make the snake converge
to boundaries, (1)) keep the points equidistant during the evolution and, eventually,
(c) impose a prior on the overall shape of the snake, e.g. in the form of a density

function on the control vector C, or using kernel methods as in [24]. Under these

requirements, we model the snake evolution by the following energy:

 

I
., 7
EM = — / w<s) - <VIi,—,—>
\ 7 l7 l1
Eimg
L
6.1
+ / llv’(t)l-lldt ( l
0
\__W___/
Earc
+Eshape

Here, (~, -) denotes the dot product, VI-L denotes a rotation by +90 degrees of
VI, C = (310,. "311—1) E R21 is again the vector of control points of the spline,
L = lengthh), and 211(3) is a user-deﬁned weight. function along '7.

We minimize the energy (6.1) using Powell’s conjugate directions gradient descent

algorithm [88].

6.2.1 Image energy

 

The image energy Eimg measures the ﬁt. to data, and we deﬁne it as:
J. ’7'
Eimg : “/ﬁywlsl' (V1 a hill) (62)

Most commonly w E 1, in which case, if we ﬁx an orientation along ’7, the spline

90

will align itself along edges with dark to its right, and light to its left. The reason for
,I

this is that —(V1i. lng) is smallest when VI'L is colinear with 7’ and points in the
1

same direction as 7,. This means that the gradient itself points from right to left as

we walk along 'y in the direction of the chosen orientation on 7. See Figure 6.3

 

I
Figure 6.3: Image energy term at a single point: Eimg = —'w ~ ( VI'L, ﬁrl). Red:

active contour 7 and tangent 7,. Black: target object. Blue: gradient VI and rotation
by +90O (note that in image coordinates. positive angles are measured clockwise). If
VI points to the left of 7, then ( VI‘L, ”'y) > 0, and the dot product is largest when
VI .1. 7. Therefore, a weight w > 0 is needed in this case. Reversing the orientation
on '7 would require a weight. w < 0.

An illustrative example of the general case is shown in Figure 6.4. The green curves

represent the initial (a) and ﬁnal (b) interpolation snake. A blue circle represents a

91

control point with 21' = 0 and will be kept ﬁxed during the minimization. The
variable points are marked in red. either with a plus sign (w = +1) or with a minus
sign (w = —1). The energy (6.1) is a function of the red control points only. The
weights at any other point on the spline (green points) are computed by interpolation
of the known weights. In this example, the orientation of the spline is from top to
bottom. Although initialized on an edge, the snake moves away from it, picking the
edges with the polarity dictated by the weights: as we walk downward along the ﬁnal
snake (b), the w < 0 points are attracted by a dark —+ light. edge, whereas points
with w > 0 are attracted by a light ——+ dark edge. This edge polarity selection
by the snake is due to the fact. that unlike the image energy Eimg in (5.4) which
only depends on the magnitude of the gradient, the Eimg in (6.1) depends on the
magnitude of the gradient and on the angle between the gradient VI and the (unit)
tangent to 7. We have found that this latter image energy performs better than the

original magnitude—only image energy.

A ﬁnal, but important detail about Eimg is the computation of the image gradient
VI. Computation of the intensity gradient requires smoothing of the initial image,
and the amount of smoothing determines the scale at which the snake evolution takes
place. We compute the gradient in a way that allows us to seamlessly vary the scale,

from coarse to ﬁne, during the ﬁtting process.

Instead of the usual Gaussian smoothing, or the more sophisticated anisotropic

smoothing methods [86], we compute the directional derivatives at each pixel as

045] = [right — [left

8y] = [— bottom _ flop

where each average intensity is calculated on a small rectangle, as shown in Figure 6.5.

92

 

(a) Initial (b) Final

Figure 6.4: Interpolation snake. Orientation along the snake is from top to bottom.
Blue and red: control points. Green: arbitrary points on the snake. The weights of
each of the control points are indicated by the shape (and color) of the marker: a
blue circle means to = 0 (hard constraint), whereas a red :1: means to = :l:1. Despite
starting out on an edge, the particular weights guide the snake towards other edges,
as follows: 11) < 0 selects dark ——> light gradient, whereas w > 0 selects light —> dark
gradient, as we move along the curve with the chosen orientation. The points with
w = 0 (blue circles) remain ﬁxed during the ﬁtting process, acting as hard constraints.

 

 

_ __ top

r—i

 

Ibottom

 

 

 

 

 

 

 

Figure 6.5: Computation of the image gradient.

93

Computation of the average intensity over a rectangle of any size can be carried
out in constant time, regardless of the size of the rectangle, using only four look-up
operations, by using a clever representation of the image I in the form of the integral

map [107]:

~ '1' y
1(r.y)=/ / Remedy
0 0

Clearly (Figure 6.6), on any rectangle [$1,513r] >< [yl, yr] we have

(Er yr ~ ., ~ ~
/ / I(.z:,y)d.z:dy = [(35% yr) - 1(1'7'» 91) — [(113], 317‘) + [(175,311) (63)
171 yl

which makes possible the computation in constant time of the average intensity

on any rectangle.

ll

yr C

 

 

 

 

0 x1 xr T

Figure 6.6: Integral map. The integral of I over the rectangle ABC D can be computed
by Equation (6.3), knowing the integrals of I over the four rectangles with the lower
left corner at the origin 0, and the upper left corner at A, B, C and D.

94

In the discrete setting. the integrals get replaced by ﬁnite sums. The integral
map itself. can be computed in quadratic time using a straightforward, bottom-up
dynamic programn’iing algorithm. based on the following recurrence. obtained by

taking .1:,- 2 .rl + 1 and gr = y] + 1 in (6.3):

I(.lr+1.y+1)= I(.r.y+1)+I(.1:+1.3/)—I(.r.y)+I(.r+1.y+1)

Note that this last relation gives [(13 + 1.y + l) in terms of the values of I, so
knowing the integral map of I we can recover exactly the image itself. In other words,
the integral map is an invertible operator.

The size of the rectangular patches on which the gradient is computed determines
the amount. of smoothing, and hence, the amount of detail that will be used during
the ﬁtting: a larger scale ( z 30 x 60 rectangles), makes the snake sensitive to more
distant edges, but produces a rather coarse ﬁt, whereas a small scale (e.g., smaller
than 10 x 20) makes the snake sensitive only to nearby edges, and produces a closer
ﬁt.

We illustrate multi-scale ﬁtting of a nmlt'iple-component interpolation snake in
Figure 6.7. The original image is noisy, with 20% salt-and—pepper noise. Figure 6.7
(b) shows the initial multi-component snake. There is a single (closed) component
around the circle, and four open components around the square, concatenated. As
before, the red “+” signs represent variable control pints with positive weight, and
the blue circles represent ﬁxed control points. Figure 6.7 (c) shows the detected
boundary at a coarse scale (30 X 60 rectangles used in the computation of VI). The
ﬁtting continues at scales 20 x 40 and 10 x 20. The ﬁnal result is shown in ((1).
Figures 6.7 (e) and (f) show the actual gradient (x-component only) used in (c) and
(d), respectively.

Since computation of the integral map need only be done once, in one pass through

the image, and since afterwz-irds computation of the gradient. VI can be done in

95

 

(0) (f)

Figure 6.7: Multi—scale, multiple-component interpolation snake. (a) 20% salt—and—
pepper image; (b) Initial snake; (c) Coarse scale ﬁt: 30x60 rectangles for computation
of VI. ((1) Fine scale ﬁt: 10 x 20 rectangles. (e) Large scale gradient used in (c) (.1:
direction only); (f) Small scale gradient used in (d) (.1: direction only):

96

constant time, regardless of scale, we, can and do vary the scale during the ﬁtting

process, gradually, from coarse to ﬁne.

6.2.2 Arclength energy

The arclength energy Earc is designed to keep the points of the snake equidistant

throughout the minimization process, by penalizing non-arclength parametrization.

L I
E(1.7‘(:=A) ll“! (Ill — lldt (6'4)

Since under the canonical parameter s, the speed along the curve is 1 (i.e. l"/’(S)| =

l7’(T)| -1

parameter. The necessity of the arclength term is due to the following argument.

1), then the amount is indicative to what extent. 7' fails to be the canonical

 

 

Without exception, a curve *7 is discretized by sampling it at equally spaced grid
points t,- = 2L - At. However, as we have seen in Section 2.2.1, the points 7'27 = 7(ti)
need not be equidistant, so this procedure does not correspond to the arclength
parametrization. Of course, if a non-uniform distribution of the points along the
snake is allowed, then the image energy will indicate a misleading measure of ﬁt, if
e.g., a densely populated segment along the snake happens to fall into an energy pit.
(strong edge). It is therefore crucial to start out with, and maintain equally spaced
points on the snake.

This situation is illustrated in Figure 6.8, which shows a typical case. We ﬁt an
interpolation snake twice, starting from the same initial conﬁguration: once using
Eimg only - hence without. the arclength energy (6.4) - in Figure 6.8 (c), and one
more time using both Eimg and Earc (Figure 6.8 (e)). For visual comparison against
the ground truth (Figure 6.8 (a)) we overlay each result on top of the ground truth
(Figures 6.8 (d) and (f), respectively). In the absence of the arclength energy, the

goodness of ﬁt is measured by the image energy alone, and has a value of —44.7159,

97

“better” even than the image energy along the ground truth itself (—16.4465), erro-
neously indicating a very good ﬁt. When the arclength energy is combined with the
image energy, the ﬁt is visibly better than in the former case. Quantitatively, in each
case we measure the difference between the detected boundary and the ground truth
using e.g., the modiﬁed (average) Hausdorff distance [32]. Without the arclength
energy, this measure is 7.68 pixels, whereas with the arclength energy, the average

Hausdorff distance is only 3.60 pixels.

Somewhat. surprisingly, the problem of preserving arclength parametrization along
the snake during the evolution, although noticeable both theoretically and practically,
has been silently overseen, and left as an implementation detail. Different active con—
tours implementations deal with this problem in an ad-hoc manner, 8. g. by removing
points from regions along the snake where the points have come too close together,
and inserting points in regions where the snake has become too sparse. The ﬁrst au-
thors to address this problem were, to our knowledge, Williams and Shah [110], who
replace the ﬁrst order term in the internal energy Eint (5.3), by a term which penal-

izes for deviations from the mean distance d]: :lA/i,+1 — 7,-l/n along the contour,

namely Z(]’)’i+1 — ’7.le - (f).

In the discrete setting, we take an equidistant subdivision of the interval [0, L],
namely T2- = 2' - L/ n, and let. 7,: = ’7’C(Ti). We approximate the derivative at 7",: as
"/(Tz') z (1.4,.+1 — 71-) / AT, and approximate the arclength energy Eaxrc by a Riemann

SUIDZ

'_1l/ Til
E(L'I'C x Z ]— i+1;— — 1] ° AT (6.5)

2 "1x; lam _,.,| .. m] (6.6)

98

 

Figure 6.8: The need for arclength parainetrization. Green: interpolation snake (con-
trol points omitted). Yellow: ground truth. (a) Initial snake (arclength parametriza-
tion). (b) Manually drawn border of a human kidney. (c) Snake ﬁtted using image en-
ergy only. The points do not remain equidistant. and tend to accumulate along strong
boundary segments. producing an artiﬁcially good score (E = Eimg = -44.7159).
better even than the ground truth score (F : Eimg : —l6.~l'1t35), but a poor ﬁt.
Average Hausdorff distance from ground truth: 7.68 pixels. ((1) Same as (c), overlaid
over the ground truth. (e) Snake ﬁtted using both the image energy Eimq and the
arclength energy Rum. Average Hausdorff distance from ground truth: 3:60 pixels.
(f) Same as (e). overlaid over the ground truth.

99

If L is chosen to approximate the length of the (initial) contour, and if this length

remains roughly the same during the evolution. then

72—1

2: lA/I-l-I - "/I'] % L :11.- AT
i=0

so AT is the mean of all distances between consecutive points along the snake. We
arrive this way at the elastic energy proposed by Williams and Shah [110].

The initial snake is also constructed so that. the sampling at uniform time intervals
corresponds to (an approximate) arclength parametrization. This was explained in

Section 2.3

6.2.3 Shape energy

The shape energy E represents the shape prior knowledge and can be imposed

shape
e.g., as a distribution on the vector of coefﬁcients. Any of a number of techniques
for distribution estimation can be used, such as e.g., Principal Component Analysis
(PCA) on a number of training contours, Parzen windows, or more general kernel
methods [24, 25]. In our experiments we use the average chamfer transform model,
discussed in detail in Chapter 4.

Speciﬁcally, let 71,. . my): be registered training samples, represented respectively

by their chamfer transforms C1, . . . Cl.“ Let

_‘ 1
c =EZC,

the average chamfer transform, and let it be the curve that. minimizes it:

,u = arg min C(n’) (6.7)
#1

If 7C is an interpolation snake, then we let TC be the similarity transform that

100

does semiazes alignment (Section 3.4) of 70 onto a, deﬁne the shape energy as

Eshapc(C) = 3(CiTCWCll) (6.8)
where the function 3 = 3(1) has the form

2
2
s(x = a -——-
) 1:2 + b2
for some constants a and b. The reason for “squashing" the average chamfer

transform by composing with s is to bring C within a ﬁxed, controlled range (Figure

6.9).

 

Figure 6.9: Squashed average chamfer transform, used as the shape model with in-
terpolation snakes. Compare to Figure 4.3

Note that even though the shape energy is deﬁned as a function on the control
vector C, at each iteration we register the entire spline '70 to it. (not just the control
pointsl). The actual similarity to the training samples is computed again on the
entire 70' Both the registration of 7C to a and evaluation of C (TC(7C)) have time
complexity linear in the number of points on 7C: the same as the complexity of the

other two energy terms Eimg and Earc-

101

The beneﬁts and limitations of the shape energy should be well understood and
not overestimated: the shape energy. in general. can correct for significant deviations
from the general shape of the training samples, but cannot account for details in any

particular test. case.

The example in Figure 6.10 illustrates the the beneﬁts of using the shape energy.
A clear kidney ultrasound image (Figure 6.10 (a)) is occluded by applying a white
stripe with dark islands (Figure 6.10 (b)), in an attempt to “confuse” the snake.
Figures 6.10 (c) and ((1) show the detected boundary without the shape energy term

(c), and with E

shape (d). The “distractor” stripe tends to deform the snake toward a

shape not part of the model, and the shape energy term manages to correct. for that

deviation.

What the shape energy cannot do, is to correct the shape of a snake (towards the
desired boundaries), if the snake already happens to be similar to the training shapes.
We illustrate this point in Figure 6.11. Starting from the same initial contour (Figure
6.11 (a)), we ﬁt the snake with (Figures 6.11 (c) and (d)) and without shape energy
(Figures 6.11 (e) and (f)). The results in both cases are similar, and similar to the
training shapes (on average), yet different from the ground truth (yellow). The reason
is that unlike in the previous example, there is no distractor to alter the shape of the
snake signiﬁcantly, from the learned model. The shape energy has nothing to correct
for, and has a minimal effect, because near convergence, the snake is already from the
distribution of the training samples. In fact, without the shape energy, the average
Hausdorff distance between the detected boundary and the ground truth was 5.13
pixels, whereas with the shape energy, the average Hausdorff distance was slightly

higher: 5.85 pixels.

102

 

(d)

Figure 6.10: Effect of the shape energy Eshape' (a) Clear kidney image. (b) Occluded
kidney, and initial snake (green). (c) Final snake with no shape energy. ((1) Final

snake with shape energy.

103

._. ‘
e""‘¢.: “xv
I ..I\

'-.'II---I-.
’-
.I
I'-
II.

a"

I
I
I
I
a
I
I
I
I
c
I
l
I
s
I

 

(e)

 

Figure 6.11: Limitations of the shape energy. (a) Initial snake. (b) Ground truth.
(c) Final snake without Shape energy. (d) Comparison of (c) with the ground truth.
(e) Final snake with shape energy. (f) Comparison of (e) with the ground truth.

104

.I‘

'.
I
I
I
I
I
I
l
I
I
n.
I
I
l
I

as
I. '1'
""I ..“lﬁ

u. IA.-
. I II.

 

(8)

Figure 6.11: Limitations of the shape energy. (a) Initial snake. (b) Ground truth.
(c) Final snake without shape energy. (d) Comparison of (c) with the ground truth.
(e) Final snake with shape energy. (f) Comparison of (e) with the ground truth.

104

6.3 Conclusion to Chapter 6

\Ve have. deﬁned here a type of active contour - intcrpolution snakes - with shape
memory, smooth by construction, and that allows for hard spatial constraints. This
is achieved by means of an interpolation spline, preferred over other splines because
the spline passes through all its control points.

The snake can have multiple connected components (Figure 6.7), and each compo-
nent preserves topology during the boundary ﬁnding process. This is not a drawback
in applications such as medical imaging, where the topology of the object of interest.
is generally known a. priori, it is almost a “must-have” feature.

A three-term energy functional is being minimized (Equation (6.1)). The main

term (E Equation (6.2)) is the boundary—seeking component. Unlike the image

imgv
term in the original snakes algorithm [63], which only seeks pixels with large intensity
gradient, our image energy term guides the snake based on the angle between the
tangent to the snake and the image gradient (Figure 63). In particular, this makes
the snake sensitive to the edge orientation1 (Figure 6.4). The second term (Earc,
Equation (6.4)) helps the snake remain parametrized by arclength, thus preventing
the constituent points of the snake to accumulate near very strong edges in the case of
objects with non-uniform boundaries (Figure 6.8). Last, but not least, there is a shape

prior term (E Equation (6.8)) that prevents signiﬁcant. deviations of the snake

shape’
from a known, predeﬁned shape model. The three terms must be commensurate, so

some normalization is in order.

The choice of splines for the active contour reduces the search space from an in-
ﬁnite dimensional space of C2 curves in R2 to a low dimensional R2,“, where k is
the number of control points. As such. the energy E is a function of 21" independent.

variables and can be minimized by standard routines, some of which do not. require

 

1This feature can be turned off. however. by taking absolutee value in the deﬁnition of Hwy

(Equation (6.2))

105

con‘iputatitm of the gradient. thus avoiding computation of the Euler-Lagrange equa-

tions.

106

Chapter 7

A Different Direction: Break Point

Detection

We devote this chapter to a non-variational approach to boundary detection. The
new approach ﬁnds the boundary by detecting jumps in the intensity profile along
segments across the boundary of the object of interest. The core of this approach is
a general, mathematically sound algorithm for detection of jumps in a 1D signal. We
expect this algorithm to have applications well beyond border detection. It. is one of

the contributions of the thesis.

7.1 One-Dimensional Break Detection

We arrived at the problem of break (or jump) detection in 1D signals while trying to
ﬁnd the kidney boundaries in ultrasound images. By and large the kidney has overall
lower intensity than its surroundings. Thus, if AB is a segment. that goes across the
boundary of the kidney, for any point. P 6 AB we can compute the average intensity
on AP and on P8. lntuitively, the border point we seek should be the one for which
one average is as low as possible while the other is as high as possible. See Figure 7.1

and also Example 7.1.2.

107

 

Figure 7.1: Boundary detection by ﬁnding a break point in an intensity proﬁle across
the boundary.

It is a trivial matter to detect a single break point: simply move along the proﬁle
and choose as break point, the point which maximizes the difference between the left
and right average intensity. From the very beginning, it is clear that the problem of
break point detection should be formulated in the more general case, that of multiple
break points. It should also be clear that we cannot expect a break point detection
algorithm to work well for arbitrary 1D signals: in this motivational example the
intensity on either side of the break point is relatively constant. After some thought,
we have come up with the right mathematical model of the situation we just described.

We present this model next.

For any interval [a, b] C R, let the set of n-segment partitions of la, b] be

Pn={T=(a=t0<tl---<tn=b)}

Let H = (h1,...,hn) 6 IR” be any vector. Each T E ’Pn and H E R" deﬁne a

108

piecewise constant.1 function:

TI.

2:1
where for any interval I , X I is the characteristic function of I . Since the compo—
nents of H appear as coefﬁcients in Equation (7.1), we shall call the vector H the

weights vector. Denote by ”PC the space of piecewise constant functions on [a, b]:

PC‘ = {yH,T|H E Rn,T E pn}

Let now y be a noisy version of y H T:

y = yH,T + 72 (7-2)

In practice, we can only measure the noisy signal y (H and T are unknown), and
we would like to segment it, assuming some information about the noise 7). In the

most. general setting, this means that we want to ﬁnd

1. the number n of partition segments,
2. the actual partition T =: (a = to < t1 . . . tn 2 b),

3. the weights vector H = (h1,. . . , hn).

Segmentation of y is impossible without assumptions on the noise function 77. For
instance, if 77 = "yH,T’ then y = 0, from which nothing can be inferred. We shall
impose conditions on the noise 77 which will guarantee mathematically that y HT is
closest in L2 norm to 3;, among all piecewise constant functions. That is. if for each

piecewise constant function yG S 6 ”PC we deﬁne the L2 energy

 

1In Lebesgue Theory, these are called simple functions. We find the term piece-wise constant more
suggestive in the context. of segmentation.

109

b

0 .
1301(le = ly - 3105 "(it (7-3)
, .0. ,

 

then we want y H T to be its minimum. The following proposition asserts under

which conditions of noise this is true.
Proposition 7.1.1 Ify = yH.T+7l and L: 7}(t) = 0 on any subinterval [’u, v] C [(1, b],

then

E(l/Gﬁ) 2 E(nyf)

for any piecewise constant ya 5"

Proof:

b
E(yGS) =/ ly - yaslgdt
. a ,

n ti 2
= Z/ lhi — 310,50) +7)(t)| (1t
i=1 ti—l

n ti 2 ti
2 (ft [11,—gash” dt+2/t
i=1 "

ti
(hi. — gage» . nmdt + l Walt)

2—1 'i—l i—l

Tl ti b
= Ell/HI) + 2 2:] (hi —- 316380)) -1](l)(lt+/ n2(‘l)dt
i=0 “-1 a
b 2
= E(ym) + f n (can
(I

The last equality is true because hi — yG, S is piecewise constant on [t.,j_1, ti): and
because by hypothesis, 77 integrates to 0 on any subinterval.

Cl

Remark 7.1.1 While Proposition 7.1.1 is undoubtedly true, unfortunately there are

not many functions 7) besides the zero function which satisfy the hypothesis f5" =

110

0 for any sub-interval [aw] C [db]. To be integrable 77 would have to be Lebesgue
measurable, but a. result from real analysis asserts that if 7] integrates to zero on
arbitrarily small intermls around a point, then the function must be zero at that point
[92]. If we require that r) integrate to zero on any subinterval, then I; must be zero at
all points (except possibly for a set of measure zero).

However, there is a ﬂat to this problem, inspired again from practice. If we choose
some scalar A > O and only require that r) integrate to zero only on subintervals of
length no less than x\, then. we can ﬁnd many such functions. Divide up the interval
[a, b] into a large number of equal length subintervals, and call their common, nonzero
length A. On each of such subintervals consider arbitrary Haar-like functions, i.e.,
functions symmetric with respect to the midpoint of the interval. Because of symmetry,
any such function integrates to zero on its respective subinterval. If n is now any
combination of such functions, then 7) integrates to zero on. any union of length-A
subintervals. Accordingly, we can no longer work with arbitrary partitions of the
whole interval [a,b]. but only with partitions whose break points are multiples of A.
With these modiﬁcations, the proof of Proposition 7.1.1 remains unchanged.

This modiﬁcation amounts precisely to the discretization of the problem. Thus,
in summary, Proposition 7.1.1 is true in the continuous case, but not interesting, as
the noise 77 must be zero to begin with. Discretization allows for non-trivial noise,
and, with the proper modiﬁcations the result remains true. For clarity and simplicity,

however, we state the results and present the proofs in the continuous framework.

In segmentation problems or 2D boundary detection problems one deﬁnes an en-
ergy functional heuristically, in such a way that the e.g., the boundaries of interest.
are expected to be among the local minima of that energy functional. Usually there
is no proof that the minima of the energy agree with the human perception, and it is
often difﬁcult to control which minima the minimization procedure is going to find.

By contrast, Proposition 7.1.1 asserts that if y = y H T + n is a noisy signal, then

111

its deﬁning piecewise constant core l/IIT - which we would like to remix-er - is the
absolutee minimum of the 1.“2 energy (7.3). so by minimizing the energy (7.3) after
all C E R” and S 6 P77. we recover the weights H and the partition T.

We can do even better though. If we assume the noise 7] has zero mean on any
subinterval of [(1.1)]2, then the weight h, is precisely the mean of the signal y on
[t,j_1. til' In other words. the weights H are not independent on the partition T, but.
rather they are uniquely determined by T (and by the signal y itself). This suggests
that we can consider the energy B only as a function of the partition (rather than
partition and weights). To prove this fact. let. us first introduce some notation.

Let S = (a = so < 51 < < sn = b) be a partition of [a,b]. Let Y8 =

(3)1, . . . y”) be the weight vector whose compcnwnts y, are the mean of y on [s ,f_1, 5i)-

Proposition 7.1.2
E(G, S) 2 E(YS,S) VG E R"

Proof: :

b 9
E(G.S)=/ ly-ycsl“
(I.

T) 8]
= / (y—yj +37J' —.(Ij)2(1t
. 3-
n s 3. 5..
J _ _ J _ J
= E: (/ (y—yj)2dt+2(yj -gj)-/ (y-yj)dt+/ (yj —.qj)2dt)
. 3’ 8' .S.‘
\__..v__/
0

 

= E(YS.S) + Ivy/5s — ”310.5 L2

8'.
v J 0 f . v '1 a — . : J I . _ ' o . 7‘ ' ‘ .7
where we have used that y] f9j_1y(t)dt/(sj 53—1) is the mean of y over

 

2Again, in the sense of Remark 7.1.1.

[S‘j— 1 , Sj ).
C]
It follows from this proposition. that in order to segment the signal y. it sufﬁces to

linimize 1e enerU‘VS——i L .ra ier ran 1..' ——> L LS . To summarize.
n tl g“; F158 tl ti ('8 PC

Corollary 7.1.1 Suppose y = ij: + 7) with [ii ml! 2 0 for any subintcrval [a, 1'] C
[(i.b]. For any partition S E P", let yJ- be the mean of y on [sJ-_1.sJ-). and let
Y-.= ' ....— lztl:’;:7’l.tx'x:t " fly :1": r w tlz' Twirl):
b (yl y”) )( If any: l(( ()l of 1c mums am yb 305.5 f 2r puuuzst

constant function (l(.ﬁn.cd by Y9 and S ) Deﬁne the energy

b
as = [G (y — is)? (7.4)

Then

T = arg min E(S), H : Yr
15.61)", F

That is, the partition T is the absolutee minimum of E, and the weights vector 11

consists of the means of y on each. of the segments of T.

Certainly. the problem now is to minimize the energy (7.4) with respect to S 6 P71.
Equivalently, we must minimize E after S E R", subject to the constraints a = 30 <
51 < < sn 2 b. In the discrete case, if y is given as a sequence (yo, . . . .yN),
then for large N and n an exhaustive search for the optimal T would have complexity
0(N") (“N-choose-n"). Fortunately, a very elegant dynamic programming solution

exists, based on the following observation (stated here in the continuous case):

Proposition 7.1.3 Suppose T = (a = to < t1 < < [n—l = b < tn, 2 c)
is the optimal partition on [ac] {i.e., T minimizes E {Equation (7.4)) over [a,c]).
Let T, = (a = to < t1 < < fn—l = b) be the truncation ofT up to the last
interior break point b = tn—l' Then T, is the optimal partition over the subinterval

[a, b] C [a, c].

113

Proof: Since we‘re. working with different intervals. we must specify the depen-
dence of the energy E on the interval and on the number of segments. Thus Each. is

the energy for partitions over [a, c] with n segments. and F

Jawb‘n_1 is the energy for

partitions over [a, b] with n, — 1 segments. Here b = tn—l- As a matter of notation,
for any interval [11,. 1] let 37W 1,] denote the mean of y over (u. L] Obviously. by the

very definition of E (Equation (7.4)). we have

c
_ 9
13041:"- : Hahn—1 “Jr/1) (y — y[b.c]))~d’

Let. S, = (a = '50 < 31 < -- - < sn — 1 = b) E 79n_1 be the optimal partition over
[a. b]. Then the partition 8 = (S’. c) has n segments. and spans [ac]. Since among

such partitions T is optimal. we have

Ewan. (T) S E(L,C.fn (S)

and by subtracting fbc(y — 3)“) (1])2 on both sides we get.

. I wI
Ea.b.n—l(T ) S Ea.b.n—1(b )

But since 5’ is optimal on [a. b]. we also have the opposite inequality:

I I
Ea.b.n—1(T ) Z Ea,b.n—1(S )

‘ * . , y 4 * ' . .
Corollary 7.1.2 Let Ea.c.n be the minnnum of Ema", and Eab _1 be the mini-

. TI.

mum of Ea,b.n—1' for any a < b < c. Then,

c
y * _ - * — 2 r'
[30.0.71 — bgiclijc] (Ea.l).n.-—1 +/b (ll - t/[ydl ) (7'0)

114

Once we have. established the recurrence relation (7.5), the bottom-up dy-
namic prograrmning algorithm is straightforward: given a. one-("lin‘iensional signal
y 2 ((yo. . . . . yN) and a. desired number of segments n, we fill incrementally a matrix
E*(b,k) for integers 0 S b g N and O S A: g n using Equation (7.5). The value
E*(N,n) is the minimum energy for partitions over the entire interval [0. N] with

n segments. The actual optimal partition T is then obtained by backtracking the

indices b which achieve the minimum on the right—hand-side of (7.5).

Example 7.1.1 For this earample refer to Figure 7.2. We generate a random 4-

segment integer partition T of the interval [0. 1000]

r = (0,169,877,924,1000)

and a random weight vector

H = (7.17897, 96.6553, ——66.3123. —24.9187)

The piecewise function y H T determined by H and T is shown in red. This is the
7
signal before adding noise, and can be seen as ground truth. Neat, we construct the

noisy signal

31 = nyr + 7I
where n is Gaussian noise with zero mean and standard deviation (7 = 20. The
resulting noisy signal is shown in blue.

The result of the dynamic programming segmentation algorithm is shown in green.

The detected partition T, and weights vector H I are

115

 

 

 

 

 

 

 

 

200 l l I I I
— y=yHP + noise
— yHP (gmd truth)
150 - L — segmented -
I t
100 h- l _.
50 - -
( l
0 ‘ ‘ _
_50 — _
_1oo — _
_150 1 4L 1 J l
0 200 400 600 800 1000 1200

Figure 7.2: Break point detection of a randomly generated signal. Blue: noisy
data y = yHT + 7) with 1000 points and 4 segments. Red: piecewise constant
function y H,T used to generate y (ground truth). Green: denoised signal. The
noise 1] was Gaussian with zero mean and standard deviation 0 = 20. The true
break points were T = (0.169,877.924.1000) and the true weights vector was
H = (7.17897, 96.6553, —66.3123, —24.9187). The denoised signal has breaks iden-
tical to the true breaks, and weights HI : (9.41748.97.0672. —68.4484. —27.1419).

116

T’ = (0,169,877,924,1000) (7.6)

H’ = (9.41748, 97.0672. —68.4484, 47.1419) (7.7)

The detected breaks are identical to the ground truth breaks, and the detected
weights are close to the original weights. This should not come as a surprise, as
the mathematics leading to the dynamic prognmzming algorithm. guarantees that the

algorithm. ﬁnds the ground truth y H T'

Example 7.1.2 In this example we extract an intensity proﬁle along a segment in
an ultrasound image, and apply the break point detection algorithm. Refer to Figure
7.3. The the intensity proﬁle is eartracted along the yellow segment in Figure 7.3(a),
moving from the top green end towards the red end. This proﬁle is plotted in blue in
Figure 7.3 ( b). The break detection algorithm is applied, and the resulting piecewise
constant approximation of the proﬁle is plotted in ( b) (green). The interior break
points are marked along the yellow segment in ( a ) with cyan { for a high to low break)

and with magenta {low to high).

Aside from these toy examples, we shall present a real application of this algorithm
in Chapter 8.5. In that application, we detect break points in ultrasound images along
rays of equal length, originating from a point, for the purpose of border detection.

In fact, although we presented the theory first, the road-map was from concrete
to abstract. We were looking for single break points along radial intensity profiles,
optimal with respect to some metric. It turned out that the energy (7.4) was giving
good results, and for two-segment partitions, the break point. was readily found by a
linear scan of the proﬁle. Since the results were encouraging, we. devised the experi-
ment in Example 7.1.1, in which the ground truth is known: starting with a known

piecewise constant function y H T7 segment a noisy version of it y = y H T + 77. In

117

300

 

.4

250 -

200

 

150-

 

 

 

50- -

All
WV W _

 

 

 

o
l

 

 

l_

0 100 120 140 160

 

20 40

(b)

Figure 7.3: Break point detection along an intensity proﬁle in an ultrasound image.
(a) We move along the yellow segment from the top green end towards the red end.
The detected interior points are marked with cyan (high to low) and magenta (low
to high). (b) Blue: 1D intensity proﬁle along the yellow segment from (a). Green:
segmented signal.

118

all experiments, the denoised signal was surprisingly close (almost identical) to the
ground truth. and that could not be by chance: it had to be a. mathematical expla-
nation for it. We were able to provide rigorous proof to support the experimental
ﬁndings. Moreover, the mathematics showed that there was nothing special about
two-segment partitions, as we originally started out. with, and that. the results were
in fact valid for partitions with any number of segments. Last, but not. least, we were
able to ﬁnd a dynamic programming algorithm that. would detect these partitions.
The only limitation of this work is that the number of partitions has to be speciﬁed
a priori. It would be highly desirable to be able to choose the number of segments
automatically, but we currently do not have a solution for that. We contend, however,

that an MDL3 solution to this problem is very likely.

 

3Minimum Description Length.

119

Part III

Applications

120

Chapter 8

Experimental Results

In this chapter we present some experin’ients designed to test the performance of
the interpolation snakes introduced in Chapter 6. We do a systematic and objective
study of four active contours models. including the interpolation snakes on synthetic
noisy images (Section 8.4) and on kidney ultrasound images (Sections 8.2.2, 8.2.1,
8.2.3 and 8.2.4) by measuring the shape similarity of the detected boundaries against
manually traced ground truth. For clinical validity of our algorithm we perform also
a subjective blind study (Section 8.3), in which a human expert. chooses the better of
of two algorithms displayed simultaneously in separate copies of the same image.

In addition to the results on ultrasound, we demonstrate in Section 8.6 good
performance of interpolation snakes for shape detection in non-medical images. We
follow up the 1D break point. detection algorithm from Chapter 7 with some partial

results in Section 8.5.

8.1 Introduction

We have developed, so far. a framework for boundary detection. speciﬁcally targeting
ultrasound images. In this chapter of the thesis we assess the performance of our

algorithms. We do this objectively and systematically, by comparing the boundaries

121

detected using our methods against the true bcnuidau'its (ground truth). In the case
of real images (such as ultrasound). the ground truth usually consists of manually
traced contours by human experts. while for synthetic images, the ground truth is
the curve used to generate the synthetic image, i.e., the curve whose interior is the

object. of interest. We describe the types of images used in Section 8.1.3.

8. 1 . 1 Performance Measures

One way to evaluate. the 1:)erformance of active contours is to compare them against
ground truth curves. \Ve. use three measures to compare the similarity of two curves.

Let X and Y be two simple closed curves in the plane.

Hausdorff Distance. This has been defined in Section 3.1 (3.6) as

H(X, Y) = sup{d(X, Y), d(Y,X)}

where d(X, Y) = suplE X (l(.1:, Y) is the directed distance between the sets X

and Y. Here, X and Y can be arbitrary sets in the plane, not necessarily curves.
The measure satisfies the mathematical deﬁnition of a distance, i.e., it has the
following properties [47]:

l. H(X, Y) 2 0, with equality if and only if X = Y.

2. H (X, Y) = H (Y, X) (symmetry property).

3. H(X, Z) S H(X, Y) + H(Y, Z) (triangle inequality).

This measure is known to be highly sensitive to outliers. To exemplify, suppose
the two sets X and Y are identical, except. for one point p. That is, Y = X U{p}.
Then obviously d(X, Y) = O as X C Y, but (1(Y, X) = (I(]), X), so the Hausdorff

distance is H (X, Y) = (1(1), X). As such, the Hausdorff distance can be made

122

arbitrarily large. by moving a. single point towards infinity, even though the
two sets may otherwise be identical. In image prrwessing. when comparing two
figures (e.g.. each as the result of an edge detector) using the Hausdorffmeasure,
the presence of outliers may deem the two figures as different. even though they

may in fact represent the same object.

One way to make the Hausdorff distance less sensitive to outliers is to replace the
supremum with the average when computing the directed distance (1(X. Y). This
leads to the following similarity measure (intrmluced in [32] as the “modified

Hausdorff dist ance" ).

Average Hausdorff Distance. This too has been defined in Section (3.8):

”(11"(X9 )X) : SllI){d(u)(X, yr): (1011(Y7 X)}

where (1(1,-1_:(X , Y) = f X d(.1:, Y)d.r / |X l is the directed average distance. As in the
case of the Hausdorff distance, X and Y can be arbitrary sets. Here [X | denotes
the length of the curve X. It has been observed [32] that this measure does not
satisfy the triangle inequality , so it is not a distance in the mathematical sense

(properties 1,2 and 3 defined above).

Normalized Symmetric Difference.

|(X\Y)U(Y\X)|
pron

 

NSD(X,Y) = (8.1)
This is a measure that has not been mentioned in this thesis so far.It measures
the amount of overlap between the sets X and Y. Strictly speaking, if X and
Y are curves, 011 the right-hand—side of (8.1) we have the interiors of X and

Y, respectively, but for simplicity, we denote these by the same letters as their

 

boundaries. IX stands for the area of the set (enclosed by) X.

123

The numerator is the usual symmetric. difference, and the (.lenominator is a
nm'malization factor. Obviously. 0 S NSD(X,Y) S 1 for any sets X and Y,
and N.S'l)(X. Y) = 0 if and only if X = Y. NSI)(X. Y) = 1 if and only if X

and Y are disjoint.

Claim: The normalized symmetric difference does not satisfy the triangle in-

equality.

Proof: To prove the claim it suffices to ﬁnd a counterexample. Let X and Y be
arbitrary sets with non-empty intersection: X (7 Y 7E (I) and such that neither
set is included in the other: X Q Y and Y g: X. The triangle inequality for X,

X \ Y and Y reads
Z

NSD(X, Y) _<_ NSD(X, X \ Y) + NSD(X \ Y, Y)
\_q,____/
0
|X\Y|+|Y\X| < |XnY|
|X|+|Y\Xl ‘ IXI

 

Now if we keep X unchanged and expand Y so that |Y\X| -—> 00 while maintain-
ing X F) Y constant, then the left hand side in the last inequality approaches
1 asymptotically, so it will exceed the right hand side, which remains ﬁxed,

strictly less than 1.

Observe that none of these metrics measures shape similarity per se. For instance.
suppose the two sets X and Y are two circles of the same. radius. These two ﬁgures
definitely have the same shape, so a true shape similarity measure should reflect. that.
However, all of the above measures are 0 on X and Y if and only if the two circles
are concentric. If the two circles are disjoint, say, like the wheels of a bicycle, then
N S D(X , Y) = 1 (largest possible), whereas the two Hausdorff measures can become

arbitrarily large. The point. is, therefore, that to use these distances as measures of

124,

shape similarity, the sets X and Y must first be brought in the same region of space,
e.g., by registration.
In the case of our experiments, however, this registration is not. necessary, as we do

compare the proxrirmjty of the snake to the ground truth, rather than shape similarity.

8. 1.2 Algorithms

We measure the performance of the interpolation snakes model (Chapter 6), by mea-
suring the proximity of the output. snake to the target boundaries (ground truth) by
using the measures described in the previous section. The signiﬁcance of the results
can be better understood relative to other models. As such, we compare head-to—head

the following four algorithms:

CV: the Chan-Vese model, described in Section 5.2.2.
GAC: the Geodesic Active Contours model, described in Section 5.2.1.
KWT: the classical Kass-Witkin-Terzopoulos model, described in Section 5.1.

I-SNAKE: the Interpolation Snakes model, introduced in Chapter 6.

8.1.3 Data Sets

We test all four algorithms on two types of data: ultrasound images of human kidneys

and synthetic and images with different amounts and different types of noise.

Ultrasound. Our kidney database consists of 117 images, each of size 733 x 471
pixels. Each kidney in this database was manually traced by a human expertl.
These manually traced boundaries represent the ground truth (CT) for the ul-

trasound images. It has to be noted that, reliable as it. may be, the kidney GT

 

1Dr. Dorry Segev, Division of Transplantatkm. Department of Surgery. Johns Hopkins Institu-
tions.

125

is not 100% objective, because in several images the boundaries are so weak
that the true boundaries are not. distinguishable, even to the trained eye of a

kidney surgeon.

 

Figure 8.1: [a] Kidney ultrasound. [b] true boundaries (GT) manually traced by a
human expert.

Synthetic images. We test our algorithms on synthetic images (in addition to ul-
trasound), in which the ground truth is known and is entirely objective. To this
end, we create a family of binary images representing smooth, similar shapes
as follows. We generate an i.i.d family of vectors Cj = (61,..ch1) E R2”,
' = 1 . . . K from a Gaussian distribution of mean CO and user-deﬁned standard
deviation 0. Each component cf is a point in R2 (Figure 8.2 (a)). The mean C0
is either manually deﬁned by the user, or randomly generated. We interpolate
each vector Cj by a closed spline Sj (Figure 8.2 (b)). The only condition we
impose on Sj is that it does not self—intersect, so that the interior of the spline
is well deﬁned. A binary image I j is then constructed for each contour Sj , in
which the foreground is the interior of Sj , and represents the object of interest

(Figures 8.2 (c)—(f)).

Each of the clear binary images [j is contaminated with several types of noise,
shown in Figure 8.3. We describe each of them below. In the sequel, the index

j in Ij is irrelevant, so we drop it.

126

 

(e) (D

Figure 8.2: Synthetic images. (a) Clusters of control points. There are 30 points at
each site, drawn from a Gaussian distribution of a user-deﬁned standard deviation
0. (b) Interpolation splines using the control points from (a). (c)-(f) Binary images
obtained as the interior of the ﬁrst four splines in (b).

127

Gaussian noise. [Gauss = I+ No where at each pixel (.r. y). the value I](.l‘. y)
is drawn from a normal distribution of zero mean and standard deviation

0. See Figure 8.3 (b).

Uniform noise. We create. the noisy image I. Figure 8.3 (c)) by

uniform (
changing with probability p each pixel in the clear image I to a random

gray level value. That is:

o for every pixel :: = (.r. y) in I
— draw a. number p E [0, 1] from a uniform distribution.

— deﬁne

I random. integer E [0,255] if p < p

uniform<z .
[(2) otherwise

0 end

Salt—and-pepper noise. This is nearly identical to the uniform noise, except
that each pixel picked to be modiﬁed is replaced with either black or white
(each of the two values with probability 50%), rather than with a random
value in the full range [0,255]. We denote the resulting image by 13p

(Figure 8.3 (d)).

Spread noise. This type of noise is obtained by swapping the value I (2.) at
each pixel z with I (z + dz) where dz is a displacement vector, drawn from
a Gaussian distribution. of standard deviation 0. The pseudo code for this

noising algorithm is the following:

o for each pixel z in I (z E R2)

9 . . . .
— draw a vector (1:: 6 IR“ from a Gaussian distribution of zero mean,

and standard deviation 0.

128

— deﬁne

I [(z + (1:) if z + (1: inside I

sprerullz) : .

1(3) ()lllf’T‘ll'lSC
This algorithm makes ob ject boundaries in an image fuzzy, as some pixels
close to the boundary, on one side of the boundary get swapped with pixels

on the other side. See Figure 8.3 (e).

8.1.4 Snake Initialization

Generally, any active contour requires initialization. The outcome of the different
models depends, sometimes to a great extent on the initial curve. Typically, an
active contour application comes with a graphical user interface (GUI) in which one
or more images are loaded and displayed, and in which the user initializes the snake
manually, by marking a few points (ideally, as few as possible) near the boundaries
of interest.

When running repeated, extensive tests on hundreds of images and trying semi-
exhaustively different values for the parameters that control the snake evolution,
manual initialization is prohibitive. We run our tests in a non-graphical application,
and we provide different types of automatic initialization, having as starting point
the ground truth itself. That is, given the ground truth (GT) in the form of an
interpolation spline, we modify in certain controllable and repeatable ways the GT
control points, then interpolate the modiﬁed points to produce the initial snake.

Needless to say, the ground truth is not used in any other way during the snake
evolution; we only use it for initialization purposes, and only to avoid manual initial-

ization on a large amount of data. We describe the initialization methods next.

Scaling initialization. This is perhaps the most. straightforward way to initialize

129

 

Figure 8.3: Various types of noise. (a) Clear image. (1)) Gaussian noise with standard
deviation 0 = 30. (c) Uniform noise with probability p = 0.3. (d) Salt-and-pepper
noise with probability p = 0.3. (3) Spread noise with standard deviation 0 = 30.

130

the snake. We simply rescale the ground truth by a factor )0. with respect to its
own center of mass. .\Iathematically, if GT is the ground truth. :0 is its center

of mass, then the initial snake “)0 is deﬁned as

")0 = [)(GT — 20) + :0

Gaussian randomization. In this case we “wiggle” the control points of the GT
near their original location. That. is, if c is any of the GT control points, we
draw a point c, from a Gaussian distribution of mean c of standard deviation 0
(user deﬁned). In our experiments we test. a = 10 (near) and a = 20 (far). We
do this around each control point. of the CT. The resulting interpolation spline

is the initial snake.

Distance initialization. Here we construct the initial snake 70 such that any point
on '70 is at a user speciﬁed distance d from the ground truth. All points on '70
are on the same side of GT, i.e., either all inside, or all outside of CT. The
easiest way to do this is to build the signed distance map on the ground truth,

and trace the d—level set on it.

Model initialization. If we have a template or a model a of the object of interest,
then we can use that model to initialize the snake, by warping it onto the
image, using a reduced number of anchor points. Speciﬁcally, in our kidney
experiments we pre—deﬁne four anchor points on the model a (Section 4.2). In a
real application, the user would have to anchor the model it to the image using
the pre—deﬁned anchors. After the user has defined the points in the image
homologous to the anchors on the model, the warping is given by a thin plate
spline (Section 3.2.1). This manual initialization of the anchors in the image
would prevent us from running many batch jobs on all the ultrasound images.

We therefore deﬁne the four anchors for each image, and store them once and

131

for all, and use them in every trial (batch mode).

 

(d) (e) (0

Figure 8.4: Different types of initialization. Yellow: ground truth (GT). Red: initial
snake. (a) Clear image. The four points marked with a cross were deﬁned by the
sonographer when the image was acquired. (b) Scaling initialization by a factor of 0.7.
(c) Gaussian randomization with a standard deviation 0’ z 10 pixels. ((1) Gaussian
randomization with standard deviation 0 = 20 pixels. (e) Distance initialization at
distance d = 20 pixels inside the ground truth. (f) Model initialization. The red
curve is the image of a template kidney (not shown) via a TPS transform (Section
3.2.1) that maps four pre—deﬁned anchors on the model (not shown) onto the four
cyan anchors in the image (user—deﬁned).

Each of these methods is signiﬁcantly different from the others. The scaling and
the distance initializations create a snake on one side of the ground truth (either
all inside or all outside). This is a sine qua non condition in the case of geodesic
active contours (GAC), due to the constant balloon force built into the model. This
all-within or all—outside initialization is the only type that can be used with GAC.

Observe, however, that the scaling and distance initialization methods differ in the

132

fact. that with scaling initialization. the distance (in pixels) between the initial snake
and the ground truth will vary with the size of the object of interest. For instance.
downsizing a small contour to 0.7 of its original size may produce an initial snake
at distance of only 5 pixels, say. from the target boundaries, whereas downsizing a
large contour by the same factor (0.7). may produce an initial snake at a significantly
larger distance from the ground truth (20 -— 30 pixels, or more).

Initialization by randomizing the control points of the ground truth is akin to how
a human would initialize the snake manually, in a real application. and it would go
on either side of the true boundary. A sloppy initialization would correspond to a
larger standard deviation of the control points, whereas a more careful initialization
would correspond to a small standard deviation.

Finally, the four-point, model-initialization might be the most convenient. in a real
application. We have observed that in some of the kidney images, four points were
marked by the sonographer during the image acquisition: these were the antipodal
points of the kidney along the long and the short axis. See Figure 8.4 (a). Since
this operation seems to be common practice, warping the model onto the image using

those four points as anchors comes at no extra cost for the medical personnel.

8. 1.5 Parameters

Each of the four algorithms has a number of parameters that are crucial for the
performance of the border detector. Setting the parameters is not. a trivial task,
because, to our knowledge, no active contours algorithm has an automatic way of
adjusting the parameters. It is not possible (without. an unreasonable amount of
work) to determine all parameters for all algorithms by running exhaustive tests on
all images in the database. Therefore, for each of KWT, CV and GAC, we determine
empirically a reasonable set of parameters by using a small number of images, and

use those parameters on the entire database.

133

However. for l-SNAKE. we determine the best set of parameters exhaustively,
during testing. That is. we vary the most. significant parameters within a specific
range, in small increIm-uits, and test I-SNAKE on the entire database. once for each
set of parameters. There are a number of parameters of the interpolation snakes,

which we shall describe next:

Number of control points. This is one of the most important parameter. and con-
trols the smoothness of the spline. The more control points, the more flexible
the spline, and higher the time complexity. This parameter is ('letermined semi-

exhaustively. by testing 72. = 10. 1‘2. 14. 16. 18, ‘20 control points.

Number of points on the spline. This parameter does not influence too much the
performance, as long as the spline is dense after discretization. Theoretically,
the number of pomts on the spline influences the image energy term (Eimgl'

To remove this dependency, we normalize E,-

1mg» dividing it by the number of

points on the spline.

Weights at the control points. These weights determine whether the snake
should choose a light-dark edge, or viceversa. Since by and large, the inte-
rior of the kidney near the boundary is dark, and the exterior is light, it follows
that for a snake oriented clockwise, the weights must have positive sign, so we
deﬁne them as +1. Very rarely, the polarity of the kidney boundary happens
to be reversed, due to image artifacts, in which tase we initialize one or two

weights to —1 (while the majority of the weights remain still +1).

Weights of the three energy terms: reﬁng, mam and wshapev corresponding, re-

spectively to Eimgs Earc and E

shape (see Equation (6.1)). These parameters

control how much each of the three energy terms is to be trusted. We take them
each within the range [0, 100]. The particular range is not important. What. is

important is that we use the same range for all three weights. Having the same

134

range for all energy weights makes sense only if E ,- m {1* Ea”; and :15 ha pe have

been normalized. say, to [0, 1]. We explain this normalization in detail in Section
8.2.4. It turns out. that. of the three weights, the most. influential is the amount
of shape prior. We have observed that increasing it beyond 40. the snake tends
to ignore the image data, trying hard to accommodate the shape model. We

run our tests with w = O. 10, 20, 30, while keeping “’img :2 tum-C = 50.

shape

8.2 Ultrasound Results

We conducted extensive. systematic tests on the 117 images in our kidney database
with all four algoritlnns. by varying the initialization method and various parameters.
It is therefore not feasible to include all the detailed results in this presentation.
Therefore, although each model was tested on all 117 images. we limit the presentation
to 20 images per model, obtained with the parameters with the best performance
To further limit the size of the presentation, we cropped these images around the
region of interest. and scaled them down to fit on a single page. In addition, to the
20 random images, for each of the algorithms we present the best, medium and worst

cases, and possibly some special cases. roughly at the original resolution.

Table 8.1 summarizes the best results of all four algoritlnns.

 

[ Algorithm I Initialization/ Parameters [ NSD (‘76) j HausdorffT Average Hausdorff

 

 

 

 

 

 

 

 

 

CV GT scaling (0.7) 48.90% 82.6 39.394
GAC GT scaling (0.7) 25.32% 49.18 16.14
KWT GT rand, (std. dev. 10) 10.44% 17.10 5.93

I-SNAKE Model (4 anchors) 7.60% 14.32 4.90

 

 

 

Table 8.1: Overall performance of the four active contour models tested on ultrasound
images.

8.2.1 The Chan-Vese model

Recall (Section 5.2.2) that the Chan-Vese model tries to find - in a piecewise-constant,
bi-modal grayscale image - two gray levels c1 and c2 and a curve '7 (with possibly
more than one connected component) that best approximate the image. That is, if

11.0 is the given image, one tries to minimize the energy

136

E((.'1,c.2.e,) = (r/

[110(2) — (71|2(1.L' + ,3/ [110(1) — ('2|2(l.r+
. iii/(7) e)

m ,

1/ ~ Lc‘nglhh) (8.2)

A level set implementation of the minimizatitm is [mssible if we embed, as usual,

the. curve "r as the zero level set of a surface (,5. changing the energy (8.2.1) z—iccordingly:

I

E((t1,c2.c'9|110) = a fQI'uO — (:1|2[I(¢)+,13/Q|110 — 02|2(1— H((;‘)))

+u (zit/Hon (8.3)

where Q is the rectangle on which the image is deﬁned, and H is the Heaviside
function.

In our implementation, the evolving surface ct is updated over the entire image, so
every iteration is an 0(n2) process. Given the relatively large size of our ultrasound
images (733 x 471), in order to keep the computation time reasonable, we downsize
the image to 30%, and define a rectangular region of interest around the ground truth
and restrict. the evolution of (f) over that region only. See Figure 8.5.

The algorithm was initialized by sealing down the ground truth by a factor of 0.7.
However it. should be mentioned that since the Chan-Vese model performs segmenta-
tion without a built-in edge detector (but rather by seeking the largest homogeneous
regions), the snake is fairly insensitive to initialization. 111 a different. experiment. we
initialized the snake as a circle centered at the center of mass of the kidney, with size
comparable to the kidney size. The results were fairly similar.

Figure 8.5 shows the best result (top, NSD = 14.9%). the worst (bottom. NSD =

88.6%) and a result close to the average (middle, NS!) = 38.4%). The best result is

137

 

obtained in an image without major gaps in the kidney boundary. and with a relatively
homogeneous interior of the kidney. The middle. images show one deficiency common
to all active contours without. shape prior knowledge: leakage through gaps in the
boundary. The segmentation detects boundaries reasonably, but those boundaries
happen not. to be entirely the boundaries of the kidney. However, one cannot expect.
the snake to stop where there is no abrupt. change in the intensity level. Refer to
Figure 8.5 top and middle rows. At the places where the snake leaks, the regions
inside and outside the ground truth are not distinguishable by their intensity (which
the model computes), so for all the snake knows. those oversegmented regions are
part of the target object.

The worst case occurs when the algorithm mistakenly detects the boundary of the
white tissue inside the kidney, instead of the true kidney boundaries. The reason for
this is that the algorithm is agnostic to the color of the object and the background.
As long as there is a signiﬁcant separation of two regions, the algorithm will detect
it, whether the interior is light and the exterior is dark, or viceversa.

Figure 8.6 presents twenty randomly selected images with the border detected by
the Chan-Vese model shown in red. These images are sorted in increasing order of the
normalized symmetric difference (1V SD). Figure 8.7 shows the same twenty images as
in Figure 8.6 with the ground truth traced in yellow, for comparison. It appears that
in about one third of the selected images the snake has detected the internal white

tissue as the object of interest, so this situation is not uncommon.

138

 

Figure 8.5: Segmentation of kidney ultrasound using the Chan-Vese model. Left col-
umn: binary segmentation of the region of interest. deﬁned around the ground truth.
Right column: ground truth (yellow) and traced boundary between the segmented
regions on the left (red). Top row: smallest normalized symmetric difference (NSD)
error (14.9%). Middle row: N SD = 38.4%. Bottom row: NSD = 88.6%. Average
NSD = 40.8%.

139

 

 

Figure 8.6: Chan—Vese model: twenty random samples. The samples are sorted in
increasing order of the normalized symmetric difference (NS D).

140

 

Figure 8.7: Chan-Vese model: the same images as in Figure 8.6 with the ground truth
overlaid onto the image (yellow).

141

8.2.2 The Geodesic Active Contour Model (GAC)

The GAC model (Section 5.2.1) performs border detection by propagating a front (the
snake) along its own normal with varying speed, defined so that the front propagation
stops (or is extremely slow) at. object boundaries. There is an implicit. “balloon force”
built into the model, which makes the front advance through homogeneous regions
by “inflating” the snake. Near the edges, the balloon force should be superseeded
by the “advection” force - another built-in component - which attracts the contour
to edges in the image. If there are gaps in the boundary object, the balloon force
remains dominant, however, so it keeps pushing the snake through the gaps, and the
snake appears to leak through the boundary.

Any motion along the normal (Equation (5.6)) can be implemented using level
set methods (Section 5.2). The most straightforward method is to solve the resulting
evolution equation iteratively, over the entire image domain at each iteration (the full
grid method). It turns out that it is sufficient to compute the updates only in a narrow
band along the moving snake, which makes for fast level set methods. However, in
our experiments we use the full grid method, as any of the fast methods is non-trivial
to implement. Given the large size of our ultrasound images (733x471), we scale down
the images to 30% and define a rectangular region of interest just around the kidney,
in order to speed up the front propagation process. The results are brought back up

to their original scale.

142

.. . .-"':: : ::.. ...... 0.0.0 o}

 

Figure 8.8: Segmentation of kidney ultrasound using the GAC model. Left column:
output snake. Right column: ground truth (yellow) and output snake (red). Top
row: smallest normalized symmetric difference (NSD) error (1073?). Middle row:
NSD = 22.98%. Bottom row: NSD : 49.28%. Average NS!) = 25.32%.

143

 

Figure 8.9: GAC model: twenty random samples. The samples are sorted in increas-
ing order of the normalized symmetric difference (N S D).

144

 

Figure 8.10: GAC model: the same images as in Figure 8.9 with the ground truth
overlaid onto the image (yellow).

8.2.3 The Kass-Witkin-Terzopoulos Model (KWT)

 

Figure 8.11: Segmentation of kidney ultrasound using the KWT model. Left column:
output snake.Right column: ground truth (yellow) and output snake (red). Top
row: smallest normalized symmetric difference (NSD) error (4.36%). Middle row:
NSD = 9.46%. Bottom row: NSD = 27.22%. Average NSD = 10.44%.

146

 

Figure 8.12: KWT model: twenty random samples. The samples are sorted in in-
creasing order of the normalized symmetric difference (N S D).

147

 

Figure 8.13: KWT model: the same images as in Figure 8.12 with the ground truth
overlaid onto the image (yellow).

148

8.2.4 The Interpolation Snakes Model (I-SNAKE)

This section presents experiments based on the prOposed interpolaticm snakes. We
test extensively the effect of different. parameters of the algorithm and different ini-
tialization methods. An exception is made, in that the results on all 117 ultrasound
images are shown.

Recall (Sections 6.1 and 2.3) that with this method, the active contour is given
by a cubic interpolation spline '7 = 70 where C = (C1,...,Cn) is the vector of
control points that uniquely determine the spline 7. In all our tests '7 will be a
single-component closed curve.

Recall also from Section 6.1 (Equation (6.1)) that we fit the spline to the image,

by minimizing the following energy functional:

,/
E(y) = —w,jm - 10(8) - (VI‘L, 1;) (13
g '7 l7 l
x—P—u
Ez'mg

L I
ware/0 I IV (t)|-1 Idt
\___v.—__/
E(U'C

(8.4)

+ wshape ' Eshape
On the shape energy

In Section 6.2.3 we proposed a shape energy term of the form

Eshapelﬁr’) ZA'Qféf'I/(Sllds (8.5)

where

2

0 5(2)) 2 a 2.2/(1'2 + b2), for some constants a, b is a squashing function.

149

o C is the average chamfer transform of the registered training samples ”3.1. . . . ,e/A.

fr

with base curve [1. = arg mind C ((5) (as in Section 4.2)

o e} is the snake A," registered to the base curve u by a linear transform T7,”.
The aligning transform T741. must be computed at each iteration during the
minimization process. It can be done fast. (but. sub-optimally), using semiaxes
alignment (Section 34) Procrustes-type registration cannot be done, as we do

not have point correspondences between an evolving snake and the template a.

Remark 8.2.1 We must mention at this point that we have done extensive tests
using this way of computing the shape energy (i.e., by aligning the snake 'y to the
base curve ,1}. by semiares registration). The results were reasonable, but contrary to
our expectations, more or less on par with the traditional Kass- Witkin- T erzopoulos
active contours (initialized suﬁ‘iciently close to the target boundary, but with no prior
knowledge). Both algorithms produced some 9 — 10% normalized symmetric diﬂerence
error (NSD).

We looked long and hard for a reason, and we noticed that in some cases, the
shape prior did more damage than good. We began to question the validity of our
shape model, and after some thinking and experimenting, we realized that we had to
allow for more general transforms when registering the snake '7 to the model. The
fact is that some kidneys differ from the model by a non-linear transform. In such
cases, comparing the contour against the model by applying a similarity transform
would incur an artiﬁcially large shape penalty on the snake. With a dominant shape
term in the energy (8.4), the image term would have little eﬂect, resulting in a poor

ﬁt to the image.

A remedy to this problem is to warp the shape model C’ (which is a surface in 3D)
to the target boundaries, before evolving the snake. To this end, we pre-deﬁne four

points (anchors) on the base curve a. These only depend on the shape model, and

150

therefore are the same across all test images. The user must. supply a corresponding
set. of anchors in any test image, homologous to the. model anchors. We compute next
a thin plate spline '1‘ (Section 3.2.1) from the image space to the model space, which
maps one set. of anchors onto the other, and pull back the model (7 to the image, via

T. The pull—back (denoted T*C’) is defined p(:)intwise, as

T*C(.r. y) = C(T(;r.y))

Now we can compute the shape energy of the snake as

Elfhape=ﬂs(T*C(7/(s)))ds (8.6)

where, as above, s(.r) = a24132/(r2 + b2) is the squash function.

Loosely speaking, the ﬁrst shape energy (Equation (8.5)) means using an absolutee
shape prior, i.e., independent of the data. By contrast, warping the shape model to the
image amounts to some sort of conditional shape prior, given the data. Prior to these
experiments, our ﬁrm expectation was that. one should only use the absolutee form
of shape prior. The results we present here (Tables 8.2, 8.3 and 8.4) are conclusive
evidence to the contrary: if the shape model does not account for non-linear transfor-
mations present in the training objects, then the model should be made contingent
upon the data (by warping). This fact may make sense after testing, but it was by

no means obvious a priori. We consider this one good lesson of these experiments.

Weights and normalization

N otice that in Equation (8.4) we introduce exphcrtly three weights “I’imgi ware and

wshaper one for each energy term, in addition to the pomtw1se weights u() part of

Emg.

Recall from Section 6.2.1 that, given an orientation along the curve, the weights

u‘(-) control whether the snake prefers an image gradient pointing to its left, or to
its right. i.e., w(-) determine the polarity of the selected edge. Since for the most.
part (but not. always!) the object of interest in ultrasound images is darker than its
surroundings, we take all w E 1.

The choice of the three energy weights must be done more carefully, for the simple
fact that the three (tin—weighted) energies take values in different ranges, so they are a
priori non-comparable. In order for the three energy terms in (8.4) to be comparable,
one should ensure that. they have at least the same order of magnitude, if not the
same range.

In the continuum, the gradient has inﬁnite magnitude at. a. discontinuity. In the
discrete case, however. IVII is bounded by the magnitude (call it IVIImaa: of the
gradient. at a perfect edge. In the case of gray level images, this bound is 255.

Therefore, for the image term we have the following upper bound:

J. 7’
Eimg =/7<V1 ’ —7>d8

IV |
S/IVIIds
'7

S lVIlmax ' N7

where N7 is the number of points on ’7. In all out experiments, N7 = 100 points.
Dividing by this a priori upper bound, Eimg can be rescaled to the interval [0, 1].

The internal energy

L I
Earc 2/0 llA/(t)l_1ldt

is supposed to maintain the arclength parametrization of the snake. As such, it stays
close to 0 throughout the minimization, so it can be assumed to already have range

in [0, 1], after dividing by N7.

152

 

Finally. the. shape energy F

”shape is given by the average chamfer transform (Equa-

tion (8.6)) squashed by the function 3(1) 2 (1.2.1‘2/(172 + b2). With a = l. the range
of s becomes precisely [(l. 1). The parameter b determines the width of the trough in
the squashed average chamfer transform (Figure (5.9).

With all energy terms nm‘inalized to [(l. 1]. we can take the. weights all in one range,
say [0, 100]. We have determined by trial and error (on a. small number of test images)
that, in the absence of the shape energy 11

"shape 2 0, the chmce of “'irng = u'arc

produces promising results. As long as w remains zero. it does not matter what

shape

the common value for u and (cm-(r is. This value does matter though, when we

'img
have a non zero shape term. In our experiments we set. u'ljmg = u‘arc = 50- We

determine the value of w = 0,10,20,30.

sen‘ii-exhai1stively, by trying w

shape shape

Larger values produce a snake too similar to the model, hence a poor fit to the image.

The outcomes of the interpolation snakes experiments are summarized in Tables
8.2 , 8.3 and 8.4, one for each of the three performance measures. Included in these
tables are only the values for the interpolation snakes (I-SNAKE) and classical snakes
(KWT), as the two level set algorithms (GAC and CV) were far behind due to border
leakage (see Table 8.1).

o Initialization (columns 1-4):

— Scaling. Controlling parameter: scaling factor.
— Gaussian randomization of the control points. Controlling parameter:

standard deviation.

— Distance initialization. Controlling parameter: distance from the ground
truth. The sign of the distance parameter indicates either interior initial-
ization (negative sign) or exterior initialization (positive sign).

— thin-plate—spline model initialization using four pairs of anchor points. The

shape prior may or may not. be used further during the i‘ninii’nization,

153

OW

shape

according to whether 11'

shape

0 I—SNAKE (columns 6-7):

> 0 or 11'

(column 5): weight of the shape prior term E

shape

= 0. respectively.

.9 h (1pc '

— shape model a priori warped onto the image using a. thin plate spline and

four anchor points (shape prior contingent upon the image data).

— tin-warped shape model (shape prior ind(.>}_)endent on the image data).

0 KWT (column 8).

Normalized Symmetric Difference ((76)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Scalin g St ddev Distance Model to S h ape Warped Absolute KWT
Shape Shape
Model Model
0.8 - - - 0 14.80 14.80 46.10
0.8 - - - 10 10.37 15.05
0.8 - - - 20 9.63 15.72
0.8 - - — 30 9.19 16.62
- 10 - — 0 9.38 9.38 10.44
- 10 - - 10 7.95 9.42
- 10 - - 20 7.80 10.65
- 10 - - 30 8.29 12.71
- 20 - - 0 14.21 14.21 22.01
- 20 - - 10 10.64 13.42
- 20 - - 20 9.57 14.08
- 20 - - 30 9.37 14.78
- - -20 — 0 13.22 13.22 49.35
- - -20 - 10 9.58 12.73
- - -20 - 20 9.07 14.06
- — -20 - 30 8.96 15.31
- - - 4 anchors 0 9.52 10.18
- - - 4 anchors 10 7.60
- - - 4 anchors 20 8.08
— - - 4 anchors 30 8.31

 

Table 8.2: Interpolation snakes vs. KWT: normalized symmetric

Smaller NSD means better performance.

4

difference (\ SD).

 

Hausdorff Distance (pixels)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Scaling St ddev Distance .\'IO( lel u' s h ape Warped Absolute KWT
Shape Shape
Model Model
0.8 - - - 0 31.53 31.53 48.35
0.8 - - - 10 22.20 31.44
0.8 - - - 20 19.71 31.88
0.8 - - - 30 18.46 33.82
- 10 - - 0 18.37 18.37 17.10
- 10 - - 10 15.69 18.72
- 10 - - 20 14.67 21.12
- 10 - — 30 15.68 26.10
- 20 - - 0 32.02 32.02 34.82
- 20 - - 10 23.58 28.71
- 20 - - 20 20.31 29.08
- 20 - - 30 19.12 30.94
- - -20 - 0 26.86 26.86 43.28
- - -20 - 10 20.05 25.32
_ - -20 - 20 18.05 28.18
- - -20 - 30 17.10 29.98
- - - 4 anchors 0 19.26 19.25
- - - 4 anchors 10 15.49
- - - 4 anchors 20 15.26
- - - 4 anchors 30 15.52

 

Table 8.3: Interpolation snakes vs. KWT: Hausdorff distance (H). Smaller H means

better performance.

CJI

CJ'I

 

Average Hausdorff Distance (pixels)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Scaling Stddev Distance Model “'5 h ape Warped Absolute KWT
Shape Shape
Model Model
0.8 - - - 0 9.56 9.56 28.47
0.8 — - - 10 6.87 9.96
0.8 - - - 20 6.36 10.49
0.8 - - — 30 5.96 11.33
— 10 - - 0 5.85 5.85 5.93
- 10 - - 10 5.11 5.91
- 10 - - 20 4.98 6.65
- 10 - - 30 5.27 8.10
- 20 - - 0 8.74 8.74 12.06
- 20 - - 10 6.65 8.37
- 20 - - 20 6.07 8.79
- 20 - - 30 5.92 9.47
- - -20 - 0 8.03 8.03 28.83
- - -20 - 10 6.17 7.90
- - ~20 - 20 5.86 8.82
- — -20 — 30 5.73 9.81
— - - 4 anchors 0 5.97 6.21
- — - 4 anchors 10 5.08
- - - 4 anchors 20 5.16
— - - 4 anchors 30 5.28

 

Table 8.4: Interpolation snakes vs. KWT: Average Hausdorff distance (AH). Smaller

AH means better perforn‘iance.

1

6

 

Perhaps the first thing to notice about the results reported in Tables 8.2, 8.3 and
8.4 is the consistency across the different performance measures, in the sense that for
any of the three models tested, the ordering of any two experiments remains the same,
regardless of the performance metric. It. suffices, therefore, to comment (mostly) on
the NSD table, say.

The only two cases when the classical model (KW T) competes with either inter-
polation snake counterparts is when the snake is initialized fairly close to the actual
boundaries (normal distribution, with standard deviation of 10 pixels), or when ini-
tialized from the shape model. Even so. we find the performance of the KWT model
quite remarkable. Based on Table 8.4, on average, in these two cases the KWT snake
is within some 6 pixels from the ground truth, which is not bad for a snake with no

shape memory.

The unexpected behavior of the interpolation snakes when the shape prior is not
correlated with the image data can be seen in the “Absolute Shape Model” column:

2

the performance decreases with increasing values of the shape weight w In

shape'
the best case (Gaussian initialization with standard deviation of 10 pixels and no
shape prior), the interpolation snake with absolutee shape model has N S D = 9.38%,
only marginally better than KWT’S 10.44%. Things change by a few points in favor
of interpolation snakes when the “relative” shape prior is used, i.e., when the shape
prior is anchored to the image.

Another noteworthy fact is that the interpolation snakes have a better performance
than KWT’s absolutee best (10.18%) by about 1 percent, even when initialized far
from the ground truth: 9.19% and 8.96% NSD when initialized by scaling (0.8), and
20 pixels within, respectively. This means in a real application, the user would have

more slack for initialization with interpolation snakes.

When the shape prior is used properly (i.e.. contingent. upon the image data), one

 

2Smaller numbers represent better performance.

observes the expected behavior (column 6): for any given initialization method. the
performance achieves an optinunn with some amount of shape prior, but increasing
the shape weight beyond a certain point is detrimental.

Last, but not least, we have to remark that. the parameters of the interpolation
snakes can be better tuned forjust about any of the images in the ultrasound database,
producing better results. The problem is that the optimal parameters are not the
same across all images. One image may require a certain amount of shape prior,
while for another. no shape prior would produce the best results. Tables 8.2, 8.3 and
8.4 show the best (among non-optimal) parameters over the entire data set, not. for
each individual image. The lack of a uniform set of parameters can only be due,
logically, to one of two causes: either our model is wrong (or at least incomplete), or
there simply cannot be (for any model) a common set of parameters, optimal for all
images. This dichotomy opens up a stream of philosophical questions, so for the time
being we shall only confine ourselves with simply pointing out this problem, and not

try to find an answer.

 

Figure 8.14: Segmentation of kidney ultrasound using the Interpolation snakes model.
Left column: output snake. Right column: ground truth (yellow) and output snake
(red). Top row: smallest normalized symmetric difference (NSD) error (3.71%). Mid-
dle row: NSD = 7.72%. Bottom row: NSD = 13.57%. Average NSD = 7.60%.

 

z).

(

< NSD S 5.52

Figure 8.15: Detected contours. 3.71

160

 

Figure 8.16: Red: detected contours (same as in Figure 8.15). Yellow: GT. 3.71 S
NSD g 5.52(%).

161

 

 

Figure 8.18: Red: detected contours (same as in Figure 8.17). Yellow: GT. 5.52 S
NSD S 6.65(%).

163

 

Figure 8.19: Detected contours. 6.65 S NSD < 7.87(%).

164

 

Figure 8.20: Red: detected contours (same as in Figure 8.19). Yellow: GT.
6.65NSD S 7.87(%).

 

 

Figure 8.22: Red: detected contours (same as in Figure 8.21). Yellow: GT. 7.89 <
NSD S 8.79(%).

167

 

Figure 8.23: Detected contours. 8.79 S NSD < 9.91(%).

168

 

Figure 8.24: Red: detected contours (same as in Figure 8.23). Yellow: GT. 8.79 S
NSD < 9.91(%).

169

 

Figure 8.25: Detected contours. 9.91 S NSD S 13.57(%).

170

 

Figure 8.26: Red: detected contours (same as in Figure 8.25). Yellow: GT. 9.91 S
NSD g 13.57(%).

171

8.3 Blind study

To confer clinical validity to the interpolation snakes algorithm (Section 6.1) we de-
vised a blind evaluation experiment. in addition to the objective comparison against
the ground truth presented in the previous sections.

Specifically, in this experiment a human expert 3 compares the boundaries de-
tected by several algorithms, without knowing which is which. For each of the 117
images used in our tests we draw three types of contours, each contour on a separate

copy of the image, as follows:

0 interpolation snakes (I-SNAKE) using the manual four-point model initializa-

tion (NSD = 7.95%; see Table 8.2).

o the classical snakes (KWT) initialized randomly around the ground truth with

a standard deviation of 10 pixels (N SD = 10.44%).

0 the ground truth itself (GT).

For each kidney ultrasound image I we have three pairs of images:
(II—SNAKEJKWT)’ (II—SNAKEJKWT) and (warJorl- We generate
the complete list of all 351 possible pairs, shuffle it and display each pair in turn in a
graphical interface, shown in Figure 8.27. The interface never makes reference to the
names of the algorithms whose results are being displayed.

We built the application trying to minimize the judge’s time and effort. The user
can toggle between the top and bottom images selecting this way the better of the
two, either by clicking with the mouse, or using the up/ down keys on the keyboard.
The thickness of the contours is adjustable and can be controlled by the user with
the left / right keys. A thickness of 0 is permitted, in order to allow the user to view

the image without any contours drawn. Also, the contour marked as the better of

 

3Dr. Dorry Segev, Division of Transplantation, Department of Surgery. Johns Hopkins Institu-
tions.

172

the two is drawn thicker than the other one, for easy visualization. Once a selection
has been made, the user advances to the next pair. When ready, the selections are
saved in a plain ASCII file with one row per pair, and with two columns. The winner

of each pair is listed in the ﬁrst column.

The task was clearly explained to the human expert. The ease of use of the
application did not require any training session prior to the actual test. No time

constraints were imposed.

File Firs! Sec and

5000101
sonod)

sonoZ

sonoZZ
sonol?)
sonnld)
sono97

sonoel

sonoSS

sonoZD
sonollZ
sonolld f
Info

929 733 471
x.y 734 2)::
NEWS 13117

SaveTo val-223m: u!

gun

 

Figure 8.27: Graphical user interface for the blind study comparison of three types
of kidney boundaries (I-SNAKE, KWT and GT). A human expert chose between
outputs of three algorithms, displayed two at the time.

173

Tables 8.5 and 8.6 show how many times the I-SNAKE output was preferred over
KWT and over GT, res1,)ectively.

Somewhat surprisingly, the interpolation snakes won over GT 80 times out of 117
(Table 8.6). Under a null hypothesis that both I—SNAKE and GT are equally likely
to win ([2 = 0.5), the probability of observing (due to chance) 80 I-SNAKE outcomes
72.
k
with n = 117.}; = 80 and p = 0.5 we get b = 000004359, or about 4 in 100,000. We

out of 117 is given by a binomial distribution b(k; n, p) = ( ) -p"'—k ﬁnk. In our case,

are, therefore, justiﬁed to reject the null hypothesis because there are only two logical

possibilities:

0 either it is true that I-SNAKE and GT are equally likely and the observed
outcome of 80 I-SNAKE winners out of 117 total is one of the lucky 4 in 100, 000,

07"

o the null hypothesis is false to begin with.

Similarly, in the case of I-SNAKE vs. KWT (Table 8.5), the probability of being
wrong in rejecting the null hypothesis is even smaller: b(99; 117, 0.5) = 4.862 - 10—15

(zero within machine precision), so the we reject the null hypothesis in that case as

 

 

 

 

 

well.
winner loser
KWT I-SNAKE Total
KWT 0 18 18
I-SNAKE 99 0 99
Total 99 18 117

 

 

 

 

 

 

Table 8.5: Blind comparison: KWT vs. I-SNAKE

The complete results of the blind study are shown in Table 8.7. For each type of
contour (GT, KWT and I-SNAKE) there are three rows. To ﬁx the ideas, let us look

at the I-SNAKE algorithm as winner. The top row shows how many times I-SNAKE

174

 

 

 

 

 

 

 

 

 

 

winner loser
GT I-SNAKE Total
GT 0 37 37
I-SNAKE 80 0 80
Total 80 37 117

 

 

Table 8.6: Blind comparison: GT vs. I-SNAKE

won over GT and KW T: 80 and 99 times, respectively, making for a total of 179

wins out of a 351 comparisons. The middle and the bottom rows represent the row

percentage and the column percentage of the entries in the top row in each group of

three. Thus, for instance, I-SNAKE won 179 times out of 351 comparisons (51.00%).

Of these wins, 80 were over GT (44.69%) and 99 were over K WT (55.31%). Reading

on columns now, I-SNAKE lost a total of only 55 comparisons out of 351 (15.67%) of

which 37 times it lost to GT (67.27%) and 18 times to KWT (32.73%).

 

 

 

 

 

 

 

 

 

 

 

 

I winner loser J
| GT KWT I-SNAKE TotalJ
GT 0 82 37 119
row percentage 0.00 68.91 31.09 100.00
column percentage 0.00 45.30 67.27 33.90
KWT 35 0 18 53
row percentage 66.04 0.00 33.96 100.0
column percentage 30.43 0.00 32.73 15.10
I-SNAKE 80 99 0 179
row percentage 44.69 55.31 0.00 100.00
column percentage 69.57 54.70 0.00 51.00
Total 115 181 55 351
row percentage 32.76 51.57 15.67 100.00
column percentage 100.00 100.00 100.00 100.00

 

 

 

Table 8.7: Blind comparison: complete results. On rows, for each of GT, KWT and I-
SNAKE categories: frequencies (top), row percentages (middle), column percentages

(bottom).

8.4 Experiments on Synthetic Images

\Ve leave now the ultrasound domain and present. the tests on synthetic images. Table
8.8 summarizes the numerics of the four algorithms (CV, GAC, KWT and I-SNAKE)
each under four different types of noise and under different initializations. It is similar
to Tables 8.2, 8.3 and 8.4, and we shall not describe its elements in detail.

In general, the results are far better on synthetic images than on ultrasound,
despite high levels of noise. In the case of CV, GAC and I-SNAKE, there is a signifi—
cant. decrease in the performance under spread noise, than compared with Gaussian,
salt-and-pepper and uniform noise, which can be attributed to the fact that all the
algorithms have a built-in smoothing filter, which removes to a large extent the gaus—
sian, the salt-and—pepper and the uniform noise, but are not as effective on the spread
noise.

W’hen initialized at some distance from the target boundaries, KWT snakes per-
form very poorly, with or without noise, and this should come as no surprise. When
initialized randomly, with standard deviation of 10 pixels from the true boundaries,
the classical snakes (KWT) perform almost perfectly under all noise conditions, and
we explain this based on the proximity to the ground truth more than anything else.

Note that initialization at about 10 pixels from the ground truth produced also
the best KWT results in ultrasound images as well (but no better than I-SNAKE).
This observation reiterates the fact that the apparently good performance of KWT
in ultrasound is due not so much to the merits of the algorithm per se, but rather
to initialization. The interpolation snakes have better performance than KWT when
initialized within 10 pixels from the ground truth, or when using the 4-anchor model
initialization method, but also have comparable performance to KWT even when
initialized at a significant distance from the ground truth (either at distance 20 pixels
from the ground truth, or by a 80% downscaling of the true boundaries). Compare

Table 8.8 with Tables 8.2, 8.3 and 8.4).

176

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Algorithm Noise Scaling Rand NSD (%) Hausdorff Av. Hausdorff

CV gaussian 0.7 3.4 7.83 3.31
8&1) 3.58 8.46 3.54

uniform 3.10 7.91 3.29

spread 9.81 17.69 6.76

GAC gaussian 0.7 5.97 10.47 4.44
s&p 7.84 11.25 5.52

uniform 7.06 10.10 5.08

spread 16.22 22.67 10.60

KW T gaussian 0.7 68.02 86.82 56.09
step 68.11 86.39 56.18

uniform 68.78 87.02 56.94
spread 62.67 85.11 50.66

KWT gaussian 10 1.62 8.09 2. 71
s&p 1.58 8.10 2.69

uniform 1.55 8.01 2.69

spread 1.41 8.03 2.65

I-SNAKE gaussian 0.7 2.16 5.63 2.82
s&p 2.64 7.60 3.00

uniform 2.36 6.78 2.87

spread 5.94 13.01 4.74

 

Table 8.8: Results of the active contour algorithms on synthetic images, under differ-

ent conditions of noise (column 2), and initial conditions (columns 3 and 4).

We present next typical images for each of the 5 categories in Table 8.8. See
Figures 8.28, 8.29, 8.30, 8.31 and 8.32. In each case, the images are arranged in a
4 x 3 display. On each row there is a different type of noise. The left column shows the

initial snake, the middle column shows the ﬁnal result, and the right column shows

also the ﬁnal result (red) with the ground truth overlaid on top (yellow).

177

 

 

Figure 8.28: Chan-Vase model on synthetic images. Left column: initial snake
(scaling to 0.7 - GT). Middle column: ﬁnal snake. Right: ﬁnal snake (red) and
GT (yellow). On rows, top to bottom: Gaussian noise, salt-and—pepper noise. uni—
form noise. spread noise. Original images were cropped and scaled down for display
purposes, so some noise was lost in the process.

178

 

Figure 8.29: Geodesic Active Contours model (GAC) on synthetic images.
Left column: initial snake (scaling to 0.7 - GT). Middle column: ﬁnal snake. Right:
ﬁnal snake (red) and GT (yellow). On rows, top to bottom: Gaussian noise, salt-and-
pepper noise, uniform noise, spread noise. Original images were cropped and scaled
down for display purposes, so some noise was lost in the process.

 

Figure 8.30: Kass-Witkin-Terzopoulos model (KWT) on synthetic images.
Left column: initial snake (scaling to 0.7 - GT). Middle column: ﬁnal snake. Right:
ﬁnal snake (red) and GT (yellow). On rows, top to bottom: Gaussian noise, salt—and-
pepper noise, uniform noise, spread noise. Original images were cropped and scaled
down for display purposes, so some noise was lost. in the process.

180

 

Figure 8.31: Kass-Witkin—Terzopoulos model (KWT) on synthetic images.
Left column: initial snake (Gaussian randomization with standard deviation 0 = 10
pixels). Middle column: ﬁnal snake. Right: ﬁnal snake (red) and GT (yellow). On
rows. top to bottom: Gaussian noise. salt-and-pepper noise, uniform noise, spread
noise. Original images were cropped and scaled down for display purposes, so some
noise was lost in the process.

181

 

Figure 8.32: Interpolation snakes model (I-SNAKE) on synthetic images.
Left column: initial snake (scaling to 0.7 - GT). Middle column: ﬁnal snake. Right:
ﬁnal snake (red) and (1'! ‘ (yellow). On rows, top to bottom: Gaussian noise. salt—and—
pepper noise, uniform noise, spread noise. Original images were cropped and scaled
down for display purposes, so some noise was lost in the process.

182

8.5 Border Detection Using 1D Break-Point De-
tection.

We return in this section to border detection in ultrasound, using this time the 1D
break-point detection algorithm from Chapter 7. The process involves the following

steps:

1. Manually define a region of interest (ROI), by drawing a circle around the
target object, centered roughly at the object center of mass. Move outward
along equally spaced rays from the center of the ROI. In our application we
used 50 between rays. The intensity proﬁle along any of the rays is obtained

by averaging over a sector of a certain angle (user deﬁned).

2. Apply the 1D break point detection algorithm from Chapter 7 with a single
break point, along each of the radial intensity proﬁles (Figure 8.33 (a)). We are
only interested low—to-high intensity points (marked with red), as the kidney

interior is in general darker than the background.

3. Some of the (correctly) detected break points are not on the target boundary.
We try to eliminate them by using a shape model, in several steps. First,
ﬁt the shape model to the red points using non-isotropic similarity transforms
(translation, rotation, and different scales in the a: and the y direction, but no

shear). The measure of ﬁt is given by the chamfer transform on the red points.

See Figure 8.33 (b).

4. Deﬁne a tubular neighborhood along the ﬁtted model (Figure 8.33 (c)) The
width of the tube is proportional to the standard deviation of all the red break
points from step 1. Extract. intensity proﬁles within the tube, along segments

normal to the ﬁtted model.

183

5. Re—compute the break points on the newly deﬁned intensity proﬁles (Figure

8.33 (d)).
6. If need be, repeat steps 3 to 5.

7. Use a smoothing spline [29] to approximate the object boundary, based on the
detected break points (Figure 8.33 (e)). We employed a smoothing spline al-
gorithm that. automatically determines the breaks and the control points. It
only depends on a single parameter called smoothing factor. When the smooth-
ing factor is zero, the smoothing spline goes through all the data points, being
nothing more than an interpolation spline. As the smoothing factor increases,

so does the smoothness of the spline, at. the expense of missing the data points.

The results are promising in many cases. Sometimes the ﬁtted model (step 3) is
so good that the tubular neighborhood reﬁnement and the smoothing spline are not
necessary. When the tube is necessary though, it must be sufﬁciently wide, to cover
completely the target boundary. This condition, in turn, depends on how close the
model ﬁtted to the break points (step 3) comes to the target boundary (not to the
breakpoints!) If in some regions, the model is too far from the kidney boundary, the
tube will miss that region, and no break point will be detected there at step 5. Last,
but not least, we have observed experimentally that the smoothing factor of the ﬁnal
spline varies greatly from case to case. It may be possible to solve this latter problem
by computing a family of splines over an entire discrete range of the smoothing factor,
and by choosing the optimal one, with respect to some optimality criterion (yet. to be

determined).

184

 

Figure 8.33: 2D border detection using radial 1D break-point detection. (a) Break
points along 72 rays starting from a user—deﬁned center. The actual intensity proﬁles
not drawn. Red: dark-to—light break point. Blue: light—to—dark break point. (b)
Shape model (green) ﬁt to the. red break points. (0) Tube (cyan) across the ﬁtted
model (green). (d) Break points on the tube intensity proﬁles. (e) Smoothing spline
on the red points from (d). (f) Smoothing spline (beige) and ground truth (yellow).

185

8.6 Miscellaneous Images

We. present here an application of the interpolation snakes framework to real-life
images. Since we are not. using any full ﬂedged general purpose object detector,
manual initialization of the snake is still required.

Speciﬁcally, we test the interpolation snakes on eight images of a truck with vary-
ing background and occlusion. As with the kidney images, we first build a model.
Since this is not a comprehensive test, we only build a fast and frugal model: we
simply take as our model the distance map built on a manually traced truck contour
in just one of the eight images. The reason this is justified is that the shape variations
can be described mostly by afﬁne transforms (no non-linear shape variations, as is
the case with kidneys), as the truck is a rigid object, photographed in near proﬁle.
The afﬁne transforms are factored out by the registration process that takes place at
every iteration during the snake evolution.

For the results, refer to Figures 8.34 (initialization) and 8.35 (output). Observe, in
the ﬁrst place, that detection of a light object on a dark background using a clockwise
oriented snake requires negative weights. This is being visualized in the images by
marking the control points with a red minus sign. In the case of kidneys, the object
was generally darker than the background, and the control points were all positive.

Each of the bottom four images contain occlusion, which the snake with shape
memory can handle successfully. Also, the snake seems remarkably robust, remaining
undisturbed by the trees in the background. This is one application which can lead
to a spin-off of our work in ultrasound, aimed at the recognition and the tracking of

vehicles.

186

 

Figure 8.34: Interpolation snakes on truck images. Initialization of interpolation
snakes.

187

 

Figure 8.35: Interpolation snakes on truck images. Output of interpolation snakes.

188

Chapter 9

Model Fitting For Fingerprint

Classiﬁcation

we present in this chapter a simpliﬁed version of the work that we did in the area of
ﬁngerprint classiﬁcation [78]. It was done prior to our work on active contours, so it
does not involve interpolation snakes. However, chronologically and conceptually this
work is the precursor of our snakes, as it is still based on the ﬁtting of a shape model
to an image. The nature of the problem requires that the shape remain roughly
unchanged for classiﬁcation purposes, so the ﬁtting is done via affine transforms,

rather than by free-form deformations as in the case of snakes.

9. 1 Problem Deﬁnition

Fingerprints are biometric characteristics of humans, that consist of ridges and fur-
rows at the tips of the ﬁngers. There is evidence [37] that human awareness of these
patterns predates Christianity. Since the last century ﬁngerprints have been used
systematically for authentication, particularly in criminal investigation. Based on
empirical evidence [37, 38] it is widely accepted that ﬁngerprints uniquely identify an

individual. Unlike other authentication methods (such as face or voice recognition,

189

which may not distinguish identical twins), ﬁngerprint authentication is unequivo—
cal. lVloreover, ﬁngerprints do not. change over time, which gives them an important
advantage over other biometric measurements. Lastly, in comparison to other biomet-
rics tecl'miques (such as retinal scan and iris scan), ﬁngerprint acquisition is relatively
simple and inexpensive. All these characteristics make the use of ﬁngerprints one of
the most pervasive methods of authentication, both in the forensic sector as well as

in the commercial sector (e.g. access control).

Given the astronomical size1 of ﬁngerprint databases, it became apparent that
an indexing methodology which would assign ﬁngerprints into a small number of
categories was in order. The system developed at the end of the 19—th century by
Sir Edward Henry is the basis of modern automatic ﬁngerprint identiﬁcation systems
(AFIS) in the majority of English-speaking countries, and serves the purpose of what
is usually termed as (primary) classiﬁcation of ﬁngerprints [13, 51, 62, 60]. The system
adopted by the FBI contains eight basic ﬁngerprint patterns [37, 38]: arch, tented
arch, left loop, right loop, whorl, central pocket loop, double loop and accidental. We
shall conﬁne ourselves here to only four major classes of ﬁngerprints 2: arch (A), left

loop (L), right loop (R) and whorl (W). See Figure 9.1.

There have been several methods proposed for ﬁngerprint classiﬁcation, including
(but not limited to) syntactic methods [90], singularity based methods [62, 64], and
neural network methods [12]. For a more detailed literature review see for instance

[13, 51, 60] and the references therein.

We address the problem of automatic ﬁngerprint classiﬁcation into one of the 4
types (A, L, R, W) using a novel method which we term kernel ﬁtting, and provide

detailed classiﬁcation results.

 

1There were 810,188 records (containing 10 ﬁngerprints each) upon the formation of the Identi-
ﬁcation Division of the FBI in 1924 [37]. Currently there are 226 million cards on ﬁle, representing
about 79 million individuals [36].

2However, the method we propose can be readily extended to more classes of ﬁngerprints.

190

 

(b) Left loop

(3) Arch

 

(d) Whorl

(c) Right loop

Figure 9.1: Four major classes of ﬁngerprints.

191

9.2 Mathematical model

9.2.1 Flow ﬁeld

We model the ridges of a. ﬁngerprint as curves in the plane, and deﬁne the ﬂow ﬁeld
as the direction of the tangent to these curves at each point 3. Several methods have
been proposed in the literature for the extraction of the ﬂow ﬁeld [17, 63, 64]. We
employ the method developed by Hong [59]. See Figure 9.2. Thus, conceptually,
we think of the flow ﬁeld as a unit vector ﬁeld in the plane, or, equivalently, as its

argument t3 6 [0, 7r) 4 (the angle between the direction of the vector and the x—axis).

9.2.2 Fingerprint kernels

We deﬁne class-speciﬁc kernel curves based on the original deﬁnition of ﬁngerprint
classes of Sir Edward R. Henry [49]. We model mathematically Sir Henry’s deﬁnitions
of ﬁngerprint classes as follows. Let the kernel of the whorl be the unit circle. The
kernels of the other classes we deﬁne empirically, using splines (see Figure 9.3). For
each class we learn a kernel model by performing a generalized Procrustes analysis

(Section 4.1) on 20 manually traced splines with homologous control points.

9.2.3 Kernel ﬁtting

Suppose we have a smooth vector ﬁeld V deﬁned over some region in the plane 1R2.
Let 7(t) = ($(t), y(t)) be a parametric curve in R2, let "7 be the tangent to 7, and 0
its argument. As before, let [3 be the argument of V. See Figure 9.5.

Deﬁne an energy functional that captures the difference between the direction of

‘y and that of the vector ﬁeld V at. the point 7(t) by

 

3Except at a small number of points, where the curve is not smooth (e.g. bifurcations).
4There is no canonical orientation of the ridges, so 1'3 6 [0, 7r) rather than [0, 2n).

192

 

——'—h-\_“q‘\‘—_“‘—_i;—“'\“‘-.- -_.__—_.\- -\.\-—_\:—-— 53:5

 
  
 
 
  

——Jw~¢-r<-—;~..\:~;x.;x.:»;-~..'-~.:‘;-—4--~'—-~‘-~\

 

—-'—=«\*-4--:\\;-._a,:x:~._\,_x_\:-—:—;-—;-—.\
_—-'-=-=-:-—:~\\:~:x_\\:~.;\\.;\\:~¢~:-¢\
——“'—*~>«b<\:\:~><\:\:s:~:s:\#<\>\fxﬁt
—-'--‘-—:-—¢~.r\.‘\‘--~,\.‘\;\\\;\;~-.~\x‘xg‘x.‘
“ﬁves—“\eoxrx.\-\\\.‘-g\\._\.\\

   

[\

  
 

.\ \~.\'\ "'\\\.\\‘

«'r/ , _"“"<‘~-‘\-‘\‘~.\‘~-..\..\\\v—._\,\\;~9.;-
7/ f'; /- ’"ﬁwwxx\\\x\xxx\\x\x\
77f: . X if ?f , “““HX\\\\\K\\\\X\\\\
L' /’ " '7' f' r / h/x‘r‘" —*4--\\‘-\‘\‘\\\ \.\ \.\\\\~.‘~~.\\_
r? /'/ ' I," ' / ,ti 2’.- ,.' , / j 1" /,- // ////,_rc_- “\\__-.\-\h\\\ \\ \\ \\ ~\\\\\\.,\..\\\N\~\-\\‘\
777 3/ «"1/ '1' f / ‘ M “<X\‘\.\\_\\ \\\'~\\,‘~.\.\\\.\
,V//////// _/ “~x\\\\\\\\\\\\\\\\
’///////// / »\\\\\‘\\ \\\\\\\
'i/ / /: "i / "I 1‘: x”. ‘\\\ ‘\ \ h :5 :K‘ \ \ \\\\f~.\\
f. /"/'/ ’I ’/ f/ / / \ \ \\ "a ‘ \\ ‘\ i\ \\ \. _\\ \,\\
Liz/r / w, /. ,xm x \\.\.\q x

l" y") I/ f/ // 2" /. J". \n \ \~ '\ \\ \ \

,5 x :4 ,1/5 ‘ ‘ ‘\_ 3., , ., ._ \ .\

a: ,0 / / IO”, /. \_\. x. \\\ \,x\

0’ ‘- \\\\ =.\

,/ ' / ., .\‘
'./'

\. \ ‘x.\\.\\\.\\‘\\T\_\.‘= -
\ s « ‘ \ .
\ 3‘. .'\. N‘s. ‘\.\\\\\‘\. \\I-\.\1\-\:\.\ﬁﬁ-

 

"x \ \‘x‘ \.\‘ C\
\ x. '~ \\\ \\_\\ \3 \-.\~\\."\.\\
. ‘4 5“ ‘ .\N \‘x \.‘\\\ ‘-.._‘-\ \ \\__‘\" \\
r” -’"./”.»-"',/,/,/ 1’ / " . "' “““-‘\“\‘\\ \"R‘ \‘x \ \‘\“-\\
‘ -'.'./,--’_.--’ /;/,/‘,/, , , , _ -_ --,. - - “-=~:-;\‘\.\;\,\c\.\c\;\\.\:\
.- ‘r' “air/'7“xi/'1"ir”/’.r"l~‘"2~’C-"S-F”FH°"—" w—a—e—sce-x_\\_~\;\:\-~¢<—~
/‘ _,.p:.4—c_.r-=——7_I—F_..—F_F.# / f _I_/ / fr _,.r.__,-f_,,.-‘_F.r_,_.-r,__-c—m-P—— W—aquh‘x\ \.\\\\~.‘5~.‘\.:~.

  

—hlh - - -_ -_
h,‘—\_\~.5;._‘ _‘ \nxr—‘ahh

 

—_._¢. ._ ._ a w
—. «hr- -- -:—:—s

ﬁ?—i—¢r;x;x;x;-:Hr-W —..—_._.—__;___-_g-_‘——.——-—-—-—
Mk; 4,1.“ 1: ___, _,_,._._,,_._.=_.=M “.-.—Ira..—

—m__,__ .——:——:.—F_.-:_—:..—a—D_.M

 

wwa—FH—hgm

 

 

_M“ﬁ—~mﬁ_m —
_m a
‘— _-_P.—F_‘—-d—~—‘ _ﬁm‘w‘m—

 

(b) Flow ﬁeld

Figure 9.2: Extraction of the flow ﬁeld from a ﬁngerprint impression. (a) Original
(scaled down to 0.3 of its original size, to ﬁt the page). (b) Flow ﬁeld.

193

 

(c) R—loop (d) Whorl

Figure 9.3: Deﬁnitions of ﬁngerprint kernels using splines. Yellow: control points.
Red: kernel. The kernels model the shape of the ridge pattern around the center of
the ﬁngerprint.

194

t;

. I
. ... --' ﬁb‘éf'ﬁ"
o" "awn,“ \,'.-‘
’- muz'.-~-'
[...

.
...

 

(e) (0

Figure 9.4: Procrustes model for learning ﬁngerprint kernels [78]. (a)—(d) Manually
traced left loops in ﬁngerprint images. The yellow dots are the landmarks, homologous
across all images. Only four out of 20 ﬁngerprint impressions are shown here. (e) All
20 training contours extracted, unregistered. (f) Registered training contours (green)
and their mean (purple).

195

 

(9.1)

   

ﬂow ﬁeld
line Y

kernel
curve

Figure 9.5: The angles a and [3 deﬁning the measure of ﬁt (Eq. (9.1))

The denominator in Equation (9.1) is a normalization factor and rules out the
trivial solution (when 7 degenerates to a point).

If 'y is allowed to have an arbitrary shape, then the minimum in Equation (9.1)
is zero, and it is achieved by any curve 7 solution of the ordinary differential system
'i/(t) = V('y(t)). However, we want to capture patterns of a speciﬁc shape, so we apply
anisotropic similarity transformations to any of the kernels deﬁned in section 9.2.2,
and minimize the energy in Equation (9.1) over these transformations. Speciﬁcally,

we consider transformations of the form:

T(x)=R,,,-D-z+r Vx€R2 (9.2)
a
In Equation (9.2), R¢ is a rotation of angle (f) about the origin. D =
0 b
is a scaling matrix and 7' = (733,711) is a translation vector. If 70 is any of the

196

kernels being ﬁt to an image. and '7 = T(m) for any transform T, then the ﬁt energy
(Equation (9.1)) becomes simply a ﬁve-variable function E(T) = E(a, b,¢, 755,731).
Minimization of this function is done semi—exhaustively, by testing every quintuple of
parameters within an appropriate discrete range, common to all ﬁngerprints in the
database. The reason for this is that the density of the ridges produces many local
minima, while the absolute minimum is crucial to our method.

To classify a ﬁngerprint impression we ﬁt all four kernels to the image and assign
it the type with the best kernel ﬁt. The problem is slightly more complicated, because
any ﬁngerprint contains an arch, so an arch kernel will always be a good ﬁt (see Figure
9.8 (c)). The basic idea is then, to consider a ﬁngerprint to be an arch only if no other
kernel has a better (or comparable) ﬁt than the arch. Because the non—arch kernels
must be ﬁt inside the arch, we term them sub-kernels. For the full details, see [78].

Figure 9.6 shows two impressions of the same ﬁnger and the whorl kernel ﬁt to
them. The ﬁt is consistent, in the sense that an ellipse of the same size is ﬁt to both
impressions, and it is positioned at about the same region in the ﬁngerprints. Note
also, that a very good ﬁt is found in (a), even though the kernel curve is not entirely
contained in the image. This is the effect of the denominator in the deﬁnition of the

energy function (Eq. (9.1)).

 

Figure 9.6: Elliptic kernel ﬁt to two impressions of the same ﬁnger.

197

9.3 Experimental Results with Fingerprints

We used the NIST 4 database [40] to test the proposed method. The database
consists of 4000 ﬁngerprint impressions, of size 512 x 512 pixels. Each impression
was labeled as one of the “arch”, “tented arch”, “left loop”, “right loop” and “whorl”
by a human expert. Some of the images were ambiguous and had more than one
label. We considered a ﬁngerprint to be correctly classiﬁed if the label assigned by
our method was among the labels assigned by the human expert. Furthermore, since
in the natural distribution of ﬁngerprints the arch and the tented arch combined
make up for approximately 5% of all ﬁngerprints [36, 51], we classiﬁed both “arch”
and “tented arch” simply as “arch”. However, we did not alter the distribution of
the classes in the NIST database, which is uniform. Our algorithm takes under 0.3
seconds to classify a single image on a Pentium III 800 MHz processor, which yields
a total time of under 20 minutes for the entire database of 4000 images. Under these
conditions, with no rejection option, we obtained a classiﬁcation rate of 91.25%. Table
9.1 presents our results in comparison with other classiﬁcation methods. In the “4
classes” column the arch and tented arch were placed in the same category, whereas

in the “5 classes” column, the arch and tented arch were considered distinct.

 

 

 

Algorithm 4 classes 5 classes
Kernel Fitting 91.25% —
MASK [13: 87.1%

 

 

Karu & Jain [62] 91.1 % 85.4%
Jain et. al.(*) [60] 94.8% 90%

 

 

 

 

 

 

Table 9.1: Comparison between various ﬁngerprint classiﬁcation methods. (*) On
2000 ﬁngerprints, with a rejection rate of 1.75%.

Table 9.2 presents the confusion matrix. Recall that in the NIST database some
of the ﬁngerprints had two true labels. In building the confusion matrix we counted
these images once for each of the true labels. The true labels are along the rows.

We have identiﬁed two main reasons for the classiﬁcation errors that our approach

198

 

Table 9.2: The confusion matrix.

makes. The ﬁrst type of error is due to the curve ﬁtting itself: a wrong kernel may
have a slightly better score than the true kernel (Figure 9.7). Under the arch - which
can be found in any ﬁngerprint (the thick continuous curve near the center of the
image) - we ﬁt sub—kernels to determine the class of the ﬁngerprint. The two runners-
up are shown under the arch. The two sub—kernels have very close scores, but the
wrong type happens by chance to have win by a small margin, and the ﬁngerprint is

misclassiﬁed.

 

Figure 9.7: Classiﬁcation error due to wrong best-ﬁt kernel.

The second type of error is not due to the curve ﬁtting process, but rather due
to the flow ﬁeld. In some cases, the ﬁt is good, yet the test sample is misclassiﬁed
due to the fact that the ﬂow ﬁeld does not provide a faithful representation of the
ﬁngerprint (Figures 9.8 (a) and (b)). This, in turn, is due to the loss of information

incurred when the ﬁngerprint is represented as a ﬂow ﬁeld.

199

 

(b)

Figure 9.8: Sources of classiﬁcation errors. (a) Right loop ﬁngerprint with an arch as
kernel of best ﬁt. (b) The same kernel (dark segments) overlaid on top of the ﬂow
ﬁeld of the impression in (a). The ﬁt is good, but the flow ﬁeld alone does not show
a right loop pattern.

200

Chapter 10

Conclusions and Future Work

This thesis represents our understanding of fundamental problems in computer vision

such as shape learning and shape and border detection.

Central to our work is the very notion of shape. This concept is by no means trivial
to deﬁne, particularly when a mathematical formalism must be developed before any
computations can take place. What is “shape”? When should two shapes be consid-
ered similar? How do we compare them? Can “shape” be learned? What groups of
deformations should be allowed for any given shape? Certainly, these questions have
been asked (many times) before, but in our opinion they have not been answered
in the most satisfactory or general way. Sometimes simplifying assumptions can be
made, such as the existence of homologous points across an entire set of contours
(e.g., anatomical landmarks). In all but the simplest cases, point correspondences
are not known, or not easy to establish. Even when the 3D objects whose boundaries
are being compared do have distinctive landmarks, occlusions create the problem of

partial matching, whose naive, brute—force solution has combinatorial complexity.

When point correspondences are not known, matters become signiﬁcantly more
difficult and more confusing. A number of researchers try to recover point correspon-

dences between two curves by warping one curve onto the other. Of course, since

201

any simple closed curve can be deformed in inﬁnitely many ways onto another simple
closed curve, one has to choose an 01.)timality criterion in order to single out one
particular warping.

Edge detection using active contours has seen a great. deal of research, in the past.
twenty years. It has to be pointed out from the beginning, that the SCOpe of this
sort of boundary detector is rather limited, due to the fact that it. requires manual
initialization. This is the reason why these techniques have had most success in
medical image processing, where some amount of human intervention is acceptable
and where the space of allowable contours is generally greatly reduced, when compared
to natural images.

By and large, existing active contour models are low-level algorithms, i.e., they act
at a perceptual level that normally should form the basis for higher level reasoning.
These models do not (or cannot!) assume any knowledge about the shape of the
object they seek, and invariably are attracted by extraneous distractors.

The point made in this thesis is that statistical shape information can prevent
sufﬁciently large, accidental deviations from the true boundaries, and can compensate
for incomplete image information, but this must be done with some care. We examine
under what circumstances, and to what extent shape memory is beneﬁcial, and we
support our conclusions with strong evidence.

Our approach to incorporating a shape prior into the snake evolution is to mini-

mize an energy functional of the form

E(“I’l = Eimg + Eint + Eshape

where Eimg quantiﬁes the ﬁt of the snake 7 to the image, Eint keeps the points

on the snake aligned and equally spaced, and E is the term that penalizes for

shape
deviations from a previously learned shape model.

The experiments suggest, perhaps contrary to intuition, that absolute shape mem-

202

ory (i.e., without regard to the reality of the image data.) is detrimental to a l;)oi_n'1(‘la.ry-
seeking curve, at least for shapes that are insufficiently described by linear transforms,
such as human or animal organs. The same experiments show that shape knowledge
is helpful only when it incm‘poratcs some image information as well. This was one
unexpected conclusion of our experiments.

These higli level conclusions are anchored in the reality of ultrasound images,
which was in fact the driving force behind this research. In the process, we came
across many practical issues, which turned out to be. full blown theoretical problems
rather than simple hacks. Most notably, we had to deal with the problem of shape
learning without assuming point correspondences. and come up with a model with
low computational complexity that could be used online, without signiﬁcantly slowing

down the active contour.

10.1 Contribution

We summarize here some of the teclmical contributions towards building an active

contour model with shape memory, speciﬁcally geared toward noisy images.

0 We reduce the search space for the minima of the snake energy E (7) from an
inﬁnite to a ﬁnite dimensional space of functions (curves). by working with
interpolation splines, rather than with arbitrary curves (Section 6.1 and [77]).
Taming the snake energy from a functional to a function of 2k variables reduces
the inherent difﬁculties related to its minimization. For one thing, computation
of the Euler-Lagrange equations is not. necessary, since minimization "an be

carried out using well understood, well studied optimization routines.

Furthermore, interpolation splines (as opposed to other types of splines) allow
the user to set hard constraints (e.g., keep one or more control points ﬁxed

during the snake evolution), because interpolation splines pass through all their

203

control points by deﬁnition. This would not be possible, for instance, with

B-splines. whose control points do not lie on the spline itself.

We make the image energy term ‘- 1- mg depend on the angle between the curve

and the image gradient, rather than just the gradient I'nagnitude. This allows
control over the polarity of the edge the snake is being attracted to. Speciﬁcally,

the term

1
Eirmg Z —/0 |V1|dt

is replaced with
L i"
E; ‘ =/u:(s)-(V1 ,——>
17719 7 ll’Il

We drop the original internal energy

1
E... = [O m + aids

since the interpolation splines are smooth by construction. We replace it with
a term (still denoted by Eint) that keeps the points equidistant on the curve

by penalizing for deviations from the arclength parametrization:

I
E... = f | I“. l —1Ids
,

This addresses the problem of arclength parametrization effectively and in a

principial fashion, quietly treated as “implementation detail” elsewhere.

Regarding shape learning, we propose a fast and natural alignment method for

simple connected curves, that does not assume or require point correspondences.

204

We term this method semiares alignment (Section 3.4). For a discretized plane
curve 7, we deﬁne the semiaxes to be its modes of variation. that. is, the eigendi-
rections of the covariance matrix computed on the (ﬁnite) set. of points resulting

from the discretization of ’7.

We introduce the notion of average chamfer transform (Section 6.2.3), and use
it. as a shape model, by measuring the simultaneous proximity of a (test) curve,

to an entire set of training contours, hence, implicitly, shape similarity.

We propose an algorithm for segmenting a 1-dimensional piecewise linear signal
perturbed by noise, unrelated to active contours (Chapter 7). This algorithm
can be used as an edge detector, by applying it along rays starting from the
center of an object of interest, but we predict it will have important standalone
applications. Currently, the number of constant pieces in the input signal must
be speciﬁed a priori. We do not have an automatic way of detecting this num-
ber, but we suspect that there might be a way to do so by using a minimum

description length (MDL) criterion.

We propose a method for ﬁngerprint classiﬁcation using again a shape model
(Section 9 and [78]) and demonstrate good performance in a ﬁrst implementa-

tion.

10.2 Future Work

The insight we have gained through this research raises more questions than it pro-

vides answers. We have some understanding on what shape is, and adding one extra

term to the energy functional of a boundary seeking contour is but one way to incor—

porate the shape memory into the model. But this is by no means the only way, and

we believe there may be better ways, which we certainly plan to explore.

205

First and foremost, we intend to pursue the boundary detector based on the 1-
dirnensional break-point. detection algorithm (Chapter 7), as we believe this approach
would have better chances for success. The idea would be to detect. the jumps in the
intensity proﬁles along rays starting out. from the center of the region of interest,
and keep only the strongest. and the most reliable boundary points. We would be
left with disparate segments along the target boundaries, which somehow have to be
interpolated to encompass the entire object of interest. A shape model would play an
important. role. and we anticipate having to deal with the problem of partial matching
(which parts on the model correspond to the detected segments along the boundaries).
We have already done preliminary experiments using this approach (Section 8.5), but
more work and a systematic evaluation of the performance is needed.

Secondly, the work on the curvature-based shape representation (Section 2.5)
needs to be completed by assessing the discriminatory power of the induced shape
similarity metric, in comparison to other known similarity metrics.

There are a number of extensions of the interpolation snakes framework proposed

for border detection that can be pursued.
0 Apply to other ultrasound problems, speciﬁcally left ventricle segmentation.

o Extend to 3D and 2D+time (tracking). Motion video consists of a sequence
of 2D frames that are perceived as continuous motion by the human eye when
played at a sufﬁciently high rate. Detecting an object in one of the image and
following it in time accross the sequence of frames is the problem of object
tracking. When these frames are viewed simply as a stack of 2D images, we
are talking about detecting an object in 3—dimensional data. In the case of
a video clip, the three dimensions are two spatial dimensions (a: and y), with
time as the third dimension. One application that doctors would like to have,

we are told1 would be to have the contour of an organ displayed at. all times,

 

1According to Dr. Dorry Segev. Division of Transplantation, Department of Surgery, Johns

206

 

 

in real time. during the actpiisition of an ultrasound clip as the transducer
11'1(,)ves. Needless to say, there are numerous 3D applications in which all three
dimensions are spatial. We intend to explore the applicability of our boundary

detection methods to 3D data.

0 Determine to what extent the globally optimal parameters reported in Tables

8.2, etc. are suboptimal with respect to each individual image.

0 Explore the }_)ossibility of application of interpolation snakes to detection and
tracking of objects (e.g. cars) in real-life images, prompted by the promising
(though insufﬁciently tested) results on the truck images (Section 8.6). Ideally,
the system would be fully automatic, which implies that a. selective visual atten-

tion module and a recognizer be used in conjunction with the border detector.

 

Hopkins Institutions.

207

 

 

APPENDICES

Appendix A

Landmark registration by linear

transforms

A.1 Scaling transforms

Proposition A.1.1 (Scaling registration with known correspondences) If

T(z) = /\z + r minimizes the squared error energy

E(m) = Zara-p.512
j

 

: ZOVZJ'+T—1L‘j)°()\3j+T—u.?j). (A1)
J

then its parameters are given by

I I
< 11.7.2 >
)x - W (A2)
7' = —)\30+u«‘0 (A3)

209

win—ire < > denotes the compler inner product on (C, 1', :0 and 'er are the centroids
ofz and u' respectively]. and z, = (21 — 20. . . . 3n — :0) and a" 2 (11‘1 —11'(). . . . u'n — 11'0)

(ll'ff the 2670-7116111). C?l'l"’l"€3.

Proof: : To minimize this function. one would normally compute the partial
derivatives with respect to all four real parameters of the transformation T. Let us
recall from calculus. that if C = .1: + ig is a complex variable, and Z = .L' — iy is its
conjugate, then one can compute partial derivatives directly with respect to C and 2,

rather than with respect. to :L' and y. These differential operators are related by

O 1 l 'I
()C 2 §(()_1: — I()(/)
1 1 a‘ 'I
()5 = E(th + (03/)

Therefore, in comrmting the gradient of E, we must compute 0A E, (3)—\E, 87E,
and 07—.E. However, smce for any complex function f (C ) we have ()5 f = ()C f , and
because E is a real valued function, we only have to compute two (rather than four)

sets of derivatives, e.g. de and ()fE.

axe = 2on- + T — 2.5) -::_J- (A.4)
.7

871E = E(Azj + r — u'j) (A.5)
i

We set these two equations to 0 and solve for r and A. First, from Equation (A.5)

T = -—A - £50 + 1.1’0 (A6)

where z = - z: n is the mean of the z..-’s and similarlv. UP is the mean of the
0 J J .1 ~ 0

210

j s. 'I‘his equation can be re-written as T(:()) = “'0’ i.e. the optimal trai’isformation

Inaps the. center of 3 onto the center of w.

U.’

To find A, we substitute 7‘ in Equation (A4) to get

('JXE = Z:(A~zJ-+r—zej)z

J

= Z()\-Zj+T—U’j)(ZJ—ZO)
J

= Zn (.2, — :0) — (“J — ..0))(;~j — :0)
J

_ /2_ 17

_ AZM “J“J
J J

= 0

I I .
where z- = z: — z and at = w- — u' . We then )‘et.
J J 0 J J 0 a

A.2 Afﬁne transforms

In this case, there are no constraints on the matrix A:

In particular, there is no convenient representatitm of the transform T using com-
plex numbers, so all computations will be carried out. over the real numbers.

Let. X and Y represent the means of the Xi’s and Yis respectively. Let also

211

~ 1. . — [3
JV; 2 )Cl- — X l:__f (115,11) and Y; = i, — Y (62f (31.21;). We collect all the .117 and

y coordinates of the \;s in two n-dimensional column vectors U, = (11,1415 . . . umT

and V, = (imiré . . .141)? Similarly, for the Yl-Is we set 2’ = (35.2% . . . 2;,)T and

W", : (iv-3.113,2 . . .11';,)T.

Proposition A.2.1 (Affine registration with known correspondences) With
the above notations, the coeﬂicients a, b, c, d and r of the aﬂine transform T(X) =

A - X + 7’ that minimizes the squared error energy

1 -
E(T) = E ZlTW) — o,|2. (A.8)
i
are given by
—1
a b |U’|2 (U’, V’)) (Z’, U’) (w’, U’)
c d <U’. V’> IV’I2 i <Z’. V’> <W’. V’>

andrz—(Y—A-X).

Proof: In this case, E = E(a,b, c, (1, 7113.731). Taking the gradient in Equation
(A.8), we get.
VE = Z< Wee). T(X.) — Ya

First, by setting the derivatives with respect to r]; and to Ty equal to zero,
we get, as in the scaling registration case, that T(Y) = 7, which is equivalent to
r = —(l7 — A - X). The rest of the proposition follows by setting to zero the deriva-
tives with respect to a, b, c and d.

C]

We conclude this appendix with the observation that minimization of the squared
error energy (A8) reduces mathematically to a least. squares problem, linear in the

coefﬁcients of the unknown transformation T (either scaling or afﬁne). If we identify

212

T with the column vector formed by its coefficients, then we must. solve a system of

t he form

Y 2.11-'1’

in the least squared sense. This system is nothing but the matrix form of VE = 0. If
."i/ were a square, invertible matrix, then the solution would be simply T = “—1 ~Y.
In general though, the least squares solution is given by T 2 Mi - Y where All is
the Moore—Penrose pseudoinverse of ll] . Computation of ll] is usually done using the
singular value decomposition (SVD) of M , which, given its popularity in the numerical
analysis community, can be used as subroutine. However, the special purpose, closed

form formulas for T that we derived here are far more elementary, stable and fast.

213

Appendix B

N umerics of cubic interpolation

splines.

B.1 Representation of a cubic polynomial in terms

of second derivatives.

Con‘iputation of a cubic interpolation spline is based on the following lemma 1:

Lemma 8.1.1 Ify = 7(t) is a cubic polynomial, then on any interval [a, b] 7 can be

erpressed uniquely as

(B.1)

where

b-t

A: .
b—a'

B=1—A

 

 

ll\-Ientioned in "Recipes in C” [88] without proof.

214

_V‘Q'?" .

$

(1': (.43 — Aw) — “)3‘ o ___ (o3 — B)(b — (1)2

(i 6

 

 

Remark B.1.1 The sigmﬁcance of this lemma is as follows: if we take a.b to

J .-. 1",, - - -. — C. .- - — . a. y) A. C. — ,,
l)(. constcutiie bimlpoints. say. a — g, and b — 51+1, thm, smut, ,(gl) — y,

and 7(51- +1) 2 llj+1 arc control points (hence given ) we can compute 7(t) for all

t 6 [547.6141]. provided we know the second derimti'ucs ’7” at the break points.

Proof: \Vith A and 13 as in the statement of the Lemma let 11s write 7(t) =

A“,(a) + 137(1)) + 2(t). Clearly,

I. 3 must be a cubic polynomial (as 7 is of degree 3 and A and B are of degree

1).
2. :(a) = () and 2(b) 2 0. and
3. z”(a) = 7”(a) and z”(b) = 7”(b).

Since 2(a) = 0, we shall seek z of the form

2(t) = A(t — a)3 + ,u(t —— (1)2 + I/(t — a)

Then the second (’lerivative is

and by property 3 above, we get

 

 

 

,\ = 7”“) - 7’70) ﬂ = “if/((1) (B 2)
6(b —- a) 2 I
From 2(b) = O, we also get
II II
A j 2.
V: —(b—a)' I (0:) r (a) (B3)

With these coefficients.

 

 

 

 

 

 

 

 

 

2 Z A,,//(()i)()b-_“:l’;(0> (I, _ (1)3 + ’7”2(a) (t __ “)2 __ (b _ a) . 7”(b) 1227/7!le _ a)
= ("W—3,133 E—i— - H "W
2 i
‘25” [$2113 H We
2 (b ‘6“)2(—B3 + 3132 -— 28)7”(a) + (Lg—@3133 — B)7”(b)
: (” ‘6”l2m3 _ far/”(a + “’ ‘6“)2(B3 — B)~/’(b)
[3

Corollary B.1.1 With the same notations as in Lemma B.1.1, the ﬁrst and second

derivatives are

 

 

A _ la, 2_ 2_
7’<o=—’—(%—_—§—3—3A6 1-(b—a) '7”(a)+3B6 1 (b—a) We) (B4)
W) = a - 7%) + b - 7%) (8.5)

Corollary B.1.2 With A, p and u as in (B2) and (B3), and with
7(1)) - 7(0)
= —— B.
T 1/ + b _ a ( 6)

then on [a, b], the cubic polynomial 7 can be written in terms of powers of t as:

7(t) = M3 + (—3a)\ + ,u)t2 + (30.2A — 2cm + T)t + (—a3/\ + a2n — (17' + 7(a)) (8.7)

Proof:

216

7(t) = (7(a) + MU — (1)) + Mt: — (1.)3 + n(t - (1)2 +1/(t —- a)

b— a k V 1

2(t)
)(t — a) + 7(a)

 

: /\(t — a)3 + n(t — (1)2 + (1/ + -————7(b) — 7(a)

b—a
W
T

= /\t3 + (—3m\ + ,ui)t2 + (3(12A — 2(1/1. + T)t + (—(13/\ + (1,2;1. —— (17' + 7(a))

B.2 Computation of 7,?

By Remark B.1.1, the second major point is, therefore, the computation of the second

derivatives 7”(£j) :2 75-, at. the break points, forj = 0,. . .l — 1.

This can be done by requiring that the ﬁrst derivatives of 7 be continuous at the
break points, as follows. Consider two consecutive intervals [£j_1,§j] and [f j’5 j +1].
The ﬁrst derivatives of 7 on these intervals must agree at 9, so by Corollary B.1.1,

after rearranging the. terms, we get

n+1 ”7J_Vj *ti—l

B.8

agj_173’_1+2(A€J-_1+A€jl73l+A5ﬂil+1 = 6'(

,., .. . .- , ._ ”.._A, . ,”._,J/ ., .._. _.
where, for bIeVlt}, we have denoted ’J .— ,(éj), 7]. .- / (£1) and A5] .— 9+1 9.

In the case of a closed spline, equation (8.8) makes sense forj = 0, . . .l — 1, with
the caveat that the indices are read modulo l , and £1 = {l__1 + I7071_1|. Therefore,

in this case, the matrix of the system is of the form

217

( ."l() 80 U (l (l (l ('[_1\ K ’76, K [)0 \

 

 

 

 

 

 

 

("0 :11 Bl 0 o 0 0 71’ 01
0 ("1 x12 [32 0 0 0 75’ 02
0 o 0 (',_4 Al_3 B,_3 0 a,al”_3 Dl_3
0 0 0 0 (L3 Al_2 Bl_2 75:2 Dl_2
\KBFI 0 0 o 0 Cl_2 A14)! @le \D,_1)
A7,,

(8.9)
where (lj_1= AEJ-_1,Bj= A61, AJ- = 2(Cj_1 + Bj) and Dj = 6.(:L%1€]:l —
LET?) for j = 0,...,l — 1. There are as many equations as unknowns
(76’, . . . ,7;I_1), so the system is fully determined in the case of a closed spline.

Remark B.2.1 Note that since 7 = (1:, y) and 7” = (.c”, y”), we should in fact write
down one version of (3.9) for ,r and one for y. For brevity though, we 're abusing the

notation and have 7 and 7” stand in for either .1: and .r”, or for y and y”.

011 the other hand, in the case of an open spline, equation (38) only makes sense

for j = 1,. . . ,l — 2, i.e., only at the interior break points. The system is in this case

 

 

 

 

 

 

 

{ 76' ( Do i
{/11 31 0 0 0 \ 7i, 01
Ci A2 32 0 0 73 02
= 3 (8.10)
0 0 cl_4 A,_3 31—3 7ng 01—3
to 0 0 Cl_3 A[_2)J 7ng 191—2
1170 K7111 \Dl—li

218

Of course, there are in this case l — 2 equations, two short of the number of
unknowns (still 76, , . . . ,7’”_ 1), so the system (13.10) is under-determined as it stands.
To make the solution unique, we impose two extra cmiditimis - the natural end
conditions for the cubic interpolation spline:

ll -,” _
’70 = 11—1 — 0 (8'11)

Introducing (13.11) in (8.10), we get a (l — 2) x (l — 2) triangular system in the
l/ //

 

 

 

 

 

 

unknowns 71 , . . . ‘71—2:
{A131 0 0 0) (71’ (D1)
Cl A2 82 0 0 7g D2
= ; (B.12)
II
0 0 Cl—4 ‘41—3 Bl—3 7l—3 Dl—3
A”
\0 0 0 01—3 Ai—zl K/l—Q )Dl—Q)

 

A7,,

A system of the form M - .1: = y can be solved by an LU decomposition of AI [88]
in two stages: L - z = y and Ill - a? = z. If the matrix AI is tri-diagonal, then L and
U have very simple forms, and the system can be solved in linear time.

Such is the case of lilo, so an open spline can be computed in linear time. In the
case of closed splines, however, the matrix MC is no longer tri-diagonal (but almost!)
because of Bl—l and Cl—l' We don’t know then, that, a priori, the complexity
remains linear. It turns out, after all, that this is still the case, if we compute the LU
decomposition of MC. The trick is to guess the right form of L and of U.

The following lemma extends the well-known LU decomposition for tri-diagonal

matrices, to almost tri—diagonal matrices, i.e., matrices of the form (8.9).

219

Lemma 8.2.1 A matrix M of the form (8.9) has an LU dccomposition of the form:

 

 

 

 

{1 0 0 O O 0 0\)
c0 1 O O 0 0 0
0 c1 1 0 0 0 0
L:
0 0 () cl_4 1 0 0
0 0 0 () Cl—3 1 0
K110 “-1 “.2 "1—4 “'1—3 C1—2 1l
((1.0 b0 0 O O 0 O 10 \
0 a1 b1 0 0 0 0 11
O 0 a2 b2 0 0 0 122
U:
0 0 0 0 0 (ll—4 bl—4 0 til—4
0 0 0 0 0 0 (ll—3 b1_3 vl_3
0 O 0 O 0 0 0 (ll—2 bl—2
l0 0 0 0 0 0 0 0 al_1)

Proof: l\vlultiplying L and U above, we get the following relations:

(1.0 = A0 (8.133.)
b0 = BO (B.13b)
v0 : Cl—l (B.13c)

220

 

For k=(),...l—4,

('A. ° (I). = Cl: (8.148.)
(71.?va+ "n+1 = Ak+1 (314‘?)
bk+1 = Bk+1 (3146)
CA,-1’A.+ Uk+1 = 0 (B.14d)
For It? 2 l — 3,
Chfiblf+al€+1 = AhT-l-l (8.153)
CA: '1’}; + bk+1 = Bk+l (B-15b)

Finally, by multiplying the bottom row in L against. the columns of 11!,

21.0 - a0 = Bl—l (B.16a)
uk - bk +uk+1-ak+1= 0, for k = 0,. . .l — 4 (B.16b)
llk'bk+Ck+1‘ak.+1 =Ck+1, fOT‘kzl—3 (B.16C)

k—3
Z: uj - ’L’j + (“l—2 - bl—2 + 01—1: Al—l (B.16Cl)
i=0
All unknown entries in L and U, namely (10,. . . ,al_1, b0,. . . ’bl—2: e0,. . . ,cl_2,
as well as 130, . . . ,vl_3 and 11.0, . . . ,ul_3 can be computed from equations (B.13),

(8.14), (B.15) and (B.16) in constant time.
C]
We close this appendix with a lemma which relates the LU decomposition of MO
to the LU decomposition of life. We expect that since 1110 is obtained by deleting

the ﬁrst. and last rows and columns from NC, the LU decomposition of MO could be

221

 

 

obtained in the same manner from the LU decomposition of NC. This is indeed the

case, and it is frmnalizet'l in the following lemma:

Lemma 3.2.2 Let 1110 and 1116 be as in (8.10) and (8.9), respectively. Let also
.110 2 L0 - U0 and MC 2 LC - UC be their respective LU decompositions. Then L0
and 1110 are obtained by deleting the ﬁrst and last rows and columns of LC and MC,

respectively.

Proof: Let L, and M, be the matrices obtained from LC and MC, respectively.

By Lemma B21.

 

 

{1 0 0 0 0)
Cl 1 0 0 O
L’..
0 0 Cl—4 1 0
\0 0 0 cl 3 1)
(a1 (21 0 0 0 0 )
0 a2 ()2 . 0 0 0

 

 

(0000 0 0a,_2)

and let M’ = L’ - U’. We get.

I - l I

and forj 2 2

222

Cj-—1(ll\‘ if h'Zj—l
Cj—l’)h7—1+(l'h7 'lf la =_j
an, Uk2j+1<=l—2

0 otherwise

By equations (B13), (B14), (8.15) and (8.16) it follows that A], = O for all but

the following entries:

‘ I — ° /’ ,.. : ° 1' ,. . = '

1&1,A1’==At,
C]
This lemma is important. when writing the routine to compute the interpolation
spline: a single routine can handle the LU decomposition in both the open and the
closed spline cases. Solving the systems (8.9) and (BIG) is again done by a single
routine, and pass it, in addition to the arrays {aj}, {bj}, {ej}, {uj} and {11]}, the
appropriate range of indices: 0 to l —— 1 for closed splines, and 1 to l — 2 for open

splines.

l0
[0
CO

Bibliography

[ll

[6]

l7]

l8]

[9]

[10]

[11]

D. Adalsteinsson and J. A. Sethian. A fast level set method for advancing
interfaces. Journal of Computational Physics, 118(2):269~-277, 1995.

A. A. Amini. T. T. \V'eymouth. and R. C. Jain. Using dynamic programming for
solving variational problems in vision. PAMI, 12(9):855—867, September 1990.

D. H. Ballard. Animate vision. Artiﬁcial Intelligence Journal, 48:57-86, 1991.
H. G. Barrow. J. M. Tennembaum, R. C. Bolles. and H. C. Wolf. Parametric cor-

respondence and chamfer matching: Two new techniques for image matching.
In Proceedings of the 5th International Joint Conference on Artiﬁcial Intelli—
gence, pages 659—663, Cambridge, MA, 1977.

B. Bascle and R. Deriche. Feature extraction using parametric snakes. In
Proceedings of the 11 -th IA PR International Conference on Pattern Recognition,
volume 3, pages 659—663, Netherlands, 1992.

S. Belongie, J. Malik, and J. Puzicha. Shape matching and object. recognition
using shape contexts. IEEE Transactios on Pattern Analysis and Machine

Intelligence, 24(4):509—522, 2002.

P. Besl and N. McKay. A method for registration of 3-d shapes. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, 14(2):239-256, 1992.

F. L. Bookstein. Principal warps: Thin-plate splines and the decomposition of
transformations. IEEE Transactions on Pattern Analysis and Machine Intelli-

gence, 11(6):567—585, 1989.

G. Borgefors. An improved version of the chamfer matching algorithm. In
Proceedings of the 7th International Conference on Pattern Recognition, pages
1175—1177, Montreal, Canada, 1984.

G. Borgefors. Hierachical chamfer matching: A parametric edge matching al—
gorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence,
10(6):849—865, November 1988.

P. Brigger, J. Hoeg, and M. Unser. B-spline snakes: A flexible tool for paramet-
ric contour detection. IEEE Transactions on Image Processing, 9(9):1484—1496,
September 2000.

[\3
[O
4‘.»

[1‘3]

[13]

[14}

[15]

I16]

[17]

I18]

[19]

[20]

[21]

GT. Candela, P.J. Crother, C.I. \Vatson. RA. \Vilkinson. and CL \Vilson.
Pcasys - a pattern—level clasiﬁcation automation system for fingerprints. Tech-

nical Report NISTIR 5647, NIST, 1995.

R. Capelli, A Lumini, D Maio. and D. Maltoni. Fingerprint classiﬁcation by
directional image partitioning. IEEE Trans. Pattern Analysis and Machine
Intelligence, 21(5):402 421, 1999.

V. Caselles. F. Catte, T. Coll. and F. Dihos. A geometric model for active
contours. Numerische llIathcmatik. 6621—31, 1993.

V. Caselles. R. Kimmel. and G. Sapiro. Geodesic active contours. Intemational

Journal of Computer Wsion, 22(1):61—79, 1997.

T. F. Chan and L. A. Vese. Active contours without edges. IEEE Transactions
of Image Processing, 10(2):266—277, 2001.

T. Chang. Texture analysis of digitized finge1'1.)rints for singularity detection.
In Proc. 5th ICPR, pages 478—480, 1980.

P. S. Churchland, V. S. Ramachandran, and T. J. Sejnowski. A critique of pure
vision. In C. Koch and J. L. Davis, editors, Large Scale Neuronal Theories of
the Brain, chapter 2, pages 23—60. MIT Press, Cambridge, MA, 1994.

1. Cohen, N. Ayache, and P. Sulger. Tracking points on deformable objects
using curvature information. In ECC V, pages 458—466, 1992.

L. Cohen. On active contour models and balloons. C VGIP: Image Understand—
ing, 53(2):211—218, March 1991.

T. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active shape models
- their training and application. Computer Vision and Image Understanding,
61(1):38—59, January 1995.

T. F. Cootes and C. J. Taylor. A mixture model for representing shape variation.
Image and Vision Computing, 17(8):567—574, 1999.

M. G. Crandall and P. L. Lions. Two approximations of solutions to Hamilton-
Jacobi equations. Mathematics of Computation, 43(167):1—19, July 1984.

D. Cremers, T. Kohlberger, and C. Schnorr. Shape statistics in kernel space for
variational image segmentation. Pattern Recognition, 36(9):1929—1943, 2003.

D. Cremers, S. Osher, and S. Soatto. Kernel density estimation and intrin-
sic alignment for knowledge—driven segmentation: Teaching level sets to walk.
Pattern Recognition, 3175:36—44, September 2004.

C. Davatzikos, J. Prince, and R. Bryan. Image registration based on boundary
mapping. IEEE Proceedings on Transactions on Medical Imaging, 15:212—215,
1996.

(O
[Q
U1

[27] C. de. Boor. A Practical Guide to Splines. Springer, 2001.

[28] Carl (1e Boor. B(asic)-spline basics. In Les Piegl, editor. Fundamental Develop-
ments of Computer—Aidcd Geometric Modeling, pages 27—49. Academic Press,
1993.

[29] P. Dierckx. Curve and Surface Fitting with Splines. 1\:Ionograhps on Numerical
Analysis. Clarendon Press, Oxford, 1995.

[30] I. L. Dryden and K. V. .\Iardia. Statistical Shape Analysis. John VVileySc Sons,
Inc., January 1998.

[31] B. A. Dubrovin, A. T. Fomenko, and S. P. Novikov. Modern Geometry - Methods
and Applications, volume 1. Springer, 1984.

[32] M. P. Dubuisson and A. K. Jain. Modiﬁed Hausdorff distance for object match-
ing. In Proceedings of Int. Conf. on Pattern Recognition, volume A, pages
566—568, Jerusalem, IS, 1994.

[33] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classiﬁcation. Wiley, New
York, second edition, 2000.

[34] N. Duta. Learning Based Detection, Segmentation and Matching of Objects.
PhD thesis, Michigan State University, 2000.

[35] N. Duta, A. K. Jain, and M. P. Dubuisson-Jolly. Automatic learning of 2d
shape models. IEEE Transactions on Patern Analysis and Machine Intelligence,
23(5):433—446, 2001.

[36] FBI. http://www.fbi.gov.

[37] FBI. Advanced Latent Fingerprint School. US. Department of Justice, Federal
Bureau of Investigation, 1983.

[38] FBI. The Science of Fingerprints. US. Department. of Justice, Federal Bureau
of Investigation, 1985.

[39] M. F igueredo, J. M. N. Leitao, and A. K. Jain. Unsupervised contour rep-
resentation and estimation using b-splines and a minimum description length
criterion. IEEE Transactions of Image Processing, 9(6):1075—1087, June 2000.

[40] US. National Institute for Standards and Technology. Nist 4.
http: / / www.itl.nist . gov.

[41] K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press,
Boston, second edition, 1990.

[42] D. Geiger, A. Gupta, L. Costa, and J. Vlontzos. Dynamic programming for
detecting, tracking and matching deformable contours. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 17(3):294—302, 1995.

226

I43]

I44]

I45]

I48]

I49]

I50]

[521

[531

[54]

S. Gold, A. Rangarajan, C. Lu. S. Pappu. and E. Mjolsness. New algorithms
for 2d and 3d point. matching. Pattern Recognition. 31(8):1019-1031, 1998.

D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine
Lcaining. Addison-W'esley, .\Iassachusetts, 1989.

C. Goodall. Procrustes methods in the statistical analysis of shape. J. Royal
Stat. Soc. B, 53(2):285 339. 1991.

F. Hausdm'ff. Grundziige der Mcngenlehre. Veitu Co, Leipzig, 1914.

F. Hausdorff. Grmidziige der Illengenlthre. American l\-Iathematical Society,
January 1999.

J. M. Henderson and A. Hollingworth. High-level scene perception. Annual
Review of Psychology, 50(243-271), 1999.

E. R. Henry. Classification and uses of fingerprints, 1897. See
http://en.wikipedia.org/wiki/Edward_Henry.

J. H. Holland. Adaptation in Natural and Artiﬁcial Systems. University of
Michigan Press, 1975.

L. Hong. Automatic Personal Identiﬁcation Using Fingerprints. PhD thesis,
Michigan State University, 1998.

B. K. P. Horn. Closed form solution of absolute orientation using unit quater-
nions. Journal of the Optical Society of America A, 4:629—642, 1987.

B. K. P. Horn, H. M. Hilden, and S. Negahdaripour. Closed form solution of
absolute orientation using orthonormal matrices. Journal of the Optical Society

of America A, 5(7):1127—1135, 1988.

D. Huttenlocher, G. Klanderman, and W. Rucklidge. Comparing images using
the Hausdorff distance. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 15(9):850—863, 1993.

L. Itti and C. Koch. A saliency-basml search mechanism for overt and covert
shifts of visual attention. Vision Research, 40(10—12):1489—1506, May 2000.

L. Itti, C. Koch, and E. Niebur. A model of saliency—based visual attention
for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 20(11):1254—1259, Nov 1998.

A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice—Hall,
1988.

A. K. Jain, Y. Zhong, and S. Lakshmanan. Object matching using deformable
templates. IEEE PAMI., 18(3):267—278, 96.

227

[59] AK. Jain. L. Hong. and R. Bolle. On-line fingerprint verificatoin. IEEE Trans.
of Pattern Recognition. and Machine Intelligence, 19(4):302--314. 1997.

[60] Anil K. Jain, S. Prabhakar, and Lin Hong. A u‘mltichamiel approach to finger-
print (L'lassiﬁcation. IEEE Trans. Pattern Analysis and Machine Intelligence,
21(4):348 359, 1999.

[61] L. Kaelbling, M. Littman, and A. Moore. Reinftu‘cement learning: A survey.
Journal of Artificial Intelligence Research, 4:237— 285, 1996.

[62] K. Karu and AK. Jain. Fingerprint classificatitm. Pattern Recognition,
29(3):389~ 404, 1996.

[63] .\I. Kass, A. Witkin. and D. rI‘erzopoulos. Snakes: Active contour models.
International Journal of Computer Vision, 1(4):321—331, 1988.

[64] M. Kawagoe and A. Tojo. Fingerprint. pattern classification. Pattern Recogni-
tion, 17(3):295—303, 1984.

[65] Kitware. The Insight Segmentation and Registration Toolkit (ITK).
http://www. itk. org, 2005.

[66] J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright. Convergence
properties of the Nelder-Mead simplex method in low dimensions. SIAM J.
Optim, 9(1):112—147, 1998.

[67] M. F. Land, N. Mennie, and J. Rusted. Eye movements and the roles of vision
in activities of daily living. Perception, 28:1311—1328, 99.

[68] M. E. Leventon, W. E. L. Grimson, and O. Faugeras. Statistical shape influence
in geodesic active contours. In IEEE Proceedings of C VPR, volume 1, pages
316—323, Hilton Head Island, North Carolina, June 2000.

[69] S. Loncaric. A survey of shape analysis techniques. Pattern Recognition,
31(8):983—1001, 1998.

[70] J. Maintz and M. Viergever. A survey of medical image registration. Medical
Image Analysis, 2(1):1—36, 1998.

[71] R. Malladi, J. A. Sethian, and B. C. Vemuri. Shape modeling with front prop-
agation: A level set approach. PAMI, 17(2):158—175, February 1995.

[72] D. Marr. Vision. Freeman and Co, San Francisco, 1982.

[73] T. Mclnerney and D. Terzopoulos. Topology adaptive snakes. Il'Iedical Image
Analysis, 4:73—91, 2000.

[74] S. Menet, P. Saint-Marc, and G. G. Medioni. B-snakes: Implementation and
application to stereo. In DARPAQI), pages 720—726, 1990.

228

I75]

[761

I81]

I82]

I83]

[84]

[85]

[86]

S. Minut, J. M. Henderson. R. Falk, F. C. Dyer, and S. Mahadevan. Gaze
control for face learning and recognition in humans and machines. In T. Shipley
and P. Kelhnan, editors. From fragments to objects: Segmentation processes in
vision. pages 463- 481. Elsevier. 2001.

S. Minut, S. l\lahadevan, J. M. Henderson, and F. C. Dyer. Face recognition
using foveal vision. In S—VV. Lee, H. H. Bulthoff, and T. Poggio, editors, Bio-
logically Motimtcd Computer Vision, pages 424—433. Springer, 2000.

S. Minut and G. Stockman. Interpolation snakes for border detection in ul-
trasound images. In A. Ranchordas, H. Araujo, and B. Encarnacao, editors,
Proceedings of the First International Conference On Computer Vision and Ap-
plications {VISAPP 2006). pages 297 305, Setribal. Portugal, 2006.

Silviu Minut and K. Anil Jain. Hierarchical kernel fitting for fingerprint clas-
sification and alignment. In Proceedings of the International Conference on
Pattern Recognition, Quebec City, Canada, August 2002.

D. h-“Iumford. Mathematical theories of shape: Do they model perception? In

Proc. Conf. 1570 SPIE, pages 2—10, 1991.

D. Mumford and J. Shah. Optimal approximations by piecewise smooth func-
tions and associated variational problems. Communications on Pure and Ap-

plied Mathematics, XLII:577—685, 1989.

J. A. Nelder and R. Mead. A simplex method for function minimization. Com-

puter Journal, 7:308—313, 1965.

S. Osher and J. A. Sethian. Fronts propagating with curvattire-dependent speed:
Algorithms based on Hamilton-Jacobi formulations. Journal of Computational

Physics, 79:12—49, 1988.

S. Osher and J. A. Sethian. Recent numerical algorithms for hypersurfaces mov-
ing with curvature-dependent speed: Hamilton-Jacobi equations and conserva-
tion laws. Journal of Diﬂerential Geometry, 31, pp. 131—161, 1990, 31:131—161,
1990.

S. Osher and C. W. Shu. High-order essentially non oscillatory schemes for
Hamilton-Jacobi equations. Siam Journal of Numerical Analysis, 28(4):907-
922, August 1991.

T. Pavlidis. A review of algorithms for shape analysis. Comp. Vision, Graph.

Image Processing, 7:243—258, 1978.
P. Peona and J. Malik. Scale-space and edge detection using anisotropic dif—

fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence,
12(7):62€%639, July 1990.

229

[87]

[88]

[89]

[91]

[92]

[93]

[941

[96]

[97]

[98]

[99]

[100]

[101]

D. Pompeiu. Sur la continuite des fonctions de variables complares (These).

PhD thesis, Ann. Fac. Sci. de. Toulouse, 1905.

W. H. Press. B. P. F lannery. S. A. Teukolsky. and W. T. Vetterling. Numerical
Recipes in C: The Art of Scientiﬁc Computing. Cambridge University Press,
2nd edition edition, 1993. available online at http://www. 111‘. com.

A. Rangarajan. H. Chui. and F. L. Bookstein. The Softassign procrustes match-
ing algorithm. In Information Processing in. Medical Imaging, pages 2942,
Berlin, June 1997. Springer-Verlag.

K Rao and K Balk. Type classiﬁcation of fingerprints: A syntactic approach.
IEEE Trans. Pattern Analysis and Machine Intelligence, 2(3):223—231, 1980.

R. P. N Rao. G. J. Zelinsky, M. M. Hayhoe, and D. H. Ballard. Eye movement
in iconic visual search. Vision Research, 42(11):1447—1463x, 2002.

W. Rudin. Real and Complex Analysis. McGraw-Hill, 3 edition, 1986.

D. Rueckert. Segmentation and Tracking in Cardiovascular MR Images Using
Ceometrically Deformable Models and Templates. PhD thesis, Department of
Computing. Imperial College of Science, Technology and Medicine, London,
1997.

S. Sclaroff and A. Pentland. Modal matching for correspondence and recog-
nition. IEEE Transactions on Pattern Analysis and Machine Intelligence,

17(6):545—561, 1995.
T. B. Sebastian, J. J. Crisco, P. N. Klein, and B. B. Kimia. Constructing 2D

curve atlases. In Proceedings of Mathematical Methods in Biomedical Image

Analysis, pages 70—77, 2000.

J. A. Sethian. A fast marching level set method for monotonically advancing
fronts. Proc. Natl. Acad. Sci. USA, 93:1591—1595, February 1996.

J. A. Sethian. Level Set Methods and Fast Marching Methods. Cambridge
Monographs on Applied and Computational l\"lathematics. Cambridge Univer-
sity Press, 2 edition, 2001.

L. G. Shapiro and G. C. Stockman. Computer Vision. Prentice Hall, 2001.

B. F. Skinner. The Behavior of Organisms. Appleton-Cei‘itury-Crofts, New
York, 1938.

L. Staib and J. S. Duncan. Boundary finding with parametrically deformable

models. PAM], 14(11):1061—1075, November 1992.

R. S. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT
Press, Cambridge, MA, 1998.

230

[102]

[10:5]

[104]

[10.5]

[106]

[107]

[108]

[109]

[110]

[111]

[112]

[113]

[114]

Tagare. Shape-basal nonrigid com‘spmidence with application to the heart
motion analysis. IEEE Transactions on Il'lcdical Imaging. 18(7):570—579, 1999.

H. Tagare. D. ()‘Shea. and A. Rangarajan. A geometric correspoiidence for
shape-based non-rigid corresporidence. In ICC V, pages 434 .439, 1995.

J. Ton and A. K. Jain. Registering Landsat. images by point. matching. IEEE
Trans. Ceosci. Remote Sensing. 27(5):642—651, 1989.

A. Torralba. A. Oliva, M. S. Castelhano, and J. M. Henderson. Contextual
guidance of eye movements and attention in real-world scenes: The role of
global features in object search. Psychological Review. 113:766—786. 2006.

J. Triesch. D. Ballard. M. Hayhoe. and B. Sullivan. \Vhat you see is what you
need. Journal of Vision, 3:86—94, 2003.

P. Viola and M. Jones. Robust real-time face detection. IJC V, 57(2):137—154,
May 2004.

J. VVeng. Developmental robotics: Theory and experiments. International

Journal of Humanoid Robotics, 1(2):199—236, 2004.
J. VVeng, J. McClelland, A. Pentland, O. Sporns, I. Stockman, M. Sur, and

E. Thelen. Autonomous mental development. by robots and animals. Science,

291(5504):599—600, Jan 2000.

Donna J. Williams and Mubarak Shah. A fast algorithm for active contours
and curvature estimation. CVGIP: Image Underst., 55(1):14—26, 1992.

C. Xu and J. L. Prince. Snakes, shapes and gradient vector ﬂow. IEEE Trans—
actions on Image Processing, pages 359—369, March 1998.

C. Y. Xu and J. L. Prince. Gradient vector ﬂow: A new external force for

snakes. In CVPR97, pages 66—71, 1997.

L. Younes. Computable elastic distance between shapes. SIAM Journal of
Applied Math, 58(2):565—586, 1998.

Z. Zhang. Iterative point matching for registration of free—form curves. Technical

Report 1658, INRIA-Sophia Antipolis. March 1992.

ES

ll

3

     

llllllllllllllllll